A Clinical Study of MLC-Based IMRT Lung Dose Calculation Accuracy on Plan Evaluation Parameters

It is well known that intensity modulated radiation therapy (IMRT) provides improved target coverage while sparing surrounding healthy tissue compared to three dimensional conformal radiotherapy. Delivering IMRT using a multileaf collimator (MLC) introduces some complexities (Oldham and Webb, 1997; Ma et al., 2000) such as leaf leakage and photon back scattered to the monitor chamber from the MLC leaves (Chui et al., 1994; Wang et al., 1996; Oldham and Webb, 1997; Hounsell, 1998; Jiang et al., 2001). Pencil beam algorithms (Boyer and Mok, 1985; Mohan et al., 1986; Ahnesjo et al., 1992) are often employed in IMRT treatment planning systems (TPS). Because pencil beam algorithms do not account for electronic disequilibrium in heterogeneous media (e.g., lung), the dose accuracy may be significantly affected(Wang et al., 1996; Ma et al., 1999; Wang et al., 2002; Ma et al., 2004; Yang et al., 2005).


Introduction
It is well known that intensity modulated radiation therapy (IMRT) provides improved target coverage while sparing surrounding healthy tissue compared to three dimensional conformal radiotherapy. Delivering IMRT using a multileaf collimator (MLC) introduces some complexities (Oldham and Webb, 1997;Ma et al., 2000) such as leaf leakage and photon back scattered to the monitor chamber from the MLC leaves (Chui et al., 1994;Wang et al., 1996;Oldham and Webb, 1997;Hounsell, 1998;Jiang et al., 2001). Pencil beam algorithms (Boyer and Mok, 1985;Mohan et al., 1986;Ahnesjo et al., 1992) are often employed in IMRT treatment planning systems (TPS). Because pencil beam algorithms do not account for electronic disequilibrium in heterogeneous media (e.g., lung), the dose accuracy may be significantly affected (Wang et al., 1996;Ma et al., 1999;Wang et al., 2002;Ma et al., 2004;Yang et al., 2005).
Monte Carlo (MC) methods have been shown to be the most accurate method for radiotherapy dose calculations in homogeneous and heterogeneous geometries (Rogers et al., 1995;Ma et al., 1999;Siebers et al., 2000;McDermott et al., 2003;Rogers, 2006). The use of MC in radiation therapy has increased in the last two decades due in part to advancements in hardware and MC algorithms. At the same time, several MC codes have been developed for treatment planning (Wang et al., 1998;Ma et al., 1999), and for verification of IMRT dose distributions (Ma et al., 2003;Boudreau et al., 2005). Monte Carlo and pencil beam (PB) IMRT dose calculations have been previously compared for prostate and head and neck patients (Ma et al., 1999;Wang et al., 2002;Boudreau et al., 2005;Yang et al., 2005). It has been shown that pencil beam calculations overestimate the dose by 5-10% compared to MC calculations and measurements in slab lung phantoms (Boyer and Mok, 1985 Previous studies were primarily devoted to implementing MC in treatment planning (Wang et al., 1998;Ma et al., 1999;Jelen et al., 2005). To our knowledge only two studies were devoted to evaluating lung IMRT dose calculation accuracy in a clinical setting. Both studies used the MC method to verify the IMRT lung dose calculation. One study used the MSKCC TPS (Wang et al., 2002) and the other study used the Helax PB, Helax-TMS's, and Helax-CC TPSs (Vanderstraeten et al., 2006 compared in terms of the mean target dose and cumulated dose volume histogram (cDVH) using dose volume points, D x and V x . A problem is that these dose points do not appropriately penalize target underdosing or critical structure overdosing. Furthermore, comparisons based on the mean target dose for relatively small dose nonuniformity or comparisons based on the minimum target dose for large dose heterogeneities are not adequate and are first order approximations as explained by Niemierko, (1997). To circumvent these problems, the Equivalent Uniform Dose (EUD) can used to perform dose comparison and patient plan evaluation. The EUD is based on the assumption that two dose distributions are equal if they have the same radiobiological effect. EUD also describes the volume of interest in a single value which is more realistic and useful for comparison.
In this work, an expanded study using 25 lung cancer IMRT patients and different number of fields is performed to evaluate the effect of dose distribution accuracy on treatment plan evaluation parameters. Pencil beam and Monte Carlo algorithms are used to compare the target mean dose, dose-volume points, and the EUD for targets and OARs. In order to attribute any dose differences related to internal patient scatter versus linac head modeling, MC ion chamber and pencil beam doses were compared and reported for a homogenous water phantom.

Material and Methods
Twenty five lung cancer IMRT patients, which had been previously treated at Massachusetts General Hospital (MGH) were randomly selected for this study. The tumor size and locations vary from patient to patient. The specific parameters for each patient are shown in Table 1. The clinical treatment plans were based on the patient CT scanned in supine position under normal free breathing condition. All patient plans used in this study were also used for treatment. Lungs, spinal cord, esophagus, and heart were contoured as OARs by the treating physician. The inverse treatment planning system CORVUS v6.2 (North American Scientific, Inc., Chatsworth, CA, USA) was used for IMRT plan optimization and final dose calculation. The CORVUS inverse treatment planning system, which is widely used throughout the world, utilizes a finite pencil beam algorithm for optimization and dose calculation. From here onward we refer to the CORVUS TPS finite size pencil beam algorithm as PB. Each patient's plan was optimized using 6 MV photon beam for a Varian 21 EX linac equipped with the Millennium MLC. Five to 12 gantry angles were used with a beamlet size of 1x1 cm 2 for all treatment plans.
Once the plans were approved by the physician, the point dose for each plan was verified in a homogenous solid water phantom. The process was done by optimizing the plan to the water phantom, then measuring the dose to a point in the water phantom using a farmer ionization chamber (0.6 cc active volume). The measurement point of the ion chamber for each patient was chosen in a homogenous dose region, which roughly corresponds to the target location in the patient plan. The ion chamber fluence perturbation correction factor was considered small because the measurement point is chosen in a homogenous dose region. Also, the combination of multiple-beam treatment plans reduce the ionization correction factor (Boudreau et al., 2005). Plans were accepted for treatment if the point dose was within 4% difference (PB versus measurements). For the purpose of comparison, the dose from PB, MC, and the measurements was normalized to the prescribed dose. Once a plan was approved for treatment, the patient's CT data as well as the water phantom's CT data and leaf sequence files were exported to the MC work station in order to re-compute the dose distribution to both the water phantom and patients.

Monte carlo calculations
The Monte Carlo calculation was performed in two main steps. The first step was to model the linac treatment head. This process consisted of selecting the beam energy and beam full width half maximum incident on the target. A phase space file was generated below the jaws for 40×40 cm 2 field size, based on the linac modeling and the selected beam parameters (Aljarrah et al., 2006). The EGSnrc/BEAMnrc code system was used to model the Varian 21EX 6 MV photon beam (Rogers et al., 1995). The transport parameters for the BEAMnrc simulation were set as ECUT = 0.7 MeV, PCUT = 0.01 MeV, and the number of Bremsstrahlung splitting is set to 20. The dimensions and materials for the accelerator components were set based on the manufacturer's specifications. The parameters are
In the second step, the generated phase space file was used as a source for the photon beam to calculate dose distribution in the patients and the water phantom as well. The MCSIM code, which is an EGS4/PRESTA user code, was used for dose calculation (Ma, 2004). The simulations were carried out using the following transport parameters: ECUT = 0.7 MeV, PCUT = 0.01 MeV, and ESTEPE = 0.04. The average leaf leakage was used as 1.8% based on ion chamber measurements and the transition to the area under the jaws was considered zero. In all patient calculations, we have kept the statistical uncertainty to be 2% or less so as not to significantly affect isodose lines, DVHs, or biological indices .
The absolute dose was calculated by converting the MC calculated dose per fluence to dose per MU at the linac calibration conditions in water (depth of 5 cm, 10×10 cm 2 field size, 100 cm SSD, and 100 cGy for 100 MU).

Plan comparison and dose reporting
The dose to a point in the solid water phantom was calculated using MC and PB for each patient's plan and measured using the ion chamber. Both TPS-PB dose and chamber measurements were compared to MC predicted dose.
Patient IMRT dose distributions and plan evaluation parameters calculated with the MC method were compared to the PB calculation as the PB result minus the MC result expressed as a percentage. Dosevolume histograms (DVHs) were created for each patient, using MC simulations and the PB inverse planning system, for the following targets and OARs: GTV, PTV, ipsilateral and contralateral lungs, spinal cord, esophagus, and heart. The mean dose to each target and OARs was compared. Due to their clinical utility to predict OAR toxicity, dose volume points were used for comparison. For lung, volume dose points V 20 , and V 30 (the lung volume that receives at least 20 and 30 Gy, respectively) were used. For esophagus, D 33 , heart D 33 , (dose received by 33% of the volume), and spinal cord maximum dose (D max ) were used. The spinal cord maximum dose was defined as spinal cord D 2 (dose received by 2% of the spinal cord volume). To evaluate the difference of the stability of dose-volume points that denote target coverage, the dose volume points D 95 and D 98 (dose received by 95 and 98% of the volume, respectively) were compared.
The comparison based on the mean dose difference is unbiased when the dose is uniformly distributed across the region of interest. This, however, is not the situation for most IMRT dose distributions. The EUD is a promising function for IMRT dose distribution plan comparisons. EUD values are controlled by the parameter a, where EUD approaches maximum dose as (a >> 1), minimum dose as (a << 1), and equal to the mean dose as a = 1. Previous studies show that EUD function can be effectively used to represent the characteristics of treatment plans seen with dose-volume constraints and therefore may be effective substitutes in IMRT treatment planning and plan evaluation. A strong correlation between dose-volume constraints and EUD was found in previous studies as well ( The equivalent uniform dose (Niemierko, 1997) was calculated from the MC and the TPS differential dose-volume histograms for each structure and compared. The following formula was used to calculate the EUD values: where d i and v i are the dose and the volume of the i th voxel, and a is the tumor or structure control parameter that describes the effect of dose heterogeneity in the structure. Following Niemierko, the control parameters are -15 for tumor, 1 for lung, 12 for spinal cord, 5 for heart, and 8 for esophagus (Niemierko, 1997).

Statistical analysis
The percent difference between the PB and MC algorithms were compared for all plan evaluation parameters previously mentioned. Pairwise comparisons were performed using the two-tailed paired Student's t-test. P values less than 5% were considered significant. A linear regression model was used to test correlations between the percent difference in plan evaluation parameters and treatment plan parameters. R 2 values were used to evaluate goodness of fit of the linear models.

Water phantom dose comparison
The average dose difference of the measurement point between PB and ion chamber and between MC and ion chamber was found as -0.3% (min: -2.6, max: 2.2) and -0.7% (min: -2.5, max: 1.6) as shown in the histogram of Figure 1. Both the PB and MC algorithms predicted the dose to the measurement point with an average of 0.4% (min: -1.5, max: 1.3).

Plan evaluation parameter comparison
Mean dose difference: On average, the difference (MC vs. PB) in the GTV mean dose for all the plans is 3.6% (min: 0.0, max: 8.4) and 4.3% (min: 0.0, max: 10.5) for the PTV. Based on patient-to-patient comparison, the mean dose difference between PB and MC for the PTV is found to be higher than that for the GTV for most patients even though the difference in their mean averages is about 1%. For all the OARs, the mean dose difference between PB and MC vary within an average of 1%, but the mean dose difference in terms of individual patient's comparison shows a minimum dose difference as low as -2.6% in the case of esophagus and up to 2.8% in the case of the ipsilateral lung.  The mean dose differences between the PB and MC algorithms for the GTV, PTV, and OARs of all 25 patients are shown in Table  2. Statistically different results between the two algorithms were obtained for the GTV, PTV, esophagus and spinal cord mean dose. The greatest difference was for the PTV (4.3%, p < 0.01) and the GTV (3.6%, p < 0.01). The esophagus and spinal cord differences were both within 1% (p = 0.01).  30 . On average, no significant dose difference between PB and MC is found for the percentage dose difference for spinal cord maximum dose (D max ), heart D 33 , and esophagus D 33 . However, in the case of the esophagus, individual comparisons show differences for two patients of +6.2% and -5.5%.
The DVH differences between the PB and MC algorithms are shown in Table 3. A statistically significant difference between the two algorithms is found for GTV D 95 (4.4%, p < 0.01) and D 98 (5.3%, p < 0.01) as well as the PTV D 95 (7.1%, p < 0.01) and D 98 (7.8%, p < 0.01). The ipsilateral lung V 30 (0.6%, p < 0.01) is the only OAR for which a statistically significant difference between the two algorithms is found.

EUD comparison:
For the GTV, the EUD average difference between PB and MC is 4.1% (min: -0.2, max: 8.7). The EUD average difference for the PTV is 4.8% (min: -17.9, max: 14.6). For all patients, the EUD average difference for both the GTV and PTV are close in values, but the EUD difference data for PTV show higher deviation  = 6.7% compared to 2.4% for the GTV. Both the GTV and PTV EUD, however, are different depending on whether the PB or MC algorithm is used (p < 0.01). For the OARs, the EUD average difference for all the patients for each structure is within 0 to 2%. The esophagus shows relatively higher EUD difference for some patients compared to other OARs such as the spinal cord. The EUD differences of the PTV are most likely greater than zero except one patient which shows about 17.9% reduction in the PB PTV EUD. For this patient, the DVH of the GTV and PTV as well as the PB dose distribution, MC dose distribution, and the dose difference at the isocenter slice are shown      in Figure 2. The EUD and the mean dose difference for OARs in this case and the mean dose difference are within 3%. The calculated dose from PB and MC and the measured dose from the ion chamber in the water phantom are all within a difference of 2%.

Parameter correlations
Only the treatment and plan evaluation parameter correlations with p < 0.05 are presented. Linear regression parameters associated with the difference in the mean dose between PB and MC algorithms are shown in Table 5. Covariates that have predictive ability for the difference in EUD between the PB and MC algorithms for the PTV are shown in Table 6. With linear modeling of the GTV, PTV, and OAR plan parameters' dependence on total MU, number of fields, GTV volume, and PTV volume, the best fits were for the PTV EUD versus GTV volume (R 2 = 0.39) and esophagus mean dose versus total MU (R 2 = 0.33). Other statistically significant associations under univariate analysis were PTV EUD versus total MU (R 2 = 0.27), PTV EUD versus number of field (R 2 = 0.21), and PTV versus PTV volume (R 2 = 0.31). Plots of the PTV EUD with respect to GTV volume and total MU are shown in Figure 3.

Discussion
In this study we have used a large cohort of patients (25) to investigate the effect of dose calculation accuracy on plan evaluation parameters for IMRT lung treatments. Differences have been found in comparing the two algorithms. Several factors are illustrated regarding the variation in the dose difference between PB and MC.
PB-based algorithms calculate the dose-to-water while the MC results reported herein are dose-to-medium, which composed the patient anatomy based on the CT number conversions (tissue, air, bone). As reported by Siebers et al. (2000) for 6 MV beam , the dose-to-medium dose-to-water correction is about 1, 10, 13% for tissue, bone, and air, respectively. In this work, we have considered the dose values calculated by MC as the ground truth. Therefore, the dose differences between dose-to-water and doseto-tissue are taken as real differences between the two algorithms. Others have found that the dose-to-material dose-to-water conversion can affect the final plan parameters from 0 -8% for head and neck and prostate cases (Dogan et al., 2006). The largest error (8%) in that study was for the femoral heads in prostate cases that included nodal irradiation. In heterogeneous geometries such as lung, the dose-tomedium dose-to-water conversion will not account for the internal scattered dose that affects the dose distribution differences between PB and MC algorithms.
Similarly, another source of dose difference between PB and MC is the perturbation to the fluence of the secondary electrons by the tissue composition. It is well known that the ability of the path-length correction within the PB algorithm to account for this is limited. The level of the perturbation most likely depends on the tumor location and size relative to bone structures or air cavities. For example, a tumor located beyond or in front of a bony structure or an air cavity relative to the incident beam should have a different dose depending on whether PB or MC is used. Such an issue is shown in the two cases of Figure 4a (case number 1) and Figure 4b (case number 7) where both patients have an air cavity inside the PTV. The significant dose prediction differences between PB and MC due to the secondary electron fluence are evident in the isodose curves as shown in the figures (the MC isodose curves being much more constricted in the high-dose region).     Variations in the homogeneity of the target tissue composition may also contribute to dose differences between PB and MC algorithms. For example, the GTV usually represents the tumor which has a relatively uniform density of ~ 1 gm/cm 3 while the PTV for lung has less tissue homogeneity, usually consisting of a lung tissue shell (density = 0.23 gm/cm 3 ) surrounding the GTV. The EUD is similarly affected where EUD differences of 13% and 14% are found for case number 1 and 7, respectively. These effects are minimized in homogeneous phantoms as demonstrated by the results in Figure 1 where the agreement between PB, MC, and ion chamber measurements are within 2%.
Previous studies usually compared the performance of a TPS or MC with a TPS based on dose-volume points from the DVH or the mean dose difference. The DVH points usually used are based on their ability to specify target coverage or predict OAR complications. These dose-volume points are unreliable due to their dependence on the DVH slope at the location of the comparison points. Therefore, a clinical decision or comparison based on these points could be misleading. The mean difference evenly penalized positive and negative outliers. Therefore, if there is any cold or hot spot in the target or the structure, it could be canceled out and not appear in the calculated mean. For example, even though a cold spot in MC calculated dose is shown in the patient of Figure 2, the mean dose difference is 1%. This patient has a large tumor GTV of 490cc and a PTV of 1189cc. This case has 12 gantry angles which is the highest among the other 25 plans tested in this study. The PTV for this patient contains tissues from lung, esophagus and bronchi. The mean dose difference is 1%. The dose calculated by MC shows a cold spot at the PTV which is shown in Figure 2a. The beam number and arrangement and the tissue complication mainly affects the significant reduction in the EUD value.
It is a natural assumption that the patient anatomy, tumor size and location, number of fields and field angles, and number of segments may play an important role in the dose distribution differences between PB and MC. As previously discussed, the exact reason for the dosimetric differences on a per case basis can be difficult if not impossible to identify. Even if one could identify the physical reason for the differences, it would not be clinically useful. The statistical differences in plan parameters between the PB and MC algorithms for the cohort of patients studied may be more clinically relevant. To this end, the targets show the greatest difference in plan evaluation parameters for the two algorithms. Even though the OAR plan evaluation parameters have shown a statistically significant difference between the two algorithms, the differences are only within about 2%. These differences may or may not be clinically significant but they nevertheless represent a systematic dose difference that can be addressed by accurate dose calculations.
The EUD shows more variability on a case-by-case basis for the targets than the OARs compared to the other plan parameters. For example, the PTV mean dose differences have a minimum of 0.0% and maximum of 10.5%. Correspondingly, the PTV EUD dose differences have a minimum of -17.9% to a maximum of 14.6%. This variability difference between the EUD and other plan parameters reflects the ability of the EUD to appropriately (i.e., in terms of radiobiological response) account for hot and cold spots in the targets.
The linear regression analysis is helpful to identify which, if any, treatment plan parameters are predictive of potential differences between a PB calculation model and a MC calculation model. One would expect that as the number of beams increases that the differences between the two algorithms would be mitigated. This is what was found in our analysis. There is a statistically significant (although weak) negative correlation with PTV EUD and total MU, number of fields, GTV and PTV volume. As the total plan MU, number of fields, GTV and PTV volume increase, then there is a decrease in the difference between the PTV EUD calculated with the PB algorithm and the MC algorithm. The only OAR that shows a similar dependence was for the esophagus mean dose difference with respect to the total MU. It is unclear why this is the only OAR that shows dependence the treatment planning parameters studied. In any case, this study shows that the degree to which PB and MC differences can be predicted using readily available treatment planning parameters (e.g., MU, GTV volume, etc.) has been shown to be weakly linked at best. Furthermore, the difference between PB and MC on plan evaluation parameters can vary widely from case to case as previously shown in the results of Figure 2. It must, therefore, be recommended that developers should still continue to work to implement MC methods into routine clinical use.

Conclusion
The dose distributions for 25 IMRT lung patients were evaluated in this study using MC and PB dose calculation methods. Dose calculated using the PB algorithm on average overestimated the GTV, PTV, and OARs plan evaluation parameters compared to MC calculations. The average difference for the PTV is 4.3% while it is 7.8% for the D 98 of the PTV. Even though the dose comparison in terms of average difference did not indicate high significant variation  for targets or OARs, the dose difference based on individual patient comparisons could vary significantly for some patients. There was a statistically significant difference between plan evaluation parameters calculated using the PB algorithm compared to the MC algorithm. The plan evaluation parameters of mean dose and EUD have a weak but statistically significant dependence on the number of fields, total MU, GTV volume and PTV volume for some structures. One main conclusion of this work is that the most accurate dose calculation methods (i.e. Monte Carlo) should still be implemented in the clinical routine. One can not tell a priori which cases will show the largest errors in plan evaluation parameters. Based on the statistical analysis, we know there are appreciable differences between the PB and MC calculations for the targets. Accurate MC calculations can remove those remaining systematic errors from treatment plans compared to PB calculations. Lastly, MC algorithms will need to be benchmarked or standardized upon clinical implementation so different levels of accuracy are avoided.