Dimitrios Nikolopoulos^{1*}, Sofia Kottou^{2}, Anna Louizi^{2}, Ermioni Petraki^{1,3}, Efstratios Vogiannis^{4} and Panayiotis H.Yannakopoulos^{1}  
^{1}Department of Electronic Computer Engineering, TEI of Piraeus, Greece, Petrou Ralli & Thivon GR250, 12244, Aigaleo, Greece  
^{2}Medical Physics Department, Medical School, University of Athens, Mikras Asias 75, GR11527, Goudi, Greece  
^{3}Department of Engineering and Design, Brunel University, Kingston Lane, Uxbridge, Middlesex UB8 3PH, London, UK  
^{4}Evangeliki Model School of Smyrna, Lesvou 4, GR17123, Nea Smirni, Athens, Greece  
Corresponding Author :  Dimitrios Nikolopoulos Department of Electronic Computer Engineering TEI of Piraeus, Greece, Petrou Ralli & Thivon GR250 12244, Aigaleo, Greece Tel: +00302105381110 Fax: +00302105381436 Mobile: +00306977208318 Email: [email protected]; [email protected] 
Received April 02, 2014; Accepted May 22, 2014; Published May 24, 2014  
Citation: Nikolopoulos D, Kottou S, Louizi A, Petraki E, Vogiannis E et al. (2014) Factors Affecting Indoor Radon Concentrations of Greek Dwellings through Multivariate Statistics  First Approach. J Phys Chem Biophys 4:145. doi:10.4172/21610398.1000145  
Copyright: © 2014 Nikolopoulos D, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 
Visit for more related articles at Journal of Physical Chemistry & Biophysics
A large scale nationwide radon survey was conducted in Greek dwellings between 1994 and 2000. Twelve hundred passive CR39 detectors were distributed and collected along with 963 filled in questionnaires. These were rechecked during 201213 to evaluate factors that potentially affect indoor radon concentrations, namely the factors (i) area (environment), (ii) building levelfloor, (iii) ground type, (iv) basement, (v) building type, (vi) construction year, (vii) building walls contact, (viii) wall materials and (ix) floor materials. The questionnaires were prepared by the research team according to international standards. Oneway and multivariate statistical methods were applied for the analysis, in specific (I) Linear Regression Analysis, (II) One way or multiway ANOVA, (III) General MANOVA, (IV) Stepwise Regression Analysis and (V) Principal Components Analysis. The results revealed that approximately 0.1% of the dwellings exhibited outlier radon concentrations. Noteworthy statistical correlations were detected between the measured mean annual indoor radon concentration levels and the factors (ii) (building levelfloor) and (viii) (wall materials). Weak evidence was provided for the corresponding correlation with the factors (v) (building type) and (vii) (building walls contact). Minor was the association with the factors (iv) (construction year) and (ix) (floor materials). Significant differences were detected in the results of the applied statistical methods.
Introduction 
Natural environmental radiation depends on local geology and hence, variations are addressed in human radiation exposure due to cosmic and terrestrial radiation [1]. ^{238}U and ^{232}Th are two natural parent isotopes which are present in soil and contribute significantly to natural terrestrial radioactivity. Radon (^{222}Rn) is a radioactive noble gas and originates from ^{238}U. ^{220}Rn originates from ^{232}Th and ^{219}Rn from ^{235}U. ^{222}Rn, ^{220}Rn, ^{219}Rn are the primary sources of radon in soil, with ^{222}Rn being dominant in rocks, soil, building materials, underground and surface waters [2] and set to be the most hazardous radionuclide. Radon (^{222}Rn) and its shortlived progeny (^{218}Po, ^{214}Po, ^{214}Bi, ^{214}Pb) are attached in dust and in water droplets creating radioactive aerosols, that inhaled via breathing and enter human lungs. Radon enters buildings through gaps around pipes or cables and through cracks in floors [3]. Primary studies have shown that radon is the second most dangerous cause of lung cancer after smoking. This happens when alpha particles emitted from radon progeny damage pulmonary epithelium. Many studies have been made for the measurement of indoor radon concentrations in several countries [49]. Over the years, in Greece, indoor radon concentrations measurements, to our knowledge, are as follows: several small–scale [1012], two middle – scale [3,13] and one largescale [14]. 
Under the National Strategic Reference Framework (NSRF) “Thales” project of the Technological Education Institute of Piraeus and extending the aforementioned large scale radon survey, this paper addresses the grade of severity with which factors influence indoor radon concentration levels. Similar studies in other countries have shown that indoor radon concentrations are higher at lower floor levels [1517]. Moreover, recent studies indicate that radon emanation from building materials contributes significantly in indoor radon concentration in dwellings [1820]. Current work provides alternative approximations for studying the relative importance of various radonaffecting factors. Through a manifold manner, associations are quantified between several factors such as the “basement existence” and “ground type (sloppy, overgrounded)” with radon concentrations. 
Materials and Methods 
A thorough investigation was performed on whether nine factors: i) area (environment), ii) building levelfloor, iii) ground type, iv) basement, v) building type, vi) construction year, vii) building walls contact, viii) wall materials, ix) floor materials, may affect indoor radon concentration independently or jointly. These factors have been recorded on 963 filled questionnaires of the greater largescale survey in Greece [14]. The following statistical methods were applied on the questionnaire data: i) Linear Regression Analysis; ii) One way or multiway ANOVA; iii) General MANOVA; iv)Stepwise Regression Analysis; and v) Principal Components Analysis; It is noted that radon data from reference [14] were dispersed across Greece. 
Linear regression analysis 
In linear regression analysis, a straight line is fitted through a set of pointsobservations in such a way that the sum of squared residuals is minimal [21]. In multiple linear regression the dependent variable can be written in terms of a linear combination of the independent variables. The regression equation describes the correlation of the mean value of a variable y with specific values of xvariables used to predict y. 
If (x_{1}, y_{1}), (x_{2}, y_{2}),…, (x_{n}, y_{n}) are realisations of random variable pairs (x_{1}, y_{1}), (x_{2}, y_{1}),…, (x_{n}, y_{n}), then the linear regression equation expresses the mean of Y as a straightline function of X which can be represented as 
E(Y_{i}) = β_{0} + β_{1} . X_{i} (2.1.1) 
or E(Y_{i}) = β_{0} + β_{i1} X_{i} + β_{i1} X_{i} + … + β_{ip} . X_{p} for p independentpredictor variables. 
E(Y_{i}) states the mean expected value and i points the population. The estimated/fitted model is then: 
(2.1.2) 
From (2.1.2), the estimated/fitted values for each of the n observations are 
(2.1.3) 
Where i = 1,…,n is the consecutive number of the population. From (2.1.2) and (2.1.3) the, so called, observed error or fitted residual is calculated as: 
(2.1.4) 
Equation (2.1.4) calculates the estimated error of the ith observation in the sample. From (2.1.4) the sum of squared observed errors (SSE) equals 
(2.1.5) 
for all observations in a sample of size n. The mean square error (MSE) equals then 
(2.1.6) 
(“n2” should be substituted by “np1” when there are p predictorindependent variables) and this is the sample variance of error. The residual standard error is then calculated as 
(2.1.7) 
and σ^{2} should be the constant error variance, otherwise the confidence intervals will be misleading. 
As 
(2.1.8) 
the total sum of squares (SST) equals 
(2.1.9) 
with set to be the mean of all observed Y values. 
The coefficient of determination 
(2.1.10) 
represents the proportion of variation in Y that is explained by X. Parameters β_{0}, β_{1}, ..., β_{p} (regression coefficients), σ^{2} (variance) and R^{2} (coefficient of determination) need to be estimated in order to examine if the linear regression model applies to this group of data. However, even if R^{2} value is close to zero, this does not mean that X and Y have linear association and polynomial terms should not be included to improve the fitting. 
One Way or multiway ANOVA: 
Analysis of variance (ANOVA) is considered as the generalization of a ttest to more than two statistical groups. ANOVA is divided in two categories: i) One way, where a single factor exists and ii) two way or multiway, where two or more factors exist. ANOVA is used for distributional assumptions about a set of effects in a model, with ability to extrapolate the inferences to a wider population, improve accounting for system uncertainty and the efficiency of estimation. ANOVA has been implemented as the basic method for the statistical analysis of radon concentrations in many studies [9,2224]. For the analysis of factors affecting indoor radon concentrations, initially each factor was analysed independently, according to Table 1. In such a way a first assumption for the weightiness of the effect of each factor is possible. In the next step, factors affection is no longer estimated independently; instead, factors influence each other and therefore are dependent. The aforementioned method is called randomeffects assumption of the analysis of variance. ANOVA can be implemented only in sampling distributions similar to those of Gauss and, due to this, it was applied in the log distributions of radon measurements. 
General MANOVA 
Multivariate (multiple dependent variables) analysis of variance (MANOVA) is defined as a partition of the sum of squares and the sum of the cross products (SSCP) matrix 
(2.1.2.1) 
into independent Wishart matrices [25]. MANOVA is applied instead of a series of oneatatime ANOVAs. In several situations, the power of MANOVA is inferior to ANOVA of one variable at a time, however, MANOVA takes into account the intercorrelations among the dependent variables, according to Table 2. Hence, MANOVA is considered more efficient over ANOVA for multivariate data [25]. 
Issues in Multiple Regression 
The difficulty with model selection emerges from the fact that for p predictors, there are 2p different candidate models. With so many possible interactions it can be difficult to find a good model. Model selection methods try to simplify this task. A true model may only depend on a subset of X_{1}, ..., X_{P} . In other words, in model 
Y = β_{0} + β_{1}X_{1} + ... + β_{p}X_{p} + ε (2.1.3.1) 
some of the coefficients are zeros. The result will be the disclosure of those predictors with nonzero coefficients, i.e. the “best subset” of all predictors. 
R2 can be used for models with the same number of parameters/ coefficients, otherwise should be used. The best model has the biggest value. 
Selecting p predictors, the Mallows’ C_{p} criterion should be small with a value near to p. 
Stepwise Regression Analysis 
Stepwise methods are used in several areas of applied statistics. A statistical model can be constructed in two ways, namely (i) forward selection and (ii) backward elimination. Forward selection means that a specific number of variables exist in the beginning and gradually variables are added, one at a time, in optimal way in order to analyse the effect of each variable. Alternatively, with backward elimination, all potential variables exist in the beginning and noneffective variables are subtracted, one at a time, until a desirable stopping point is reached. 
Stepwise regression forms a hybrid model between forward selection and backward elimination. More precisely, steps have a forward direction with variable addition, however if a variable is characterized as nonsignificant, it is removed as in backward elimination. In literature stepwise regression has been used for the prediction of mean indoor radon concentrations [26], in the construction of radon maps based in indoor radon measurements and soil geochemical parameters [27] and in risk analysis of factors affecting lung cancer [28]. 
Principal Components Analysis 
Principal components analysis (PCA) is used for the reduction of the number of possible clusters. PCA offers the ability for the identification of patterns within large sets of data [29]. Its significance rely in the occurrence of a relative redundancy in the variables, due to their correlation in the measurement of the same construct. During the analysis of the principal components, eigenvalues represent the relative participation of each factor in presenting the general variability of the sampled data [30]. PCA has several implementations in factors investigation of water quality, in drug development, in cancer detection and in health care [3033]. PCA has been also used for the investigation of the dependence among variables and for the prediction of relationships among variables. 
Results and Discussion 
Figure 1 presents characteristic residual plots calculated from the measured average concentrations of radon (C) and their logarithms (log (C)). It is noted that the C values of Figure 1 correspond to timeintegration over a year, constitute representative sample for Greece, were derived in accordance to international standards and delineate the radon profile of Greece [26]. In this consensus, Figure 1 is of significance since it may show actual tendencies regarding the randomness or predictability of the employed concentration sample. Indeed, completely randomised responses to normaldistribution either of C or log (C), would exhibit no deterministic normaldistribution’s residuals and, hence, be completely described by stochastic processes. The normal probability plots of Figures 1a and 1b indicate, however, that the logarithms of the measured concentration followed normal distribution up to the 95% of log (C) values, namely indicated that C values followed lognormal distribution. This is also evident from the shapes of the frequency distributions of the residuals. The frequency distribution of Figure 1a was clearly lognormal, while simultaneously that of Figure 1b, clearly normal. This is also of significance because all international largescale radon surveys reported lognormal behaviour of indoor radon concentrations. The reason is rational, namely that is why the C values did not follow normal distribution, as it is shown in the corresponding normal probability plot of Figure 1a. Under another view, the residual plots of log (C) versus values, fitted to normaldistribution, showed random patterns for the fitted residual values of log (C) above 1.6. It is noted that a residual of 1.6 in log (C) is consistent with an uncertainty of σ_{C} = 39.8 Bq.m^{3} in the predicted C values. This, according to the recording capabilities of the employed dosemeters [34], accompanies high C values, namely C values above the EU action limit of 200 Bq.m^{3}. Moreover, the majority of predicted residuals were below 1.2. This is very significant because this residual range is consistent with concentrations usually addressed, namely between 10 – 120 Bq.m^{3}. Other factors may affect concentrations in this range, and, surely the potential factors could not be continuous under the normal distribution. Indeed, the Versus Fits diagram of Figure 1a shows characteristic predictability, different from the normal distribution for the residual C range below 75 Bq.m^{3}. On the other hand, the residuals versus observation order, did not showed tendencies for concentrations up 160 Bq.m^{3}, either in the concentration order (Figure.1a), or, the order of the concentration’s logarithm. 
Table 3 presents the analysed factors, factor levels and level values with their corresponding description. Data of Table 3 were formulated in accordance to the contents of the 963 questionnaires which were filled during the radon survey of Greece. It is noted that these questionnaires were developed in agreement to other national surveys. All factors exhibited 34 levels. This is worth to notice, because a multilevel collection of factors can distract results, especially for limited number of measurements. Factor (Level L) was 5level marking the usual situation of apartment dwellings in big cities of Greece. Nevertheless, this 5level factor is easily convertible to a lowerlevel one. Factor (Floor’s material F) was free to fill, so a 6 level collection was finally achieved. 
It is well identified that Gauss distribution offers a rigid and justified pathway for statistical analysis. Since concentrations’ logarithms followed the distribution of Gauss, log (C) was considered favourable. Hereafter, analysis was conducted only on log (C). Table 4 presents the unusual observations in log (C) values accounting that these followed normal distribution, viz. were treated according to the distribution of Gauss. Leverage points were considered to be those observations corresponding to extreme or outlying values of log (C) in a manner that any lack of neighbouring observations implied that the fitted Gaussian regression model passed close to the particular observation. In specific, leverage points were calculated by moving all points onebyone up or down and calculating the proportionally constant (leverage) of the change of the corresponding Gaussian fitted value. Outliers were calculated as the observations that presented residuals above 1.5 times the interquartile range. According to Table 4, eight outlier (R) and two leverage (X) residual points were identified. In any case, unusual log (C) values were approximately 0.1% of the total concentration sample size. Therefore, they constituted a negligible part of measurements. Importantly, however, the latter finding indicates that only a small portion (<0.1%) of Greek dwellings presented unusual concentrations. Considering that high unusual concentration extremes may associate with high human radiation burden, this fact implies that indoor radon in Greece may not lie in the international extremes. Emphasis should be stressed also on the fact that outlier data affect any type of fit and should be removed prior to regression analysis, whereas, leverage point may or may not affect. For this reason, all outlier and leverage points were finally removed from the dataset. 
Table 5 presents the combinations to define the best subsets from the nine factors of Table 3 for the regression of log (C). As in Table 4, regression was linear to the factors employed in each entry of Table 5. Mallow’s Cp (MCP) was calculated for any subset of k,k ≤ p of explanatory variables, as , where SSEp was the residual sum of squares for the subset model containing p explanatory variables counting the intercept (i.e., the number of parameters in the subset model), MSE the mean square error and n is the sample size. It should be emphasised that, acceptable models in the sense of minimizing the total bias of predicted values, are those models for which Cp approaches the value p, i.e., those subset models that fall near the line Cp = p in a plot of Cp against p for the collection of all subset models under consideration. Under this view, only the combination of all factors except factor G (Ground Type, Table 3) constitutes an acceptable subset that could minimise total bias. 
Additionally to the analysis presented sofar, oneway ANOVA was applied to singlefactor data from the whole data set. Analysing factor LevelL in its full depth, a f value of 2.551 was calculated, whereas the critical f value at the 95% confidence interval  CI is 2.03. These values imply that with p= 0.014, results from the various different levels collected did not differ significantly. However when the full dwellinglevel data were reorganised in 3levels (ground floor, first floor and upper floors) f value was found equal to 7.156 while the corresponding critical value at the 95% CI is 3.01 with corresponding value p = 0.000877. This finding is significant, since it implies that with p<0.001, lower level dwellings present higher indoor radon concentrations. Further evidence was provided by reorganising dwellinglevel data in 2 levels, namely ground floor and higher floor dwellings. Applying ttest to the average concentrations it was calculated that at p<0.001, ground floor dwellings presented higher radon concentrations. Analysing factor “Wall ContactCon” with oneway ANOVA, an f value of 0.893 was calculated, whereas the critical f value at the 95% CI is 2.624. Namely, at p<0.001different Wall Contact  factor level dwellings did not present differences. Similar was the outcomes of the oneway ANOVA for factor “Floor’s materialF”. Non significant variations were addressed, since calculated f value was 1.298 and the critical value at 95% CI, 2.119. On the contrary, the oneway analysis of factor “Wall’s materialW”, provided an f value of 4.314 with a critical value at 95% CI of 2.624 and an associated p value of 0.0051. The latter finding was associated with a tendency of higher concentrations of rock dwelling. 
Table 6 presents the results of the general MANOVA method. These results support further findings of the oneway ANOVA. Nonsignificant statistical interactions between any combinations of factors were detected by General MANOVA 
Table 7 presents the results of stepwise regression of log (C) versus all factors. Through stepwise regression, a linear model was sought containing only those variables which were significant in modelling log (C). The qualitative factor levels of Table 3 were employed in their original values so as to be transformed to quantitative variables. It should be stressed, that stepwise regression is particularly useful when there are many possible explanatory (independent) variables. Some of these variables may be highly correlated with each other and therefore may explain the same variation in the response and not be independently predictive. Some may also not influence the response in any meaningful way. Employing high alpha values for the error probability in Table 7, only three factors were finally selected, and these after the third step. It is noted that probability values, p , and alpha values, a , are related as p = 1  a and, thus, the results of Table 7 correspond to the 50% significance level, either for accepting entering of a certain factor, or, for its removal. According to Table 7 the main factors found to influence indoor radon concentrations were “Wall’s material W”, “Level L” and “Wall Contact Con”. In specific, factor contact exhibited pvalue of 0.162, factor level, 0.131 and factor wall 0.276 (error probability 50%). This implies that at a significance level <17% level and contact affect indoor radon concentration, while at the 30% significance level, all three factors affect. These results, however, provide vague evidence on the null hypothesis, namely that the above factors actually affect. This is also indicated by the small value of MCP in reference to an accepted well value of 3 for two factors and 4 for 3 factors. Moreover, since R^{2} exhibited maximum value of 2.92, only a small percent of the total variance (<3%) can be described by a linear model of the three factors of Table 7. 
According to data presented sofar, no single factor or linear subset of factors, could describe sufficiently the variance of the analysed radon concentration data. This implies that a multivariate set of factors could be probably more adequate. Table 8 presents the unrotated four principal factor loadings together with the corresponding communalities according to principal component analysis, applied, however, to radon concentrations. It is very interesting that, although factor 1 is loaded to five factors (bold numbers), the remaining three are only loaded to one single independent factor each. More specifically, 16.2% of the total variance may be described by factor 2 loaded only to the construction year. 12.3% of the total variance could be attributable to factor 3 loaded mainly to the existence of basement and 10.9% to the existence of contact (factor 4). A very important finding of Table 8, however is that since the loadings of factor 1 to “Level L”, “Building Type” and “Construction Year Y” are negative in respect to the one of C, it is rational to accept that concentrations would increase as Level, Building Type and Year are decreased. According to Table 3, this implies that ground floor dwellings tend to present higher radon concentrations. This is rational, since the lower the floor, the higher is the contribution of soil’s exhalation in indoor radon. Also detached houses tend to present higher concentrations, since other types offer pathways for radon’s interchange between dwellings in contact. Significant is also that aged dwellings, especially those of the previous century, presented higher radon concentrations. The latter finding is also reinforced by the positive loading of the “Wall’s material W”, especially due to its rather high loading. Since higher wall values correspond to rock materials, it can be supported that higher concentrations are addressed in dwellings of the beginning of the twentieth century made of rocks. To some degree these results were supported by Table 7, since the dwelling’s “Level L” and building “Wall’s material W” were considered to be more significant compared to other factors. 
Conclusion 
The statistical methods employed in this paper, revealed significant tendencies on potential ways that certain factors may affect indoor radon concentrations in Greece. These methods were applied to a large dataset of the principal indoor radon largescale survey in Greece with passive dosemeters. The following important results were extracted: 
(1) Approximately 95% of the measured log (C) values followed normal distribution, i.e., the vast majority of C values followed the lognormal distribution. 
(2) About 0.1% of the investigated dwellings exhibited outlier radon concentrations. 
(3) The majority of log (C) values exhibit residuals consistent with C values between 10 – 120 Bq.m^{3}. 
(4) Under the view of MCP, only a combination of all factors except factor G (Ground Type) constitutes an acceptable subset that could minimise total bias. Hence under this view, no certain factor tendencies were found. 
(5) According to ANOVA, lower level dwellings presented higher meanannual indoor radon concentrations. 
(6) Ttest on ground and upper floor dwellings, showed that ground floor dwellings presented higher mean annual indoor radon concentrations. 
(7) ANOVA statistics revealed a tendency of higher concentrations of rock dwelling. 
(8) No single factor or linear subset of factors described sufficiently the variance of the analysed radon concentration data 
(9) According to PCA, detached houses tend to present higher concentrations and that aged dwellings, especially those of the previous century, presented higher mean annual indoor radon concentrations. 
(10) Significant differences were detected in results produced by some of the applied statistical methods. 
New methods and, most importantly, more passive measurements of mean annual indoor radon concentrations in Greece, will assist the future research in this field. It is important that these future measurements will account, at least, the presented factors and associated levels and, preferably, include additional surveyed factors. It should be noted however, that this work constitutes only a first approach of this type. Focused studies accounting the role of certain factors in controlled dwellings, or, active multilevel measurements, will improve the methods and provide new pathways for assessing the effects of certain factors in indoor radon accumulation in Greece. 
Acknowledgement 
This research has been co‐financed by the European Union (European Social Fund – ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) ‐ Research Funding Program: THALES Investing in knowledge society through the European Social Fund. 
References 

Table 1  Table 2  Table 3  Table 4 
Table 5  Table 6  Table 7  Table 8 
Figure 1 