Brambilla G^{1}, Gallo V^{1} and Zambon G^{2*}  
^{1}CNRInstitute of Acoustics and Sensors “O.M. Corbino”, 00133 Rome, Italy  
^{2}University of Milano Bicocca DISAT, 20126 Milan, Italy  
Corresponding Author :  Zambon G University of Milano Bicocca DISAT 20126 Milan, Italy Tel: +3902 64482744 Email: [email protected] 
Received: August 05, 2015; Accepted: October 20, 2015; Published: October 26, 2015  
Citation: Brambilla G, Gallo V, Zambon G (2015) Prediction of Accuracy of Temporal Sampling Applied to NonUrban Road Traffic Noise. J Pollut Eff Cont 3:147. doi:10.4172/23754397.1000147  
Copyright: © 2015 Brambilla G, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.  
Related article at Pubmed, Scholar Google 
Visit for more related articles at Journal of Pollution Effects & Control
The legislation on road traffic noise often requires that acoustic descriptors be determined on a medium or long term. However, such a duration is not feasible for attended monitoring, and therefore, temporal samplings are often applied to save time and resources. However, the values of the noise descriptors estimated by those measured at the sampling times are affected by uncertainty, the amount of which depends on the ratio between the total measurement time and that of the estimate, as well as on the variability of the noise immission at the measurement point.This paper describes the results obtained from the statistical analysis performed on a large set of acoustic data collected at 80 sites along the nonurban road network in the Lombardia region (Italy). The aim of the analysis is to determine the accuracy of two procedures to estimate: i) the daytime (06 to 22 h) Aweighted equivalent level LAeqd and the nighttime (22 to 06 h) Aweighted equivalent level LAeqn from the hourly Aweighted equivalent level LAeqh; and ii) the LAeqh level from the LAeqt measured continuously for a shorter time interval t. The proposed procedures enable to predict the accuracy of both the above estimates; the second one, that is the LAeqh level from the LAeqt, resulted to be greater with increasing of hourly traffic flow and measurement time. Example of the applications of the two procedures is also described.
Keywords 
Road traffic noise; Noise monitoring; Temporal sampling; Accuracy 
Introduction 
Because road traffic noise is a random phenomenon, the relevant legislation and standards often require the determination of the acoustic descriptors either on a medium or long term. For instance, the dayeveningnight level L_{den} introduced by the European Directive 2002/49/EC (2002) [1], even if referred to 24 h, should be representative of the annual period, and, in Italy, the current legislation requires that the road traffic noise monitoring lasts at least one week [2]. 
The current instrumentation enables measurements over a long time, as it can store and transmit a large amount of data. However, such duration is not feasible for monitoring attended by an operator and, therefore, requires the timeconsuming post processing validation of the acquired data to eliminate all of the sound events that are not associated with road traffic noise. In addition, the need of saving resources and improving the spatial sampling resolution where required often lead to use temporal sampling procedures [3]. 
These procedures offer the advantage of making the attended monitoring feasible and, therefore, enable to eliminate the data validation. However, the values of the noise descriptors for a medium or long term estimated by those measured at the sampling time are affected by uncertainty, the amount of which depends on the ratio between the measurement time and the medium or long term, as well as on the variability of the noise immission at the measurement point. Several studies on this aspect are available in the literature. For instance, BordoneSacerdote et al. [4] described a simple approximate criterion to calculate the uncertainty in evaluating the noise level due to N vehicles per hour, all of the same type, moving with constant speed on one line and direction. Alberola et al. [5] investigated the statistical variability of 2 week’s noise recordings at 50 locations in residential areas affected mainly by road traffic noise. The observed relationships between variability and either logarithmic or arithmetic mean L_{Aeq} over the time periods investigated may be of assistance when estimating the noise level variability and the uncertainty associated with a noise measurement affected by road traffic or other environmental noise sources. Theoretical approaches by Makarewicz et al. [6] proposed that the longterm average sound level L_{AeqT} can be approximated by a few, m, shortterm, τ, average sound levels, L_{Aeqτ}, so that mτ << T, and the uncertainty of such approximation should be calculated by nonlinear uncertainty of Λ_{Αεθτ} for m<10. The analysis of 5 years of continuous noise measurements carried out at one site in Valencia yielded Gaja et al. [7] to conclude that a random day strategy gives a more accurate estimate of the annual equivalent level from the 24h noise level than a consecutive day’s strategy. Other things being equal, further studies, such as Brambilla et al. [8], confirmed that random sampling is more efficient than continuous one. Bellucci et al. [9] analyzed the noise data collected at 10 sites along non urban roads to evaluate the accuracy of 10 and 20 minute continuous sampling in the estimate of the hourly L_{Aeqh} and the day time (06 ÷ 22 h) L_{Aeqh} and nighttime (22 ÷ 06 h) L_{Aeqn} values. For vehicle passbys during the measurement time greater than 100, the accuracy in the estimate of L_{Aeqh} from the measured L_{Aeq} was observed to be within ± 1 dB for both the sampling time. Brocolini et al. [10] analyzed acoustic measurements carried out continuously during three months in Paris at six locations, considering samples of 5min, 10min, 15min, 20min, 30min and 1h duration. The results showed that at least 10min sampling duration is necessary to discriminate among homogeneous time periods. 
Predicting the accuracy of the estimated values of L_{Aeq} is important because this accuracy can have a large influence on the compliance with the limits required by legislation and standards and the corresponding costs of mitigation actions. 
Dealing with the above issue, this paper presents a practical approach for determining and predicting the above accuracy, and the results obtained from the statistical analysis performed on a large set of acoustic data collected from continuous monitoring during weekdays in 80 sites alongside the nonurban road network in the Lombardia region (Italy) are described. The roads have different layouts: from the widest two carriage ways with three lanes for each direction to the narrowest one carriageway with one lane for each direction. 
The aim of the analysis has been twofold, that is to determine the accuracy of: 
• the estimate of daytime (06 to 22 h) Aweighted equivalent level L_{Aeqd} and nighttime (22 to 06 h) Aweighted equivalent level L_{Aeqn} from the 24hr hourly pattern of Aweighted equivalent level L_{Aeqh}; 
• the estimate of hourly L_{Aeqh} from L_{Aeqt} measured continuously for different shorter durations, namely t=5, 10, 15, 20 and 30 minutes. 
The analysis was performed considering the hourly traffic flow too. 
Acoustic Database and Processing 
The acoustic monitoring carried out on weekdays in the 80 sites alongside the network of nonurban roads in the Lombardia region in the years 20002006 has provided a large database containing not only the Aweighted equivalent level L_{Aeq} measured at a sampling rate of 1 minute (L_{Aeq1m}) but also, at 15 sites, the hourly traffic flow for 24 hours. At another 35 sites the traffic flow was available for 1 hour during the daytime. The traffic was always free flowing during the monitoring and the microphone was located at 3 up to 60 m from kerbside, as already described in Zambon et al. [11]. 
The L_{Aeq1m} data have been pooled to obtain the corresponding values L_{Aeqt} at the times t of 5, 10, 15, 20, 30 and 60 minutes, as well as the daytime L_{Aeqd} and nighttime L_{Aeqn} levels. 
The ranges of the daytime L_{Aeqd} and nighttime L_{Aeqn} in the 80 sites were very wide, as reported in Table 1, and the distributions of the hourly L_{Aeqh} levels, with a bin width of 1.5 dB, are shown in Figure 1. The difference between the mean values of day and night L_{Aeqh} distributions was 7.0 dB. (Table 1) (Figure 1). 
At the 21 sites where the monitoring was performed for longer than 24 hr the median of the corresponding L_{Aeqh} values for each ith hour was considered. This led to 59 (8021) profiles of hourly L_{Aeqh} available for the statistical analysis. The median was preferred to the mean value because the former is less influenced by outliers. This data pooling avoided that 24h L_{Aeqh} profiles measured at the same road but on different days could be allocated to different groups by the subsequent cluster analysis. 
Because the measurements were performed in various environmental setups, a direct comparison among the 24h profiles of the hourly L_{Aeqh} was not meaningful. For this reason, and to perform further statistical analysis of the data, each i^{th} value of hourly L_{Aeqh} in the j^{th} temporal series was referred to the corresponding daytime level, L_{Aeqdj}, and the difference δ_{ij} was considered: 
δ_{ij}=L_{Aeqhij}L_{Aeqdj} [dB](i=1, ……., 24; j=1, ……., 59) (1) 
as shown in the example in Figure 2. 
The reference to the daytime L_{Aeqd} was chosen because this descriptor is more often available than the night time L_{Aeqn}. However, the methodology of the procedures described in the following for the estimate of daytime L_{Aeqd} can be also applied to develop similar procedures for estimating the nighttime L_{Aeqn}, providing that each i^{th} value of hourly L_{Aeqh} in the j^{th} temporal series is referred to the corresponding nighttime level, L_{Aeqnj}. Table 2 reports the distribution of the 24h profiles of the hourly L_{Aeqh} available for the statistical analysis. 
The 24h profiles were grouped according to the day of monitoring (Monday to Friday). In addition, the unsupervised technique of clustering was used to group together profiles which are “close” to one another in a multidimensional feature space, to uncover some inherent structure of the data. Various clustering algorithms were used, namely the hierarchical agglomeration using the Ward method [12], the Kmeans using the Hartigan and Wong algorithm [13], the partitioning around medoids by Kaufman and Rousseeuw [14] and the modelbased method. The results of these algorithms were compared and the most appropriate number of clusters for the data, a compromise between satisfactory discrimination and the need of limited number of clusters, was chosen. The range of solutions for clustering k was set from five groups (for a straightforward comparison with the categorization according to the day of monitoring) to two groups, which corresponds with the minimal discrimination. The Euclidean distance was chosen as the metric of the distance among observations. 
The statistical software R (an opensource programming environment for data analysis, graphics and statistical computing) was applied for the above clustering and the package “clValid” by Brock et al. [1517] was used to validate the results. For such validation, three features of the cluster partitions were considered, namely, compactness, connectedness, and separation. Connectedness relates to the extent to which observations are placed in the same cluster and is measured by the connectivity [18]. The connectivity has a value between zero and ∞ and should be minimized. Compactness assesses cluster homogeneity, usually by examining the intracluster variance, while separation quantifies the degree of separation between clusters (usually by measuring the distance between cluster centroids). Because compactness and separation demonstrate opposing trends (compactness increases with the number of clusters but separation decreases), popular methods combine the two measures into a single score, such as the Dunn index [19] and silhouette width [20]. The Dunn index has a value between zero and ∞ and should be maximized. The silhouette width lies in the interval (1, 1) and should be maximized. 
Considering the estimate of the hourly L_{Aeqh} from the L_{Aeqt} level measured continuously for a shorter time t: 
t=m·M [s]with 0 < m < 1 (2) 
where M=3600 s, the L_{Aeqt} values referring to the measurement times t of 5, 10, 15, 20 and 30 minutes were compared with the corresponding hourly L_{Aeqh} to determine the difference: 
ε_{τ}=L_{Aeqt}  L_{Aeqh} [dB] (3) 
Thus, with the assumption that the estimated L_{Aeqh} is equal to the measured L_{Aeqt}, the above difference represents the error ε_{t} of such estimate. The errors ε_{τ} were analyzed as function of the standard deviation of the L_{Aeqt} belonging to the relevant hour and the hourly traffic flow, as well as in terms of the probability P_{tE} that the accuracy of the hourly L_{Aeqh} estimate from L_{Aeqt} is within a specific interval E, namely ± 0.5 and ± 1.0 dB with an interval width of 1 and 2 dB respectively. The value of probability P_{tE} was obtained by the number of measurements within the selected interval divided by the total number of measurements. 
The available data sets for the above analyses are reported in Table 3. 
Results and Discussion 
The main results of the estimate of the daytime L_{Aeqd} from the hourly L_{Aeqh} and those of the estimate of the hourly L_{Aeqh} from the L_{Aeqt} values measured continuously for shorter time t are described separately. 
Estimate of daytime L_{Aeqd} from the hourly L_{Aeqh} 
Figure 3 shows the 24h profiles of the average δ_{ij} for each weekday from Monday to Friday. By this data grouping overlaps of the profiles occur very often, especially during the day period from 06 to 22 h. For the night period (22 to 06 h) the highest and lowest average profiles correspond to Friday and Monday, respectively. 
A different classification was obtained by clustering. Table 4 summarizes the output of the validation of the results obtained by the various clustering methods in terms of the optimal scores observed for the connectivity, the Dunn index and the silhouette width. 
After the analysis of the detailed results for each clustering method, the two groups obtained by the Kmeans were considered to be a reasonable solution, also because the corresponding values of the Dunn index and the connectivity were not too much different from the optimal scores (0.13 and 18.55 respectively). Figure 4 shows the results of the multidimensional scaling (MDS) applied to the data to provide a visual representation of the pattern of proximities among the data. 
The discrimination between the two clusters is rather good; the centroids C1 and C2 are reported by stars in the plot. Cluster 1 and 2 are formed by 33 and 26 profiles respectively and their correspondence (in percentage) with the categorization based on weekdays is reported in Table 5. For each day the 24h profiles are not too much unevenly splitted into the two clusters. Cluster 2 groups the majority of profiles observed on Monday, whereas Cluster 1 is formed by the majority of profiles of the other weekdays. 
The average profiles δ_{ik} for each cluster and the standard error of the mean at 95% confidence interval are shown in Figure 5. Because the distributions of data were not normal for some hours, the mean and its confidence intervals were calculated using the bootstrap method [21] considering 1000 samples with replication. 
The hourly intervals with significant differences between the two average profiles at the confidence level of 95% were identified by the MannWhitney test and are listed in Table 6 with the corresponding significance value. The best discrimination between the clusters occurs during the nighttime (2206 h). In the 0719 h period, the average profile of cluster 1 has very small fluctuations around the L_{Aeqd}, whereas that of cluster 2 shows larger fluctuations, but still within 1 dB. The median value of the difference L_{Aeqd} – L_{Aeqn}, together with the standard deviation value given within ( ), is also reported in Figure 5: cluster 1 show a value 1.7 dB lower than that observed for cluster 2. As the noise emission of the road under consideration is not known “a priori”, additional information linked to such emission, i.e. traffic flow is necessary for the selection of the average profile most appropriate for the road itself. For this purpose, the average daily traffic flows (ADT) and their 95% confidence intervals for all the roads belonging to each cluster were calculated by the bootstrap method. The results are reported in Table 7. 
The boxplot of the ADT values given in Figure 6 shows a clear overlap of the data associated to the two clusters which leads to uncertainty in the selection of the appropriate profile. Thus, this parameter is not suitable for the above purpose. To overcome this problem a deeper analysis of traffic flows was performed on hourly basis. The MannWhitney test, which was applied to the hourly traffic flows of the roads according to their cluster membership, showed that the differences among means were not different at the 95% confidence level for the period between 10 and 16 h, as shown in Figure 7, where the hourly average values of traffic flow and the corresponding 95% confidence interval are reported. 
Thus, the hourly traffic flow is suitable for the appropriate selection of the cluster average profile, providing that it is not measured in the 1016 h period. After all, traffic flow data are usually available for rush hours, which usually are outside the overlapping period, as shown in Figure 7. Cluster 1 includes the busiest roads. On the other hand, looking at the hourly cluster profiles and their corresponding 95% confidence intervals plotted in Figure 5, and zoomed in for the day period 719 h in Figure 8, the hourly intervals most suitable for the best accuracy in the L_{Aeqd} estimate are observed in the period from 12 to 16 h for both clusters. 
Estimate of hourly L_{Aeqh} from L_{Aeqt} measured for shorter time interval t 
The hourly L_{Aeqh} is not often measured continuously, whereas it is frequently estimated by the L_{Aeqt} values measured for a shorter time t according to the following relationship: 
(4) 
To evaluate the accuracy ετ of such an estimate, the differences εt in equation (3), calculated for the measurement times t of 5, 10, 15, 20 and 30 minutes, were determined for all the monitoring data considering their cluster memberships. Figure 9 shows the box plots of the obtained values of ε_{t} for each measurement time t and cluster. 
As expected, the amplitude of the error ε_{t} decreases with the increase of the measurement time t and the means and median tend to the null value. For each measurement time t, the mean closest to zero and smallest standard deviation are observed for cluster 1 which includes roads with the highest traffic flows. The KolmogorovSmirnov test showed that all the distributions were not normal at 95% significance level, with means and standard deviations reported in Table 8. 
In order to predict the error ε_{t} for each measurement time, the median absolute value of the error of the L_{Aeqh} estimate obtained from L_{Aeqt} was related to the hourly traffic flow. The traffic flow data were grouped in bins with a width of 100 vehicles/hour, and the median absolute value of the error in each bin was considered. The plots in Figure 10 report the traffic flow on the x axis on a log scale and show the regression lines for the five t measurement times and the two clusters. For a fixed hourly traffic flow, the errors observed for cluster 2 are greater than those for cluster 1 and, as expected, for both clusters the errors decrease with increasing of traffic flow and measurement time t. In addition, for a fixed measurement time t the regression line for cluster 2 is steeper than that for cluster 1. 
Table 9 reports the values of parameters A and B in the relationship used for the data interpolation, together with the adjusted Pearson’s correlation coefficient R^{2}. The last column gives the parameters obtained by interpolation of all the data, regardless their cluster membership. 
The differences in the accuracy of the L_{Aeqh} estimate obtained by the sampling times decrease with the increasing of the hourly traffic flow, as clearly shown in Figure 11 for all the data pooled together. The 30 minute sampling was taken as reference as it was the most accurate in the L_{Aeqh} estimate and the y axis reports the corresponding differences of the median absolute value of the error for the sampling times t=5, 10, 15 and 20 minutes; greater this difference lower the accuracy. It can be seen that above 4000 vehicles/hour the 10 minute sampling performs slightly better than the 15 minute one, but the former is more dependent on the traffic flow rather than the latter (slope of the regression line steeper). 
The results for the measurement time t=10 minutes were compared with those obtained in a previous similar study carried out along nonurban roads in the Lazio region in Italy [9]. As can be seen in Figure 12, the regression relationship of the data collected in the Lazio region is steeper than that obtained for the present study, but the differences are rather small and increase with increasing of hourly traffic flow. 
Dealing with the probability PtE that ετ is within a specific accuracy range, Figure 13 reports the regression lines obtained by fitting the data of the hourly traffic flow with those corresponding to the five measurement times for the accuracy range of ± 0.5 dB and for both the clusters. The values of regression parameters obtained from fitting are given in Table 10, together with the adjusted Pearson’s correlation coefficient R2. The last column gives the parameters obtained by interpolation of all the data, regardless their cluster membership. The probabilities Pt0.5 obtained for cluster 2 are lower than those for cluster 1 and these differences decrease with increasing of hourly traffic flow. 
The above mentioned probabilities P_{t0.5} were compared with those computed according to the following relationship proposed by BordoneSacerdote et al. [4]: 
(5) 
where T=3600 s, t is the measurement time [s], N the hourly traffic flow, n the vehicles counted in the measurement time t calculated by: 
(6) 
n1 and n2 the uncertainty in vehicle counting corresponding to the uncertainty E=± 0.5 dB in the noise level, that is: 
(7) 
(8) 
Equation 5 provides values P_{t0.5} with the assumption of N vehicles per hour, all of the same type, moving with constant speed on one line and direction. Figure 14 shows that the experimental data and their fitting (solid lines) are rather lower than the corresponding values provided by equation (5), reported by dashed lines. This is most likely due to the difference between real traffic conditions and those assumed for equation (5). 
Regarding the accuracy range of E= ± 1.0 dB, as expected higher probability P_{t1.0}, other factors being equal, were observed as shown in Figure 15, dealing with all the data, regardless their cluster membership. For instance, for t=15 minute at the hourly traffic flow of 1000 vehicles/ hour, widening the accuracy range from ± 0.5 to ± 1.0 dB increases the probability P_{15mE} by 26.1% (from 49.4 at ± 0.5 dB up to 75.5% at ± 1.0 dB). 
Example of application 
To illustrate the features of the procedures above described and the associated uncertainties in the estimation, let assume that the road traffic monitoring carried out continuously for 15 minutes in the interval 8:158:30 h gives L_{Aeq15}=64.0 dB(A) and traffic flow=250 vehicles during the 15 min measurement time. Assuming that the traffic flow is evenly distributed throughout the hourly interval from 8 to 9 h, the corresponding hourly traffic flow is 250×4=1000 vehicles/hour. Thus, the road can be associated with cluster 2 (see Figure 7) and for the 15 min measurement time, the median value of is estimated to be as shown in Figure 10 and Table 9 
(9) 
which can be assumed as standard uncertainty of the estimate of the hourly L_{Aeqh} from the measured L_{Aeq15m} for 15 minutes. For the 89 hourly intervals and cluster 2, Figure 8 provides the corresponding δ_{η} value: 
(10) 
Thus, the estimated value of the day L_{Aeqd} is equal to: 
(11) 
(12) 
With standard uncertainty of 0.65 dB. The combined uncertainty of the two procedures, under the simplifying hypothesis that they are uncorrelated, is calculated by: 
(13) 
Considering the coverage factor k=1.96 corresponding to the 95% confidence level, the estimated daytime L_{Aeqd} with the expanded uncertainty is as follows: 
(14) 
Considering the estimate of L_{Aeqn}, in addition to the similar procedures which can be developed as described for L_{Aeqd} estimate, a straightforward calculation can be based on the estimated value of L_{Aeqd}, considering the median value of the differences L_{Aeqd} – L_{Aeqn} and taking as standard uncertainty of such estimate the standard deviation of these differences. Thus, Figure 5 shows for cluster 2 the median value and standard deviation s as follows: 
(15) 
(16) 
Then: 
(17) 
with a standard uncertainty of 1.4 dB. 
However, the above uncertainty budgets are limited to the proposed procedures under the simplified hypothesis that these are uncorrelated; the standard uncertainties due to the other sources, at least that due to the instrumentation, should be considered, for instance as described by Craven et al. [22]. 
References 

Table 1  Table 2  Table 3  Table 4  Table 5 
Table 6  Table 7  Table 8  Table 9  Table 10 
Figure 1  Figure 2  Figure 3  Figure 4  Figure 5 
Figure 6  Figure 7  Figure 8  Figure 9  Figure 10 
Figure 11  Figure 12  Figure 13  Figure 14  Figure 15 