Audrey W. Zhu^{1*} and Halton Pi^{2}  
^{1}Santa Monica High School 601 Pico Blvd., Santa Monica, CA 90405, USA  
^{2}Torrey Pines High School, 3710 Del Mar Heights Rd., San Diego, CA 92310, USA  
Corresponding Author :  Audrey W. Zhu Santa Monica High School 601 Pico Blvd. Santa Monica, CA 90405, USA Tel: +13103159823 Email: [email protected] 

Received February 28, 2014; Accepted April 11, 2014; Published April 18, 2014  
Citation: Zhu AW, Halton Pi (2014) A Method for Improving the Accuracy of Weather Forecasts Based on a Comprehensive Statistical Analysis of Historical Data for the Contiguous United States. J Climatol Weather Forecasting 2: 110. doi:10.4172/23322594.1000110  
Copyright: © 2014 Zhu AW, et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.  
Related article at Pubmed Scholar Google 
Visit for more related articles at Journal of Climatology & Weather Forecasting
Using historical weather forecast data downloaded from the National Oceanic and Atmospheric Administration’s [NOAA] National Weather Service Digital Library, we performed statistical analysis on the forecast accuracies of temperature, probability of precipitation, quantitative precipitation and wind speed. The major findings of this study are: (1) There are significant variations in forecast accuracies at different geographical locations in the United States; (2) The overall accuracies of 3day or longer temperature forecasts are similar in magnitude to the standard deviations of historical daily changes in temperature; (3) There are statistically significant biases in the forecasts of either large positive or negative changes in temperatures; (4) The observed probabilities of precipitation are significantly lower than forecasted probabilities for 2day or longer horizons; (5) On the average, the 3day or longer forecasts for quantitative precipitation tend to significantly under estimate actual amount for periods of heavy precipitation; and (6) Forecasters generally under predict wind speeds by a large margin for days when wind speeds exceed 20 mph. An improved weather forecast model can be constructed based on some of the empirical statistical parameters from this study.
Keywords  
Weather forecast; Statistical models  
Introduction  
Despite advances in computing and satellite technologies and improvements in various atmospheric models scientists use to predict weather [1], there are still significant uncertainties in weather forecasts. Almost everyone has his or her own anecdotal experiences about the unreliability of weather forecasts. Some past studies have focused on the accuracies of forecasting severe weather conditions, such as tropical cyclones [2,3], but there have been fewer studies focused on quantifying the uncertainties of general every day weather forecasts, or comparing forecast accuracies across different geographical regions within the United States [46].  
In this study, we make use of the historical forecast data from the National Oceanic and Atmospheric Administration’s see [7] National Weather Service Digital Library website see [8] to perform statistical analysis on the forecast accuracies of the most commonly watched weather forecasts: temperature, probability of precipitation, quantitative precipitation and wind speed. To compare geographical variations in forecast accuracies, we selected 60 geographical locations almost evenly spaced throughout the continental United States in terms of latitude and longitude^{1}.  
The main goal of this study is to focus on answering questions such as:  
1. Statistically, do temperature forecasts do a better job than simply using the previous day’s number? Or, in other words, are the uncertainties of forecasts smaller than normal daily fluctuations?  
2. When the forecast predicts a coming heat wave or calls for heavy rain, do they get it right, on the average, or do they overor underpredict their target statistically speaking? What are the standard deviations of observed vs. predicted?  
3. When the forecast calls for a 50% chance of precipitation, does it simply mean that they are not really sure whether it’s going to rain, or the weather model actually calculates a 50% probability? Does 50% chance of precipitation in Buffalo, NY mean the same thing as in Los Angeles, CA?  
4. Among the weather variables, wind speeds are probably the most difficult to predict. But how uncertain are wind speed forecasts? Do the forecasters generally over or under forecast wind speeds?  
5. Are there variations in weather forecast accuracies across different parts of the continental United States? Is it harder to predict rain or large change in temperature in the Great Lakes areas as compare to the Coastal Northeast?  
Even though some of our analysis will be focusing on the occurrences of heat waves of over 10 degree temperature change or heavy precipitation of more than half an inch, the focus of our studies can still be characterized as high probability weather events. This is in contrast to studies of rare weather events such as tornados that could cause multibillion dollar damages, or a flood with a 100year recurrence interval. In principle one can also use the weather forecast database from the NWS Digital Library to study rare weather events. But since rare weather events happen at very low frequency, using general every day forecast database is less efficient.  
Secondly, accessing forecast accuracies for rare weather events can be intrinsically difficult as the science for modeling these rare events continues to improve over the years. It is hard to measure the accuracy of a continuously improving forecast model for a rare weather event that happens once every hundred years.  
For this study, we will use mean and standard deviation based statistical measures as opposed to composite scores for ranking purposes. We will give estimates for the sampling errors^{2} of the mean and standard deviation whenever appropriate. For example, for comparison purposes we will list the average daytime maximum temperature for different geographical locations, as well as their overall variations as measured by the standard deviations. For both of these measures, we will not list their sampling errors as the number of measurements is sufficiently large for this study. We will, however, give estimates of sampling errors for comparisons that have limited number of data points, e.g, The average forecast error for quantitative precipitation greater than 1.0 inches for Los Angeles, CA.  
In the next section, we will give a brief description of the structure and type of weather data that was used in this study, and the software that we use to decode and access the database. We follow that with sections that discuss the main results for temperatures, precipitation, and wind speeds. Finally, we offer some ideas for future work and some potential practical applications of our findings.  
Data Collection and Processing  
Weather data are typically stored in binary format known as the GRIB [Gridded Binary] format. This format is used by most of the world to store weather related data. It contains standardized header information that can be read by a variety of software packages^{3}. The web site we used to download historical weather forecasts are maintained by the National Environmental Satellite, Data, and Information Services’ National Climate Data Center see [9] under the US Department of Commerce. The data set is extremely comprehensive and covers time periods from 2004 to present. It is freely available to anyone who is interested in studying weather data, but one generally needs a very high speed internet connection to be able to download the data in any reasonable amount of time.  
The files are identified primarily by their WMO [World Meteorological Organization] header codes. For example, files that start with ‘YG’ will contain daytime maximum temperature, and ‘YD’ files will have data for probability of precipitation. Lists of the some of these WMO codes are listed in Table 1.  
In this study, we will analyze data from the time period of January 2009 to June 2013. Since each day is one file, for each WMO code, there are approximately 1,600 files to download. The file sizes vary depending on the type of weather data, but they range between 50 MB to 200 MB each^{4}. The total amount of data eventually downloaded ads up to approximately 2TB^{5}. To read the binary formatted GRIB data, we used the Degrib see [10,11] software package. Degrib works by probing the data for a particular geographical location using a pair of latitude and longitude for the location. In this study, we chose 60 geographical locations roughly evenly spaced throughout the continental United States^{6}. Since GRIB data are structured around the latitude and longitude grid [using bilinear interpolations between points^{7}], ideally one should select a grid with fixed spacing between grid points, but for this study, we will use mostly metropolitan area for the purposes of easy name recognitions.  
For post processing of the GRIB data, we extract the relevant weather measure for each geographical location for the entire 2009 2013 time period and put them all in one data file. For example, we would extract daytime maximum temperature data for Los Angeles, CA from the approximately 1,600 “YG” files into one single file. This makes processing all 60 geographical locations easier and faster as one does not need to go through a large GRIB file for each geographical location, as well as saving time from having to access large number of files.  
Daytime Maximum Temperature  
In this part, the first sets of questions we will attempt to answer are: What is magnitude of the overall error in daytime maximum temperature forecasts? Are there statistically significant geographical differences? If so, would those differences correlate to the location’s overall volatilities of temperature changes?  
For 6day forecast results, Table 2 ranks overall forecast errors as the standard deviation of actual minus forecast . This analysis shows that the forecast errors in most cases are similar in magnitude to the daytoday change in actual temperature. At first glance, this might seem to indicate that the 6day forecasts do not seem to provide much value in terms of forecasting, as someone randomly make up some numbers would give you the same standard errors as daytoday fluctuations. What we need to keep in mind here, however, is that the results listed here are simply overall averages. They do not imply that the forecasters are completely incapable of forecasting anything 6 days ahead. It simply means that the forecast errors are fairly large, on the average. In other words, it could mean that forecasters can be off by the same error, for both normal days and days that have large changes in temperature.  
In Table 2, we also list the ratios of the forecast errors to the standard deviations of daytoday changes. We see that there are fairly large variations in terms of this ratio for the geographical locations listed^{8}. The highest being that of Santa Monica, CA and many locations in California, and the lowest are generally those that have fairly large daytoday changes in temperatures. This could mean that predicting daytime maximum temperatures in many parts of California (especially those close to the coast) could be just random guesses for most days as the daytoday fluctuations are relatively small, and the 6day horizon is simply too long to make precise forecasts. On the other hand, large daytoday temperatures changes for many areas could most likely be due to tangible weather events, and forecasters might have an easier job in those cases.  
The mean of actual minus forecast listed in Table 2 are mostly zeros within the sampling errors, indicating that there are no apparent biases, either high or low, for the majority of the locations considered. But some areas do show statistically consistent biases in actual minus forecast. For example, forecasters tend to overforecast daytime high temperature by an average of about 0.8°F for New York City, and underforecast by 0.7°F in Las Cruces, NM.  
Not surprising, 3day forecasts are more accurate than 6day forecasts, as shown in Table 3. The average errors are still fairly large, however, e.g., at 5.2°F for Wichita, KS, and 4.5°F for Dallas, TX. The improvements are relatively modest for many West Coast locations in terms of absolute temperature, but the percent changes relative to daytoday temperature fluctuations are similar overall [about 40% improvements].  
Having looked at overall average errors, we examine next how well forecasters are able to forecast large swings in daytime high temperatures. We first look at days when the actual temperature at the forecast date differs by a large amount from that of the previous day^{9}. In other words, for 3day forecasts, we will look at T[3] f[3] given that Ta[3]Ta[2] ≥ C, or Ta[3]Ta[2]≤  C. Here the subscript a means actual, f means forecasted, and C could be a large cutoff constant, say 10°F. Table 4 shows the results as ranked by the mean of the actual minus forecasted. Here the results clearly indicate that for most of the areas, the forecasters significantly underforecast the daytime high if the temperature experienced a surge of at least 10°F from the previous day. For example, the forecasters underpredicted daytime high on the average by 4.2°F for New York City. There are strong variations among the 60 geographical regions considered, but with no apparent correlations to the variations shown in Table 3. For some locations, the averages of the forecast errors are similar in magnitude to the standard deviations of the forecast errors.  
Probabilities of Precipitation  
In this section, we look at results of actual probabilities of precipitation vs. forecasted probabilities. Here the actual probability is calculated by taking the total number of days that forecasters predicted rain with a certain probability, and divide that by the actual number of days that actually experienced precipitation. For example, let’s say we want to assess what is the actual realized probability of 20% chance of rain. We first find the total number of days that the forecasters predicted a 20% chance of rain 3 days ahead. We then divide that by the total number of days that actually rained on the dates that the forecasters referenced to [in their 3day forecasts].  
Table 5 shows the actual probabilities^{10} of precipitation for both 20%50% and 50%70% chances of 3day forecasted precipitation. The results are ranked by their actual probabilities for the 20%50% chance forecasts^{11}. The results in this table seem to indicate that actual probabilities of precipitation can be significantly lower than forecasted, especially for probabilities below 50%. Additionally, the geographical differences in actual realized probabilities can differ by as much as an order of magnitude. The variations in actual probabilities for the 50% 70% range seem to be less than that for the 20%50% range, but the differences are still significant. In other words, when the 3day forecast calls for a chance of precipitation in the range of 20%50%^{12}, the actual chance of precipitation is much lower than the forecasts, on the average. For example, for San Jose, CA, the average actual probability is only 4.8%, significantly lower than the lower bound value of the forecasted, which is 20%. But the average actual probability goes up to 26% if the forecasted is between 50% to 70%. This is still much lower than the lower bound of the forecasted, but the difference here is somewhat smaller.  
For comparisons, actual probabilities corresponding to shorter horizon forecast is shown in Table 6. The results seem to indicate that realized probabilities are noticeably smaller for 1day forecasts as compared to that of 3day forecasts for same 50%70% range of forecasted probabilities [with a higher dispersion of geographical variations]. In other words, 3day forecasts seem to be more accurate than 1day forecasts if the forecasted chance falls between 50%70%.  
Quantitative Precipitation  
Quantitative precipitation refers to the amount of total cumulative precipitation observed [or in the case of forecast, predicted] in a 24 hrs period. It is typically measured in inches. For most geographical locations, the average quantitative precipitation is around a tenth of an inch. In this section, we present an analysis of forecasting accuracies for moderate to heavy precipitation^{13}.  
Table 7 shows the comparisons between actual and 3day forecasted 24hrs quantitative precipitation for values between 0.25 to 2.0 inches. The results appear to indicate that the forecasters consistently under predict quantitative precipitation for relatively heavy precipitation. Again, significant differences exist among the geographical locations considered.  
Wind Speeds  
The last weather measure we will look at is the wind speed forecast. Among the weather measures discussed so far in this report, wind speed (along with wind direction) is probably the most volatile, or the most unpredictable. The actual and forecast comparisons for wind speeds are performed in more granular time intervals as compared to most other weather measures discussed so far. This could, in principle, introduce colinearity in the data series, as stronger winds tend to persist for periods of 24 hours or more, and the same goes for periods of calmer winds. This should not affect our estimates of the mean and the standard deviations, but it might introduce a bias in the estimation of the sampling error^{14}.  
Table 8 shows results for actual minus 2day forecasted wind speeds if the actual speeds are in the range of 2030 mph. As comparisons, the table also lists the average daily wind speeds and actual minus 2day forecasted for all values of wind speeds. We see that on the average, the means of forecast errors are fairly close to zeros if one considers all magnitudes of wind speeds. But for speeds between 2030 mph^{15}, there seem to be significant under forecasts for almost half of the regions considered here. The strong variations among the regions in terms of their forecast biases do not seem to correlate strongly with the average wind speeds [or its standard deviations^{16}]. For example, for ‘Windy City’ Chicago, IL, even though the average daily wind speeds are at a relatively high 8.3 mph, its average forecast error for high speed winds is at a modest 4.4 mph. But for Los Angeles, CA, with its average daily wind speed of 3.7 mph, the error in forecasting strong winds is at 9.2 mph, on the average.  
Method for Improving the Accuracy of Weather Forecasts  
Using historical weather forecasts data from the National Weather Service Digital Library, we performed statistical analysis on the forecast accuracies of daytime high temperature, probability of precipitation, quantitative precipitation, and wind speeds^{17} The results suggest that forecast errors, as measured by both its mean and standard deviations, vary widely among a representative set of 60 geographical locations selected for this study. Additionally, the forecasters generally do a poor job predicting relatively severe weather conditions, such as large temperature swings, heavy rains or strong winds. Also, we find that actual chances of rain are generally much smaller than forecasted.  
The results from the above analyses can be used to design a method for improving the accuracies of weather forecasts based on the statistical parameters obtained from the various analysis performed. The analysis of historical data described above can be performed for any location for which historical weather forecasts data is available, and can be performed to map out the entire Continental United States with finer latitude longitude resolutions. The method will also include parameters that depend on seasonality, horizon period of the forecast (e.g. 5day forecast, 10day forecast, etc.), and the absolute value of the weather variable. The method can be used to provide corrections to weather forecasts based on the geographical location, the horizon period of forecast, and the magnitude of the forecasted variable. As demonstrated in this report, for certain weather conditions, these corrections can be substantial for some geographical locations.  
An embodiment of the present invention provides a method for generating more accurate weather forecasts by correcting standard current weather forecasts using correction values obtained from historical data, as schematically illustrated in Figure 1. First, for a given geographic location, a statistical analysis of historical weather data, both forecast and actual, is performed as described in detail above (step S11). Although not shown in the data presented above, the analysis is optionally performed on a seasonal basis; in other words, the analysis described above can be performed separately for data from each seasonal period of the year, such as each month of the year. From the statistical analysis, historical forecast error of each weather variable is calculated. The weather variables may include, for example, daytime maximum temperature, nighttime minimum temperature, probability of precipitation, quantitative precipitation, snowfall amount, wind direction, wind speed, relative humidity, etc. The historical forecast error is in the form of “actual minus forecast” values, i.e. the difference between the actual value and forecast value of each weather variable.  
The current weather forecast for that geographic location is then obtained (step S12). The “current forecast” refers to the standard forecast that has not been corrected using historical data. Current forecast data may be obtained from, for example, NOAA or other weather forecast services. Then, for some or all weather variables in the current weather forecast, correction terms, which are the historical “actual minus forecast” value calculated in step S11, are added to the respective forecast values provided by the current forecast, to calculated the corrected forecasts for these weather variables (step S13). Each correction term may have a positive or negative value. The correction values depend on the geographical location and horizon period of the forecast, and may also depend on season.  
For example, for most locations, for 6day forecast of maximum daytime temperature, the mean of actual minus forecast value see Table 2 is less than 1 degree and therefore no correction term needs to be added.  
In the case of probability of precipitation, on the other hand, a correction can be made to improve the accuracy of the forecast. For example, for Los Angeles, CA, if the currently forecasted probability of precipitation of a 3day forecast falls within a range of 20%50%, the forecasted probability should be corrected downwards significantly, e.g. the corrected probabilities of precipitation should be below 10% based on the result shown in Table 5. Analysis of historical data in the manner described in Section IV above can be performed for finer ranges of forecasted probabilities of precipitation (e.g., (20%, 30%), (30%, 40%), (40%, 50%)) to obtain the corresponding actual probabilities and the actualminusforecast values, which are then used to correct the forecast values.  
As another example, corrections in quantitative precipitation can be meaningful for some locations. For example, for Seattle, WA, if a 3day forecasted 24hrs quantitative precipitation is 0.4 inches, then a correction of 0.27 inches will be added see Table 7, , resulting in a corrected 3day forecasted 24hrs quantitative precipitation of 0.67 inches.  
As yet another example, for actual vs. 2day forecasted wind speeds see Table 8, a correction term may be added when the forecasted wind speed is relatively high. For example, for Los Angeles, analysis of historical data shows that when the actual wind speed is 2030 mph, the forecasts tend to underforecast it by about 9 mph. Therefore, a correction term can be added when the forecasted wind is over a certain value. Further, additional analysis of the historical data can be performed to calculate the actualminusforecast wind speed value when the forecast value is within various ranges [1020 mph, 2030 mph, etc] so that appropriate correction values can be added based on the forecasted value.  
The corrected weather forecast values calculated in step S13 are provided to a user (step S14). This step may be implemented in a webbased system such as a weather forecast website.  
In addition, in the analysis step S11, statistical confidence levels for the various forecasts may be calculated (optional). As shown in our studies, the standard deviations of forecast errors can depend strongly on locations, horizon period of forecast and the magnitude of the weather variable. Adding confidence levels to certain weather forecasts [for certain locations] has the potential for improving the quality of the forecasts. In one implementation, standard deviations of the “actual minus forecast” values are used as an indication of the confidence level. For example, for a 6day forecast of daytime maximum temperature, the standard deviation of actual minus forecast, see Table 2 may be used as the confidence level. Take Los Angeles as an example, =4.96 degrees, meaning that for 68% of the time, the actual temperature will be within ± 5 degrees (1 sigma) of the forecast value, and for 95% of the time, the actual temperature will be within ± 10 degrees (2 sigma) of the forecast value.  
In an alternative implementation, a reliability score (similar to a FICO score) can be devised in place of the traditional statistics based confidence level instead.  
The confidence levels for the weather forecasts calculated in step S11 are also provided to the user in step S14 (optional).  
The methods described above may be implemented as a webbased weather forecast service.  
^{1}We selected mostly metropolitan cities as opposed to fixed distance grid points in order to make the geographical locations more recognizable.  
^{2}We use σ/ √N as an approximation for the sampling error of the mean, σ/ and √2N as an approximation for the sampling error of the standard deviation  
^{3}This is similar to the html formatted web content, which adheres to a set of rules so that they can be processed by a variety of browsers, except the weather data are in binary form.  
^{4}Wind related data files typically are the largest as the forecasts usually target more granular time periods due to the volatile nature of wind direction and speed.  
^{5}It took approximately 3 weeks to download all the data. To maximize throughput, we used an 8 CPU desktop machine, which allowed us to run 8 simultaneous download jobs in parallel.  
^{6}For most locations, we used the latitudes and longitudes of local weather stations if available.  
^{7}Although one does have the option of using nearest neighbor interpolation method as well  
^{8}Many of the results can be requested in Excel format for users to perform their own data manipulations. But for space considerations in this report, we won’t list a separate ranking table for each column in the same data set.  
^{9}Due to limitations on the total number of pages for this report, some results of this study have been omitted here. The full result set will be presented in another report.  
^{10}The full result set will be presented in another report.  
^{11}Which can also be viewed as the realized, or the measured probabilities.  
^{12}Due to size limitations on this report, we chose to include a set of unranked results along with a set of ranked results.  
^{13}There are considerable differences between regions, however. But due to space limitations of this report, a more comprehensive analysis of quantitative precipitation will be presented in a separate report.  
^{14}This is because sampling errors are typically estimated under the assumption of independence. But colinearity will tend to introduce correlations among the data (typically in consecutive order) in the time series; therefore it has the tendency to underestimate the true sampling error.  
^{15}It is possible that extremely strong winds could skew the mean of forecasted errors for some regions; therefore we chose to work with a speed range between 2030 mph here as supposed to 20 mph and up.  
^{16}Not shown here in Table 8.  
^{17}It is worth noting that the statistical models presented in this study cannot be substitutes for weather forecast models based on atmospheric sciences. One should expect that advances in atmospheric sciences will result in better and more accurate weather forecast models going forward. The method proposed in this paper can be set up using empirical weather data from a rolling time window of 4 years or so, as this time interval cannot be too short since some weather events such as heavy rains do not occur frequently in many geographical locations. Also, as mentioned earlier in this paper, methods proposed in this study cannot be easily applied to studies of forecast accuracies of extremely rare weather events such as floods that occur once every hundred years.  
References  

Table 1  Table 2  Table 3  Table 4 
Table 5  Table 6  Table 7  Table 8 
Figure 1 