The National Oceanic and Atmospheric Administration (NOAA) divides the globe into 5°x5° grids for the monthly GHCN dataset described at NOAA [1
]. The dataset can be obtained at National Climatic Data Centre [2
]. The dataset contains monthly global values for each year from 1880 to 2014 as of this writing. The values in the dataset are anomalies times 100.
The anomalies must then be added to a referenced monthly mean value found at Global surface Temperature Anomalies from National Climatic Data Center [3
]. Reading the 5°x5° grids are described by NOAA as:
The data are formatted by year, month, latitude and longitude. There are twelve longitude grid values per line, so there are 6 lines (72/12=6) for each of the 36 latitude bands. Longitude values are written from 180W to 180E, and latitude values from 90N to 90S. Data for each month is preceded by a label containing the month and year of the gridded data.
for year=begin yr to end yr
for month=1 to 12
format(2i5) month, year
for ylat=1 to 36 (85-90N,80-85N,...,80-85S,85-90S)
format (12i5) 60-65E,65-70E,...,110-115E,115-120E
format (12i5) 120-125E,125-130E,...,170-175E,175-180E
Upon a precursory examination of the data it became obvious that there was a fundamental issue as shown in Figure 1. Calculating the Earth’s temperature when the cells with a value represent only 7% to 22% of the Earth’s total surface area is very troubling and the fact that the best percentage the dataset refers to is less than 25%. Unless the sites have been strategically located to adequately represent the Earth’s temperature regions and their values validated or the sites are random locations which they are explained by Michael E Mann [4
], then NOAA has created a grave statistical problem in which there is no valid statistical solution. IBM in their use of the statistical package SPSS said
“. . . variables that have more than 50% missing values are not imputed, nor are they used as predictors in imputation models”.
That is they don’t let you impute more than 50% missing and they are working with a random sample - which NOAA sites are not. Even though that, which has just been presented, is suffcient to statistically reject any data “supporting Global Warming/Cooling”, the examination of NCDC/ NOAA’s suggested procedures for calculating global temperature will be further examined in this study.
The “statistical” and numerical including modeling approaches used and reported in professional journals without anyone asking the fundamental questions about using statistical procedures on non-random samples [5
]. Furthermore the use of models to supply missing values is problematic at best since none of the models have had independent assessments and none of them have been validated by independent labs or corporations. And validation means more than duplicating historical data. It means proving that each subroutine does as advertised and that the entire model has suffcient data details without using forcing factors for different epochs. And this is what NASA does to fill-in the missing values according to Hanson et al. [6
]. Since the weather models used by local weather forecasters is usually accurate for no more than 5 days and climate is weather over time then using these models adds nothing other than bias into our temperature estimates.