Posted by: genezeien | December 28, 2009

Step by step: Is the climate changing?


Eugene Zeien, BS Applied Physics 1991  18 years experience in data analysis & IT support at the University of Iowa.


Having heard, so frequently, that the data underlying the current consensus was robustly supportive, I decided to take the time to find raw, unadjusted data and undertake some simple analyses.  I was quite surprised by the results.  I am posting those here for comments and suggestions, along with source code and links to the raw data.

The majority of climate researchers use the adjusted data in their work, because CRU, GISS, and NCDC make the adjusted data easily accessible and easy to use.  Since evidence has surfaced which suggests those three entities are not independent, all three adjustment methods may be suspect.  Let’s take a look.


Starting with a home PC, I installed Sun’s VirtualBox.  Next, since my experience is primarily Linux/Unix based, I installed Ubuntu 9.10 on a virtual 40Gb disk.  GHCN maintains a nice, though hard to find, ftp repository of raw climate station data, which was downloaded and decompressed.  The documentation in readme.txt was fairly easy to follow.

As a first pass, all the stations available in GHCN were included.  Temperature data was combined into a bin from all the stations within a geographical 1×1 degree sector.  This methodology allows for station relocation and overlap with minimal impact upon the results.   An annual temperature was computed for a sector with more than 240 daily readings within that year.  240 was a completely arbitrary decision, based upon the reasoning that the stations with high dropout rates are in the most inhospitable regions.


# Monthly data
#wget -Nr -o monthly.log
#find -name \*.Z -exec uncompress '{}' \;

# Daily data
wget -Nr --exclude-directories=grid -o daily.log

# Start with list of stations' coordinates & codes
# ghcnd-stations.txt:   List of stations and their metadata (e.g., coordinates)
# Contents look like:
# ID           Lat       Long      Elev   State     Name    GSNflag  HCNflag     WMOid
# AJ000037749  40.7000   47.7000   95.0   GEOKCAY                                37749

# Look up min & max temps from ${StationID}.dly files
# ex: "AJ000037749.dly"  line 1 looks like:
#            YearMo      Day1    Day2    Day3    ...
# AJ000037749193602PRCP    0  I   17  I    0  I    0  I    0  I    0  I    0  I   78  I    0  I    0  I    0  I    0  I    0  I   20  I    0  I    0  I    0  I    0  I    0  I    0  I    0  I   12  I    0  I    0  I    0  I    0  I    0  I    0  I    0  I-9999   -9999
# AJ000037749200910TMAX  240  S  224  S  252  S  256  S  250  S  250  S  239  S  212  S  220  S  221  S  225  S  239  S-9999     253  S  256  S  231  S  239  S  240  S  230  S  231  S  229  S  230  S  231  S-9999     243  S-9999     181  S  187  S  212  S  170  S  173  S

# Build list of Lat Lon pairs to check for data
awk '{printf("%d %d\n",$2,$3)}' ghcnd-stations.txt | sort -u | grep . > ${H}/Coords.txt

# Going to populate a 180x360 degree grid
# anything from X.0 - X.99 goes into X's box
echo -n "" > ${H}/TAvg_1x1.txt

let Lines=`cat ${H}/Coords.txt | wc -l`
let C=0
while [ $C -lt $Lines ] ; do
      let C++
      Coords=(`awk 'NR=='$C'{print $0; exit}' ${H}/Coords.txt`)
      Lat=${Coords[0]} ; Lon=${Coords[1]}
      ## select stations with this Lat & Lon
      stations=(`awk -v lat=$Lat -v lon=$Lon '\
                (lon+0.0<=$3)&&($3<lon+1.0){print $1}' ghcnd-stations.txt`)
      if [ ${#stations[@]} -gt 0 ] ; then
         printf '+ %s %s %s\n' $Lat $Lon ${#stations[@]}
         printf '  %s %s\n' $Lat $Lon
      ## Poll stations for years with temperature data.  -9999 is missing value.
      for S in ${stations[@]} ; do
         Tstations=(${Tstations[@]} `awk '$1~/TMAX/{print FILENAME; exit}' all/${S}.dly`)
      if [ ${#Tstations[@]} -eq 0 ] ; then
         let Lon++
      printf 'T %s %s %s\n' $Lat $Lon ${#Tstations[@]}
      ## Would be best to pair up TMIN & TMAX
      ## Parsing is FUN!
      ## Output file should be Lat Lon Year TAvg (divided by 10.0 to remove builtin T*10)
      ## Added a minimum N>240 to avoid regions with one temperature obs/year  😉
      awk '{St=substr($0,0,11);Yr=substr($0,12,4);Mo=substr($0,16,2);}\
           $1~/TMAX/{l=length($0); d=1; \
                     tmx[Yr]+=mx; Nx[Yr]++; \
                        D[Yr]++; Day[Yr,Mo,d]=1;\
           $1~/TMIN/{l=length($0); d=1; \
                  if((mn<100)&&(mn>(-100))){tmn[Yr]+=mn; Nn[Yr]++;}\
           END{for(i in tmx){\
                 if((tmn[i]!="")&&(D[i]>240)){ \
                    print "'$Lat'","'$Lon'",i,((tmx[i]/Nx[i])+(tmn[i]/Nn[i]))/2.0}\
               }}' ${Tstations[@]} | sort -n >> ${H}/TAvg_1x1.txt

awk '{Temp[$3]+=$4;n[$3]++}\
     END{for(i in Temp){print i,Temp[i]/n[i]}}' ${H}/TAvg_1x1.txt > ${H}/TAvg_Globe.txt

In order to minimize the effect of temperature stations appearing and dropping out, sectors were selected which had continuous annual temperature data from 1900 to 2009.  The data from the 613 sectors has a fairly strong sinusoidal pattern.  The poorly correlated linear trend is probably related to the point in the wave’s peaks and troughs where the data begins and ends.  Do not copy this graph, I have an idea…

Correlation 0.43 using a 30 year sine wave combined with a 0.3/century drop in temperature.

## This is a follow-up from the previous processing script.
## Sift out sector that are have measures for all of 1900-1999
awk '{Lat=$1; Lon=$2; Year=$3;\
         Coords[Lat,Lon]=1; T[Lat,Lon,Year]=$4;\
         for(C in Coords){\
            for(Y=1900; Y<=1999; Y++){\
               for(Y=1900; Y<=2009; Y++){Keep[C,Y]+=T[C,Y]; N[C,Y]++}\
         print "Year Celsius";\
         for(Y=1900; Y<=2009; Y++){\
            for(C in Coords){\
               if(N[C,Y]!=0){AnnualT[Y]+=(Keep[C,Y]/N[C,Y]); Nsec[Y]++}\
            print Y,AnnualT[Y]/Nsec[Y],Y,Nsec[Y];\
      }' ${H}/TAvg_1x1.txt > ${H}/Continuous_TAvg_Globe.txt

The reason 1900 was chosen is due to the increase in the number of stations just prior to the turn of the century. The next two graphs do not represent the 613 sectors used above. I will post source code.. tomorrow?

Intuitively, the monthly average temperature oscillates with the Northern seasons.

Having established a basic overview of what the raw data looks like, now it is time to look at the temperature anomalies.  Starting with the 1×1 degree sector data, those with continuous temperatures 1900-1999 were used to demonstrate the effect of transforming and averaging raw temperatures across the globe versus averaging anomalies(base 1961-1990). This graph illustrates the wild fluctuations of land temperature anomalies from one year to the next and the 11 year moving average.

## Get sectors with complete 1900-1999 data, compute anomalies (base 1961-1990)
## 11 yr average is easily done in spreadsheet, so that's not here
awk '{Lat=$1; Lon=$2; Year=$3;\
         Coords[Lat,Lon]=1; T[Lat,Lon,Year]=$4;\
         for(C in Coords){\
            for(Y=1900; Y<=1999; Y++){\
               for(Y=1900; Y<=2009; Y++){Keep[C,Y]+=T[C,Y]; N[C,Y]++}\
               for(Y=1961; Y<=1990; Y++){BaseSum[C]+=T[C,Y]; BaseN[C]++}\
         print "Year Celsius";\
         for(Y=1900; Y ${H}/Continuous_Anomaly_Globe.txt

How does an anomaly differ from real temperature minus a constant? Surprise! A constant was chosen that left the two plots offset by 0.1 C.

How does the temperature data go from the chaotic, variable state seen above to the relatively quiet plot posted by GISS:

Note the subtitle (Meteorological Stations).  Clearly, this does not include ocean data.  Other GISS graphs are available here.

How do the GISS computed anomalies compare with unadjusted anomalies?  The 1961-1990 mean from the GISS anomalies was 0.09, whereas the 1961-1990 mean from my anomalies was 0.00.  Clearly, the GISS data is adjusted further after the conversion to anomalies.

Clearly, something needs to be done to reduce the variance in the raw temperature data.  Let’s take a look at temperature within 10 degree latitude bands, i.e. latitude 80 is 75 to 84.99.  Using all sectors with 240 days of data within the year:

## First pass, average temp by latitude x10
awk '{if($1 < 0){Lat=int(($1-5)/10)}else{Lat=int(($1+5)/10)} \
      Temp[Lat,$3]+=$4;n[Lat,$3]++; Years[$3]=1; Lats[Lat]=1;}\
     END{printf("Year ");\
         for(L=90; L>=-90; L-=10){printf("Lat=%d ",L);}\
         for(Y=1900; Y<=2009; Y++){\
           printf("%d ",Y);\
           for(L=9; L>=-9; L-=1){\
              printf("%f ",Temp[L,Y]/(n[L,Y]*1.0));\
     }' ${H}/TAvg_1x1.txt > ${H}/TAvg_byLat_Globe.txt

Two surprises:  there was no data in the range -45 to -84.99 latitude, and none above 80 latitude.  The entire Lat=80 band is data from +75 to +79.99.  At last, a warming trend has been found.  Perhaps this graph is easier to read.  Latitudes with no data have been eliminated.  The legend has been rearranged with equatorial latitudes at the top, near their temperature plots.

The Arctic trend will be considered in depth here

UHI effects will be considered as well.  This one is too good to hide, greater NYC area versus 40N latitude stations and 35N to 45N stations.

## Gather up all the sectors in the 40.0 to 40.99 Latitude range
## Odd side-effect of my gridding method,
## NY NEW YORK CNTRL PRK  lat=40.7800  long=-73.9700
## ends up in the lat=40 long=-74 sector.
awk 'BEGIN{print "Year 40N 35-45N NYC"}\
     (($1==40) && ($2!=(-74)))){Avg[$3]+=$4; N[$3]++}\
     (($1>=35)&&($1<=45)){Zavg[$3]+=$4; Zn[$3]++}\
     (($1==40)&&($2==(-74))){NYt[$3]+=$4; NYn[$3]++}\
         for(Y in Avg){print Y,Avg[Y]/N[Y],Zavg[Y]/Zn[Y],NYt[Y]/NYn[Y]}\
     }' TAvg_1x1.txt | sort -n > 40N_latitude3.txt



  1. It would be interesting to see the number of stations included by year, plus some index of the relative climate of new stations added each year- I suspect your curves are heavily influenced by the year-to-year composition of stations.

    • The early portion definitely is, A large influx of stations appear ~1900. I’ll post up a graph after work, today.

  2. What I’m really really interested in is
    (a) stations with long continuous records up to the present – this avoids say starting with weighting in cool-temperate climes (eg abundant early records in European North Atlantic – Arctic) and progressing to tropical climes with the effect of skewing the global trend inappropriately up
    (b) rural stations with good siting and longish records
    (c) comparisons of neighbouring pairs rural/urban

    But – it is already stunning that, far from climate warming, your graph suggests COOLING.

    I’d do these graphs myself but lack the skills.

    • Indeed. I will add that “quality control” metric to the project’s time line.

  3. If you havent done so already take a look at what EM Smith has been doing at his website “musings of the chiefio” He has been doing some detailed work on similar issues with the GHCN network and the GISTemp programme.

    • I have been following EM Smith’s work. Our goals differ: while he is examining the code to see if sensible choices were made, I am crunching through the data with homemade code. Hopefully, I am making sensible choices along the way.

  4. Love it! There are more and more people (myself included) burning through theses datasets. Truth shall prevail!

  5. Hi guys,

    You guys being into dataprocessing, I wonder how to start fiddling with the idl (can I use gdl in Linux?) pro-files in the leaked zip?? It’s a lot of files in there, which to use?? Any tips welcomed.

    Tnx /Rob

    • GDL is open source, and is supposed to compile easily in Linux. I have not tried it.

  6. Eugene

    This is a very interesting analysis.

    Just some questions and thoughts.

    How do you know that the data is really “raw” and not changed in some way. I got the impression that the “raw” data had been lost or “homogenized” into oblivion. Why have other people not analysed this data before?

    This is important stuff. It should be replicated and verified by others – perhaps Jeff Id (Air Vent)or even Steve McIntyre if you could get him interested.

    I find the decrease in trend quite plausible. I have always been perplexed by the fact that the 1930/40 and the Arctic seemed to be as warm if not warmer that today. I was sort of convinced that the 30/40 warmth was just the continental US Arctic.
    However, GISS, Hadcrut etc always showed the 1990 as warmer than the 1930/40.

    Also, any way to compare the raw data with just the UAH/RSS data timeframe?

    • How do I know the data is uncooked:

      a. Obviously, the “really raw” data would be on 100+ year old pieces of paper. So this data is somewhat processed.
      b. The data I am using is from GHCN, not CRU (the guys who lost their unadjusted data), there’s a link at the beginning of methods
      c. I still have not found any significant warming trend.

      Because I have not “cooked the books”, I have not added in the magical “time of observation” adjustment Doing so would give the 1990’s a boost 😉

  7. I would second (pretty much) Lucy Skywalkers comment.

    I’m interesting in the trends from long-lived, high quality weather stations. I’d calculate trends from individual station data and NOT try to calculate a complicated average. Then I’d try to answer questions by characterizing the trends.

    Has there been warming, where, when, how consistent are station trends, do they warm at different times, etc. You get the idea.

    Love your idea. Keep it up (easy for me to say).

    And Good Luck!

  8. Eugene,

    The step by step approach is very sensible, and I will be avidly following what you are doing.

    One thing that gets lost in reading papers dealing with temp measurements is that the assumptions behind choosing certain approaches are lost. These assumptions are critical. In a step by step approach, these assumptions naturally fall out and can be tested and challenged.

    With that in mind, I would suggest that you consider separate threads for each step… You may be satisfied that you have covered all the issues and got the code to do what you think should happen and want to move on. But if where you move to gets interrupted by comments relating to previous steps, the conversation may become incoherent – so – best for it to continue where it is “on topic”.

    It is also natural that people will want to compliment you or otherwise. If this happens in the middle of technical discussions, again, the conversation may become incoherent. It may be best if you have a dedicated general discussion thread where you allow people to raise issues that may not be on topic… You can move any comments from technical threads there so as to unclutter them and allow anyone to easily follow the arguments / logic.

    Just a thought…

    Closer to the topic. When you are showing a very busy graph such as the “Raw Temp by Latitude”, if your software alows this, grade your colours by tone or progress through the colour spectrum (think of how SSTs are represented for example) – cross referencing to the legend almost becomes unnecessary.

    It would also be better than just hyperlinking to a data source / paper via a key word or simple phrase, to parenthetically provide a fuller description. It is also good to know what you are hyperlinking to (i.e. a text file, pdf, etc). Knowing how large it is is also good. You may consider a separate thread just devoted to links etc.

    I for one would like to do something similar albeit on a more local scale – so what you are doing here is likley to be a great resource.

    I wish you good luck and keep it comming.

    Thank you.


  9. Eugene, given that the data is “raw” and shows a linear (?) increase in temperature in the Arctic since 1960, I’d love to see a further breakdown of the Arctic stations: Urban vs. rural, Canada/Alaska vs. Siberia, etc.
    Also, what parts of the year are affected? There have been hints of warming mostly during the Nothern hemisphere Winter e.g., in E.M. Smith’s workthrough of GIStemp.
    However this goes, what you are doing is important work. Thanks.
    Al S.

    • You read my mind 😉 The upward trend in the arctic temperature is impressive.

  10. I’ve just been directed (via Climate Audit) to your site. Very impressive indeed, and “on the ball”. I’ve been analysing climate time series since about 1994, using my own software, but can’t readily handle large sets of multiple sites. I tend to use single site data, or other people’s compilations of means because, like Lucy, I lack the skills required to handle large multiple sets. I would /dearly/ love to have the type of data that you have assembled into various types of (monthly) means. Is there any chance of this? I am fully prepared to provide my email address, (with minor coding!), and I look forward to a contact. Stupid me, you already know it! Regarding methods, I tend to be wary of any smoothing techniques. Long industrial experience, now 25 years ago, led me to this stance. You may have views on this too. I would be happy to send you some “sample” plots of station data, or such things as PDO, produced by my method. You would probably find them rather interesting (I hope).


  11. […] free to leave an analysis suggestion, or point out coding problems in the comments area on “Step by Step“. Contrary to typical blogging, the “Step by Step” entry […]

  12. Eugene:

    This is excellent. Will definitely follow your work going forward to see what else you uncover.

    I think there is tremendous value in reviewing the original data, insofar as it is available. Not necessarily because the original data is the best or “correct,” but because it provides a critical baseline — a realty check — against which the later adjustments, deviations and conclusions should be assessed.

  13. Another approach would be to:

    1. Find all raw records for every station within each 1 X 1 degree grid cell (or 5 X 5, if not enough records for 1 X 1) which have homogeneous data for at least 1 decade (no changes in station location, instrument, etc., during the decade). Call these “qualified decadal records.”

    2. Average all QDFs in each cell for each decade, to create a smoothed decadal trend segment for that decade/cell.

    3. Splice the QDFs in each cell end-to-end, to create a 100+ year trend for the cell. While the absolute temps for the period would be questionable, the trend —- which is what interests us — should be valid.

    4. Average the trends in the cells to produce trends over wider areas, such as latitudinal bands.

    That approach would utilize a larger database than stations with 100+ year records, and sidestep the problem of inhomogeneities in those long records.

    (Idea proposed by commenter “supercritical” at WUWT).

  14. Eugene,

    Great work, thank you. I have been looking for something along these lines for a while and have come across a couple of stumbling blocks.

    If we could only have had a weekly anomaly, as opposed to monthly, we could see most clearly that hot and cold, and dry and wet etc, events happen right next to one-another.

    These events jump out of the Central England Temperature record where we have a very cold month/year next to a hot month/year. The point I suppose I am making is that one cold January can skew a whole year’s anomaly as much as a hot December may. It may well be that just one week within a particular month made all the difference.

    Maybe a look at what happens when a “year” is started in a month other than January?

    Monthly anomalies fog up an analysis purely because they vary in length and “what a difference a day makes”. @1/50th of a year samples would have been a wonderful tool.

  15. […] Re: NASA GISS shows 2009 as tied for 2nd warmest year on record // Step by step: Is the climate changing? Data analytics […]

  16. No precision or error bars in evidence.
    No discussion of sources of error, or their contribution to the overall uncertainty.
    Without these I am unable to give any substantive comment.
    Such a report would have resulted in an instant fail when I was studying science.

    • There is sufficient consideration given to “errors” in the documentation at NCDC. Hansen et al. determined that adding a 0.5C/century trend to temperature data was appropriate to compensate for “known” measurement errors.
      For me, the whole point of this exercise was to look at the RAW data without applying statistics tools. There should at least have been an underlying trend. In fact, there is a sixty year cycle in evidence. The next step would be to begin applying appropriate statistical analyses, perhaps during my next vacation… If you have the time, links to the data & processing source code are provided.

  17. 60 year cycle. Very interesting may you should check out

  18. I have been searching for the “raw data” that feeds the AGW hypothesis. Just looking at your graphs, I really can’t tell if the upward trend is consistently displayed or not. AGW suggests this will only be 1C degree for 100 years. More than that is alarming to the AGW crowd. A good graph for you to add to this collection is the amount of adjustment GISS makes to the raw data – just the adjustment, not combined with the raw data. To me, the adjustment fully explains the rise in temperatures for the last 100 years. Add to that the reliance on city temperatures (heat islands) and you have a history of temperature rise that cannot be refuted.

    Another comparison would be the raw data to the results from UAH’s satellite data. How do they compare?

    Another point: just how many temperature recorders are used? If land mass is 1/3 of the total, do we have 2/3’s of the recorders in the oceans? If not, why not? If, in fact the oceans are becomming more acidic because of all of the CO2, does that mean they are getting cooler (because equilibrium is based on temperature, the higher the temp the more CO2 they release and vice-versa)?

    The AGW crowd has been at this for over 40 years; very well funded through government grants; many hundreds of scientists in their clique; dopey press on their side: why are we still talking about the fundamentals of the science?? They should be able to articulate their view extremely well by now!

    I am just an amateur with some doubts…..

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s


%d bloggers like this: