« 01.05.2010 - 31.05.2010 | Main | 01.03.2010 - 31.03.2010 »

25/04/2010

Accessing Canadian temperature data

One of the things that has come out of climategate is the huge amount of data manipulation that has taken place in order to make regional temperature data fit with the AGW model. This has been demonstrated numerous times on WUWT. Canadian climate data is available on the National Climate Data and Information Archive. If one goes there, temperature data from thousands of weather stations is available for ones perusal. This is very nice except when one attempts to download large chunks of data. The site works fine for the first few downloads (done through the bulk-data option) and then seems to hang. I've tried multiple different web browsers and different IP addresses and found the same problem and it appears that this behavior is deliberate and designed to prevent people from downloading more than a few months of data at a time.

In order to look for trends and cycles in raw climate data, one needs to use as large a date range as possible. Thus far the longest temperature record I've found in the Canadian data thus far is for Edmonton which goes back to 1880 (I'm sure there are earlier records but I just haven't explored the site enough yet). The temperature data at this site appears to be uncorrected and there is some urgency in downloading it all before it becomes "homogenized" and "adjusted". Thus, in December of 2009, when the nasty little adjustments being made to regional temperature data started to surface, I wrote a quick and dirty program, climate_scraper to download data in variious formats from the National Climate Data and Information Archive.

Curiously, if one copied the query strings from this site into a browser and manually changed the date and station parameters, there didn't seem to be any throttling of the data flow from the site. All that my climate_scraper program does is to automate this process and one gives it the station ID# and year range that one wants data for as well as the type of data desired. Right now the only options are for hourly data and daily data. The climate data server is rather simple minded and, when one gives it a year for which it has no data, it will return a file of dates with comma seperated null strings. Needless to say, a file of hourly data for an invalid date is just a waste of time and space so make sure you have the right date range when you use the program.

Climate_scraper is written in VB6 (the best version of VB the M$ put out before they went to the incompatible VB.NET) and the download, available on my website, contains both an executable file as well as full source code. I'm releasing this code under GPL v3 which basically means it is yours to do with as you please except, if you create a derivate work based on my code you have to indicate that you made the changes and release all your code with the executable file.

What motivated me to finish up this program was the WUWT posting about possible errors in the Eureka, Nunavut weather station data for July 2009. This station is used by GISS to calculate temperatures for essentially the whole of N. America north of this station. Fortunately, the manager of the Eureka station has posted explanations on WUWT which has been very helpful in sorting out what's going on with the temperature at this location. Such willingness to share information with WUWT is unusual in the scientific climate community and the manager, Rai LeCotey, is to be commended for his openness. When I read the first posting regarding the Eureka station, I was motivated to write the code to download hourly data from any station that has this data. (This is the h option for DataType).

I've been hesitant to release this program for general use primarily because of my concern that if everyone decides to make their own copy of the Canadian temperature data the increased server load will be noticed and this means of access would be terminated. It would be impossible to close off entirely without rewriting all of the javascript code to access data but I'm sure that it would be easy enough to restrict the number of downloads an individual user was allowed to do during a given time period. For this reason, I would ask people to limit themselves to looking at their local area and thinking about setting up a mirror server that has the whole dataset. This isn't going to be me as I don't have the time, or anything close to the bandwidth required for such a project. The data format for temperature data is an incredibly inefficient one and here is a line of temperature data from the Eureka station:

"2009-7-14 19:00","2009","7","14","19:00","14.40","","4.10","","50.00","","36.00","","22.00","","64.40","","101.87","","","","","","Mainly Clear"

This, incidentally, was the temperature which started the whole post at WUWT. While this data format might not be a concern on someones local 1 Tb disk, it is grossly inefficient for internet transfers. Note that the date appears twice and 41 bytes are used to hold the date which, even if one uses M$'s bloated VB Date variable, only 8 bytes would be required. 19 variables then follow and each of them could be represented in 2 bytes as a signed (or unsigned in some cases) integer variable which would give a binary representation requiring 46 bytes/line instead of the 145 bytes that are currently required (may be off by a few bytes as I counted them manually). It appears that this is a standard format for representing temperature data but when I have a chance to start playing with this data, there is no way that I'm going to store it in this format on my computers

One problem with data conversion is that errors may creep in and this is what we are trying to avoid. One possible way of flagging errors would be to create a pseudo-checksum of all of the numeric values in the ascii format temperature reading and if converting all of the binary variables to text results in something different then clearly there is a problem. I'll leave this problem for whoever takes on the task of coming up with an open-source world temperature data archive.

Going through this data by hand is a gargantuan task, but with the average desktop machine being significantly more powerful than the supercomputers of even 20 years ago, automatic analysis of huge amounts of data is possible in short amounts of time. The main things we're interested in are temperature trends and correlations between geographically close stations as well as temperature outliers which might be errors or may be the result of the numerous artifacts that plague weather stations in urban areas that have been looked at in Anthony Watts surface stations project. This project demonstrate the power of thousands of interested amateurs to exhaustively document the potential temperature errors in every surface weather station in the US. Analysis of temperature data is something which any amateur scientist armed with a computer and and often self-written software can do and potentially make valuable contributions to climate science. Open-source code has been described as peer-reviewed code and the contributions of extended peer-review by amateur climatologists to this area have been embarrasing to the inbred climatologic establishment. Steve McIntyre has done outstanding work, WUWT has been a hotbed for dissemination of independent climate information and E.M. Smith has audited GISS FORTRAN code finding major bugs as well as doing an impressive amount of other climatologic work. I'm happy to make my minor contribution to the process to find climate truth.

Unless someone with more time than me decides to take on the Kamloops temperature data, I plan on analyzing this data when RSN. TIme is one thing that is in very short supply for me.

One warning about the program is that it does run fairly slowly. Data transfer from the internet control returned data is done a byte at a time primarily because I didn't want to write the code to move chunks of data at once instead; I had my code working inside of an hour and I was more interested in looking at the climate data than I was in coming up with the ideal download program. Also, be warned that the VB6 code is not very nice as I used the first thing that worked rather than striving for elegant code. I don't normally program like that and was inspired by some of the climategate FORTRAN code from FOI2009.zip. My code is at least as good as that FORTRAN code and better documented.

Now that you've read through my verbose and rambling description of the program (no time to write a shorter entry) here is where you can get the code. Right now it is only available as a 7zip format file and one can download 7zip at this link.

25/4/2010 T:=14:04

Sorry for the error in the download link; Thingamablog used "downloads.html" instead of "download" resulting in a 404 error for anyone who tried this. Puzzling as the program has been quite good until now. Lack of appropriate testing on my part.

Posted by Boris Gimbarzevsky at 2:23.15
Edited on: 25/04/2010 13:54.58
Categories: Climatology, Computers

18/04/2010

Yardwork

I've been looking forward to nice weather for a while and had forgotten that one of the side effects of a rise in temperature is that plants begin to grow. This means that I now have to allocate time for mowing lawns, weeding and pondering what to plant in my garden this year. As seems to be the norm for all humans, in the winter I was looking forward to long warm sunny days and today, while weeding, I was looking fondly back at winter evenings where I had the time to do more programming and reading.

After seeing my patients in hospital I spent 4 hours mowing the front lawn and weeding its border and realized that this area constitutes only about 1/4 of my yard. When I talk to other doctors about how much time they spend on yard maintenance I usually get a blank look and the name of a company that they use to perform all of these tasks. I might end up going this route later in the summer but at this time of year I still enjoy doing something different.

The nice thing about spending time on yard maintenance is that it is about as different from medicine as one can get. Also, at the end of the day one can see the results of ones work and get the feeling of having really accomplished something. Furthermore, it is good exercise as I'm noticing twinges of pain in places I don't normally feel pain. I'm still at the stage where the novelty of doing this hasn't quite worn off. Given a choice between going to a gym to workout or performing the equivalent work gardening I'll take gardening anyday. This may be a holdover from youth when my father's opinion that working out on weights was a total waste of time and, when I was a teenager, before I could hit the weights in the basement I'd first have to chop half a cord of wood in the garage which my father viewed as usefull muscular work.

After living in an apartment in Vancouver for years it was really neat to finally have a yard and garden. Plant hacking is similar to programming if one looks at plants like machines that execute unique growth programs. I used to just let plants do their thing but after moving to Kamloops I finally decided to modify the growth of plants to see what happened. I obviously don't have the time to create stunning flower gardens as only retired people seem to be able to do this sort of thing, but I can try out various plants to see what they do over the next few months. The other thing that is interesting to observe are the various insects that appear during the course of the spring and summer and these vary from year to year. Leaf cutter bees were very prevalent a few years ago but don't seem to be around yet this year.

Posted by Boris Gimbarzevsky at 21:50.28
Categories: Miscellaneous