ISTI has more data

When we’re using the ISTI dataset with ccc-gistemp, what advantage does it give us? The northern hemisphere is already well sampled, so it doesn’t give us much there. Does it do any better in the southern hemisphere?

This is a plot after figure 2 of Hansen and Lebedeff 1987. It shows, for each of the 80 boxes used in the analysis, the earliest year that has data:


A word of caution for those comparing this to the actual figure 2 of Hansen and Lebedeff 1987. Their figure shows the date “when continuous coverage began” for each box, which I take to mean the date when continuous reporting began for any station within the box. The plot I give above is of data used in an analysis, and a box will include data from stations outside of the box (as per the 1200 km rule in Hansen and Lebedeff 1987); it is also why my boxes are clamped at 1880.

The top figure in each box is the earliest year of data for the ISTI MADQC dataset; the bottom figure is the earliest year of data for the GHCN-M QCU dataset (years in brackets mean that the given year is the earliest year of continuous reporting, but there are earlier fragments). The little figure in the bottom right corner of each box is the box number, using the same convention as figure 2 of Hansen and Lebedeff 1987. A box is blue when ISTI has earlier data (hence, more), and is pink when GHCN-M has earlier data.

Where ISTI wins the most is box 65, where ISTI has extended the data period back from 1950 to 1887 and a little bit before (almost the full period for the analysis). There are only a handful of stations contributing to this box, so it’s just about sensible to plot them all on one plot. All the stations start reporting around 1950, except for one:


That station is MP00006199, Plaisance (now renamed Sir Seewoosagur Ramgoolam International Airport; the renaming of airports is a history-in-miniature of colonialisation: airports are built by the colonial powers, then renamed as the newly independent ex-colonies stamp their mark on them).

The equivalent plot for the ccc-gistemp analysis done with the GHCN-M QCU dataset has all of the stations (including 12961990000 Plaisance) starting in 1950 or later:


Meteorological stations don’t just spring up out of nowhere and start reporting all in the same year of 1951. The fact that many stations have records starting in 1951 is an artefact of the collection process: There were deliberate attempts to recover and digitise existing records from 1951 to 1980 so that they could be used for normals (Peterson and Vose, 1997). This is one of the key benefits of the ISTI dataset. By bringing together data from diverse sources, we find and can make use of longer records.

So it seems likely that some of these other stations contributing to box 65 could have more data coaxed out of them. For this box the station of most interest would be WMO station 61996 (listed as 15761996000 Ile Nouvelle-Amsterdam in GHCN-M v3, and FS000061996 Martin-de-Viviès in ISTI Stage 3) because this station is actually within the bounds of the box, whereas the others are not. Sadly, this is a remote island that didn’t have any settlement at all until 1949, so we’re not going to suddenly find significantly more data for this station.

Box 58 is a case where ISTI has more data, but it’s not in a period that connects to the data starting in 1935. Plotting the contributing stations we see that the period of continuous reporting hasn’t actually changed:


What’s changed is that we have two extra data fragments for a single station: FP000091938, Tahiti. Given the huge gaps between reporting periods, for an analysis like ccc-gistemp we would be better off just discarding those data. I’m sharpening my scalpels.

Perhaps it would be worth doing some data archaeology for Tahiti. Can it really be the case that reliable temperatures were only reported from 1935 onwards? The international airport opened in 1960, so the period of reporting isn’t directly related to the airport opening. Perhaps there’s a stack of yellowing paper forms in wherever the French keep their archives.

Box 40 is the only box in the northern hemisphere to have its period of data extended by ISTI. The, by now traditional, plot of contributing stations shows that this is due to 3 stations (I’ve truncated the plot at 1945 to avoid showing a large number of stations that began in 1950/1951):


The contributions of BPXLT466819, Tulagi, and KRXLT605164, Ocean Island, are most welcome. The record for the 3rd station, NRXLT092567, Nauru, is somewhat obscured, but when I plot that alone, we see that it’s just the sort of record that I’ve come to complain about:


A series of unrelated periods joined into a single record for no particularly good reason. The ISTI dataset is certainly good for studies of imhomogeneity because it seems to create lots of inhomogeneities to study. I think if we’re going to continue to use ISTI for ccc-gistemp, I’ll have to implement something like Rohde’s scalpel.

This entry was posted in News. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *