How simple can a temperature analysis be?
We wanted the average person on the Clapham omnibus to be able to use it, inspect it, reason about it, and change it. ccc-gistemp successfully reproduces the NASA GISTEMP analysis (to within a few millikelvin for any particular monthly value) in a few thousand lines of Python code.
A few thousand lines is still a lot of code. There are still a few corners of ccc-gistemp that I haven’t fully looked into. Can we make something simpler and smaller that does more or less the same job? Obviously we can’t still expect to use exactly the same algorithm as NASA GISTEMP, and nor would we want to, because the exact details of which arctic locations use SST anomalies and which use LSAT anomalies are just not very important (for estimating global temperature change). It can be distracting to get bogged down in detail.
ZONTEM attempts to discard all constraints and make an analysis that is as simple as possible. The input data is monthly temperature records from GHCN-M. The Earth is divided into 20 latitudinal zones. The input records are distributed into the zones (by choosing the zone according to the latitude of the station). The records are then combined in two steps: first combining all the stations in a zone into a zonal record; then combining all zonal records into a single global record. The global record is converted into monthly anomalies, and then averaged into yearly anomalies. The zones are chosen to be equal area.
This is simpler in so many ways: only one source of input; no ad hoc fixing or rejection of records; no correction for UHI or other inhomogeneity; no use of Sea Surface Temperatures; only a single global result (no gridded output, no separate hemispherical series).
The result is about 600 lines of Python. This is split into 3 roughly equally sized pieces: ghcn.py, series.py, zontem.py. ghcn.py understands the GHCN-M v3 file format for data and metadata (this is vital, but scientifically uninteresting); series.py is borrowed from the ccc-gistemp project and consists of the detailed routines to combine monthly records and to convert to anomalies; zontem.py is the main driver algorithm, it allocates stations to zones, and picks a particular order to combine station records.
A good chunk of zontem.py is concerned with finding files and parsing command line arguments. The actual interesting bit, the core of the ZONTEM algorithm, is expressed as a very short Python function:
def zontem(input, n_zones): zones = split(input, n_zones) zonal_average = map(combine_stations, zones) global_average = combine_stations(zonal_average) global_annual_average = annual_anomaly(global_average.series) zonal_annual_average = [ annual_anomaly(zonal.series) for zonal in zonal_average] return global_annual_average, zonal_annual_average
This is a useful 7 line summary of the algorithm even though it glosses over some essential details (how are stations split into zones? How are station records combined together?). The details are of course found in the remaining source code.
I like to think of ZONTEM as a napkin explanation of how a simple way to estimate global historical temperature change works. I can imagine describing the whole thing over a couple of pints in the pub. ZONTEM has probably been simplified to the point where it is no longer useful to Science. But that does not mean it is not useful for Science Communication. In just the same way we might use a simplified sketch of a cell or an atom to explain how a real cell or atom works, ZONTEM is a sketch of a program to explain how a real (“better”) analysis works.
If ZONTEM seems simplistic because it doesn’t increase coverage by using ocean data, well, that’s because GISTEMP and HadCRUT4 do that. If it seems simplistic because it doesn’t produce gridded maps at monthly resolution, well, that’s because Berkeley Earth (and the others) do that. Every way in which ZONTEM has been made simpler is probably a way in which NASA GISTEMP, or a similar analysis that already exists, gets a more accurate result.