Open BEST Project status

This guest post is written by György Kovács, who worked all summer on a reimplementation of some climate science software using only free software tools, thanks to the excellent Google Summer of Code. This is his second post, here is the first.

In the Google Summer of Code 2012 program my task was the reimplementation of the Berkeley Earth Surface Temperature (BEST) Matlab software in C. In the beginning of the work I was very optimistic, but now I see that I have highly underestimated the amount of work required to complete the entire project.

The BEST software is a professional Matlab code, that is, all the fine features and structures of Matlab are routinely utilized in the code, from the dynamic extension of structures with fields to the use of cell arrays. The representation of these things is definitely not that mechanical job I have expected. Another drawback is the lack of Matlab to C compiler in the form we have expected it when scheduled the work. In older versions of Matlab there were opportunities to compile a Matlab code to a C source, but in the current releases Matlab creates only a header and an encrypted library, which definitely not fits the goals of Climate Code Foundation.

Anyway, after two months of coding I managed to implement the main run path in C. The BEST software has plenty of parameters and three predefined parameter sets. The ‘quick’ parameter set is the one which takes the control over one iteration of the kriging process and generates simple but demonstrative results. This run path is the backbone of any parameterizations of the software, so the most important part of the development is completed and working.

Although a large amount of code was written, the development is far not ready. Several features need to be added to get more accurate results. The main lesson I have drawn is that the C language is not the easiest way to reimplement the interpreted Matlab code to compiler based imperative languages. Although C is simple and extremely fast, it’s a hard task to represent the data structures and operations that are routinely used in Matlab. Either a thorough refactoring is required before coding in C or some higher level programming tools (classes, overloading, templates) should be used to code the handy but complex features of Matlab. Perhaps a reimplementation of the reimplemenetation could make the code simpler and easier to use by the community.

Special thanks to Nick Barnes and Nick Levine for the great mentoring. GSoC is great, and I really hope I have created something valuable for the Climate Code Foundation, science and mankind in this summer. The code is available at github, while some sample charts can be seen in my blog posts.

This entry was posted in News. Bookmark the permalink.

2 Responses to Open BEST Project status

  1. Mark Hadfield says:

    Re-implementing matlab code in C? You’re a glutton for punishment! Did you consider Python with the Numpy extension?

  2. Nick Barnes says:

    Mark: yes, of course we did (in fact, that was my preferred technology in the original project idea). But there were two obstacles. The first was performance: the Berkeley algorithm involves some heavy numerical lifting (the Berkeley team are running it on a cluster, of 80 CPUs if I remember correctly), which might have required writing NumPy extensions to run at acceptable speed. The second was that György’s experience was in numerical algorithms using C/C++ (and MATLAB), and we didn’t want him to have to spend much of the summer coming up to speed on NumPy.

Leave a Reply

Your email address will not be published. Required fields are marked *