This guest post is written by György Kovács, who worked all summer on a reimplementation of some climate science software using only free software tools, thanks to the excellent Google Summer of Code. This is his second post, here is the first.
In the Google Summer of Code 2012 program my task was the reimplementation of the Berkeley Earth Surface Temperature (BEST) Matlab software in C. In the beginning of the work I was very optimistic, but now I see that I have highly underestimated the amount of work required to complete the entire project.
The BEST software is a professional Matlab code, that is, all the fine features and structures of Matlab are routinely utilized in the code, from the dynamic extension of structures with fields to the use of cell arrays. The representation of these things is definitely not that mechanical job I have expected. Another drawback is the lack of Matlab to C compiler in the form we have expected it when scheduled the work. In older versions of Matlab there were opportunities to compile a Matlab code to a C source, but in the current releases Matlab creates only a header and an encrypted library, which definitely not fits the goals of Climate Code Foundation.
Anyway, after two months of coding I managed to implement the main run path in C. The BEST software has plenty of parameters and three predefined parameter sets. The ‘quick’ parameter set is the one which takes the control over one iteration of the kriging process and generates simple but demonstrative results. This run path is the backbone of any parameterizations of the software, so the most important part of the development is completed and working.
Although a large amount of code was written, the development is far not ready. Several features need to be added to get more accurate results. The main lesson I have drawn is that the C language is not the easiest way to reimplement the interpreted Matlab code to compiler based imperative languages. Although C is simple and extremely fast, it’s a hard task to represent the data structures and operations that are routinely used in Matlab. Either a thorough refactoring is required before coding in C or some higher level programming tools (classes, overloading, templates) should be used to code the handy but complex features of Matlab. Perhaps a reimplementation of the reimplemenetation could make the code simpler and easier to use by the community.
Special thanks to Nick Barnes and Nick Levine for the great mentoring. GSoC is great, and I really hope I have created something valuable for the Climate Code Foundation, science and mankind in this summer. The code is available at github, while some sample charts can be seen in my blog posts.