2012 Student: György Kovács

György Kovács is a student mentored by the Foundation in the Google Summer of Code 2012. This is his project proposal. There are also blog posts reporting his project progress.

Short description: The aim of the project is to reimplement the Matlab code of the Berkeley Earth Surface Temperature software using Open Source Tools to make it available for the open source community research climate change. In the work I would use R or C++ with GNU GSL.

I’m an assistant lecturer at University of Debrecen (http://www.inf.unideb.hu), MSc. in Computer Science, MSc. in Applied Mathematics. Besides, currently I am a BSc. student in Physics.
I live in Debrecen, Hungary, GMT+1, and my preferred communication language is English.
My main interest areas are machine learning and computer vision. I’m familiar with several technologies, frameworks and libraries related to scientific and numerical computations and parallelization, like Matlab, GNU GSL, OpenCL, OpenMP, pthread.
I lead an open source development called OpenIP. Within this OpenIP project I have implemented a waste of image processing and machine learning algorithms, including many numerical solutions. Without the sake of completeness: histogram and continuous distribution based classifiers, decision tree based classifiers, validation and discretization techniques, graph search algorithms, stochastic optimization techniques, wavelet transforms, etc. for the statistical segmentation of 2D/3D images. In this project many of the related open source packages are used rutinely: GSL, libpng, libjpeg, libtiff, libsvm, etc.
Sometimes I use R and unfortunately I’m not too experienced with Python and JavaScript.
I’m a PhD-student in Signal Processing and Digital Communication. During the implementation of the OpenIP library, I have understood and implemented the content of many scientific paper, and have also written some. You can find the list in my CV at my website.
I have no experience in climate science (apart from the shows on National Geographic and Dicsovery channels), but I have keen personal interest. I know some general facts and/or urban legends about the change of the Gulf-stream, etc., but anyway, I would like to know more from reliable sources.
I would like to participate Google Summer of Code, because I am working at my university as an assistant lecturer, the summer is free and I want to do something interesting and beneficial. Furthermore, I want to try myself in an international group.
I have chosen the Climate Code Foundation as my mentor, since I have found some project proposals really interesting, and I am also interested in the ongoing research in climate science.
In the summer I will take a two-weeks holiday in Hungary, watch the European Football Championship and the Olympic Games on TV :), for about two weeks, but in the rest I wanna go on with research and programming.

The Project

I am interested in the reimplementation of the Berkeley Earth Surface Temperature code. As it is stated, it’s written in Matlab and an open source implementation would be required.
First I would like to understand what’s going on in the code, then I reimplement them in C++/C using GNU GSL, or in R. (Although I have mentioned that I’m not too experienced in R, I wanna learn it for a long time, so this would be a perfect opportunity.) Furthermore, if required, I could do a C++ implementation and write some R binding for it. Since I’m experienced in statistical methods (PCA, feature selection, curve fitting with Support Vector Machines, B-Splines, etc.), if I have some ideas to enhance the functionalities of the code, after discussion with the mentors, I could do it, as well.
In the summer I would like to perform this work in July-August.
My project does not require travel.
Actually, I could not access the Matlab Code on the specified link, so I could not analyse it. At first glance, the risks are low, since I have already done many Matlab => C++ reimplementations (filtered backprojection for positron emission tomography, etc.). If the code must be merged to some existing library (apart from GSL), and I should use its data structures, methods, methodologies, the work could become harder, since I must understand the existing concepts, discover the existing tools, etc. But in general, I can do it.
For the best results and preferred enhancements, I would like to have a scientist mentor. If the code must be merged to an existing system, I would need a programmer mentor, as well, to help me discover the key data structures and functionalities of the existing libraries.
I can work in team but if required, I can work efficiently in self-sufficing manner, it depends on the details.