As a scientist I’m constantly worried about the reproducibility of my results and the clarity of my codes.
My research topic is directly related to climate science and that’s the main reason I chose to apply to the Climate Code Foundation project for the GoSC. That way, I will be exposed to a learning environment with good coding practice, and I’ll have the opportunity to work with other scientists in a programming effort, as well as sharpen my Python skills.
I’ll separate my ideas into some quick-hacks (short term goals) and a more in-depth/open-ended programming exercise (long term goals). The short term goals that will deliver a simple solution to the project ideas presented at the website and at the same time as serve as a starting point to dig deeper into the ccc-gistemp code. Whilst long term goal the long will serve as the primary programming programming project for the GSoC.
Short term goals
1) One-clicking packaging
- Make the code installable using Python distutils via a setup.py for linux user.
- Create of binary packages for many linux distros using OpenSUSE Build Service as a test of the setup.py approach.
- Create a Windows binary via py2exe and a MacApp via Appinstall.
2) Excel support
- Add read/write capabilities with the Python module xlrd or just a function to read/write a csv file of the data.
3) Climate science libraries
- Adapt parts of my own project, python-seawater (TEOS-10) to ccc-gistemp.
- Implement the Conservative Temperature (CT) algorithm.
- Propose a new Sea Surface Temperature (CT) over oceans when Salinity data is available.
Long term goals
4) NumPy implementation
- Make use of NumPy function when applicable inside the ccc-gistmp code.
- Create Unittest for all NumPy and non-NumPy computations to guarantee a safe programming environment and testing for installation on other platforms.
- Write automatic documentation for sphinx with and some examples on how to use the code.
5) Super Whizzy Visualisation Browser
- Create an easy to use GUI to for scientific and non-scientific community alike, built on top of Enthought’s Trait-GUI.
My PhD research is a study of the winter water formation in the Gulf of Maine and the Ocean Mixed Layer evolution throughout the year. Right now I’m working on a preliminary paper where most of the code used will be made available on my personal web page together with the results.
My proposal is to tackle all short term goals in the first quarter of the summer to get familiar with the ccc-gistemp and at the same time produce some simple results.
The last 3 quarters will be used to execute the long term goals.
My proposal is to expand the idea at the website for NumPy support with unit tests and automatic documentation generation with examples inside the docstrings. The reason I’m qualified for the NumPy support idea come from my experience with NumPy and code testing (see my projects below ), and the everyday use of NumPy as an alternative for Matlab in our lab.
The first step is to study the ccc-gistemp code and identify code blocks where a NumPy function or object (like interpolators or array objects) make sense. Once a code block with a potential for a NumPy alternative is identified, I will write a unit test for the code “as is.” That way, the current result is stored, and the NumPy implementation can start. After each modification the unit test will be used to certify that the results are in accordance with the non-NumPy code.
The next step is to create a speed test to identify if the modification were worth it or if more work is necessary in the block.
If applicable, there is still the possibility to create a Python decorator to treat inputs in a homogeneous form. That will allow users to input scalars, arrays, and masked_arrays seamlessly with no modification need from user side.
My proposal for the visualization browser is to start a framework using Enthought’s Traits-GUI. I’m not very experienced with Traits, but I’ve been studying it for a while and I believe that it provide a nice API to quickly implement GUIs. My plan is to keep this part of the project open and use any extra time available to give it a try.
I’m taking a informal course in Traits with an expert here at our university and I believe I can make some progress.
My idea is to add links, parts of the code, and other “hints” to every user “click”. The goal is to give the user a “transparency feeling” to what the browser is doing under the hood.
The main risk of my proposal is to be unable to fully identify all possible code blocks that could be optimized by NumPy.
Also, the visualization browser will have a 3rd party dependency on Enthought’s Traits-gui. Even though Enthought products are open source, sometimes it can be difficult for users to get all the modules to work due to the intricate dependencies. Another option is to use Enthought’s binary package that is free of charge for students.
The main goal of the Climate Code Foundation is to promote the understanding of climate science among the general public.
At the moment ccc-gistemp is a big blob of Python code that the user need to download, unpack, read instructions, type instructions at a command line, wait for 30 minutes, and then look at plain-text output files.
The work I propose is intended to improve that situation by:
- Enhancing public accessibility via packaging (one click install for windows/mac/linux).
- Writing a comma separated values (or excel spreadshet) with the results
- Delivering a faster code, making use of NumPy implementation, allowing quick runs for demonstrations (i.e., a professor making a classroom demonstration to students).
- Making run and visualization easier for the general public via a click-and-point Graphical User Interface (GUI).
In addition to those goals the foundation values software clarity, management and openness. Those key points will be respected during the development of this proposal.
In addition to its main goal, the foundation values software clarity, management and openness. Those key points will be respected during the development of this proposal.
The final outcome of this proposal is to make one step forward in a long- term plan to elaborate an interface to ccc-gistemp that anyone can run and visualize the results, maintaining the scientific reliability and robustness of ccc-gistemp.
Proposal time-line and development methodology
Before coding (1st quarter)
- Familiarize myself with ccc-gistemp code.
- Create a packaging system with distutils.
- Create a binary installers for ccc-gistemp and test the average user run/install on Mac and Windows.
- Discuss with the mentor the idea of serving smaller pre-processed chunks of the data for faster runs.
- Discuss the possibility of Coma Separated Values (CVS) versus full excel files support.
- Discuss the incorporation of the Conservative Temperature algorithm into ccc-gistemp.
- Identify places where a NumPy function could be applied.
- Implement NumPy function when possible.
- Create a test unit for each implementation with/without NumPy.
- Discuss masked_array support with the mentors.
- Test scalar/arrays/masked_arrays as input when applicable.
- Further refine tests and documentation for the whole project.
Coding (Visualization Browser)
- Use the last 2-weeks to start a framework for a visualization browser.
- Write/polish documentation for the code produced.
- Discuss with mentor possible improvements specially with the visualization browser.
- Write a report with final thoughts and suggestions regarding the code.
This summer, if selected, I intend to work mostly on the coding, with intermissions for my own studding, but I won’t be taking any courses.
I expect to deliver a working code (and hopefully faster) making use of NumPy.
I consider my level of experience with scientific computing as intermediate. Recently I’ve been migrating from Matlab to Python, and my experiences with this have been very exciting. However, due to the limited availability of the standards toolboxes for oceanography present in Matlab, most of Python studies time has been to “translate” toolboxes to Python.
In the past 2 years, I’ve manage to fully switch from Matlab to Python, making use of NumPy/SciPy/Matplotlib and others modules. Most of my computations and plotting are now 100% in Python, with rare exceptions for Fortran wrapped code. I have experience reading and writing a significant number of data formats (Netcdf 3/4, HDF 4/5, custom Fortran binaries etc).
I have some Fortran skills as well, mostly modifying/adapting code from numerical ocean models to our studies. My favorite Fortran compiler is gfortran; due to its openness, it is readily available in a great number of platforms and is easy to use. However, we do use ifort heavily in day-to-day operations in our lab for optimizations.
I’m converting the OASP package to Python making use of NumPy/SciPy functions. My focus is to modernize the code and create a course framework with tutorials and examples integrated into the code documentation. The final objective is to produce an entry point for students in scientific computing with Python.
Open source experience
I’m big fan and knee-deep in the open source world. Most of my project are license with OSI approved licenses and hosted at Google codes and/or PyPI.
Furthermore I’m packager Linux packager for the OpenSUSE distribution.
The packaging experience brought me not only compiling experience, but experience in software testing, quality control, and patch work-flow to upstream as well.
The objective is to host a one-stop repository for oceanographers to obtain scientific software.
Python is nowadays my main language. Here’s a list of my current projects:
- Python translation of http://www.teos-10.org/software.htm:
- Air-sea fluxes computations:
- Miscellaneous scripts: