2012 Student: Jeremy Wang

Jeremy Wang is a student mentored by the Foundation in the Google Summer of Code 2012. This is his project proposal. There are also blog posts reporting his project progress.

Short description: I propose a web-based tool supporting dynamic visualization of spatially and temporally variable surface temperature data from ccc-gistemp. The tool will support dynamic client-side interaction including navigation of climate data and visualization at variable resolutions and throughout history.

I am proficient in Matlab and Mathematica. I greatly prefer Python and regularly use numpy and scipy through Python. I have extensive experience in data mining and numerical processing – most of it in relation to computational genetics and its massive reams of sequence and genotype data.

I am both a producer and consumer of open-source software. I am most familiar with and favor Linux and its suite of open-source applications.

I much prefer Python over others like R and I am very good. My speciality is HTML(5), Javascript, and the host of technologies and languages extant in web development and web-based applications.

I have published extensively in graduate school . Please see my CV for a full list. The most pertinent to this project is “Dynamic Visualization and Comparative Analysis of Multiple Collinear Genomic Data.”

I have not had any formal education in climate science outside what one learns in introductory college-level science courses. I’ve developed a personal interest based on what I’ve read and heard. I’ve performed some rudimentary data mining and analyses based on publicly available data to assess climate trends in the past century and a half. I’m very much the kind of person who wants to prove something for themselves rather than taking statements at face value. This led me to want to explore climate change for myself given the, perhaps unfounded, controversy.

I intend to pursue the Google Summer of Code as a sort of independent summer internship, in no uncertain terms. Climate change is a field I am interested in and I think I can apply my skills, and of course I would love to be paid so that I can afford the time to do it. I am really excited about the project proposed and I want to make sure it’s done and done right. I enjoy making dynamic web-based tools like this and I think it will prove to be a great way to expose these kind of climate data and problems to lay people and climate scientists alike.

This summer, aside from GSoC, if I am so lucky, I have planned vacations with my family and my fiance’s family amounting to 2 weeks. I also plan to continue progress toward my degree including a part-time research assistantship. I will not be taking any courses and my primary focus for the summer will be GSoC and this project. I have completed similar projects (see my CV) and I am confident this project can be completed in excellent fashion in the given time.

Project

I propose a web-based tool supporting dynamic visualization of spatially and temporally variable surface temperature data. Time-permitting, the tool may be made extensible to the generic set of geographical (GIS) data.

The project consists of, at its heart, a website with supporting server-side data mining but minimal analysis or computation. The user interface will be presented as a web page utilizing web technologies including HTML5, DOM, Javascript, and CSS to allow dynamic client-side manipulation of the data presented, including navigation through the data both spatially and through time. The fundamental interface will be a map view a la Google Maps, supporting zooming and out, and panning in two dimensions. Additional information and annotation will be available by selecting a feature on the map (ex. grid or station information). Variation through time will be adjustable and animated using a slider or some other client-side adjustment.

Many web-based visualizations and tools exist pertaining to climate and/or geographical data, most of which admit a map-based UI conceptually similar to what I propose. However, each of these has limited functionality, operates on a fixed data set, and often has poor performance. There does not exist a generic online GIS viewer, much less one which is open source and can be easily extended to the format specific to GISTEMP. I plan to create (1) a tool exposing GISTEMP data, (2) a more dynamic and necessarily higher performance and more responsive tool to allow fluid user interaction, (3) a greater range and flexibility of data and visualization than that seen in an, essentially, gridded image (the technique used by Google Maps and many existing viewers), and (4) if time permits, an open-source tool for visualization of generic GIS data.

Timeline

April 23rd – May 21st

Exploration and parsing of GISTEMP data and/or any data that will be exposed by the tool.

May 21st – June 11th (1st quarter)

Setup of server architecture and data, server-side data mining, filtering, and exposure by JSON/AJAX

June 11th – July 9th (2nd quarter)

Client-side map-based representation including panning and zooming, JSON/AJAX interface with server

July 9th – Midterm evaluation

At this stage, we should at least meet the baseline of existing, static, map-based visualization tools

July 9th – July 30th (3rd quarter)

Advanced client-side manipulation including off-map annotation, “side” widgets, and navigation through time

July 30th – August 13th (4th quarter)

Polishing UI, user manual (if I’ve done my job, it won’t be long), documentation, and cross-browser testing

August 13th – August 20th

Final testing and documentation

August 20th – Final evaluation

Deliverables

Server-side (LAMP)

Data-mining, filtering, and exposure via a JSON/AJAX interface to web-based client.
While the server architecture and setup will be largely industry-standard, the primary server-side deliverable will be the code (Python) used to serve client requests (JSON/AJAX), mine the requisite data, and return a response

Client-side (HTML)

JSON/AJAX client

Visualization widgets, primarily dynamic map-based display as described above

All client-side functionality will be encapsulated in a set of HTML, CSS, and Javascript (JS) files admitting, in a classical sense, the model, view, and controller, respectively

Data mining and filtering components of the server will interface with existing data files/formats and code generating annotation or processed/post-analysis data. Client software consisting of the website/visualization tool(s) will be largely independent and may stand alone. It may also be integrated into the existing site at climatecode.org or any other location as a stand-alone widget.

The tool/website will promote the goals of the Climate Code Foundation by providing an intuitive and informative interface exposing the GISTEMP data at a level comprehensible and usable for anyone from lay persons with a vague interest in climate change to climate scientists.

This project requires no travel.

The most pressing concern is simple exposure and availability of the data. I expect access and parsing of the existing data to be the biggest hurdle impeding success of this project. Using the month before May 21st and the official start of the project to familiarize myself with the data and difficulties of access and parsing should serve to make sure this won’t be a problem and we can avoid issues popping up later on.

The ideal mentor will be a scientist familiar with the GISTEMP data, including the technical format and interpretation as a climate researcher, including knowledge of which data sets and potential visualization would be most useful.

Aside from regular mentoring, this project will require the GISTEMP data we plan to include in the tool and a server, physical or virtual, on which to develop the server-side components. I am most familiar with a LAMP setup – Linux, Apache, MySQL, and Python (not PHP) – and this is widely available and understood. Of course the GISTEMP data is not a simple MySQL database, so this will be entirely dependent on the existing data formats. If the foundation does not have these resources, a more or less LAMP architecture is available via Google App Engine or Amazon Web Services (EC2). I can also certainly negotiate a development environment, if not a production server, from the university.