The Climate Code Foundation is a non-profit organisation to promote the public understanding of climate science.

Google Summer of Code 2012 update

The Climate Code Foundation is taking part in the Google Summer of Code (GSoC) again this year. GSoC is a program in which Google sponsors students who contribute to open-source software projects. Each student works full-time for three months (late May to late August), mentored by experts from an organization such as the Foundation, and earns $5000.

The Foundation took part last year, and mentored three student projects. All three projects were successful, and all three students presented work at the AMS annual meeting in New Orleans in January.

This year we have two students, György Kovács and Jeremy Wang. György, in Debrecen, Hungary, is working to produce a version of the Berkeley Earth Surface Temperature analysis code which can be run without any closed-source components (the current BEST analysis requires Matlab). Jeremy, in North Carolina, USA, is building a web visualisation tool for climate datasets, with the primary goal of a clear and integrated visualisation of the GISTEMP analysis. Both are making excellent progress in their projects, and both have written blog posts with more information, which I will post shortly.

We’re lucky to have a pair of such talented scientists and programmers working for us, and we’re grateful to Google for their support.

Posted in News | Tagged , , , | Leave a comment

Climate Informatics workshop

Apologies for the enormous delay in writing this post.  There is a massive backlog, and I am intending to write one post every day until it’s cleared.

On 2011-08-26 I attended the First International Workshop on Climate Informatics, in New York City. It was a very interesting event which gave me some new insights into computer use in climate science. This blog post is my trip report.
Continue reading

Posted in News | Leave a comment

Student Internships

Are you a student? And a programmer? Do you want to work for three months this summer, paid by Google, mentored by experts, writing code to help climate science?

The Climate Code Foundation has been selected as a mentoring organization for the Google Summer of Code 2012. We took part last year, and mentored three, student projects. All three projects were successful, and all three students presented work at the AMS annual meeting in New Orleans in January. The experience was very positive, and we are looking forward to repeating it.

We have many exciting ideas for student projects, several of which involve working directly with climate scientists in the US or the UK. If you have experience with Android or iOS, you could make an app. If you have worked with Python and Matplotlib, you could build visualization tools. If you have skills with web frameworks, or RDF, or data formats, or porting old applications, then we have something for you.

Visit our organization page, read our ideas page, join our mailing list, and take part.

Posted in News | Leave a comment

Presentations

Here is a table of recent presentations I have given about the Climate Code Foundation:

Date Location Occasion Slides
2011-08-29 Google offices, New York City Invited talk PPT
Google Docs
2011-10-24 Googleplex, Mountain View Invited talk PPT
Google Docs
2011-11-22 WMO offices, Geneva Short talk at ICES Foundation meeting PPT
Google Docs
2011-11-25 NCAS, Reading, UK Seminar PPT
Google Docs
2012-01-18 NCDC, Asheville, NC Seminar PPT
Google Docs
2012-01-24 AMS Annual Meeting, New Orleans, LA Guest Speaker at Second Symposium on Advances in Modeling and Analysis Using Python PPT
Google Docs
2012-02-28 UKMO, Exeter, UK Seminar PPT
Google Docs

Here are also Google Docs and PPT slides for the talk I originally prepared to give at the ICES meeting in Geneva on 2011-11-22. I discarded those slides and rewrote the talk after reading Michael Nielsen’s book “Reinventing Discovery” on my way to the meeting.

Creative Commons Licence
These presentations by Climate Code Foundation are licensed under a Creative Commons Attribution 3.0 Unported License.
 

Posted in News | Leave a comment

Google Summer of Code 2012

At the Climate Code Foundation we are hoping to take part in this year’s Google Summer of Code (GSoC). GSoC is a program in which Google sponsors students who contribute to open-source software projects. Each student works full-time for three months (late May to late August), mentored by experts from an organization such as the Foundation, and earns $5000.

If you are a student with programming experience and an interest in climate science, why not work with us over the summer to improve the public understanding of climate science?

The Foundation took part last year, and mentored three, student projects. All three projects were successful, and all three students presented work at the AMS annual meeting in New Orleans in January. The experience was very positive, and we are looking forward to repeating it.

We have an ideas page, listing several dozen ideas which could be developed into project proposals. If you have ideas of your own, we’d like to hear about those too.

If you are interested in participating as a student, then please get in touch.

Posted in News | Tagged , | 1 Comment

Activity and Status

It’s been some months since we updated this blog. My apologies for that. Here’s a quick summary of our recent activities. I hope to make a quick series of blog posts over the next week or two describing some of these in more detail, and linking to presentations and related materials:

  • 2011-08-26: Invited panelist at the First International Workshop on Climate Informatics, at the New York Academy of Sciences in New York City. This fascinating workshop, organized by Gavin Schmidt (of Columbia and NASA GISS) and Claire Montelione, brought together researchers in climate science and in informatics to find common ground.
  • 2011-08-29: Invited talk at Google’s New York City offices, describing the Foundation, our Clear Climate Code project, our Summer of Code successes, and the then-draft Science Code Manifesto.
  • 2011-09-01: Attended a meeting of the ‘Science as a Public Enterprise’ policy study, at the Royal Society in London. Michael Nielsen spoke about the Polymath project, and the open science revolution. He took the time to talk with me later, and offered his support for the Science Code Manifesto.
  • 2011-09-02: Invited panel member at Science Online London, providing an outsider’s perspective in a discussion of the research funding system.
  • 2011-09-22: Visited the Macmillan offices in London, to meet Olive Heffernan (editor of Nature Climate Change) and Mark Hahnel (the man behind FigShare).
  • 2011-10-10/11: Attended the two-day meeting at the Royal Society on “Warm Climates of the Past”. A lot of fascinating science attempting to draw lessons for the Anthropocene from some particular paleoclimate episodes (the LIG, the PETM, and so on).
  • 2011-10-13: Launched the Science Code Manifesto, to a great response from a wide range of scientists.
  • 2011-10-21/23: Invited to the Google Summer of Code Mentor Summit, in Mountain View, CA. A terrific gathering of people from across the open-source world. We had a couple of positive sessions on Open Science: open data, open source code, open access publications.
  • 2011-10-22/23: I couldn’t get to the Open Science Summit (also in Mountain View, CA), because it clashed with the Google event. I did show my face at the Saturday evening social, and met a number of Open Science movers and shakers in person for the first time (apparently I just missed Victoria Stodden).
  • 2011-10-24: Talk at the GooglePlex (invited by Peter Norvig). Covered the Science Code Manifesto especially.
  • 2011-10-24: Attended a seminar at Stanford by Robert Reicher, on sustainable energy futures. Met John Mashey in the flesh (20+ years after first encountering him online).
  • 2011-10-25: Invited visit to the Berkeley Earth Surface Temperatures team at the LBL. Met Richard Muller and the rest of the team. Sat in on their weekly team meeting (honoured to sit opposite Saul Perlmutter and Arthur Rosenfeld).
  • 2011-11: Our paper on the ccc-gistemp project was published in IEEE Software. This is the Foundation’s first peer-reviewed publication. Sadly my copy of this particular issue has been lost in the post; I must chase it up with the IEEE.
  • 2011-11-10: The GHCN 3.1.0 dataset was released by NCDC. This incorporates fixes to their homogenization code prompted by Dan Rothenberg’s Summer of Code work.
  • 2011-11-22: Invited talk to an ICES Foundation meeting at the WMO in Geneva. They’re a group in the very early stages of an exciting project. I had finally received my copy of Michael Nielsen’s book “Reinventing Discovery” just before this trip. I read it on my outward journey, and as a result completely rewrote my talk. Read this book now.
  • 2011-11-25: Invited seminar at NCAS in Reading. Covered the usual ground: the CCF, ccc-gistemp, Summer of Code, the Science Code Manifesto. Very lively Q&A session afterwards, which continued through coffee into lunch.
  • 2011-12-15: Attended AAAS-sponsored seminar by David MacKay at Imperial College, London. MacKay is the chief scientific adviser to the UK Department of Energy and Climate Change (DECC), and talked about their “2050 Pathways” web tool, which has grown out of his great book on sustainable energy. He is far too busy, but I managed to pigeon-hole him briefly to discuss a possible project to build a Pathways app for smartphones.
  • 2012-01-18: Invited talk at NCDC in Asheville, NC, hosted by Peter Thorne. Very interesting meetings with Tom Petersen and Scott Hausman, then the Ingest and Analysis group, and the Climate Data Records team.
  • 2012-01-22/26: Invited speaker and panellist at the AMS annual meeting in New Orleans. Johnny Lin (of PyAOS) ran the “Second Symposium on Advances in Modeling and Analysis Using Python” and kindly invited me to speak. An excellent meeting, full of new contacts and ideas. One particular highlight for me was that all three of our Summer of Code students gave presentations. I was also very glad to meet Travis Oliphant, the creator of NumPy – I hope to be able to work with him, and his new Continuum Analytics company and NumFocus foundation, in future.

In the next month I am giving a seminar at the Met Office and attending a round-table meeting of the Royal Society policy study. We’re expecting the Google Summer of Code 2012 to be announced shortly, and are hoping to take part.

It’s possible I’ve missed a few items. I haven’t mentioned any of the amazing related work going on, especially in the open science field. I’ll try to summarize that in another blog post.

As you can see, the blog silence has not been a sign of inactivity – rather the reverse. In fact, I’m actually writing this blog post in a stolen moment between sessions at the AMS meeting. Some other aspects of the Foundation’s work have also been neglected (for instance, we failed to schedule a meeting of our advisory committee). But the Foundation is in rude health.

In other news, one of our founders, David Jones, is now working for scraperwiki, a truly excellent open-source/open-data website. He continues as a director of the Foundation, and is writing a paper on some of our work.

The Foundation is still unfunded and all our work continues to be unpaid (and most “invited talks” do not include travel or accommodation expenses). We are meeting most of our expenses from a small fee for contract programming in March 2011.

Posted in News | Leave a comment

Science Code Manifesto

'Code is Method' button, 240x120I am pleased to announce the launch of the Science Code Manifesto, laying out general principles of publication for science software. Please read the manifesto, endorse it if you agree, then come back here to discuss it.

This issue isn’t specific to climate science. I originally created this as a response and contribution to the Royal Society’s policy study on “Science as a Public Enterprise”. It is partly inspired by the Panton Principles, a bold statement of ideals in scientific data sharing. It refines the ideas I laid out in an opinion piece for Nature in 2010.

However, I did not originate these ideas. They are simply extensions of the core principle of science: publication. Publication is what distinguishes science from alchemy, and is what has propelled science – and human society – so far and so fast in the last 300 years. The Manifesto is the natural application of this principle to the relatively new, and increasingly important, area of science software.

Go on, endorse it now.

Posted in News | Tagged , | 3 Comments

Royal Society submission

The Royal Society is conducting a policy study entitled “Science as a Public Enterprise”, and called for submissions from the public. The Climate Code Foundation made the following submission in August, (and I just realised that I never posted it to the blog):

1. What ethical and legal principles should govern access to research results and data? How can ethics and law assist in simultaneously protecting and promoting both public and private interests?

Two prefatory remarks which apply to all our answers: First, the Climate Code Foundation is a non-profit organisation to promote the public understanding of climate science. Thus, it is focused specifically on climate science. Although its arguments may well apply to other fields, it takes no formal view on those fields.

Secondly, in this and the following answers, we take the use of ‘data’ in the questions to mean both scientific measurements (and their accompanying metadata such as instrument type and time of observation) and code (that is, computer programs written by scientists to process their raw data into results).

In specific response to question 1: since climate science results are of critical public importance, the ethical principle of least harm is relevant to these datasets, and dictates that they should be made available to the general public as open data http://opendefinition.org/. That is, at no cost, and under no restrictions save only, at most attribution and share-alike requirements.

The software source code, which creates, defines, and interprets the datasets, should be available as open source software http://opensource.org/ for the same reason: so that any interested party may inspect, copy, reason about, criticise, improve, or run it, without restrictions.

Regarding legal principles and law, the Climate Code Foundation does not have a view. We have observed many legal actions against climate scientists, and use of legal instruments in place of polite enquiry, and legal threats in place of debate, and find the effects to be damaging to discourse and chilling to research. The criminal and civil law should of course be available at the last resort, but that is not how it has been used.

2 a) How should principles apply to publicly-funded research conducted in the public interest?

In this case, and again restricting my remarks to climate science code and data, the public-pays principle confirms the conclusion that publicly-funded research results should be open source and open data.

2 b) How should principles apply to privately-funded research involving data collected about or from individuals and/or organisations (e.g. clinical trials)?

This is not relevant to climate science and so the Climate Code Foundation has no view.

2 c) How should principles apply to research that is entirely privately-funded but with possible public implications?

The possible threat of climate change to our collective well-being is so severe that privately-funded research data and code, in this field, should be open. However, I see no way to enforce this.

2 d) How should principles apply to research or communication of data that involves the promotion of the public interest but which might have implications from the privacy interests of citizens?

This is not relevant to climate science and so the Climate Code Foundation has no view.

3. What activities are currently under way that could improve the sharing and communication of scientific information?

There are many, and I’m sure other responses will give a much broader view. The top trends I would mention are as follows:

  • open access publication;
  • open data repositories;
  • open bibliographic data;
  • linked open data;
  • electronic lab notebooks, and open notebook science;
  • open source software;
  • blog science and tweet science;
  • citizen science and ‘crowd-sourced’ science.

In the Climate Code Foundation we ourselves are working with several climate science groups to create shared science software resources, and to improve access to and understanding of existing climate data resources.

4. How do/should new media, including the blogosphere, change how scientists conduct and communicate their research?

The Climate Code Foundation takes no view on the use of new media in conducting research, except to welcome the possibility of increased public engagement through online citizen science projects such as Old Weather. Much science can and must continue to be done in a traditional way (e.g. a month digging up tree-stumps from permafrost, followed by several months of long lab hours processing them).

However, new media (email, the web, the blogosphere), does provide new opportunities for communicating research, both to other researchers and to the public. Many of these changes have already taken place over the last few decades, and indeed early users of the internet were almost all researchers, and the web was invented precisely to allow better communication between scientists.

First, the usual medium of communication between scientists is now email. It may seem trivial to mention this; I mention it to illustrate a potential problem with the question. Email is a new medium, but it has become so completely integrated into our lives that it is hard to imagine science without it.

Secondly, the internet allows researchers to communicate results fully to each other: datasets, source code, and all. Science communication is no longer constrained to the narrow bottleneck of the paper publication. Modern science is entirely dependent on such uses of the internet.

Thirdly, the web provides an unfiltered channel for scientists to communicate their results directly to interested members of the public. This might be by providing raw data, commentary, tools for data analysis, and linking to other information. Many sites are tremendous resources for learning, exploration, and education. In climate science, we encourage and assist with this process.

Certain web-based tools should also be included in “new media”, and these can provide direct benefits to scientists that are able to use them. If access to your source code does not require a signed faxed agreement, then the department secretary can be taken out of the loop and the source code can be placed on one of the many freely available project management tools (gitHub, GoogleCode, KnowledgeForge, etc). If individual researchers can even use the same tools to manage the code on a day to day basis in the spirit of Open Notebook research; this may reduce the burden on departmental IT staff. BitTorrent, and other similar tools, can be used to share data in a peer-to-peer fashion, reducing the burden on departmental datacentres, but again only if the grip on data is released slightly and it is made more openly available. We doubt that benefits are restricted to these particular examples.

Whether scientists engage in the discussions which take place in ‘the blogosphere’ is a matter of choice; scientists should feel no obligation to take part. In climate science, at least, it is not generally a polite environment, and large parts of it are completely hostile to almost any informed point of view. It can be a boundless and toxic time-sink.

5. What additional challenges are there in making data usable by scientists in the same field, scientists in other fields, “citizen scientists” and the general public?

There are a few. Data is not always accompanied by metadata, by a description of its format, its provenance, or its scope, or by information about limitations and errata. This information may be understood by the originating scientists, but needs to be added to the data to make it usable by others. However, this process is not compulsory, and should be encouraged simply by the rewards which automatically flow: a scientist whose data is more usable by others will receive more citations and other professional recognition.

That recognition does assume that mechanisms are in place to identify the effort: the researcher who does this work should of course be credited with it.

6 a) What might be the benefits of more widespread sharing of data for the productivity and efficiency of scientific research?

Simply put, more sharing of data will allow scientists to build more rapidly and reliably on each others’ results, causing science to progress more quickly in the discovery of truth and the construction of scientific knowledge. It will reduce duplication of effort, it will result in more rapid discovery and correction of errors, and it will increase the speed with which new ideas and approaches are developed.

More sharing of code in particular will have the same effects, and will also allow the development of large, reliable, shared libraries and bodies of code, increasing the productivity and reliability of science across entire fields.

Sharing of data and code should help to ‘level the playing-field’ for poorly-resourced scientists, and may also lead to higher levels of collaboration and to greater community-building.

6 b) What might be the benefits of more widespread sharing of data for new sorts of science?

Open data and code allow relatively novel types of science such as crowd-sourced citizen science: Zooniverse and especially research using automated data-mining and scraping systems, such as the various projects under Peter Murray-Rust’s “Blue Obelisk” banner. Without open data and code, these projects would be impossible. Without significant improvements in policies and statements of open-ness, they will be unable to progress.

6 c) What might be the benefits of more widespread sharing of data for public policy?

In climate science, very great indeed, and this is the raison d’etre of the Climate Code Foundation: by encouraging and enabling climate scientists to share and communicate their results more clearly and effectively, we work to improve public understanding of the science, and thus to allow the formation of public policy in an informed context.

In climate science, the unavailability of some code and data, and the impression that other code or data is either simply unavailable or more mysteriously ‘missing’, has been immensely damaging to public perceptions for several years, especially since the theft of emails from the University of East Anglia in 2009. Those emails were grossly mis-represented as revealing a culture of secrecy, data abuse, and incompetence.

In fact practices in climate science are broadly in line with those in many other sciences – although there is a certain level of openness, some data and much source code is not freely available to the public. It may be available on request, but at the discretion of the researcher, and often not to a member of the public. This inconsistent environment allows noxious allegations to flourish, and to be exploited in public policy debates. Many times in the last decade, policy makers have repeat these allegations in debate, to cast doubt on the science, to avoid or delay important climate-related policies. The negative effect of this overall atmosphere on policy related to climate change is hard to overstate.

Greatly increased openness in climate science would not draw all the poison from this discourse: positions are far too entrenched for that. Allegations of secrecy and corruption will continue. But it would give climate scientists an unambiguous, consistent, and verifiable answer to all such questions. Here is the data. Here is the code. Here are the results.

6 d) What might be the benefits of more widespread sharing of data for other social benefits?

It is hard to estimate the general social benefits of a better-informed public.

6 e) What might be the benefits of more widespread sharing of data for innovation and economic growth?

The Climate Code Foundation has no view on this question as it is posed.

6 f) What might be the benefits of more widespread sharing of data for public trust in the processes of science?

See my answer to (6c).

7. How should concerns about privacy, security and intellectual property be balanced against the proposed benefits of openness?

There is no conflict.

Scientists have a right to privacy, and open-ness does not conflict with that. We only advocate the sharing of research products: data, code, results, and publications. None of those contain any private information.

“Security” is a catch-all word, but again, there is no conflict.

“Intellectual property” is a much-abused term. I will take its use here to mean copyrights. Copyright is not threatened by openness. Indeed, many frameworks for openness (such as the Creative Commons) depend on copyright to enforce sharing conditions such as attribution. “Intellectual property” is often used as a post-hoc argument to justify delaying or avoiding openness.

There is no copyright on data (although there may be database rights within the European Union), and datasets have been protected in the past by secrecy and embargos. Such embargos are becoming a thing of the past, as publication policies change, and scientists are realising that they receive more credit for becoming the originators and curators of a widely-used, widely-published, foundational dataset, than they could garner by eking out a few more papers on their own.

Software may be protected by copyright, but the protected work is rarely of any value which could be realised. Either the research is described by a publication which gives full details of the algorithm, or it is not. In the latter case, we would argue that the publication is seriously flawed – the science depends on methods which are not published – and is not truly part of the collective enterprise of science. In the former case, because algorithms per se cannot be protected by copyright, there is no “intellectual property” left to protect.

In a very few cases, research bodies may have come to rely on revenue generated by licensing software or data protected by copyright or database rights. In the specific case of climate science, we hold that the public interest argument for openness is so strong that these bodies must restructure their business models, and funding agencies may have to allow for this.

In short, the “intellectual property” of science is a commons, to which all researchers contribute, and from which all of society benefits.

8. What should be expected and/or required of scientists (in companies, universities or elsewhere), research funders, regulators, scientific publishers, research institutions, international organisations and other bodies?

Public funders of research in climate science should require all research products to be made available to the public: publications should be open-access, data should be open data, software should be open-source.

The responsibilities of all other actors in climate science (scientists, institutions, publishers, etc) will follow naturally from this requirement.

Many stakeholders and powers-that-be have made positive statements or policies about openness, but they are often hedged around with phrases such as “where possible”, “when available”, or “subject to commercial constraints”. No such hedges are viable in climate science. The public policy issues are so serious, and the possible consequences of inaction so grave, that there must be no exceptions.

Other comments

The Climate Code Foundation is launching a “Science Code Manifesto”, on the specific subject of science software availability, and its consequences for science stakeholders (strongly related to your question 8). Please see:

http://code.google.com/p/climatecode/wiki/ScienceCodeManifesto.

Posted in News | Leave a comment

Homogenization report

This guest post is written by Daniel Rothenberg, who worked all summer on homogenization code, thanks to the excellent Google Summer of Code. This is his third post, here are the first and second.

As you may recall, I spent the past summer working on behalf of the Climate Code Foundation to port and revise the Pairwise Homogenization software utilized by the National Climatic Data Center to produce the US Historical Climate Network dataset. Since my last update in the middle of July, I successfully worked through my first pass at the remaining sections of the algorithm, and have arrived at a major milestone – a Python program which can arbitrarily look at networks in the USHCN raw data, and homogenize them based on pairwise comparisons.


Figure 1 illustrates the homogenization results for two stations which were passed into the algorithm with a random selection of 50 other stations from across the USHCN. This test illustrates that the new code does some things very well, but still has some work to be done. For starters, when investigating the diagnostic output log from running the code on this test case, it is clear that the code nearly exactly reproduces its Fortran parent’s results up through the final “CONFIRMFILT” stage of analysis. At this stage, the code attempts to condense a large number of suspected breakpoints into a best-fit over the data. There are still some discrepancies between my code and the original, which tends to suppress the final number of detected changepoints. A perfect example of this is in the NEW ULM plot in Figure 1; the Python code misses the first detected changepoint around the year 2000, while it sucessfully finds others that the Fortan code spots. By contrast, the Python code sometimes fails to remove extra changepoints – particularly around swaths of ‘deleted’ data (data which cannot be analyzed in this algorithm, usually because there aren’t enough paired neighbors to provide supporting information to them); this is illustrated well by the COLFAX plot in Figure 1, around 1910.

Although this is the major glitch in the code at this point, there are some other issues which need to be ironed out. First, there are some numerical issues associated with calculating the standardized adjustments to apply at each changepoint. From my experience with other parts of the code, this is likely a sign error in the statistical test which calculates the final adjustment at each changepoint, so it should be simple to find and fix in the future. Second, the algorithm needs to be adjusted to accept external sources of documented changepoints – this will greatly improve its ability to find the “best” changepoints in the cloud of suspect ones it finds through the first half of the algorithm. Finally, I am still working on re-engineering the code in its existing form to work more in the fashion of an API so that it can be more easily used on various datasets in the future.

This project wouldn’t have been possible without the support and adivce of David Jones and Nick Barnes of the Climate Code Foundation, as well as with the help and advice of Claude Williams and Matt Menne at the National Climatic Data Center. Hannah and Filipe – my GSoC compatriots – also provided great feedback and help throughout our code reviews and meetings. I’d like to thank them for all their time and effort over the summer!

Finally, I’m excited to continue working on this project – especially over the next few months and leading up the 2012 Annual Meeting of the American Meteorological Society, where I will hopefully presenting a talk entitled “Lessons From Deploying the USHCN Pairwise Homogenization Algorithm in Python” as part of the 2nd Symposium on Advanced in Modeling and Analysis Using Python. There is much work to continue with in the future:

  • Refinement of the Python homogenization code, including addressing known bugs in the CONFIRMFILT process.
  • Further collaboration with Menne/Williams to improve the code and thoroughly see how it differs from the Fortran homogenization code.
  • A possible project with David Jones, looking at applying this algorithm to data from the Canadian climate record.

If you’re interested in helping continue this project, please contact me or the Climate Code Foundation – we’d love to have you on board!

Posted in News | Tagged , , | 1 Comment

Common Climate Project demonstration

This guest post is written by Hannah Aizenman, who worked all summer on the Common Climate Project, thanks to the excellent Google Summer of Code. The summer is now past, and Hannah has built a useful demonstrator website. This is her third post, here are the first and second.

Since the last blog post, the project has gained a (very) barebones web interface, so you can test out the functionality here. So please go and play with the project, and if you end up thinking “hmm, this could work for a dataset I have”, grab the code. And if instead you think “hmm, this could work, but …” file a ticket, contribute a patch, or email the mailing list and see if it can get sorted out. The project has been tested with GISTEMP and CCSM-C, and should work out of the box on any NetCDF dataset using float coordinates and unidate time (assuming an “x since y” unit is given for time). I’m currently sorting out how to support non-gridded data, with a focus on the Mann temperature reconstructions.

I hope that by giving people a simple little toolset to play with datasets, it’ll simplify dataset exploration so that anyone (even somebody who doesn’t understand climate/geophysical dataset conventions) can just join in and play. All the plotting is still handled server side using ccplib, so if you’ve got local data and don’t really want or need the web aspect, just grab that code, run setup.py, and go. There’s one demo already, and I’d love contributions of more!

I’ve also added support for making time series graphs, and I hope to add support for more visualization tasks as this project grows. I’m also including an example of a spatial graph for anyone who missed the last blog post:

CCP example time series chart with GISTEMP dataCCP example map using GISTEMP data

I had originally intended this project to lay done the framework for building a flexible toolkit for visualizing data, and I hope I’ve at least accomplished that much. Adding new visualizations and file types mostly boils down to hooking into an existing class (CCPData or Graph respectively) and the client side HTML, CSS, and JavaScript is heavily separated so that the form can easily be styled to fit within a larger website. Adding more content to the web is a matter of adding another JavaScript function, HTML element, and pyramid view. The backend web architecture tries to be RESTful (though it doesn’t yet conform to the HTML RFC), so the URLs for the images contain all the user defined attributes of the graph and the graphs can be created and manipulated directly from the URL. I hope to maintain the flexibility of the project so that it can grow into something really useful for all sorts of scientists. I very much hope my project will simplify the current chore of getting data on the web, because I think making the data public friendly is key to improving public data literacy.

Posted in News | Tagged , , | Leave a comment