Nature article: “Publish your computer code”

I am proud to announce that this week’s Nature features an opinion piece by me, arguing that all science software–from tiny scripts to huge models–should be published.  There is also a related news article–with quotes from many luminaries, distinguished company for yours truly–about the very prevalent use of software in science, and some related problems (primarily a lack of training and openness).

This issue is important for the whole of science, and I was delighted to be approached by Nature to write the article.  However, the word-limit was quite strict and it was not possible to address many questions and concerns.  Over the next few weeks we will be posting a number of blog articles and white papers here on the Foundation website, to help fill in these gaps.  Just as a quick bullet-list of tasters:

  • No, publication on its own is not enough.  We want to see open-source publication, so that code can be re-used by other scientists and join the great competitive collaborative enterprise that is peer-reviewed science.
  • Yes, training is important and must be funded.  If I had to pick five top software skills which all scientists should learn, they would be source code management, defect tracking, literate programming, unit testing, and evolutionary development.
  • Yes, open development is important.  Open source-code management, open defect-tracking.  Sourceforge or Google Code are good models here; there are also science-specific open workflow tools.
  • Yes, there are specific requirements and a specific urgency in climate science, because of present and future public policies which are developed and decided based on the science results.  Public support for these policies has been substantially eroded, in part due to doubts about climate science software.  That is why the Foundation exists.
  • Yes, I am an outsider, but I have known scientists all my life and have worked with climate scientists for several years on Clear Climate Code.  I do know what I’m talking about.
  • Yes, right now the Foundation is just tiny, and unfunded.  But we have advisors, goals, a plan, and are seeking sponsorships and partnerships. We won’t be so small for long.
  • Yes, our goals are ambitious, but the longest program starts with a single line of code.

This week I am in Brussels talking to NGOs and European officials, building up the organisational networks which we will need to have any effect on policies of agencies and inter-governmental bodies.  So I haven’t actually been able to pick up a copy of Nature.  I look forward to seeing my name in print when I get home tomorrow.

This entry was posted in News. Bookmark the permalink.

9 Responses to Nature article: “Publish your computer code”

  1. Derecho64 says:

    NCAR, for one, has been very open with its CESM code. You even get access to the SVN repo!

    How much more open can you get than that?

    • Nick Barnes says:

      Right. Some science code is published, and there are other moves in this direction, as I say in my article. We are pushing at an opening door. One of our activities will be to make a comprehensive web directory of published climate science code.

      As for “How much more open?”, most of CESM is public-domain (yay!) but parts are under other licenses (several broken links on that page), so in fact I don’t know whether it is truly open-source. Still, it is an excellent example of published source code, and I applaud it.

  2. Rattus Norvegicus says:

    Your op/ed was really good. Having worked on kernel code for 15 years (various flavors of System V, Irix and then NT) then 2 years on a web content management system and 6 years on a customer relationship management system, I can attest to the dirty little secrets of what commercial software really looks like, and it ain’t pretty.

    Both the Unix and the NT kernels had poor internal documentation and this was true for the code on the application side also. Luckily for the Unix kernel there was a small cottage industry producing books which explained the internal workings of the kernel but mostly I learned the workings from reading the code and, where possible, stepping through it with a debugger. You don’t even want to know what locore.s/the locore directory (written in assembler no less) looked like and this is where I spent most of my time, since my job involved bringing the OS up on new architectures. At this level debugging involved logic analyzers, ICEs and occasionally an oscilloscope. Real Programming(tm) at it’s finest. The situation with NT was better because of the HAL layer, but it still involved a bit of Real Programming(tm). In sum commercial software ain’t much better than the scripts seen in climategate data dump, although the commerical software was better structured than the considerably shorter scripts.

    One point you do bring up, source control, is key. If you are going to release your software, you have to use source control. At SGI I was heavily involved in my division’s switch from our internally developed ptools system (based on RCS and somewhat like the original CVS scripts) to Perforce (I was the one who did the evaluations and recommended it as the system of choice). This was back in the day when Perforce was new and there were a lot of questions about the longevity of the company, etc. But, they had a great product, our programmers loved it, and it was frikkin’ fast. The adoption in our division was so successful that SGI eventually adopted it company wide. When I worked at Unisys I drove the adoption of CVS in our division in the late 1980′s/early 1990′s on the strength of Brian Berliner’s papers at USENIX. Ah, the good old days when companies would actually send programmers to conferences as part of continuing education.

    I don’t think that scientists need to be software engineers, as some seem to be arguing, but learning the basics of the development process would improved the situation quite a bit. Note that this means putting data sets under source control also. Minimal CM is a good thing also, but even CVS provides that, but SVN and Perforce handle the job much better.

    • Nick Barnes says:

      I am familiar with some of the same code to which you refer, having struggled in the early 90s with low-level issues in Solaris and Irix (a compiler we were writing turned out to be a useful tool for discovering CPU design defects).
      Scientists don’t need full professional training as software engineers (a devalued term in any case, which has come to mean quite different things in different parts of the field). They need something like Software Carpentry.

  3. Cheng Soon Ong says:

    I share your sentiments. Nice opinion piece in Nature. We’ve been pushing open source software in my area of research (called machine learning) at mloss.org. Since 2007, we have slowly managed to change a few minds but it has been an uphill battle. We’ve published a longer article (without word limits) detailing why one should publish software at:

    http://jmlr.csail.mit.edu/papers/v8/sonnenburg07a.html

    There are two points which I would like to add based on the responses I have got when trying to advocate publication of scientific software:

    1. Software is not considered a scientific contribution. In the chase for academic tenure, writing (and publishing) software is something that does not show up on the radar. Hence most academics dismiss software writing as a “waste of time”. We are trying to change this in our field, but there is a long way to go. I know cases of people who have written significant pieces of scientific software and still have not received recognition in academia for it.

    2. Reuse. In computational sciences, the dream scenario would be to be able download some source code to include into your project. If many people publish their software, invariably some good quality stuff emerges, and this gives the field a big push forward since the scientist now does not have to spend that extra time coding up the well known idea.

  4. Steven Sullivan says:

    Nick, can you recommend any resources I could use to educate myself better in “the five top software skills which all scientists should learn…source code management, defect tracking, literate programming, unit testing, and evolutionary development”?

  5. Frobisher says:

    I think this is a good start, but the scope of the problem, as I see it, is much larger.

    “Code” is multilingual, for example, so that just having code available is often not much use. Being able to connect useful code quickly and easily is still not practicable.

    Various attempts have been made to produce ‘global’ environments, but I would not think anyone is going to use Java for heavy duty programming, for example. Any large project that uses Java, especially numerically intensive ones, is going to be inadequate compared to althernatives. Fortan is still the preferred language for CGMs.

    Setting up a proper framework for the development, testing, archiving and control of source code is not a trivial matter. You then have to maintain that environment. Then you have to be trained in it’s use. It is time consuming, and expensive, something that scientists tend to not have much of after doing the rest of their work. Software Carpentry is a good example of something that would be useful for their code, but not something that they should have to bother with. This is the age of specialisation.

    Software development as a discipline is still in it’s infancy, as can be seen by the rapid development and change that is happening. Much of what you call for will serve as little more than to provide us with some historical curiostities.

    Perhaps you could provide an environment for scientists that will make the task of using version control and change management easy, so that they don’t have to worry about such things. Not because they aren’t capable of it, but because they would find it useful to offload such concerns.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>