Ten reasons you must publish your code

Last week I gave a short talk to the SoundSoftware Workshop 2013. SoundSoftware is a group of researchers in the field of music and acoustics, based at Queen Mary University in London, who promote the use of sustainable and reusable software and data. My talk was entitled “Ten reasons you must publish your code”, and SoundSoftware have now published the video. It was intended to stimulate debate, and the time was limited, so it’s long on polemic and short on evidence, although there is plenty of evidence out there to support almost everything I said. The short list of reasons is as follows:

  1. Review: to improve your chances of passing review, publish your code;
  2. Reproducibility: if you want others to be able to reproduce your results, publish your code;
  3. Citations: if you want to boost your citation counts (or altmetrics), publish your code;
  4. Collaboration: to find new collaborators and teams, to cross-fertilise with new areas, publish your code;
  5. Skills: to boost your software skills, publish your code;
  6. Career: to improve your resume and your job prospects, publish your code;
  7. Reputation: to avoid getting egg on your face, or worse, publish your code;
  8. Policies: to get a job, to publish in a particular journal, to secure funding, publish your code;
  9. Preparation: to prepare for the great future of web science, publish your code;
  10. Science! To do science rather than alchemy, publish your code.
This entry was posted in News. Bookmark the permalink.

2 Responses to Ten reasons you must publish your code

  1. Tim Daly says:

    Nick,

    You and David Jones are “Founding Signatories” of the Science Code Manifesto which, as you know, lays out five principles for science software (http://sciencecodemanifesto.org)

    I’m interested in how you think these goals might be accomplished.

    These issues affect me directly. Axiom was developed at IBM Research, was sold commercially as a competitor to Mathematica and Maple, and was eventually released by me as open source software. (http://axiom-developer.org)

    You write:

    CODE: All source code written specifically to process data for a published paper must be available to the reviewers and readers of the paper.

    If someone writes an algorithm using Axiom they need to include the Axiom source code, which is fine and allowed by the license. In particular, you require that they provide a *link*, along with a full description of the platform, language implementation, tools, libraries, and parameters to run the platform.

    Perhaps you aren’t considering the problem in sufficient depth. I can provide you a link to a particular, reproducible version of Axiom using the git hash code. You can extract the exact version.

    Unfortunately, I’d bet that the code won’t run. Indeed, I spend a significant amount of time getting Axiom to run on “this week’s latest release of the operating system”. Without a solid “reference platform” that does not change almost no software will run.

    Even commercial software has that problem. I bought a GoPro a while ago and used it with my Android. But now the software no longer works on the latest Android and GoPro has abandoned it.

    What “reference platform” would you suggest as the permanent basis?

    You write:

    CURATION: Source code must remain available, linked to related materials, for the useful lifetime of the publication.

    “The curator must provide the specific version of the software used in the publication, along with ownership and licensing information, accessible by a unique stable identifier such as a DOI or URI.”

    A git hash code uniquely identifies a particular version of software so this subgoal is easy to fulfill… except that there are rumors that sites like github might disappear. Where is the stable platform that will survive “for the useful lifetime of the publication”?

    I have had several discussions of this “Digital Vellum” issue with Vint Cerf, albeit without any particular solution.

    “Bodies asserting code ownership, and not using open-source licenses, have a particular duty of curation, as they prevent others from voluntarily curating their code.”

    Indeed. What happens when Wolfram Research goes out of business? Does all of the mathematical research based on Mathematica disappear? Will there be any way to reproduce the published results?

    You miss another long-term issue. How to make software “live” beyond the participation of the original authors. You’ll find that EVERY code repository (github, savannah, sourceforge, …) is 95% dead code. The original authors stopped working on it and nobody else can understand it enough to make it useful again.

    This is a “long term” problem which is not addressed by any of the hot software development methods. It became apparent to me only after I, as one of the original Axiom authors, got my own code back after 10 years. I understood the code. I just didn’t understand WHY I wrote what I did.

    Literate Software, ala Knuth, seems to be the only long-term solution to survivable software. Software needs to achieve a level similar to that seen in “Physically Based Rendering” by Pharr and Humphreys.

    We need a “Stable Software Institute” that publishes a “Reference Platform” so scientific software developers can write reproducible software. Linux presents a stable API. We need to extend the idea to higher levels using published standards (e.g. ANSI Common Lisp) so scientific software can have a stable place to live.

    Scientific commercial software needs to have a “dead man clause” that will enable the software to be held “in escrow”. It would be publicly released if it was no longer commercially available. This will prevent things like Macsyma (Symbolics died) and the TI scientific calculator software (based on Derive from Soft Warehouse), etc. from disappearing. Companies die. There are very few companies that are 100 years old. We are letting our scientific software rot among the fallen result. The only science that will benefit will be the new branch called “Software Archeology”.

    Tim Daly

    • Tim Daly says:

      I’ve spent years thinking about and writing about this problem.

      Axiom’s code, written in the 1970s, represents the “Newton’s Notebooks” of computational mathematics software. I am doing my very best to make sure that it is not lost to history.

      We need to collect and promote common, openly available, scientific software on a git-like STABLE platform. Scientists should be encouraged to use that code when possible. Hopefully that platform would reach “critical mass” so that it becomes “the standard reference”.

      Of course, being open source, this is a pipe dream. There is no money in open source and I don’t see how to make such a platform into a commercially viable venture. The grant agencies won’t fund it because it isn’t connected to a University. The government won’t fund it because it would be considered “competition” with existing commercial software. So we are left with hundreds of random rotting code piles underlying our scientific work. (Scientific Linux might be an exception, see https://www.scientificlinux.org/)

      It will be a generation before the maintainable software (aka Literate Software) meme becomes apparent. Companies have to come to understand that they live or die by software and that it needs to survive beyond the current Scrum nonsense. Once the team is disbanded they will have millions of lines of “legacy code”. Think COBOL, still the basis for many banking platforms, or ADA, still the basis for government code. None of it is literate and yet it is vital to the companies.

      It will take another generation for that insight to escape companies into the open source platforms.

      So I expect that at least a few generations of scientific work will not be viable.

      Tim Daly

Leave a Reply

Your email address will not be published. Required fields are marked *