One of our goals is to see more scientific code published. Nature kindly gave us space to voice this opinion earlier in the year. In the world of software tools (our home planet if you like) we have seen huge strides forward because people published the source code to their software. It’s where the Open Source movement began. We believe that science will similarly be improved by having more scientists publish more of their code.
By publish we don’t necessarily mean polished, documented, formatted, and printed in a glossy peer-reviewed journal. We just mean made available. Whatever you wrote. Just stick it on the web somewhere. A zipfile is fine.
What should this look like? The upcoming issue of Annals of Applied Statistics provides a fine example. McShane and Wyner have published an article in this journal, and their are various discussions of the article in the same issue. One of which is Davis & Liu, and their code is published as supplementary material. It’s a great example of what we mean when we say you should publish your code. The code (it’s a few dozen lines of R) is clearly more or less as Davis & Liu wrote it. It has a comment at the top telling you how to download the inputs and run it, that looks like it might have been added in haste later.
As code goes it’s not great code: it’s poorly documented, and full of magic numbers. But it doesn’t have to be great code. It does the job; no-one is going to be building nuclear power stations or recommending the purchase of a cancer-busting drug using this code. The important thing is that it’s the code used to produce the figures in the paper, and it’s published.
(Davis & Liu are not the only ones to make their code available, there is plenty more in the supplementary materials: McShane and Wyner make available their R code. The rest use Matlab: Smerdon’s; Tingley’s; Holmström’s; Kaplan’s)
Stein’s editorial in the same issue of Annals of Applied Statistics is well worth reading, and he has useful things to say on peer-review, data, statistical testing, uncertainty, and the relationship between code and reproducibility. He notes that (emphasis mine) “There is a movement in various disciplines to make all numerical results reported on in published papers reproducible by providing all of the data and code used to generate the results“, and goes on to say that this reproducibility “should be a requirement for research that has potentially important public policy implications whenever permissible”.
Naturally we agree.