This is the first of a series of posts intended to help scientists take part in the open science revolution by pointing them towards effective tools, approaches, and technologies.
When I make presentations to groups of scientists about the Climate Code Foundation and the Science Code Manifesto, I am often asked to recommend particular systems, or to advise on the advantages and disadvantages of some particular combination of tools. Many scientists are keen to adopt free software systems, open-source tools, and modern development methods, to become part of the growing open science revolution, and are looking for guidance.
I’m happy to discuss such questions individually, but there are far too many possible systems, configurations, and sets of requirements for anyone to give a single comprehensive answer. One tool might be excellent for most uses but fall short in some particular aspect which is critical for your project. Another might be generally weak but have a tremendously strong feature which addresses your needs. Over the next few days I will be giving some specific answers to common questions—identifying the strengths and weaknesses of my own favourite systems—but first I will try to help any scientist to pick winners for themselves, by giving some useful rules of thumb.
So I have formulated five general rules for making good choices in the open-source development world, and this is the first in a series of posts laying out those rules.
1. Is it really free? Is it really open?
The first question to ask about any particular tool is: what restrictions are there on using, modifying, or sharing it? Can my audience use it too?
The open science revolution is built on free software, and I strongly encourage all researchers to use free software whenever possible. However, the term “free software” has a very specific technical meaning, which is often misunderstood by those outside the software world. It does not mean “zero-cost software”, it means “software that does not restrict its users from studying, modifying, and sharing it”.
Because of this confusion, I usually favour the term “open-source software”, which isn’t so easily misunderstood and also has two other advantages. First, it has the ring of jargon, so listeners immediately grasp that it has a specific technical meaning. Secondly, it chimes with other pillars of twenty-first century science: “open data” and “open access”, with which researchers are often already familiar. Open-ness—making one’s results available to others to be criticised, praised, and built upon—is a core scientific value, and this term emphasizes that the same principle is at work for software.
The formal definitions of the two terms overlap almost totally—certainly all the software I recommend is both free and open-source—and many people use them synonymously: I mostly use “free software” for audiences of software professionals, and “open-source software” for others. It’s very regrettable that some people, including software community leaders, see conflict between the two. In my experience most people actually writing free software aren’t very interested in the distinction.
In any case, the important thing to remember is that software which is zero-cost but not truly free is a very risky choice which will not provide many of the great advantages of open-source software. Scientists and researchers working in large institutions in rich countries often have a very wide choice of zero-cost software tools:
- Some will be available “for free” for some limited trial period. For example, many new computers come pre-installed with suites of software which become unusable after 30 days, unless a fee is paid. The same is often true of software provided at a departmental or group level, and of many web services (for example, search tools for research literature). Remember, the first hit of any drug is always free.
- Some will have been bought for the institution, the department, or the team. A license may need renewing—often annually—at a price to be determined by the vendor. Next year, or next week, the license may come up for renewal and be cancelled due to budget pressure.
- Some will not have been bought, but will have been copied illegally, and any licensing mechanism will have been subverted. Such software may be suddenly disabled (and the people responsible punished).
- Some may have been bought by a single researcher or small group, and made available to a few people. The fact that it’s on your computer doesn’t guarantee that you have the right to use it.
- Finally, some will be truly free, open-source, software.
Always, ask yourself this: can my audience use this tool? Open science is all about sharing your research, including your code, with others, and if they can’t use the tool then your code is much less useful to them—they won’t read it and they won’t improve it—and to you: they won’t cite it.
Your audience might be your research colleagues, or those elsewhere in your department, or at other institutions around the world (including those with less generous budgets), or independent researchers, or the public. One key audience member, often the most important, is your own future self: you can use the tool right now, but your institution might stop licensing it, or you might move to work somewhere without a license, or the software vendor might go broke (so that suddenly nobody has a license). Do you want to be able to use your own research in five years’ time?
Open science uses open-source software. Make sure you do too.