Royal Society submission

The Royal Society is conducting a policy study entitled “Science as a Public Enterprise”, and called for submissions from the public. The Climate Code Foundation made the following submission in August, (and I just realised that I never posted it to the blog):

1. What ethical and legal principles should govern access to research results and data? How can ethics and law assist in simultaneously protecting and promoting both public and private interests?

Two prefatory remarks which apply to all our answers: First, the Climate Code Foundation is a non-profit organisation to promote the public understanding of climate science. Thus, it is focused specifically on climate science. Although its arguments may well apply to other fields, it takes no formal view on those fields.

Secondly, in this and the following answers, we take the use of ‘data’ in the questions to mean both scientific measurements (and their accompanying metadata such as instrument type and time of observation) and code (that is, computer programs written by scientists to process their raw data into results).

In specific response to question 1: since climate science results are of critical public importance, the ethical principle of least harm is relevant to these datasets, and dictates that they should be made available to the general public as open data http://opendefinition.org/. That is, at no cost, and under no restrictions save only, at most attribution and share-alike requirements.

The software source code, which creates, defines, and interprets the datasets, should be available as open source software http://opensource.org/ for the same reason: so that any interested party may inspect, copy, reason about, criticise, improve, or run it, without restrictions.

Regarding legal principles and law, the Climate Code Foundation does not have a view. We have observed many legal actions against climate scientists, and use of legal instruments in place of polite enquiry, and legal threats in place of debate, and find the effects to be damaging to discourse and chilling to research. The criminal and civil law should of course be available at the last resort, but that is not how it has been used.

2 a) How should principles apply to publicly-funded research conducted in the public interest?

In this case, and again restricting my remarks to climate science code and data, the public-pays principle confirms the conclusion that publicly-funded research results should be open source and open data.

2 b) How should principles apply to privately-funded research involving data collected about or from individuals and/or organisations (e.g. clinical trials)?

This is not relevant to climate science and so the Climate Code Foundation has no view.

2 c) How should principles apply to research that is entirely privately-funded but with possible public implications?

The possible threat of climate change to our collective well-being is so severe that privately-funded research data and code, in this field, should be open. However, I see no way to enforce this.

2 d) How should principles apply to research or communication of data that involves the promotion of the public interest but which might have implications from the privacy interests of citizens?

This is not relevant to climate science and so the Climate Code Foundation has no view.

3. What activities are currently under way that could improve the sharing and communication of scientific information?

There are many, and I’m sure other responses will give a much broader view. The top trends I would mention are as follows:

  • open access publication;
  • open data repositories;
  • open bibliographic data;
  • linked open data;
  • electronic lab notebooks, and open notebook science;
  • open source software;
  • blog science and tweet science;
  • citizen science and ‘crowd-sourced’ science.

In the Climate Code Foundation we ourselves are working with several climate science groups to create shared science software resources, and to improve access to and understanding of existing climate data resources.

4. How do/should new media, including the blogosphere, change how scientists conduct and communicate their research?

The Climate Code Foundation takes no view on the use of new media in conducting research, except to welcome the possibility of increased public engagement through online citizen science projects such as Old Weather. Much science can and must continue to be done in a traditional way (e.g. a month digging up tree-stumps from permafrost, followed by several months of long lab hours processing them).

However, new media (email, the web, the blogosphere), does provide new opportunities for communicating research, both to other researchers and to the public. Many of these changes have already taken place over the last few decades, and indeed early users of the internet were almost all researchers, and the web was invented precisely to allow better communication between scientists.

First, the usual medium of communication between scientists is now email. It may seem trivial to mention this; I mention it to illustrate a potential problem with the question. Email is a new medium, but it has become so completely integrated into our lives that it is hard to imagine science without it.

Secondly, the internet allows researchers to communicate results fully to each other: datasets, source code, and all. Science communication is no longer constrained to the narrow bottleneck of the paper publication. Modern science is entirely dependent on such uses of the internet.

Thirdly, the web provides an unfiltered channel for scientists to communicate their results directly to interested members of the public. This might be by providing raw data, commentary, tools for data analysis, and linking to other information. Many sites are tremendous resources for learning, exploration, and education. In climate science, we encourage and assist with this process.

Certain web-based tools should also be included in “new media”, and these can provide direct benefits to scientists that are able to use them. If access to your source code does not require a signed faxed agreement, then the department secretary can be taken out of the loop and the source code can be placed on one of the many freely available project management tools (gitHub, GoogleCode, KnowledgeForge, etc). If individual researchers can even use the same tools to manage the code on a day to day basis in the spirit of Open Notebook research; this may reduce the burden on departmental IT staff. BitTorrent, and other similar tools, can be used to share data in a peer-to-peer fashion, reducing the burden on departmental datacentres, but again only if the grip on data is released slightly and it is made more openly available. We doubt that benefits are restricted to these particular examples.

Whether scientists engage in the discussions which take place in ‘the blogosphere’ is a matter of choice; scientists should feel no obligation to take part. In climate science, at least, it is not generally a polite environment, and large parts of it are completely hostile to almost any informed point of view. It can be a boundless and toxic time-sink.

5. What additional challenges are there in making data usable by scientists in the same field, scientists in other fields, “citizen scientists” and the general public?

There are a few. Data is not always accompanied by metadata, by a description of its format, its provenance, or its scope, or by information about limitations and errata. This information may be understood by the originating scientists, but needs to be added to the data to make it usable by others. However, this process is not compulsory, and should be encouraged simply by the rewards which automatically flow: a scientist whose data is more usable by others will receive more citations and other professional recognition.

That recognition does assume that mechanisms are in place to identify the effort: the researcher who does this work should of course be credited with it.

6 a) What might be the benefits of more widespread sharing of data for the productivity and efficiency of scientific research?

Simply put, more sharing of data will allow scientists to build more rapidly and reliably on each others’ results, causing science to progress more quickly in the discovery of truth and the construction of scientific knowledge. It will reduce duplication of effort, it will result in more rapid discovery and correction of errors, and it will increase the speed with which new ideas and approaches are developed.

More sharing of code in particular will have the same effects, and will also allow the development of large, reliable, shared libraries and bodies of code, increasing the productivity and reliability of science across entire fields.

Sharing of data and code should help to ‘level the playing-field’ for poorly-resourced scientists, and may also lead to higher levels of collaboration and to greater community-building.

6 b) What might be the benefits of more widespread sharing of data for new sorts of science?

Open data and code allow relatively novel types of science such as crowd-sourced citizen science: Zooniverse and especially research using automated data-mining and scraping systems, such as the various projects under Peter Murray-Rust’s “Blue Obelisk” banner. Without open data and code, these projects would be impossible. Without significant improvements in policies and statements of open-ness, they will be unable to progress.

6 c) What might be the benefits of more widespread sharing of data for public policy?

In climate science, very great indeed, and this is the raison d’etre of the Climate Code Foundation: by encouraging and enabling climate scientists to share and communicate their results more clearly and effectively, we work to improve public understanding of the science, and thus to allow the formation of public policy in an informed context.

In climate science, the unavailability of some code and data, and the impression that other code or data is either simply unavailable or more mysteriously ‘missing’, has been immensely damaging to public perceptions for several years, especially since the theft of emails from the University of East Anglia in 2009. Those emails were grossly mis-represented as revealing a culture of secrecy, data abuse, and incompetence.

In fact practices in climate science are broadly in line with those in many other sciences – although there is a certain level of openness, some data and much source code is not freely available to the public. It may be available on request, but at the discretion of the researcher, and often not to a member of the public. This inconsistent environment allows noxious allegations to flourish, and to be exploited in public policy debates. Many times in the last decade, policy makers have repeat these allegations in debate, to cast doubt on the science, to avoid or delay important climate-related policies. The negative effect of this overall atmosphere on policy related to climate change is hard to overstate.

Greatly increased openness in climate science would not draw all the poison from this discourse: positions are far too entrenched for that. Allegations of secrecy and corruption will continue. But it would give climate scientists an unambiguous, consistent, and verifiable answer to all such questions. Here is the data. Here is the code. Here are the results.

6 d) What might be the benefits of more widespread sharing of data for other social benefits?

It is hard to estimate the general social benefits of a better-informed public.

6 e) What might be the benefits of more widespread sharing of data for innovation and economic growth?

The Climate Code Foundation has no view on this question as it is posed.

6 f) What might be the benefits of more widespread sharing of data for public trust in the processes of science?

See my answer to (6c).

7. How should concerns about privacy, security and intellectual property be balanced against the proposed benefits of openness?

There is no conflict.

Scientists have a right to privacy, and open-ness does not conflict with that. We only advocate the sharing of research products: data, code, results, and publications. None of those contain any private information.

“Security” is a catch-all word, but again, there is no conflict.

“Intellectual property” is a much-abused term. I will take its use here to mean copyrights. Copyright is not threatened by openness. Indeed, many frameworks for openness (such as the Creative Commons) depend on copyright to enforce sharing conditions such as attribution. “Intellectual property” is often used as a post-hoc argument to justify delaying or avoiding openness.

There is no copyright on data (although there may be database rights within the European Union), and datasets have been protected in the past by secrecy and embargos. Such embargos are becoming a thing of the past, as publication policies change, and scientists are realising that they receive more credit for becoming the originators and curators of a widely-used, widely-published, foundational dataset, than they could garner by eking out a few more papers on their own.

Software may be protected by copyright, but the protected work is rarely of any value which could be realised. Either the research is described by a publication which gives full details of the algorithm, or it is not. In the latter case, we would argue that the publication is seriously flawed – the science depends on methods which are not published – and is not truly part of the collective enterprise of science. In the former case, because algorithms per se cannot be protected by copyright, there is no “intellectual property” left to protect.

In a very few cases, research bodies may have come to rely on revenue generated by licensing software or data protected by copyright or database rights. In the specific case of climate science, we hold that the public interest argument for openness is so strong that these bodies must restructure their business models, and funding agencies may have to allow for this.

In short, the “intellectual property” of science is a commons, to which all researchers contribute, and from which all of society benefits.

8. What should be expected and/or required of scientists (in companies, universities or elsewhere), research funders, regulators, scientific publishers, research institutions, international organisations and other bodies?

Public funders of research in climate science should require all research products to be made available to the public: publications should be open-access, data should be open data, software should be open-source.

The responsibilities of all other actors in climate science (scientists, institutions, publishers, etc) will follow naturally from this requirement.

Many stakeholders and powers-that-be have made positive statements or policies about openness, but they are often hedged around with phrases such as “where possible”, “when available”, or “subject to commercial constraints”. No such hedges are viable in climate science. The public policy issues are so serious, and the possible consequences of inaction so grave, that there must be no exceptions.

Other comments

The Climate Code Foundation is launching a “Science Code Manifesto”, on the specific subject of science software availability, and its consequences for science stakeholders (strongly related to your question 8). Please see:

http://code.google.com/p/climatecode/wiki/ScienceCodeManifesto.

This entry was posted in News. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *