Once upon a time the internet didn’t exist. People corresponded by sending written or typed letters to one another through the postal service. Knowledge was transferred mostly through reports, articles and books. Important letters and books were collected and placed in collections in libraries, which one would have to make the effort to visit in order to learn what they had to tell.
Today things are different. Online collections and websites can be visited with comfort from anywhere with an internet connection. Online means of collecting, transferring and communicating knowledge are increasingly supplanting traditional print publishing methods. Figures, data and code are now research outputs in their own right that can stand alone or may be important for deciphering and communicating results that may be shared prior to any peer-reviewed publication. This is, on the whole, all taken for granted. However, not everything is as rosy as could be. Specifically I want to tackle how software and code are released.
It was once the case that the easiest way for researchers to make any software they had created available for others was by releasing it on their own personal websites (or those managed by their university). This is not ideal as these websites may disappear if the domain is not maintained e.g. if a researcher leaves the university (by changing jobs, retiring – or worse: if they die); or (more the case for personal websites) domain payments are not kept up. Software and code released in these ways are not persistent and may not stand the test of time. This creates a problem for people trying to access these materials. Another commonly seen phrase in manuscripts has been something along the lines of:
“ available from authors upon request”
This is also unsatisfying as these materials will only be available for as long as these authors are able to participate in correspondence (i.e. their current contact details can be found and that they will reply) and that they keep a copy of their original codes/software. As the length of time from release increases, the probability of being able to access relevant resources under these modes is likely to decrease.
Things are getting better. Nowadays various online repositories exist that provide a space specifically for uploading software code e.g. github, bitbucket. This is great as it encourages researchers to upload the codes they have used – which allows other to view the methods used, rather than just an application which is said to perform certain operations (in an unseen way).
However, some of the same problems exist surrounding the use of these sites as did for previous methods of releasing software and code. The companies that run these repositories could collapse, change their funding structure such that payment is required to access the material, or change their website address. In these cases the original software would become harder to access; or in the worse case inaccessible.
To quote the editor for my latest paper in Royal Society Open Science:
“ Whilst github is a useful platform it does not provide a final version of record ”
On top of potential problems from a business perspective other things need to be considered. Code/software within a github repository may continue to be altered after a paper using it has been published. This could be good (in terms of fixing bugs that have not been detected) or bad, when the repository changes so much it no longer reflects the original code (including for example the deletion of a repository!).
Fortunately, there is a solution for making code persistently accessible for future users — by archiving the code and assigning it a DOI. DOI’s (digital object identifiers) are designed to provide a unique identifier that can be assigned to almost any type of resource. DOI’s are permanent unique references to a specific resource. There exists a consortium of agencies who support the DOI who work collaboratively to ensure persistence of these DOI identifiers to the resource, regardless of how the URL may change.
In addition, I think it looks much nicer to cite an online resource through the repository and its DOI, than by inserting a website address into your manuscript! Thus making it easier to cite and give credit to code and software materials.
Both Zenodo (https://zenodo.org/) and figshare (https://figshare.com/) are examples of current repositories that allow for assigning a DOI to code (and both include options to integrate with github). There is a great how-to guide for making your code citable here.
I encourage you to archive your codes and software with a DOI.