How can you publish FAIR software

Publish & Share
A step-wise guide to make your software Findable, Accesible, Interoperable and Reusable.

Introduction

This document provides some pointers that can help to set you up with a coding repository that follows the FAIR guiding principles, meaning Findability, Accessibility, Interoperability, and Reuse of digital assets. It is not meant to be an exhaustive guide for every step, but rather acts as a starting point and links to more exhaustive sources.

Publishing online

Research code (or other code) is best stored on a git repostory, like GitHub or GitLab. Both platforms are is used by millions of programmers and researchers around the world. To upload your code to GitHub/GitLab, you first need to make an account here. Once you have an account, you can create a repository that hosts your code.

GitHub: You can create a free account. In addition, as a university employee, you can use your university email address to get a free “Pro” account, which gives you access to some additional features. You can find more information at the GitHub Education platform. GitLab: As VU employee, you can access GitLab using your VU credentials (VUnetID) by logging in through the VU portal.

Creating a repository

GitHub: After logging in, click your profile picture on the top right -> “Your repositories” -> “New”. Choose a name and set the visibility (do you want everyone to see your code already or wait a bit before sharing with the world?). It is common practice to immediately add a “README” file and add a licence (see next section). You can update the visibility of your repository later under the repository settings.

GitLab: After logging, click on “Projects” in the left menu, and then on “Create project”. Choose a name and set the visibility (do you want everyone to see your code already or wait a bit before sharing with the world?). It is common practice to immediately add a “README” file and add a licence (see next section). You can update the visibility of your repository later under the repository settings.

Adding code to your repository

GitHub: You can create a new file using the “Add file” or “+”-button.

GitLab: You can create a new file using the “+”-button and click “New file”.

Once you have uploaded your files or written your code, you need to write a “commit” message. This is basically a description of the changes you made between the previous version of the code and the current version. Since this is the first version of the files (that is available online) you can write something like “initial”.

Note: never put sentitive information (like passwords or API keys) in your code.

Using Git locally

When you plan to use GitHub/GitLab more often, adding files through the web interface can become cumbersome. In that case, it is useful to consider using GitHub Desktop or a git manager which is integrated in your IDE (integrated development environment), like Visual Studio Code. Git can also be used on the command line, for example, when using git from a server. The course Learn Git & GitHub provides a useful tutorial on using git from the command line.

Well done! Your research code is not sitting on your own computer (or worse – deleted after publication*), but is available online for others to find. This is step one in following the FAIR principles. In the remainder of this guide, you will find steps that will guide you through all steps to release a “perfect” code repository. The first sections are most important, further sections provide room for further improvement, so please use this guide at your own discretion.

*VU Amsterdam requires research data and software to be preserved for at least 10 years, unless legal provisions or discipline-specific guidelines dictate otherwise, as per the Research Data and Software Management Policy. Note that funding organisations may have specific requirements for preserving code as well. If you receive external funding for your research, please make sure to familiarise yourself with such requirements.

Licence

Licensing software

Publishing research software under an appropriate licence is crucial for its accessibility, usability, and further integration into research. Choosing a licence usually happens right when you start developing the software or when you put it in a public repository, rather than when the software is finished and fully baked.

A software licence states how other people may re-use your code and under which circumstances. For research software, it is recommended (and often required by funders) that licences are as permissible as possible.

There are many licences out there; below we list some very frequently used licences in research software. However, if none of these licences fit your case, there are several tools that can help you to choose a suitable software licence. If you need guidance in choosing a licence for your software, get in touch with the RDM Support Desk.

MIT License

The MIT License is a popular choice, due to its readability and permissiveness. It allows users to reuse the software for any purpose, including using, copying, modifying, and distributing it, provided they include the original copyright notice and licence text.

However, its permissiveness means that derivative works can be closed-source and do not need to mention that they use your code, which might not align with all scientific openness goals or general.

GNU GPLv3

The GNU General Public License (GPLv3) is another option, designed to ensure that the software and any derivatives remain open-source.

This encourages collaborative improvement of software. Any software that includes GPL-licensed code must also be open-source under the GPLpotentially deterring commercial use or integration with proprietary software. In conclusion, when you want your code to be used by others, but only the code that uses your code is also open source, this is the way to go.

Apache License 2.0

The Apache License 2.0 allows for modification and distribution of the software and its derivative works, with the requirement that changes to the original code are documented.

It is a more complex licence than the MIT License and can be incompatible with GPL-licensed software. The specifics of this go beyond the scope of the handbook.

Adding a licence to GitHub

On GitHub you add a licence on creating a new repository, by selecting the licence from the drop-down menu. If your repository already exists, add a new file called “LICENSE” using the “+”-button on top of the repository (see below).

Location of file creation button

One the next page, start to type LICENSE as the file name, and a button to “Choose a license template” should automatically pop up. Follow the steps provided by GitHub to finish adding the licence to the repository.

You should now see your licence shown on the main page of your repository.

Further considerations
  • If you are reusing software or libraries written by someone else, you must stick to the clauses of the licence given to the original software/library;
  • When choosing a licence, do not just think about what others may do with the software, but also what you might want to do with the software in the future.

Persistent Identifier

A persistent identifier is a durable reference to a digital dataset, document, website or other object. A Digital Object Identifier (DOI) is a widely used Persistent Identifier in the research domain and hence for research software as well. There are two ways to generate a DOI for your code. Depending on how often you intend to update your code, one or the other may be simpler.

  1. If you do not intend to update your code often, it is sufficient to upload your code separately to Zenodo (or similar repository). This will then automatically generate a DOI for you, which can be included in a publication.
  2. If however, you intend to update your code frequently, it may be easier to set up an automatic link between GitHub and Zenodo, so that a new version of the code is released on Zenodo on publication of a new version on GitHub. You can find instructions for this in this guide on referencing and citing content.

Readme, commenting, code formatting

To allow others to understand and use your code effectively it is important to include a “README” file, and to comment your code.

Readme file

The README file is the general introduction to a code repository. A good readme includes the following components (but may include other sections depending on the project):

  • Project Title
  • Project description
  • Installation Instructions: Provide step-by-step guidelines to get the project running.
  • Usage Examples: Include examples of how to use the code, which can be particularly beneficial for libraries or APIs.
  • Dependencies: List any libraries or other software required to run the project.
  • License: Specify the licensing under which the code is released (see above).
  • Citation: Collect more citations 😉 by providing researchers a way to easily cite your paper.
  • [if applicable] Contributing Guidelines: Explain how others can contribute to the project.

On GitHub, the README file should ideally be named “README.md”. GitHub automatically renders this file and displays it on the home page of your repository. You can find an example of a README file in the numpy repository.

Code commenting

Code commenting serves several essential purposes, such as enhancing readability, maintainability, and usability for original authors, future contributors and others interested in your code. Comments provide context or explanations for complex logic, variables, and algorithms, making the codebase accessible and understandable. Good practices include describing the purpose of functions, explaining the rationale behind decisions, and highlighting potential side effects.

A huge part of commenting your code is variable naming. Good variable naming can go a long way in making your code “self-documenting”. In general, do not use abbreviations or very short variable names. This may save a couple of keystrokes when writing your code, but this time is easily recovered when looking at the code again months later. Good variable naming will help your colleagues or other interested people even more. Modern integrated development environments also usually include autocompletion.

Code formatting

Consistency in code formatting, and adherence to standards can be highly beneficial, allowing other people and yourself to more quickly grasp your code. Each language has different conventions. For example, Python uses PEP8, for which some examples are given below:

  • Use 4 spaces as indentation
  • local variable_names are all lowercase
  • ClassNames have each word starting with a Capital letter
  • GLOBAL_VARIABLES are all uppercase

Other examples are Google’s R Style Guide and Google’s C++ Style Guide.

Code formatting tools

You can also configure Visual Studio Code to automatically help you formatting your code. You can find more information in the Visual Studio code formatting manual. Even more strict formatting can be done with ruff, which is PEP8-compliant but introduces additional rules. ruff can also format your code in place, running ruff format {source_file_or_directory}.

You can also use GitHub Actions to automatically check and format your code when you push new commits to your repository. For example for Python, ruff publishes GitHub actions that can be used to automatically check and format your code.

If you follow the steps above, your code is now following the FAIR principles already! However, there is always room for improvement. Further steps are explained below.

Documentation

Extensive documentation can help users of your code re-use your code effectively. There are some great tools out there to help you automatically build documentation from your code comments, which can be supplemented with general documentation. Sphinx is a great way to do this, and can be nicely integrated with your existing GitHub repository. An example of code documentation can be found in the ReadTheDocs documentation, which uses Sphinx. This guide is not meant to be a full guide on how to use Sphinx as many useful resources can be found on the internet. An example of a great resource is this blog post.

Installing Sphinx

To use Sphinx for automatic documentation generation in a GitHub repository, start by installing Sphinx using pip (pip install sphinx). Next, run sphinx-quickstart in your project’s root directory to generate the basic configuration (conf.py) and structure for your documentation.

Automatic documentation building

Documentation can be automatically updated when you push new commits to your GitHub repository. To do so, you can set up a continuous integration (CI) workflow using GitHub Actions. In a workflow file, you specify the steps to install Sphinx, build the documentation using the sphinx-build command, and then push the generated HTML files to the gh-pages branch or another branch designated for hosting your documentation. A guide on how to set this up can be found in this Sphinx manual.

This guide is not meant to be a full guide on how to use Sphinx, as many useful resources can be found on the internet. However, some useful resources (here, and here) are listed, and pointers are given.

Automatically documenting your code

To document a Python function using autodoc, ensure that the functions in your code have a properly formatted docstring. For example:

def add_numbers(a, b):
    """
    Adds two numbers together.

    :param a: first number
    :type a: int or float
    :param b: second number
    :type b: int or float
    :return: The sum of `a` and `b`
    :rtype: int or float
    """
    return a + b

In the conf.py file, enable the autodoc extension by adding 'sphinx.ext.autodoc' to the extensions list. This extension allows Sphinx to generate documentation directly from your source code’s docstrings. Then, in your Sphinx documentation source directory (usually docs/source), create a .rst file where you want this function’s documentation to appear. Use the autofunction directive to automatically include the function’s documentation:

.. autofunction:: path.to.module.add_numbers

Replace path.to.module with the actual Python import path to your function. Sphinx will extract the docstring from the function and incorporate it into the generated documentation, complete with the parameter descriptions and types.

Released as a package

Creating a Python package allows you to share your code with a broader audience, promoting collaboration, and ensuring reproducibility. There are two “levels” of publishing your code. First of all, you can ensure that your code is directly installable from the GitHub repository. To do so, you can follow the steps explained in this Python tutorial, but only up to and including the section “Configuring metadata”. Here, we assume you already created a README.md and added a LICENCE as explained above.

In brief, you need to take the following steps, which are explained in full below:

  1. Structure your code as a package
  2. Create a metadata file

Then, once this is added to GitHub, you can install the package using the following command (of course adjusted to point to your own repository):

pip install git+https://github.com/your_username/your_repository.git

PyPi

If this works, you can go on to the next step and publish your package on PyPi. When your package is published here, it is possible to directly install your code from pip, as such:

pip install your_package

For a full explanation of how to do so, follow the remaining steps explained in this Python tutorial.

Automatic publishing versions

If you publish new versions frequently, it may be useful to automatically release these versions to PyPi, directly from GitHub without taking manual steps. To do so, you can follow the guide in this repository on PyPI publish GitHub Action.

Testing

Testing your code is a crucial aspect of software development, ensuring that your application behaves as expected and maintains a high level of quality. You can usually catch bugs much earlier, although it also requires some investment, especially in the beginning. Apart from catching bugs, testing can also focus on syntax use.

Code testing

One of the most frequently used Python packages for software testing is pytest. Here are some quick steps to set up pytest in your repository:

Python

  1. Install pytest:

    pip install pytest
  2. Create a tests directory in your repository’s main folder.

  3. Assuming you have a function called add_one in the file my_package/math_functions.py, you can start by creating a test file within the tests directory to test this function:

    # my_package/math_functions.py
    def add_one(x):
        return x + 1
  4. Create a test file named test_math_functions.py inside the tests directory. This file will contain the tests for your add_one function:

    # tests/test_math_functions.py
    from my_package.math_functions import add_one
    
    def test_add_one():
        assert add_one(1) == 2
        assert add_one(0) == 1
        assert add_one(-1) == 0
        assert add_one(100) == 101
  5. Run the tests by executing pytest from the command line. Navigate to your project’s root directory and run:

    pytest

R

For R, the most frequently used package for software testing is testthat, more information can be found in the testthat repository.

C++

For C++, there are a lot of different testing frameworks available, but one of the most frequently used are Catch2 and Google Test.

Testing tools

  • The Visual Studio Code Python extension has really good support for testing, saving you lots of time.
  • On GitHub, you can use GitHub Actions to automatically run your tests when you push new commits to your repository. For example for Python, pytest publishes GitHub actions that can be used to automatically run your tests.

Updating

Regularly updating your repository and engaging with users can help build a community around your software. For example, users can report bugs to your repository or even contribute to new features (or fix those bugs).

GitHub Issues

GitHub Issues is a feature that allows project maintainers and contributors to track tasks, enhancements, and bugs for their projects. It can be found by clicking on “Issues” for most repositories (unless explicitly disabled). For example, issues can be used to:

  • Report Bugs: Users and developers can report bugs they encounter, including descriptions, steps to reproduce, and screenshots.
  • Request Features: Suggestions for new features or improvements can be tracked as issues, allowing maintainers to prioritize and discuss them.

Publication

Finally, publishing your code alongside a scientific publication can help build trust in the software you developed. This can, of course, be done in a traditional journal alongside a journal article. However, in some cases, it may be useful to publish your code in a journal specifically focused on software development. Here, we recommend JOSS (Journal of Open Source Software).

Increasingly, traditional scientific publications also include a section that describes where the code can be found. This can be a DOI, a link to a GitHub repository, or a link to a Zenodo repository. This allows other researchers to reliably cite your software, which also improves the findability. An example of such a section can be found here:

Code for data cleaning and analysis is provided as part of the replication package, written in R. The specific code that was used for this article is available at https://doi.org/10.xxxx/path/to/journal/archive, while further updates are released at https://github.com/repository/your_code.