1.5 Open Data

Data sharing is a core element in Open Science to enhance the credibility and trust on our research practices. Data sharing allows for the reproducibility of previous findings, while enabling others to inspect and spot mistakes in the data, the meta-data, or the research workflow. Additionally, it prevents—to a certain extent—the emergence of scientific fraud due to fabricated data/results/visualizations or to the use of questionable data analytical practices.

Data sharing also favors a swift progress in fields of research with difficulties in gaining access to relevant data (Gewin, 2016). It also enables researchers working at institutions based on countries with less resources and limited access to paywall journals and repositories.

However, sharing data has its drawbacks. Some researchers invest years to collect data that is very difficult to obtain (e.g., long-term primate behavior in Kibale National Park, Uganda) with the idea of completing a research project that will produce several outputs. If the data is shared in the first publication, sharing might be a problem for future publications (Hunt, 2019). To partially mitigate this problem, many journals allow researchers to publish the data sets in specialized data journals such as NatureScientific Data, GigaScience, BMC Research Notes, or Data in brief. For example, in NatureScientific Data, these data descriptors are curated and can be modified to incorporate new data collected later. Moreover, Nature-titled journals do not consider prior Data Descriptor publications to compromise the novelty of new manuscript submissions if those manuscripts go substantially beyond a descriptive analysis of the data, and report important new scientific findings appropriate for the journal in question (Nature—Scientific Data, 2023).

Similarly, sharing data is not as easy and straightforward as it seems (e.g., to upload my anonymized csv file into a public repository). Researchers—especially postgraduate students and early career researchers-—must learn a new set of skills in order to publish their research products:

  • Data curation
  • Data management plan
  • Storing, saving, archiving, and data preservation
  • Meta-data
  • Data analysis and visualization
  • Data wrangling
  • Reproducibility and data reuse
  • Compliance with FAIR data principles