Open Life Sciences OLS6

This project has been selected to participate in the Open Life Sciences mentoring program, as part of OLS cohort 6. This will run from September 2022 to January 2023. The text used for the application to the program is given below:

Title

An extensible notebook for open specimens

TL/DR

Define and develop a research tool to enable exploration and citation of all specimens and associated data necessary to develop a formal species description.

Project

This project is developing a prototype “extensible notebook for open specimens”. This is a link-aware editor for semi-structured data based on personal knowledge management software (Obsidian). This environment plus standard open science tools (reference management tooling and pandoc document production) could help the adoption of open science principles amongst biodiversity researchers.

The project is split into three main areas of investigation (effort so far has been focussed on the first):

  1. Working environment: can we extend personal knowlege management software to reference biodiversity-relevant data classes (in a similar way to how bibliographic citations are managed)

    We have developed a set of Obsidian plugins which facilitate easy access to the data resources needed to (a) work with existing species descriptions from literature and (b) recognise and formally describe new species. (Entry for the forthcoming Ebbe Nielsen challenge)

  2. Review environment: can we generate snapshots for peer-review /publication
  3. Publication environment: can we package data for harvesting into data aggregators

We aim to enable researchers to develop the “digital extended specimen”, but without being prescriptive about their workflow: open to access and publish the necessary data - but also open to choose how to organise their work.

Problem

Biodiversity research depends upon literature and specimens from museums and herbarium collections. The biodiversity informatics community has defined data standards (www.tdwg.org) and mobilised these data through multiple research infrastructures (www.gbif.org). We’re moving towards the “digital extended specimen” - a concept that links specimens with associated data across multiple research infrastructures to investigate wider-scale questions. Exactly how these linkages would be achieved is still the subject of discussion and experimentation, but useful activities could contribute at very different scales: from large-scale computational processes run at (or between) research infrastructures, to a lightweight toolkit that supports expert-led link construction in context, at research time. To date, attention has mostly focussed on the former activities, and the range of tools and techniques which could be combined to enable researchers to participate in this effort have not been fully investigated.

It is difficult to recruit technical staff to work on biodiversity informatics projects. Approaches which enable us to more fully utilise standard cross-domain tools for research note-taking, reference management and document production should free us to focus technical efforts on the problems particular to our own domain: accessing, integrating and mobilising data about specimens and the species which they represent.