Open Science Tools: supporting hands on creation of the digital extended specimen

Nicolson, Nicky and Lucas, Eve

As a biodiversity informatics community, we have mobilised and interconnected a wide array of information, including specimen collections, published literature and metadata resources, which compile facts about collections and the people that work with them. We have defined data standards to facilitate data interoperability and tools development. Along with colleagues in allied research disciplines, we have helped to develop training resources, enabling researchers to automate routine tasks like data access and reference management. We have also started to explore how we could realise the vision of the digital extended specimen, which would integrate specimens and associated data across multiple research infrastructures, allowing the investigation of wider scale research questions. How this would be achieved is still the subject of discussion and experimentation: an open community will support a diverse range of approaches. In the construction of the digital extended specimen, we can envision useful activities operating at very different scales: from large scale computational processes run at (or between) research infrastructures, to an ecosystem of lightweight tools that support link construction in context, closer to researchers. A toolset enabling in-context link construction could play a similar role to Open Refine (which has been effective at democratizing data linking between different sources), and supply valuable training data for the development of machine learning approaches. We will review the use of Open Refine in the biodiversity informatics (and wider research) communities and examine the resources and working practices that facilitated the adoption of this tool. We will showcase work towards “an extensible notebook for open science”, and we aim to open a discussion on how a link-aware editor for semi-structured data plus standard open science tools (i.e., those covered by training resources such as software, data and author carpentry) could be viewed as a lightweight alternative to traditional document production—just as Open Refine is a viable alternative to many traditional spreadsheet use cases. Our aim is to enable researchers to develop the digital extended specimen at research time, but without being prescriptive about their workflow. We will conclude by discussing how this effort supports open science, showing how researchers are open to access the data needed to explore their area of study and to form their hypotheses, using well recognised entities (specimens, names, people, institutions, citations etc), represented in data standards and accessed via open APIs - but also open to choose how they organise their work. In the authors own domain (botany), such a tool will be fundamental to e-taxonomic undertakings to build an online reference system in which all known plant species are described, as well as to significant acceleration of parts of the taxonomic process to address the biodiversity crisis.

10.3897/biss.6.91123