Journal:Making data and workflows findable for machines

From LIMSWiki
Revision as of 19:04, 31 January 2021 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Making data and workflows findable for machines
Journal Data Intelligence
Author(s) Weigel, Tobias; Schwardmann, Ulrich; Klump, Jens; Bendoukha, Sofiane; Quick, Robert
Author affiliation(s) Deutsches Klimarechenzentrum, Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen,
CSIRO, Indiana University Bloomington
Primary contact Email: weigel at dkrz dot de
Year published 2020
Volume and issue 2(1–2)
Page(s) 40-46
DOI 10.1162/dint_a_00026
ISSN 2641-435X
Distribution license Creative Commons Attribution 4.0 International
Website https://www.mitpressjournals.org/doi/full/10.1162/dint_a_00026
Download https://www.mitpressjournals.org/doi/pdf/10.1162/dint_a_00026 (PDF)

Abstract

Research data currently face a huge increase of data objects, with an increasing variety of types (data types, formats) and variety of workflows by which objects need to be managed across their lifecycle by data infrastructures. Researchers desire to shorten the workflows from data generation to analysis and publication, and the full workflow needs to become transparent to multiple stakeholders, including research administrators and funders. This poses challenges for research infrastructures and user-oriented data services in terms of not only making data and workflows findable, accessible, interoperable, and reusable (FAIR), but also doing so in a way that leverages machine support for better efficiency. One primary need yet to be addressed is that of findability, and achieving better findability has benefits for other aspects of data and workflow management. In this article, we describe how machine capabilities can be extended to make workflows more findable, in particular by leveraging the Digital Object Architecture, common object operations, and machine learning techniques.

Keywords: findability, workflows, automation, FAIR data, data infrastructures, data services

Introduction

In several scientific disciplines, the number, size, and variety of data objects to be managed are growing. Examples of particular interest to the challenges discussed in this article include climate modeling[1], geophysics[2], and “omics”-based scientific approaches.[3] The supporting data infrastructures and services are challenged to offer adequate solutions, and researchers are looking toward increased automation in their processes to cope with the needs. Aspects of automation are intrinsic to making data and workflows findable, accessible, interoperable, and reusable according to the FAIR guiding principles.[4] This article highlights the automation steps that are required to automatically identify data objects, associate them with metadata, and make both that data and the processes that generated them more findable. Persistent identifiers, machine processes with autonomous decision-making capability, and machine-actionable metadata are critical elements for practical solutions.


References

  1. Balaji, V.; Taylor, K.E.; Juckes, M. et al. (2018). "Requirements for a global data infrastructure in support of CMIP6". Geoscientific Model Development 11 (9): 3659–3680. doi:10.5194/gmd-11-3659-2018. 
  2. Squire, G.; Wu, M.; Friedrich, C. et al. (2018). "IN43C-0903: Scientific Software Solution Centre for Discovering, Sharing and Reusing Research Software". Proceedings from the 2018 AGU Fall Meeting. https://agu.confex.com/agu/fm18/meetingapp.cgi/Paper/459873. 
  3. Goble, C.; Cohen=Boulakia, S.; Soiland-Reyes, S. et al. (2020). "FAIR Computational Workflows". Data Intelligence 2 (1–2): 108–21. doi:10.1162/dint_a_00033. 
  4. Mons, B.; Neylon, C.; Velterop, J. et al. (2017). "Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud". Information Services & Use 37 (1): 49–56. doi:10.3233/ISU-170824. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.