Journal:A robust, format-agnostic scientific data transfer framework

From LIMSWiki
Revision as of 22:05, 7 November 2016 by Shawndouglas (talk | contribs) (Created stub. Saving and adding more.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
Full article title A robust, format-agnostic scientific data transfer framework
Journal Data Science Journal
Author(s) Hester, James
Author affiliation(s) Australian Nuclear Science and Technology Organisation
Primary contact Email: jxh at ansto dot gov dot au
Year published 2016
Volume and issue 15
Page(s) 12
DOI 10.5334/dsj-2016-012
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website http://datascience.codata.org/articles/10.5334/dsj-2016-012/
Download http://datascience.codata.org/articles/10.5334/dsj-2016-012/galley/605/download/ (PDF)

Abstract

The olog approach of Spivak and Kent[1] is applied to the practical development of data transfer frameworks, yielding simple rules for construction and assessment of data transfer standards. The simplicity, extensibility and modularity of such descriptions allows discipline experts unfamiliar with complex ontological constructs or toolsets to synthesise multiple pre-existing standards, potentially including a variety of file formats, into a single overarching ontology. These ontologies nevertheless capture all scientifically-relevant prior knowledge, and when expressed in machine-readable form are sufficiently expressive to mediate translation between legacy and modern data formats. A format-independent programming interface informed by this ontology consists of six functions, of which only two handle data. Demonstration software implementing this interface is used to translate between two common diffraction image formats using such an ontology in place of an intermediate format.

Keywords: metadata, ontology, knowledge representation, data formats

Introduction

For most of scientific history, results and data were communicated using words and numbers on paper, with correct interpretation of this information reliant on the informal standards created by scholarly reference works, linguistic background, and educational traditions. Modern scientists increasingly rely on computers to perform such data transfer, and in this context the sender and receiver agree on the meaning of the data via a specification as interpreted by authors of the sending and receiving software. Recent calls to preserve raw data[2][3] and a growing awareness of a need to manage the explosion in the variety and quantity of data produced by modern large-scale experimental facilities (big data) have led to an increase in the number and coverage of these data transfer standards. Overlap in the areas of knowledge covered by each standard is increasingly common, either because the newer standards aim to replace older ad hoc or de facto standards, or because of natural expansion into the territory of ontologically “neighbouring” standards. One example of such overlap is found in single-crystal diffraction: the newer NeXus standard for raw data[4] partly covers the same ontological space as the older imgCIF standard[5], and both aim to replace the multiplicity of ad hoc standards for diffraction images.

References

  1. Spivak, D.I.; Kent, R.E. (2012). "Ologs: A categorical framework for knowledge representation". PLoS One 7 (1): e24274. doi:10.1371/journal.pone.0024274. PMC PMC3269434. PMID 22303434. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269434. 
  2. Boulton, G. (2012). "Open your minds and share your results". Nature 486 (7404): 441. doi:10.1038/486441a. PMID 22739274. 
  3. Kroon-Batenburg, L.M.; Helliwell, J.R. (2014). "Experiences with making diffraction image data available: What metadata do we need to archive?". Acta Crystallographica Section D Biological Crystallography 70 (Pt. 10): 2502-9. doi:10.1107/S1399004713029817. PMC PMC4187998. PMID 25286836. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4187998. 
  4. "NXmx – Nexus: Manual 3.1 documentation". NeXusformat.org. NIAC. 2015. http://download.nexusformat.org/doc/html/classes/applications/NXmx.html. 
  5. Bernstein, H.J. (2006). "Classification and use of image data". International Tables for Crystallography G (3.7): 199–205. doi:10.1107/97809553602060000739. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance.