User:Shawndouglas/sandbox/sublevel2

From LIMSWiki
< User:Shawndouglas‎ | sandbox
Revision as of 16:52, 10 August 2018 by Shawndouglas (talk | contribs) (Created stub. Saving and adding more.)
Jump to navigationJump to search

Sandbox begins below

Full article title A data quality strategy to enable FAIR, programmatic access across large, diverse data collections for high performance data analysis
Journal Informatics
Author(s) Evans, Ben; Druken, Kelsey; Wang, Jingbo; Yang, Rui; Richards, Clare; Wyborn, Lesley
Author affiliation(s) Australian National University
Primary contact Email: Jingbo dot Wang at anu dot edu dot au
Editors Ge, Mouzhi; Dohnal, Vlastislav
Year published 2017
Volume and issue 4(4)
Page(s) 45
DOI 10.3390/informatics4040045
ISSN 2227-9709
Distribution license Creative Commons Attribution 4.0 International
Website http://www.mdpi.com/2227-9709/4/4/45/htm
Download http://www.mdpi.com/2227-9709/4/4/45/pdf (PDF)

Abstract

To ensure seamless, programmatic access to data for high-performance computing (HPC) and analysis across multiple research domains, it is vital to have a methodology for standardization of both data and services. At the Australian National Computational Infrastructure (NCI) we have developed a data quality strategy (DQS) that currently provides processes for: (1) consistency of data structures needed for a high-performance data (HPD) platform; (2) quality control (QC) through compliance with recognized community standards; (3) benchmarking cases of operational performance tests; and (4) quality assurance (QA) of data through demonstrated functionality and performance across common platforms, tools, and services. By implementing the NCI DQS, we have seen progressive improvement in the quality and usefulness of the datasets across different subject domains, and demonstrated the ease by which modern programmatic methods can be used to access the data, either in situ or via web services, and for uses ranging from traditional analysis methods through to emerging machine learning techniques. To help increase data re-usability by broader communities, particularly in high-performance environments, the DQS is also used to identify the need for any extensions to the relevant international standards for interoperability and/or programmatic access.

Keywords: data quality, quality control, quality assurance, benchmarks, performance, data management policy, netCDF, high-performance computing, HPC, fair data

Introduction

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.