Difference between revisions of "Journal:Building open access to research (OAR) data infrastructure at NIST"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 36: Line 36:
At the onset of the OAR project, priority was placed on developing a system that would allow us to comply with government open data policy.<ref name="BurwellOpenData13">{{cite web |url=https://obamawhitehouse.archives.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf |format=PDF |title=Open Data Policy—Managing Information as an Asset |work=M-13-13 Memorandum for the Heads of Executive Departments and Agencies |author=Burwell, S.M.; VanRoekel, S.; Park, T.; Mancini, D.J. |date=09 May 2013 |accessdate=20 April 2019}}</ref> This resulted in a baseline Minimum Viable Product (MVP), delivering a NIST public data listing (PDL) which enforces adherence to a new government data standard semantic model, the Project Open Data (POD) schema. The NIST PDL continues to be routinely harvested by the Department of Commerce and made available through the U.S. [https://www.data.gov/ data.gov] web portal, which hosts records of all POD-compliant government public datasets. Following enactment of the OPEN Government Data Act<ref name="115CongressHR4174_18">{{cite web |url=https://www.congress.gov/bill/115th-congress/house-bill/4174/text#toc-H8E449FBAEFA34E45A6F1F20EFB13ED95 |title=Title II - Open Government Data Act |work=HR 4174: Foundations for Evidence-Based Policymaking Act of 2018 |publisher=115th Congress |date=2018 |accessdate=29 January 2019}}</ref>, updates and compliance of our OAR infrastructure will be further advanced.
At the onset of the OAR project, priority was placed on developing a system that would allow us to comply with government open data policy.<ref name="BurwellOpenData13">{{cite web |url=https://obamawhitehouse.archives.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf |format=PDF |title=Open Data Policy—Managing Information as an Asset |work=M-13-13 Memorandum for the Heads of Executive Departments and Agencies |author=Burwell, S.M.; VanRoekel, S.; Park, T.; Mancini, D.J. |date=09 May 2013 |accessdate=20 April 2019}}</ref> This resulted in a baseline Minimum Viable Product (MVP), delivering a NIST public data listing (PDL) which enforces adherence to a new government data standard semantic model, the Project Open Data (POD) schema. The NIST PDL continues to be routinely harvested by the Department of Commerce and made available through the U.S. [https://www.data.gov/ data.gov] web portal, which hosts records of all POD-compliant government public datasets. Following enactment of the OPEN Government Data Act<ref name="115CongressHR4174_18">{{cite web |url=https://www.congress.gov/bill/115th-congress/house-bill/4174/text#toc-H8E449FBAEFA34E45A6F1F20EFB13ED95 |title=Title II - Open Government Data Act |work=HR 4174: Foundations for Evidence-Based Policymaking Act of 2018 |publisher=115th Congress |date=2018 |accessdate=29 January 2019}}</ref>, updates and compliance of our OAR infrastructure will be further advanced.


However, to achieve FAIR (findable, accessible, interoperable, reusable) capabilities<ref name="WilkinsonTheFAIR16">{{cite journal |title=The FAIR Guiding Principles for scientific data management and stewardship |journal=Scientific Data |author=Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. |volume=3 |pages=160018 |year=2016 |doi=10.1038/sdata.2016.18 |pmid=26978244 |pmc=PMC4792175}}</ref>, the OAR infrastructure supporting a science data portal and public data repository was designed to extend the limited MVP to include standard open formats, protocols, and demonstrated best practices in [[Information management|data management]] and publication to harness the full potential for community re-use of NIST research data products. The data portal provides both discovery and data access (distribution) capabilities through a science-oriented web user interface and REST (Representation State Transfer)<ref name="BoothWebServ04">{{cite web |url=https://www.w3.org/TR/2004/NOTE-ws-arch-20040211/#relwwwrest |title=3.1.3 Relationship to the World Wide Web and REST Architectures |work=Web Services Architecture |author=Booth, D.; Haas, H.; McCabe, F. et al. |publisher=W3C |date=11 February 2004}}</ref> [[application programming interface]]s (APIs). The repository enables interoperability for scientific disciplines such as crystallography, biology, and chemistry as shown in the organization context (Figure 1) by supporting programmatic access to semantically rich data structures captured through the NIST data publication process. Key to the reuse of these data is the implementation of data citation for each of the records, along with the inclusion of provenance metadata and a link to usage policy.


[[File:Fig1 Greene DataScienceJ2019 18-1.jpg|657px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="657px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 1.''' NIST OAR organizational context. NIST laboratory sites are in Gaithersburg, Maryland and Boulder, Colorado in the U.S., in addition to partner remote site locations as listed in the figure.</blockquote>
|-
|}
|}


==References==
==References==
Line 42: Line 53:


==Notes==
==Notes==
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version—by design—lists them in order of appearance.
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version—by design—lists them in order of appearance. The original used Wikipedia as a source for "REST," which is frowned upon; we replaced it with the first citation of the Wikipedia entry.


<!--Place all category tags here-->
<!--Place all category tags here-->

Revision as of 17:39, 26 August 2019

Full article title Building open access to research (OAR) data infrastructure at NIST
Journal Data Science Journal
Author(s) Greene, Gretchen; Plante, Raymond; Hanisch, Robert
Author affiliation(s) National Institute of Standards and Technology
Primary contact Email: gretchen dot greene at nist dot gov
Year published 2019
Volume and issue 18(1)
Page(s) 30
DOI 10.5334/dsj-2019-030
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website https://datascience.codata.org/articles/10.5334/dsj-2019-030/
Download https://datascience.codata.org/articles/10.5334/dsj-2019-030/galley/861/download/ (PDF)

Abstract

As a National Metrology Institute (NMI), the U.S. National Institute of Standards and Technology (NIST) scientists, engineers, and technology experts conduct research across a full spectrum of physical science domains. NIST is a non-regulatory agency within the U.S. Department of Commerce with a mission to promote U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life. NIST research results in the production and distribution of standard reference materials, [[calibration services, and datasets. These are generated from a wide range of complex laboratory instrumentation, expert analyses, and calibration processes. In response to a government open data policy, and in collaboration with the broader research community, NIST has developed a federated Open Access to Research (OAR) scientific data infrastructure aligned with FAIR (findable, accessible, interoperable, reusable) data principles. Through the OAR initiatives, NIST's Material Measurement Laboratory Office of Data and Informatics (ODI) recently released a new scientific data discovery portal and public data repository. These science-oriented applications provide dissemination and public access for data from across the broad spectrum of NIST research disciplines, including chemistry, biology, materials science (such as crystallography, nanomaterials, etc.), physics, disaster resilience, cyberinfrastructure, communications, forensics, and others. NIST's public data consist of carefully curated Standard Reference Data, legacy high valued data, and new research data publications. The repository is thus evolving both in content and features as the nature of research progresses. Implementation of the OAR infrastructure is key to NIST's role in sharing high-integrity, reproducible research for measurement science in a rapidly changing world.

Keywords: data repository, FAIR, research metadata, metrology, data portal, government

Introduction

NIST research is predominantly characterized as “long tail” in terms of the data produced, i.e., small datasets that are highly varied in topic and content.[1] This is colloquially described as “a mile wide and an inch deep” and may be classified as big data in context of variety and veracity. Newer, more modern laboratory instrumentation such as nuclear magnetic resonance spectrometers, electron microscopes, synchrotron beamlines, and high-performance computers usher NIST into the realm of managing the velocity and volume of big data. Furthermore, new strategic initiatives in the areas of artificial intelligence (AI) require an infrastructure designed to support digital mining and transformation. Management and exchange of the underlying research domain-specific data with both internal and external communities are important considerations for the OAR architecture and implementation.

The overarching goal of OAR is to deliver a robust research data infrastructure to share the results of NIST research with the community at large. Our strategy for achieving this goal involves collaborative data science as demonstrated through usage statistics from astronomical archives’ data discovery and access patterns.[2] Organizations face many challenges striving to balance rapid advancements in technology and data driven research with internal operational costs and constraints. To meet these challenges, NIST assembled a diverse group of experts with key leaders and engaged stakeholders via cross-organizational advisors. This resulted in a joint effort to build an integrated system engineered to support data workflow processes, systems infrastructure, and public dissemination with secure publicly accessible platforms for scientific collaboration.

At the onset of the OAR project, priority was placed on developing a system that would allow us to comply with government open data policy.[3] This resulted in a baseline Minimum Viable Product (MVP), delivering a NIST public data listing (PDL) which enforces adherence to a new government data standard semantic model, the Project Open Data (POD) schema. The NIST PDL continues to be routinely harvested by the Department of Commerce and made available through the U.S. data.gov web portal, which hosts records of all POD-compliant government public datasets. Following enactment of the OPEN Government Data Act[4], updates and compliance of our OAR infrastructure will be further advanced.

However, to achieve FAIR (findable, accessible, interoperable, reusable) capabilities[5], the OAR infrastructure supporting a science data portal and public data repository was designed to extend the limited MVP to include standard open formats, protocols, and demonstrated best practices in data management and publication to harness the full potential for community re-use of NIST research data products. The data portal provides both discovery and data access (distribution) capabilities through a science-oriented web user interface and REST (Representation State Transfer)[6] application programming interfaces (APIs). The repository enables interoperability for scientific disciplines such as crystallography, biology, and chemistry as shown in the organization context (Figure 1) by supporting programmatic access to semantically rich data structures captured through the NIST data publication process. Key to the reuse of these data is the implementation of data citation for each of the records, along with the inclusion of provenance metadata and a link to usage policy.

Fig1 Greene DataScienceJ2019 18-1.jpg

Figure 1. NIST OAR organizational context. NIST laboratory sites are in Gaithersburg, Maryland and Boulder, Colorado in the U.S., in addition to partner remote site locations as listed in the figure.

References

  1. "Long Tail of Data: e-IRG Task Force Report" (PDF). e-IRG Secretariat. September 2016. http://e-irg.eu/documents/10920/238968/LongTailOfData2016.pdf. Retrieved 29 January 2019. 
  2. White, R.L.; Accomazzi, A.; Berriman, G.B. et al. (2009). "The High Impact of Astronomical Data Archives". Astro2010: The Astronomy and Astrophysics Decadal Survey: 64. https://ui.adsabs.harvard.edu/abs/2009astro2010P..64W/abstract. 
  3. Burwell, S.M.; VanRoekel, S.; Park, T.; Mancini, D.J. (9 May 2013). "Open Data Policy—Managing Information as an Asset" (PDF). M-13-13 Memorandum for the Heads of Executive Departments and Agencies. https://obamawhitehouse.archives.gov/sites/default/files/omb/memoranda/2013/m-13-13.pdf. Retrieved 20 April 2019. 
  4. "Title II - Open Government Data Act". HR 4174: Foundations for Evidence-Based Policymaking Act of 2018. 115th Congress. 2018. https://www.congress.gov/bill/115th-congress/house-bill/4174/text#toc-H8E449FBAEFA34E45A6F1F20EFB13ED95. Retrieved 29 January 2019. 
  5. Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J. et al. (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data 3: 160018. doi:10.1038/sdata.2016.18. PMC PMC4792175. PMID 26978244. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175. 
  6. Booth, D.; Haas, H.; McCabe, F. et al. (11 February 2004). "3.1.3 Relationship to the World Wide Web and REST Architectures". Web Services Architecture. W3C. https://www.w3.org/TR/2004/NOTE-ws-arch-20040211/#relwwwrest. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version—by design—lists them in order of appearance. The original used Wikipedia as a source for "REST," which is frowned upon; we replaced it with the first citation of the Wikipedia entry.