Difference between revisions of "Journal:Kadi4Mat: A research data infrastructure for materials science"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 34: Line 34:


In particular, repositories for the storage and internal or public exchange of research data are becoming increasingly widespread. In particular, the publication of such data, either on its own or as a supplement to a text publication, is increasingly encouraged or sometimes even required.<ref name="NaughtonMaking16">{{cite journal |title=Making sense of journal research data policies |journal=Insights |author=Naughton, L.; Kernohan, D. |volume=29 |issue=1 |pages=84–9 |year=2016 |doi=10.1629/uksg.284}}</ref> In order to find a suitable repository, services such as re3data.org<ref name="PampelMaking13">{{cite journal |title=Making Research Data Repositories Visible: The re3data.org Registry |journal=PLoS One |author=Pampel, H.; Vierkant, P.; Scholze, F. et al. |volume=8 |issue=11 |at=e78080 |year=2013 |doi=10.1371/journal.pone.0078080}}</ref> or FAIRSharing<ref name="SansoneFAIR19">{{cite journal |title=FAIRsharing as a community approach to standards, repositories and policies |journal=Nature Biotechnology |author=Sansone, S.-A.; McQuilton, P.; Rocca-Serra, P. et al. |volume=37 |pages=358–67 |year=2019 |doi=10.1038/s41587-019-0080-8}}</ref> are available. These services also make it possible to find subject-specific repositories for materials science data. Two well-known examples are the Materials Project<ref name="JainComment13">{{cite journal |title=Commentary: The Materials Project: A materials genome approach to accelerating materials innovation |journal=APL Materials |author=Jain, A.; Ong, S.P.; Hautier, G. et al. |volume=1 |issue=1 |at=011002 |year=2013 |doi=10.1063/1.4812323}}</ref> and the NOMAD Repository.<ref name="DraxlNOMAD18">{{cite journal |title=NOMAD: The FAIR concept for big data-driven materials science |journal=MRS Bulletin |author=Draxl, C.; Scheffler, M. |volume=43 |issue=9 |pages=676–82 |year=2018 |doi=10.1557/mrs.2018.208}}</ref> Indexed repositories are usually hosted centrally or institutionally and are mostly used for the publication of data. However, some of the underlying systems can also be installed by the user, e.g., for internal use within individual research groups. Additionally, this allows full control over stored data as well as internal data exchanges, if this function is not already part of the repository. In this respect, open-source systems are particularly important, as this means independence from vendors and opens up the possibility of modifying the existing functionality or adding additional features, sometimes via built-in plug-in systems. Examples of such systems are CKAN<ref name="CKANHome">{{cite web |url=https://ckan.org/ |title=CKAN |publisher=CKAN Association |accessdate=19 May 2020}}</ref>, Dataverse<ref name="KinganIntro07">{{cite journal |title=An Introduction to the Dataverse Network as an Infrastructure for Data Sharing |journal=Sociological Methods & Research |author=King, G. |volume=36 |issue=2 |pages=173–99 |year=2007 |doi=10.1177/0049124107306660}}</ref>, [[DSpace]]<ref name="SmithDSpace03">{{cite journal |title=DSpace: An Open Source Dynamic Digital Repository |journal=D-Lib Magazine |author=Smith, M.; Barton, M.; Bass, M. et al. |volume=9 |issue=1  |year=2003 |doi=10.1045/january2003-smith}}</ref>, and Invenio<ref name="InvenioHome">{{cite web |url=https://invenio-software.org/ |title=Invenio |publisher=CERN |accessdate=19 May 2020}}</ref>, where the latter is the basis of Zenodo.<ref name="ZenodoHome">{{cite web |url=https://www.zenodo.org/ |title=Zenodo |author=European Organization for Nuclear Research |publisher=CERN |year=2013 |doi=10.25495/7GXK-RD71}}</ref> The listed repositories are all generic and represent only a selection of the existing open-source systems.<ref name="AmorimAComp16">{{cite journal |title=A comparison of research data management platforms: Architecture, flexible metadata and interoperability |journal=Universal Access in the Information Society |author=Amorim, R.C.; Castro, J.A.; da Silva, J.R. et al. |volume=16 |pages=851–62 |year=2017 |doi=10.1007/s10209-016-0475-y}}</ref>
In particular, repositories for the storage and internal or public exchange of research data are becoming increasingly widespread. In particular, the publication of such data, either on its own or as a supplement to a text publication, is increasingly encouraged or sometimes even required.<ref name="NaughtonMaking16">{{cite journal |title=Making sense of journal research data policies |journal=Insights |author=Naughton, L.; Kernohan, D. |volume=29 |issue=1 |pages=84–9 |year=2016 |doi=10.1629/uksg.284}}</ref> In order to find a suitable repository, services such as re3data.org<ref name="PampelMaking13">{{cite journal |title=Making Research Data Repositories Visible: The re3data.org Registry |journal=PLoS One |author=Pampel, H.; Vierkant, P.; Scholze, F. et al. |volume=8 |issue=11 |at=e78080 |year=2013 |doi=10.1371/journal.pone.0078080}}</ref> or FAIRSharing<ref name="SansoneFAIR19">{{cite journal |title=FAIRsharing as a community approach to standards, repositories and policies |journal=Nature Biotechnology |author=Sansone, S.-A.; McQuilton, P.; Rocca-Serra, P. et al. |volume=37 |pages=358–67 |year=2019 |doi=10.1038/s41587-019-0080-8}}</ref> are available. These services also make it possible to find subject-specific repositories for materials science data. Two well-known examples are the Materials Project<ref name="JainComment13">{{cite journal |title=Commentary: The Materials Project: A materials genome approach to accelerating materials innovation |journal=APL Materials |author=Jain, A.; Ong, S.P.; Hautier, G. et al. |volume=1 |issue=1 |at=011002 |year=2013 |doi=10.1063/1.4812323}}</ref> and the NOMAD Repository.<ref name="DraxlNOMAD18">{{cite journal |title=NOMAD: The FAIR concept for big data-driven materials science |journal=MRS Bulletin |author=Draxl, C.; Scheffler, M. |volume=43 |issue=9 |pages=676–82 |year=2018 |doi=10.1557/mrs.2018.208}}</ref> Indexed repositories are usually hosted centrally or institutionally and are mostly used for the publication of data. However, some of the underlying systems can also be installed by the user, e.g., for internal use within individual research groups. Additionally, this allows full control over stored data as well as internal data exchanges, if this function is not already part of the repository. In this respect, open-source systems are particularly important, as this means independence from vendors and opens up the possibility of modifying the existing functionality or adding additional features, sometimes via built-in plug-in systems. Examples of such systems are CKAN<ref name="CKANHome">{{cite web |url=https://ckan.org/ |title=CKAN |publisher=CKAN Association |accessdate=19 May 2020}}</ref>, Dataverse<ref name="KinganIntro07">{{cite journal |title=An Introduction to the Dataverse Network as an Infrastructure for Data Sharing |journal=Sociological Methods & Research |author=King, G. |volume=36 |issue=2 |pages=173–99 |year=2007 |doi=10.1177/0049124107306660}}</ref>, [[DSpace]]<ref name="SmithDSpace03">{{cite journal |title=DSpace: An Open Source Dynamic Digital Repository |journal=D-Lib Magazine |author=Smith, M.; Barton, M.; Bass, M. et al. |volume=9 |issue=1  |year=2003 |doi=10.1045/january2003-smith}}</ref>, and Invenio<ref name="InvenioHome">{{cite web |url=https://invenio-software.org/ |title=Invenio |publisher=CERN |accessdate=19 May 2020}}</ref>, where the latter is the basis of Zenodo.<ref name="ZenodoHome">{{cite web |url=https://www.zenodo.org/ |title=Zenodo |author=European Organization for Nuclear Research |publisher=CERN |year=2013 |doi=10.25495/7GXK-RD71}}</ref> The listed repositories are all generic and represent only a selection of the existing open-source systems.<ref name="AmorimAComp16">{{cite journal |title=A comparison of research data management platforms: Architecture, flexible metadata and interoperability |journal=Universal Access in the Information Society |author=Amorim, R.C.; Castro, J.A.; da Silva, J.R. et al. |volume=16 |pages=851–62 |year=2017 |doi=10.1007/s10209-016-0475-y}}</ref>
In addition to repositories, a second type of system increasingly being used in experiment-oriented research areas is the [[electronic laboratory notebook]] (ELN).<ref name="RubachaARev11">{{cite journal |title=A Review of Electronic Laboratory Notebooks Available in the Market Today |journal=SLAS Technology |author=Rubacha, M.; Rattan, A.K.; Hosselet, S.C. |volume=16 |issue=1 |year=2011 |doi=10.1016/j.jala.2009.01.002}}</ref> Nowadays, the functionality of ELNs goes far beyond the simple replacement of paper-based [[laboratory notebook]]s, and can also include aspects such as data analysis, as seen, for example, in [[Galaxy (biomedical software)|Galaxy]]<ref name="AfganTheGal18">{{cite journal |title=The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update |journal=Nucleic Acids Research |author=Afgan, E.; Baker, D.; Batut, B. et al. |volume=46 |issue=W1 |year=2018 |doi=10.1093/nar/gky379}}</ref> or [[Jupyter Notebook]].<ref name="KluyverJupyter16">{{cite book |chapter=Jupyter Notebooks—A publishing format for reproducible computational workflows |title=Positioning and Power in Academic Publishing: Players, Agents and Agendas |author=Kluyver, T.; Ragan-Kelley, B.; Pérez, F. et al. |editor=Loizides, F.; Schmidt, B. |publisher=IOS Press |pages=87–90 |year=2016 |doi=10.3233/978-1-61499-649-1-87}}</ref> Both systems focus primarily on providing accessible and reproducible computational research data. However, the boundary between unstructured and structured data is increasingly becoming blurred, the latter being traditionally only found in [[laboratory information management system]]s (LIMS).<ref name="BirdLab13">{{cite journal |title=Laboratory notebooks in the digital era: The role of ELNs in record keeping for chemistry and other sciences |journal=Chemical Society Reviews |author=Bird, C.L.; Willoughby, C.; Frey, J.G. |volume=42 |issue=20 |year=2013 |pages=8157–8175 |doi=10.1039/C3CS60122F}}</ref><ref name="ElliottThink09">{{cite journal |title=Thinking Beyond ELN |journal=Scientific Computing |author=Elliott, M.H. |volume=26 |issue=6 |pages=6–10 |year=2009 |archivedate=20 May 2011 |url=http://www.scientificcomputing.com/articles-IN-Thinking-Beyond-ELN-120809.aspx |archiveurl=https://web.archive.org/web/20110520065023/http://www.scientificcomputing.com/articles-IN-Thinking-Beyond-ELN-120809.aspx}}</ref><ref name="TaylorTheStatus06">{{cite journal |title=The status of electronic laboratory notebooks for chemistry and biology |journal=Current Opinion in Drug Discovery and Development |author=Taylor, K.T. |volume=9 |issue=3 |pages=348–53 |year=2006 |pmid=16729731}}</ref> Most existing ELNs are domain-specific and limited to research disciplines such as biology or chemistry.<ref name="TaylorTheStatus06" /> According to current knowledge, a system specifically tailored to materials science does not exist. For ELNs, there are also open-source systems such as [[eLabFTW]]<ref name="CarpiElabFTW17">{{cite journal |title=eLabFTW: An open source laboratory notebook for research labs |journal=Journal of Open Source Software |author=Carpi, N.; Minges, A.; Piel, M. |volume=2 |issue=12 |at=146 |year=2017 |doi=10.21105/joss.00146}}</ref>, [[sciNote]]<ref name="ScinoteHome">{{cite web |url=https://www.scinote.net/ |title=SciNote |publisher=SciNote LLC |accessdate=21 May 2020}}</ref>, or [[Chemotion ELN|Chemotion]].<ref name="TremouilhacChemotionELN17">{{cite journal |title=Chemotion ELN: An open source electronic lab notebook for chemists in academia |journal=Journal of Cheminformatics |author=Tremouilhac, P.; Nguyen, A.; Huang, Y.-C. et al. |volume=9 |at=54 |year=2017 |doi=10.1186/s13321-017-0240-0}}</ref> Compared to the repositories, however, the selection of ELNs is smaller. Furthermore, only the first two mentioned systems are generic.





Revision as of 22:29, 22 February 2021

Full article title Kadi4Mat: A research data infrastructure for materials science
Journal Data Science Journal
Author(s) Brnadt, Nico; Griem, Lars; Herrmann, Christoph; Schoof, Ephraim; Tosato, Giovanna; Zhao, Yinghan;
Zschumme, Philipp; Selzer, Michael
Author affiliation(s) Karlsruhe Institute of Technology, Karlsruhe University of Applied Sciences, Helmholtz Institute Ulm
Primary contact Email: nico dot brandt at kit dot edu
Year published 2021
Volume and issue 20(1)
Article # 8
DOI 10.5334/dsj-2021-008
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website https://datascience.codata.org/articles/10.5334/dsj-2021-008/
Download https://datascience.codata.org/articles/10.5334/dsj-2021-008/galley/1048/download/ (PDF)

Abstract

The concepts and current developments of a research data infrastructure for materials science are presented, extending and combining the features of an electronic laboratory notebook (ELN) and a repository. The objective of this infrastructure is to incorporate the possibility of structured data storage and data exchange with documented and reproducible data analysis and visualization, which finally leads to the publication of the data. This way, researchers can be supported throughout the entire research process. The software is being developed as a web-based and desktop-based system, offering both a graphical user interface (GUI) and a programmatic interface. The focus of the development is on the integration of technologies and systems based on both established as well as new concepts. Due to the heterogeneous nature of materials science data, the current features are kept mostly generic, and the structuring of the data is largely left to the users. As a result, an extension of the research data infrastructure to other disciplines is possible in the future. The source code of the project is publicly available under a permissive Apache 2.0 license.

Keywords: research data management, electronic laboratory notebook, repository, open source, materials science

Introduction

In the engineering sciences, the handling of digital research data plays an increasingly important role in all fields of application.[1] This is especially the case, due to the growing amount of data obtained from experiments and simulations.[2] The extraction of knowledge from these data is referred to as a data-driven, fourth paradigm of science, filed under the keyword "data science."[3] This is particularly true in materials science, as the research and understanding of new materials are becoming more and more complex.[4] Without suitable analysis methods, the ever-growing amount of data will no longer be manageable. In order to be able to perform appropriate data analyses smoothly, the structured storage of research data and associated metadata is an important aspect. Specifically, a uniform research data management is needed, which is made possible by appropriate infrastructures such as research data repositories. In addition to uniform data storage, such systems can help to overcome inter-institutional hurdles in data exchange, compare theoretical and experimental data, and provide reproducible workflows for data analysis. Furthermore, linking the data with persistent identifiers enables other researchers to directly reference them in their work.

In particular, repositories for the storage and internal or public exchange of research data are becoming increasingly widespread. In particular, the publication of such data, either on its own or as a supplement to a text publication, is increasingly encouraged or sometimes even required.[5] In order to find a suitable repository, services such as re3data.org[6] or FAIRSharing[7] are available. These services also make it possible to find subject-specific repositories for materials science data. Two well-known examples are the Materials Project[8] and the NOMAD Repository.[9] Indexed repositories are usually hosted centrally or institutionally and are mostly used for the publication of data. However, some of the underlying systems can also be installed by the user, e.g., for internal use within individual research groups. Additionally, this allows full control over stored data as well as internal data exchanges, if this function is not already part of the repository. In this respect, open-source systems are particularly important, as this means independence from vendors and opens up the possibility of modifying the existing functionality or adding additional features, sometimes via built-in plug-in systems. Examples of such systems are CKAN[10], Dataverse[11], DSpace[12], and Invenio[13], where the latter is the basis of Zenodo.[14] The listed repositories are all generic and represent only a selection of the existing open-source systems.[15]

In addition to repositories, a second type of system increasingly being used in experiment-oriented research areas is the electronic laboratory notebook (ELN).[16] Nowadays, the functionality of ELNs goes far beyond the simple replacement of paper-based laboratory notebooks, and can also include aspects such as data analysis, as seen, for example, in Galaxy[17] or Jupyter Notebook.[18] Both systems focus primarily on providing accessible and reproducible computational research data. However, the boundary between unstructured and structured data is increasingly becoming blurred, the latter being traditionally only found in laboratory information management systems (LIMS).[19][20][21] Most existing ELNs are domain-specific and limited to research disciplines such as biology or chemistry.[21] According to current knowledge, a system specifically tailored to materials science does not exist. For ELNs, there are also open-source systems such as eLabFTW[22], sciNote[23], or Chemotion.[24] Compared to the repositories, however, the selection of ELNs is smaller. Furthermore, only the first two mentioned systems are generic.



References

  1. Sandfeld, S.; Dahmen, T.; Fischer, F.O.R. et al. (2018). "Strategiepapier - Digitale Transformation in der Materialwissenschaft und Werkstofftechnik". Deutsche Gesellschaft für Materialkunde e.V. https://www.tib.eu/en/search/id/TIBKAT%3A1028913559/. 
  2. Hey, T.; Trefethen, A. (2003). "Chapter 36: The Data Deluge: An e‐Science Perspective". In Berman, F.; Fox, G.; Hey, T.. Grid Computing: Making the Global Infrastructure a Reality. John Wiley & Sons, Ltd. doi:10.1002/0470867167.ch36. ISBN 9780470867167. 
  3. Hey, T.; Tansley, S.; Tolle, K. (2009). The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research. ISBN 9780982544204. https://www.microsoft.com/en-us/research/publication/fourth-paradigm-data-intensive-scientific-discovery/. 
  4. Hill, J.; Mulholland, G.; Persson, K. et al. (2016). "Materials science with large-scale data and informatics: Unlocking new opportunities". MRS Bulletin 41 (5): 399–409. doi:10.1557/mrs.2016.93. 
  5. Naughton, L.; Kernohan, D. (2016). "Making sense of journal research data policies". Insights 29 (1): 84–9. doi:10.1629/uksg.284. 
  6. Pampel, H.; Vierkant, P.; Scholze, F. et al. (2013). "Making Research Data Repositories Visible: The re3data.org Registry". PLoS One 8 (11): e78080. doi:10.1371/journal.pone.0078080. 
  7. Sansone, S.-A.; McQuilton, P.; Rocca-Serra, P. et al. (2019). "FAIRsharing as a community approach to standards, repositories and policies". Nature Biotechnology 37: 358–67. doi:10.1038/s41587-019-0080-8. 
  8. Jain, A.; Ong, S.P.; Hautier, G. et al. (2013). "Commentary: The Materials Project: A materials genome approach to accelerating materials innovation". APL Materials 1 (1): 011002. doi:10.1063/1.4812323. 
  9. Draxl, C.; Scheffler, M. (2018). "NOMAD: The FAIR concept for big data-driven materials science". MRS Bulletin 43 (9): 676–82. doi:10.1557/mrs.2018.208. 
  10. "CKAN". CKAN Association. https://ckan.org/. Retrieved 19 May 2020. 
  11. King, G. (2007). "An Introduction to the Dataverse Network as an Infrastructure for Data Sharing". Sociological Methods & Research 36 (2): 173–99. doi:10.1177/0049124107306660. 
  12. Smith, M.; Barton, M.; Bass, M. et al. (2003). "DSpace: An Open Source Dynamic Digital Repository". D-Lib Magazine 9 (1). doi:10.1045/january2003-smith. 
  13. "Invenio". CERN. https://invenio-software.org/. Retrieved 19 May 2020. 
  14. European Organization for Nuclear Research (2013). "Zenodo". CERN. doi:10.25495/7GXK-RD71. https://www.zenodo.org/. 
  15. Amorim, R.C.; Castro, J.A.; da Silva, J.R. et al. (2017). "A comparison of research data management platforms: Architecture, flexible metadata and interoperability". Universal Access in the Information Society 16: 851–62. doi:10.1007/s10209-016-0475-y. 
  16. Rubacha, M.; Rattan, A.K.; Hosselet, S.C. (2011). "A Review of Electronic Laboratory Notebooks Available in the Market Today". SLAS Technology 16 (1). doi:10.1016/j.jala.2009.01.002. 
  17. Afgan, E.; Baker, D.; Batut, B. et al. (2018). "The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update". Nucleic Acids Research 46 (W1). doi:10.1093/nar/gky379. 
  18. Kluyver, T.; Ragan-Kelley, B.; Pérez, F. et al. (2016). "Jupyter Notebooks—A publishing format for reproducible computational workflows". In Loizides, F.; Schmidt, B.. Positioning and Power in Academic Publishing: Players, Agents and Agendas. IOS Press. pp. 87–90. doi:10.3233/978-1-61499-649-1-87. 
  19. Bird, C.L.; Willoughby, C.; Frey, J.G. (2013). "Laboratory notebooks in the digital era: The role of ELNs in record keeping for chemistry and other sciences". Chemical Society Reviews 42 (20): 8157–8175. doi:10.1039/C3CS60122F. 
  20. Elliott, M.H. (2009). "Thinking Beyond ELN". Scientific Computing 26 (6): 6–10. Archived from the original on 20 May 2011. https://web.archive.org/web/20110520065023/http://www.scientificcomputing.com/articles-IN-Thinking-Beyond-ELN-120809.aspx. 
  21. 21.0 21.1 Taylor, K.T. (2006). "The status of electronic laboratory notebooks for chemistry and biology". Current Opinion in Drug Discovery and Development 9 (3): 348–53. PMID 16729731. 
  22. Carpi, N.; Minges, A.; Piel, M. (2017). "eLabFTW: An open source laboratory notebook for research labs". Journal of Open Source Software 2 (12): 146. doi:10.21105/joss.00146. 
  23. "SciNote". SciNote LLC. https://www.scinote.net/. Retrieved 21 May 2020. 
  24. Tremouilhac, P.; Nguyen, A.; Huang, Y.-C. et al. (2017). "Chemotion ELN: An open source electronic lab notebook for chemists in academia". Journal of Cheminformatics 9: 54. doi:10.1186/s13321-017-0240-0. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order; however, this version lists them in order of appearance, by design.