Difference between revisions of "Journal:Water, water, everywhere: Defining and assessing data sharing in academia"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Saving and adding more.)
 
(Added content. Saving and adding more.)
Line 33: Line 33:


While the NSF does not currently require sharing the dataset that underlies an article at the time of publication, many scientific journals have begun to require or request data sharing as part of the publication process.<ref name="SturgesRes15">{{cite journal |title=Research data sharing: Developing a stakeholder-driven model for journal policies |journal=Journal of the Association for Information Science and Technology |author=Sturges, P.; Bamkin, M.; Anders, J.H.S. et al. |volume=66 |issue=12 |pages=2445–2455 |year=2015 |doi=10.1002/asi.23336}}</ref> This move has been motivated by recent high profile cases of scientific misconduct related to falsified/poorly analyzed data<ref name="TheEdScie15">{{cite web |url=http://www.nytimes.com/2015/06/01/opinion/scientists-who-cheat.html |title=Scientists Who Cheat |author=The Editorial Board |work=The New York Times |publisher=The New York Times Company |date=01 June 2015}}</ref> and the increasing acknowledgment among scientific communities that data sharing should be part of the process of communicating research results.<ref name="MartoneBrain14">{{cite journal |title=Brain and Behavior: We want you to share your data |journal=Brain and Behavior |author=Martone, M.E. |volume=4 |issue=1 |pages=1–3 |year=2014 |doi=10.1002/brb3.192 |pmid=24653948 |pmc=PMC3937699}}</ref><ref name="KratzData14">{{cite journal |title=Data publication consensus and controversies |journal=F1000Research |author=Kratz, J.; Strasser, C. |volume=3 |pages=94 |year=2014 |doi=10.12688/f1000research.3979.3 |pmid=25075301 |pmc=PMC4097345}}</ref><ref name="McNuttData15">{{cite journal |title=Data, eternal |journal=Science |author=McNutt, M. |volume=347 |issue=6217 |pages=7 |year=2015 |doi=10.1126/science.aaa5057 |pmid=25554763}}</ref><ref name="BloomData14">{{cite journal |title=Data Access for the Open Access Literature: PLOS's Data Policy |journal=PLOS Medicine |author=Bloom, T.; Ganley, E.; Winker, M. |volume=11 |issue=2 |pages=e1001607 |year=2014 |doi=10.1371/journal.pmed.1001607 |pmc=PMC3934818}}</ref>
While the NSF does not currently require sharing the dataset that underlies an article at the time of publication, many scientific journals have begun to require or request data sharing as part of the publication process.<ref name="SturgesRes15">{{cite journal |title=Research data sharing: Developing a stakeholder-driven model for journal policies |journal=Journal of the Association for Information Science and Technology |author=Sturges, P.; Bamkin, M.; Anders, J.H.S. et al. |volume=66 |issue=12 |pages=2445–2455 |year=2015 |doi=10.1002/asi.23336}}</ref> This move has been motivated by recent high profile cases of scientific misconduct related to falsified/poorly analyzed data<ref name="TheEdScie15">{{cite web |url=http://www.nytimes.com/2015/06/01/opinion/scientists-who-cheat.html |title=Scientists Who Cheat |author=The Editorial Board |work=The New York Times |publisher=The New York Times Company |date=01 June 2015}}</ref> and the increasing acknowledgment among scientific communities that data sharing should be part of the process of communicating research results.<ref name="MartoneBrain14">{{cite journal |title=Brain and Behavior: We want you to share your data |journal=Brain and Behavior |author=Martone, M.E. |volume=4 |issue=1 |pages=1–3 |year=2014 |doi=10.1002/brb3.192 |pmid=24653948 |pmc=PMC3937699}}</ref><ref name="KratzData14">{{cite journal |title=Data publication consensus and controversies |journal=F1000Research |author=Kratz, J.; Strasser, C. |volume=3 |pages=94 |year=2014 |doi=10.12688/f1000research.3979.3 |pmid=25075301 |pmc=PMC4097345}}</ref><ref name="McNuttData15">{{cite journal |title=Data, eternal |journal=Science |author=McNutt, M. |volume=347 |issue=6217 |pages=7 |year=2015 |doi=10.1126/science.aaa5057 |pmid=25554763}}</ref><ref name="BloomData14">{{cite journal |title=Data Access for the Open Access Literature: PLOS's Data Policy |journal=PLOS Medicine |author=Bloom, T.; Ganley, E.; Winker, M. |volume=11 |issue=2 |pages=e1001607 |year=2014 |doi=10.1371/journal.pmed.1001607 |pmc=PMC3934818}}</ref>
A challenge has arisen, though, of defining data sharing in a way that is useful to a broad spectrum of data producers and consumers. The NSF, for example, has been reluctant to define not only data sharing or data sharing best practices, but the meaning of data itself, insisting that these definitions should “be determined by the community of interest through the process of peer review and program management,” rather than being mandated.<ref name="NSFData10">{{cite web |url=http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp |title=Data Management & Sharing Frequently Asked Questions (FAQs) |author=National Science Foundation |publisher=National Science Foundation |date=30 November 2010}}</ref> This lack of guidance has caused some level of confusion among researchers trying to share data, and among service providers attempting to offer venues for data sharing. We have begun to see communities of practice offering guidance on best practices for data sharing from individual research domains (for examples see references <ref name="SchofieldPost09"/>, <ref name="WhiteNine13">{{cite journal |title=Nine simple ways to make it easier to (re)use your data |journal=IEE Ideas in Ecology and Evolution |author=White, E.P.; Baldridge, E.; Brym, Z.T. et al. |volume=6 |issue=2 |pages=1–10 |year=2013 |doi=10.4033/iee.2013.6b.6.f}}</ref>, <ref name="KervinCommon13">{{cite journal |title=Common Errors in Ecological Data Sharing |journal=Journal of eScience Librarianship |author=Kervin, K.E.; Michener, W.K.; Cook, R.B. |volume=2 |issue=2 |pages=e1024 |year=2013 |doi=10.7191/jeslib.2013.1024}}</ref> and [https://www.dataone.org/ DataONE.org]) and from broad-level organizations such as Force11<ref name="Force11Princ">{{cite web |url=https://www.force11.org/group/fairgroup/fairprinciples |title=The FAIR Data Principles - For Comment |work=Force11 |accessdate=10 July 2015}}</ref> and Kratz and Strasser.<ref name="KratzData14" /> While many of these resources are helpful for understanding how to effectively share data, we have yet to see a rubric for evaluating how well a dataset is shared and assessing where improvements should be made to facilitate more effective sharing.
In this study we set a definition of data sharing and create a rubric for evaluating how well data have been shared at two significant levels: for research projects as a whole and as a dataset that underlies a journal article. We focus on research projects because of the NSF and OSTP focus on project-level requirements for data sharing (as cited above), and on journal articles because these represent a logical and common venue for data sharing.<ref name="FergusonHow14">{{cite web |url=https://hub.wiley.com/community/exchanges/discover/blog/2014/11/03/how-and-why-researchers-share-data-and-why-they-dont |title=How and why researchers share data (and why they don't) |author=Ferguson, L. |work=Wiley Exchanges |publisher=John Wiley & Sons, Inc |date=03 November 2014 |accessdate=07 July 2015}}</ref> We use our rubric to evaluate data sharing from NSF-funded projects that were put into effect after the requirements for management and sharing plan was put into place. Likewise, we use our rubric to evaluate data sharing in journal articles that originate from NSF-funded research projects that are subject to said policy. We conclude by offering guidance on best practices for facilitating data sharing to authors, journals, data repositories, and funding agencies.





Revision as of 19:51, 1 August 2016

Full article title Water, water, everywhere: Defining and assessing data sharing in academia
Journal PLOS ONE
Author(s) Tuyl, Steven V.; Whitmire, Amanda, L.
Author affiliation(s) Oregon State University, Stanford University
Primary contact Email: steve dot vantuyl at oregonstate dot edu
Editors Ouzounis, Christos A.
Year published 2016
Volume and issue 11(2)
Page(s) e0147942
DOI 10.1371/journal.pone.0147942
ISSN 1932-6203
Distribution license Creative Commons Attribution 4.0 International
Website http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0147942
Download http://journals.plos.org/plosone/article/asset?id=10.1371%2Fjournal.pone.0147942.PDF (PDF)

Abstract

Sharing of research data has begun to gain traction in many areas of the sciences in the past few years because of changing expectations from the scientific community, funding agencies, and academic journals. National Science Foundation (NSF) requirements for a data management plan (DMP) went into effect in 2011, with the intent of facilitating the dissemination and sharing of research results. Many projects that were funded during 2011 and 2012 should now have implemented the elements of the data management plans required for their grant proposals. In this paper we define "data sharing" and present a protocol for assessing whether data have been shared and how effective the sharing was. We then evaluate the data sharing practices of researchers funded by the NSF at Oregon State University in two ways: by attempting to discover project-level research data using the associated DMP as a starting point, and by examining data sharing associated with journal articles that acknowledge NSF support. Sharing at both the project level and the journal article level was not carried out in the majority of cases, and when sharing was accomplished, the shared data were often of questionable usability due to access, documentation, and formatting issues. We close the article by offering recommendations for how data producers, journal publishers, data repositories, and funding agencies can facilitate the process of sharing data in a meaningful way.

Introduction

“It is one thing to encourage data deposition and resource sharing through guidelines and policy statements, and quite another to ensure that it happens in practice.”[1]

In 2011, the National Science Foundation (NSF) reaffirmed a longstanding requirement for the dissemination and sharing of research results by adding a requirement for the submission of a data management plan (DMP) with grant proposals.[2] DMPs are intended to explain how researchers will address the requirement that they will “share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.”[3] The expectation that NSF-funded researchers will share data has been in place since at least 1995, the year of the oldest NSF Grant Proposal Guide that we could locate in the NSF online archive[4], but the requirement is likely much older. A memorandum put forth by the White House Office of Science and Technology Policy (OSTP) in 2013 aimed at ensuring public access to the results of federally funded research[5], and the subsequent responses from funding agencies, lends credence to the notion that Federal funding agencies are now beginning to take seriously the idea that federally funded data are products that should be managed and shared in order to maximize scientific output from federal investments.

While the NSF does not currently require sharing the dataset that underlies an article at the time of publication, many scientific journals have begun to require or request data sharing as part of the publication process.[6] This move has been motivated by recent high profile cases of scientific misconduct related to falsified/poorly analyzed data[7] and the increasing acknowledgment among scientific communities that data sharing should be part of the process of communicating research results.[8][9][10][11]

A challenge has arisen, though, of defining data sharing in a way that is useful to a broad spectrum of data producers and consumers. The NSF, for example, has been reluctant to define not only data sharing or data sharing best practices, but the meaning of data itself, insisting that these definitions should “be determined by the community of interest through the process of peer review and program management,” rather than being mandated.[12] This lack of guidance has caused some level of confusion among researchers trying to share data, and among service providers attempting to offer venues for data sharing. We have begun to see communities of practice offering guidance on best practices for data sharing from individual research domains (for examples see references [1], [13], [14] and DataONE.org) and from broad-level organizations such as Force11[15] and Kratz and Strasser.[9] While many of these resources are helpful for understanding how to effectively share data, we have yet to see a rubric for evaluating how well a dataset is shared and assessing where improvements should be made to facilitate more effective sharing.

In this study we set a definition of data sharing and create a rubric for evaluating how well data have been shared at two significant levels: for research projects as a whole and as a dataset that underlies a journal article. We focus on research projects because of the NSF and OSTP focus on project-level requirements for data sharing (as cited above), and on journal articles because these represent a logical and common venue for data sharing.[16] We use our rubric to evaluate data sharing from NSF-funded projects that were put into effect after the requirements for management and sharing plan was put into place. Likewise, we use our rubric to evaluate data sharing in journal articles that originate from NSF-funded research projects that are subject to said policy. We conclude by offering guidance on best practices for facilitating data sharing to authors, journals, data repositories, and funding agencies.


Data availability

All raw and processed data for this paper are shared at ScholarsArchive@OSU - Oregon State University's repository for scholarly materials. Data may be accessed at: http://dx.doi.org/10.7267/N9W66HPQ.

Funding

Publication of this article in an open access journal was funded by the Oregon State University Libraries & Press Open Access Fund. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests

The authors have declared that no competing interests exist.

References

  1. 1.0 1.1 Schofield, P.N.; Bubela, T.; Weaver, T. et al. (2009). "Post-publication sharing of data and tools". Nature 461 (7261): 171–3. doi:10.1038/461171a. PMID 19741686. 
  2. National Science Foundation (January 2011). "Significant Changes to the GPG". GPG Subject Index. National Science Foundation. http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/gpg_sigchanges.jsp. 
  3. National Science Foundation (October 2012). "Chapter VI - Other Post Award Requirements and Considerations, section D.4.b". Proposal and Award Policies and Procedures Guide: Part II - Award & Administration Guide. National Science Foundation. http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/aag_6.jsp#VID4. 
  4. National Science Foundation (17 August 1995). "Grant Proposal Guide". National Science Foundation. http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf9527&org=NSF. 
  5. "Memorandum for the heads of executive departments and agencies: Increasing access to the results of federally funded scientific research" (PDF). Executive Office of the President, Office of Science and Technology Policy. 22 February 2013. https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf. 
  6. Sturges, P.; Bamkin, M.; Anders, J.H.S. et al. (2015). "Research data sharing: Developing a stakeholder-driven model for journal policies". Journal of the Association for Information Science and Technology 66 (12): 2445–2455. doi:10.1002/asi.23336. 
  7. The Editorial Board (1 June 2015). "Scientists Who Cheat". The New York Times. The New York Times Company. http://www.nytimes.com/2015/06/01/opinion/scientists-who-cheat.html. 
  8. Martone, M.E. (2014). "Brain and Behavior: We want you to share your data". Brain and Behavior 4 (1): 1–3. doi:10.1002/brb3.192. PMC PMC3937699. PMID 24653948. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3937699. 
  9. 9.0 9.1 Kratz, J.; Strasser, C. (2014). "Data publication consensus and controversies". F1000Research 3: 94. doi:10.12688/f1000research.3979.3. PMC PMC4097345. PMID 25075301. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4097345. 
  10. McNutt, M. (2015). "Data, eternal". Science 347 (6217): 7. doi:10.1126/science.aaa5057. PMID 25554763. 
  11. Bloom, T.; Ganley, E.; Winker, M. (2014). "Data Access for the Open Access Literature: PLOS's Data Policy". PLOS Medicine 11 (2): e1001607. doi:10.1371/journal.pmed.1001607. PMC PMC3934818. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3934818. 
  12. National Science Foundation (30 November 2010). "Data Management & Sharing Frequently Asked Questions (FAQs)". National Science Foundation. http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp. 
  13. White, E.P.; Baldridge, E.; Brym, Z.T. et al. (2013). "Nine simple ways to make it easier to (re)use your data". IEE Ideas in Ecology and Evolution 6 (2): 1–10. doi:10.4033/iee.2013.6b.6.f. 
  14. Kervin, K.E.; Michener, W.K.; Cook, R.B. (2013). "Common Errors in Ecological Data Sharing". Journal of eScience Librarianship 2 (2): e1024. doi:10.7191/jeslib.2013.1024. 
  15. "The FAIR Data Principles - For Comment". Force11. https://www.force11.org/group/fairgroup/fairprinciples. Retrieved 10 July 2015. 
  16. Ferguson, L. (3 November 2014). "How and why researchers share data (and why they don't)". Wiley Exchanges. John Wiley & Sons, Inc. https://hub.wiley.com/community/exchanges/discover/blog/2014/11/03/how-and-why-researchers-share-data-and-why-they-dont. Retrieved 07 July 2015. 

Notes

This version is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.