Journal:Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators
Full article title | Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators |
---|---|
Journal | PLOS Computational Biology |
Author(s) | Barone, Lindsay; Williams, Jason; Micklos, David |
Author affiliation(s) | Cold Spring Harbor Laboratory |
Primary contact | Email: lbarone at cshl dot edu |
Editors | Ouellette, Francis |
Year published | 2017 |
Volume and issue | 13(11) |
Page(s) | e1005858 |
DOI | 10.1371/journal.pcbi.1005755 |
ISSN | 1553-7358 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005755 |
Download | http://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005755&type=printable (PDF) |
This article should not be considered complete until this message box has been removed. This is a work in progress. |
Abstract
In a 2016 survey of 704 National Science Foundation (NSF) Biological Sciences Directorate principal investigators (BIO PIs), nearly 90% indicated they are currently or will soon be analyzing large data sets. BIO PIs considered a range of computational needs important to their work, including high-performance computing (HPC), bioinformatics support, multistep workflows, updated analysis software, and the ability to store, share, and publish data. Previous studies in the United States and Canada emphasized infrastructure needs. However, BIO PIs said the most pressing unmet needs are training in data integration, data management, and scaling analyses for HPC, acknowledging that data science skills will be required to build a deeper understanding of life. This portends a growing data knowledge gap in biology and challenges institutions and funding agencies to redouble their support for computational training in biology.
Author summary
Our computational needs assessment of 704 principal investigators (PIs), with grants from the National Science Foundation (NSF) Biological Sciences Directorate (BIO), confirmed that biology is awash with big data. Nearly 90% of BIO PIs said they are currently or will soon be analyzing large data sets. They considered a range of computational needs important to their work, including high-performance computing (HPC), bioinformatics support, multistep workflows, updated analysis software, and the ability to store, share, and publish data. However, a majority of PIs—across bioinformatics and other disciplines, large and small research groups, and four NSF BIO programs—said their institutions are not meeting nine of 13 needs. Training on integration of multiple data types (89%), on data management and metadata (78%), and on scaling analysis to cloud/HPC (71%) were the three greatest unmet needs. Hardware is not the problem; data storage and HPC ranked lowest on their list of unmet needs. The problem is the growing gap between the accumulation of big data and researchers’ knowledge about how to use it effectively.
Declarations
Acknowledgements
The authors wish to thank Bob Freeman and Christina Koch of the ACI-REF project for helpful discussions and references during the development of the survey.
Funding
This study is an Education, Outreach and Training (EOT) activity of CyVerse, an NSF-funded project to develop a “cyber universe” to support life sciences research (DBI-0735191 and DBI-1265383). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests
The authors have declared that no competing interests exist.
References
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.