Journal:Developing a bioinformatics program and supporting infrastructure in a biomedical library

From LIMSWiki
Revision as of 19:22, 27 March 2018 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Developing a bioinformatics program and supporting infrastructure in a biomedical library
Journal Journal of eScience Librarianship
Author(s) Hosburgh, Nathan
Author affiliation(s) National Institutes of Health
Primary contact Email: Nathan dot Hosburgh at nih dot gov
Year published 2018
Volume and issue 7(2)
Page(s) 2
DOI 10.7191/jeslib.2018.1129
ISSN 2161-3974
Distribution license Creative Commons Attribution 4.0 International
Website https://escholarship.umassmed.edu/jeslib/vol7/iss2/2/
Download https://escholarship.umassmed.edu/cgi/viewcontent.cgi?article=1129&context=jeslib (PDF)

Abstract

Background: Over the last couple decades, the field of bioinformatics has helped spur medical discoveries that offer a better understanding of the genetic basis of disease, which in turn improve public health and save lives. Concomitantly, support requirements for molecular biology researchers have grown in scope and complexity, incorporating specialized resources, technologies, and techniques.

Case presentation: To address this specific need among National Institutes of Health (NIH) intramural researchers, the NIH Library hired an expert bioinformatics trainer and consultant with a PhD in biochemistry to implement a bioinformatics support program. This study traces the program from its inception in 2009 to its present form. Discussion involves the particular skills of program staff, development of content, collection of resources, associated technology, assessment, and the impact of the program on the NIH community.

Conclusion: Based on quantitative and qualitative data, the bioinformatics support program has been heavily used and appreciated by researchers. Continued success will depend on filling key staff positions, building on the existing program infrastructure, and keeping abreast of developments within the field to remain relevant and in touch with the medical research community utilizing bioinformatics services.

Keywords: bioinformatics, bioinformatics support program, biomedical library

Introduction and background

In the context of an ever-expanding information landscape, those involved in biomedical research have become increasingly reliant on the use of bioinformatics to analyze large amounts of complex data. Bioinformatics is an interdisciplinary field involving molecular biology and genetics, computer science, mathematics, and statistics. Large-scale biological problems, such as modeling biological processes, are addressed from a computational point of view so that inferences can be made from aggregate data.[1] As stated by Rein[2], “Bioinformatics research advances in such areas as gene therapy, personalized medicine, drug discovery, the inherited basis of complex diseases influenced by multiple gene/ environmental interactions, and the identification of the molecular targets for environmental mutagens and carcinogens have wide ranging implications for the medical and consumer health sectors.”[2] The field of bioinformatics has seen explosive growth since the mid-1990s, spurred by the Human Genome Project and rapid advances in DNA sequencing technology.

Despite the importance of bioinformatics in advancing scientific research, it has been observed that most researchers in the life sciences do not have the necessary training to take advantage of the array of bioinformatics tools and resources available to them due to the rapidly evolving, interdisciplinary nature of the field.[3] Extensive technological changes, new databases and software, and changes in the types and quantity of data combine to pose formidable challenges to the uninitiated. Likewise, few biomedical librarians have the training, experience, or subject expertise required to provide robust bioinformatics services such as interpretation of molecular sequence database search results, pathway analysis, and data analysis from the latest biotechnology advances. Therefore, some institutions have recruited individuals with advanced degrees in biology or biochemistry and a strong background in bioinformatics to assess the molecular biological information needs of researchers and design strategies to enhance library resources and services in the areas of consultation, education, and resource development.[2][4][5]

As library involvement in bioinformatics has grown, particularly across research and clinical settings, the role of the health information professional as “informationist” has become more prominent. Specifically, in the “bioinformaticist” role, the information professional possesses advanced subject knowledge in information science as well as applied technical and biological skills.[6][7] Those responsible for building library bioinformatics programs must discern user needs and skills, identify existing services, develop plans for new services, recruit and train specialized staff, establish collaborations with other centers at their institutions, and assess the success of such programs.[8][9] If executed effectively, library involvement in bioinformatics support services has the potential to contribute to the process of scientific discovery and save the research community valuable time and money.

Study purpose

The purpose of this case study is to outline the process of creating, developing, and assessing a bioinformatics support program at the National Institutes of Health in Bethesda, Maryland.

Case presentation

The National Institutes of Health (NIH), a part of the U.S. Department of Health and Human Services, is the nation’s medical research agency. Located in the Clinical Research Center at the heart of campus, the NIH Library supports the clinical care and research of the intramural community, which leads to discoveries that improve public health and save lives. In addition to bioinformatics, the NIH Library provides services in bibliometrics, custom information solutions, data management and analysis, document delivery, editing, literature searching, research assistance, systematic reviews, training, and translations.[10]

In 2008, the National Center for Biotechnology Information (NCBI) scaled back its bioinformatics training program, creating a need for other groups to offer the training previously provided by the NCBI. The NIH Library, in keeping with its objective to support intramural research in genetics and bioinformatics more comprehensively, stepped in to fill that void by offering training specifically geared towards NIH investigators.

In February 2009, the NIH Library hired an expert bioinformatics trainer and consultant, Dr. Medha Bhagwat, to support bioinformatics research at NIH. Up to this point, the library did not offer bioinformatics support services. Dr. Bhagwat arrived from NCBI with 11 years of bioinformatics experience as well as diverse expertise in biochemistry and structural biology.

During her tenure at NCBI, Dr. Bhagwat developed and taught several two-hour mini-courses dealing with the effective use of specialized bioinformatics tools. These included “quick start” courses on analyzing microbial genomes, structural analysis, identification of disease genes, correlating disease genes and phenotypes, understanding DNA and protein sequences, and utilizing tools such as BLAST, Entrez Gene, MapViewer, and GenBank. Leveraging the courses and training she had previously developed at NCBI, Dr. Bhagwat was able to create classes tailored to the specific bioinformatics needs of the NIH intramural research community.[11] Previous work as a bench scientist endowed her with an understanding of the needs and terminology particular to biomedical researchers. The fact that Dr. Bhagwat had been employed on the NIH campus since 1994 meant that she had also generated a strong internal network and was able to feel the pulse of the research community. These qualities combined to immediately make Dr. Bhagwat a valuable resource in her new role at the NIH Library.

Although Dr. Bhagwat had the expertise, experience, and training as a bioinformaticist, preliminary work was necessary to build a comprehensive bioinformatics support program. She began by researching bioinformatics support programs at prominent medical libraries and found that such programs include one or more of the following: instruction, licensing, computing software, collections, resource development such as online tutorials, and frameworks for collaborations among researchers. She then sought to identify the requirements of the NIH research community via a three-pronged approach: interviews with bioinformatics specialists at several NIH institutes, direct interaction with researchers during early training and consultation sessions, and a formal survey of NIH scientists. An initial bioinformatics support program was established, consisting of classroom training, one-on-one tutorials and consultation, online tutorials, software and database licenses, high-performance computers, and a collection of books, journals, and other literature.

Classroom training is taught by NIH Library staff as well as outside speakers, including subject and product experts supplied by bioinformatics software vendors. Most of the classroom instruction is provided in the library training room with additional live streaming over WebEx in some cases. Dr. Bhagwat formed strategic partnerships with several institutes to teach on-site training programs offered to extramural scientists, medical professionals, educators, and students at other facilities. These partnerships have helped expand the reach of the NIH Library’s bioinformatics support program and have fostered a network of bioinformatics experts across campus. Examples include the National Institute of Nursing Research (NINR) Precision Health Boot Camp[12] and the Summer Genetics Institute for nurses[13]; the National Human Genome Research Institute (NHGRI) Short Course in Genomics[14] for middle- and high-school teachers, community college, and tribal-college faculty; and the National Library of Medicine’s (NLM) remote hands-on classes hosted by university libraries for academic researchers.[15][16] Dr. Bhagwat taught a two-credit course “Practical Bioinformatics” at the Foundation for Advanced Education Sciences (FAES) at NIH annually during the fall semester[17], and she gave lectures at Georgetown University as adjunct faculty and provided continuing education courses at both the Medical Library Association[18] and Special Library Association conferences.[19] The annual NIH Library Bioinformatics Research Symposium serves as a great example of a collaborative endeavor in which the Library organizes a two-day event featuring a series of scientific presentations highlighting practical applications of the analysis tools and databases licensed by the NIH Library for NIH researchers. The presenters are all scientists from NIH or relevant companies offering such bioinformatics tools.[20]

Examples of bioinformatics classes led by Dr. Bhagwat at NIH include: Making Sense of DNA and Protein Sequences; Gene Resources: From Transcription Factor Binding Sites to Function; Sequence Similarity Search: BLAST-Like Alignment Tool (BLAT); Protein Structural Analysis: Binding Sites to Distant Homologs; Genome Browsers; Identification of Disease Genes; Correlation of Disease Genes to Phenotypes; Microbial Genome Analysis; Gene Expression Microarray Data Analysis; Next Gen Sequence Analysis; Gene Expression Omnibus; and Introduction to Clinical Genomics.

In addition, specific training is done by vendor-provided experts on the following proprietary bioinformatics software: CLC Biomedical Genomics Workbench, DNASTAR Lasergene, ArrayStar Qseq, SeqMan NGen, Metacore, MetaGeneMark, GeneIndexer, GeneSpring, Genomatix Genome Analyzer, Golden Helix SVS and VarSeq, Human Gene Mutation Database Professional, Ingenuity Pathways Analysis, Partek Genomics Suite, Pathway Studio, and ProteinLounge.

Depending on the software, the library provides online access via floating licenses or directly on three specialized bioinformatics workstations, two of which have identical specifications for typical high-throughput analysis: Windows 7 64-bit, 8 cores, 48 GB RAM, and 2 TB disk space. The third workstation is designed specifically to run CLC Genomics Workbench, an application for analyzing and visualizing next-generation sequencing (NGS) data. The specifications of this computer are more robust due to the demanding requirements of this sort of data analysis: Red Hat Enterprise Linux 6 64-bit, 28 cores, 512 GB RAM, and 24 TB disk space. However, even with these computing capabilities, the workstations often run overnight in order to complete such analyses.

In order to bolster support for the burgeoning bioinformatics program, a second staff member was hired in August 2010. Dr. Lynn Young has a PhD in physics, with computer programming experience as well as expertise in microarray and next-generation sequencing data analysis. Employing years of teaching experience, Dr. Bhagwat provides classroom instruction and organizes vendor-led instruction, while Dr. Young devotes more time to individual and small group consultations, either on the bioinformatics workstations or in her office. Due to her background in computer science and bioinformatics, Young is uniquely positioned to collaborate with NIH researchers by assisting with using software, writing scripts, and interpreting the results of complex analyses. When a researcher needs a tutorial before Dr. Young is available, she is able to refer them to a short video tutorial outlining the analysis of next-generation sequencing data using specific software and follow up later with an in-person meeting. Examples of tutorial and consultation topics include: download upstream gene sequence and identify transcription factor binding sites; gene set enrichment/pathway analysis from microarray experiments; and next-gen sequence analysis using RNA-Seq, ChIP-Seq, and miRNASeq.

In response to the heavy demands of instruction and consultation, the Bioinformatics Workgroup was formed to handle some of the administrative functions of the program. This workgroup consists of library staff members who are not bioinformaticists but support the program in various ways; these support roles were realized by reallocating resources among existing NIH Library staff. Support activities include communicating with vendors; scheduling and keeping an up-to-date training calendar; organizing qualitative and quantitative data from testimonials and evaluation forms; and compiling statistics on classes, tutorials, off-site presentations, workstation reservations, software usage, and other metrics that feed into assessment of the program.

The most comprehensive formal assessment covers the 2016 calendar year in which 50 training sessions were provided to a total of 1,475 participants. The Bioinformatics Workgroup adjusts strategies for advertising and works with the library's Communication Workgroup to make such training available to the most attendees possible. For example, the group decided to raise the cap on registrants for each class and to publicize to people on the waiting list that, if they arrive early to class and sign in, they would be given any empty seat once the class began.

Figure 1 shows a list of vendor-led training during 2016. This training for fee-based resources is typically provided as part of the library’s subscription. It gives vendors an opportunity to promote their resources and enables the user community to gain targeted experience with specialized tools.

References

  1. Can, T. (2014). "Introduction to Bioinformatics". In Yousef, M.; Allmer, J.. miRNomics: MicroRNA Biology and Computational Analysis. Methods in Molecular Biology. 1107. Humana Press. doi:10.1007/978-1-62703-748-8_4. ISBN 9781627037488. 
  2. 2.0 2.1 2.2 Rein, D.C. (2006). "Developing library bioinformatics services in context: The Purdue University Libraries bioinformationist program". Journal of the Medical Library Association 94 (3): 314–20. PMC PMC1525331. PMID 16888666. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525331. 
  3. Schneider, M.V.; Watson, J.; Attwood, T. et al. (2010). "Bioinformatics training: A review of challenges, actions and support requirements". Briefings in Bioinformatics 11 (6): 544–51. doi:10.1093/bib/bbq021. PMID 20562256. 
  4. Li, M.; Chen, Y.B.; Clintworth, W.A. (2013). "Expanding roles in a library-based bioinformatics service program: A case study". Journal of the Medical Library Association 101 (4): 303–9. doi:10.3163/1536-5050.101.4.012. PMC PMC3794686. PMID 24163602. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3794686. 
  5. Yarfitz, S.; Ketchell, D.S. (2000). "A library-based bioinformatics services program". Bulletin of the Medical Library Association 88 (1): 36–48. PMC PMC35196. PMID 10658962. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC35196. 
  6. Davidoff, F.; Florance, V. (2000). "The informationist: A new health profession?". Annals of Internal Medicine 132 (12): 996–8. doi:10.7326/0003-4819-132-12-200006200-00012. PMID 10858185. 
  7. Helms, A.J.; Bradford, K.D.; Warren, N.J.; Schwartz, D.G. (2004). "Bioinformatics opportunities for health sciences librarians and information professionals". Journal of the Medical Library Association 92 (4): 489–93. PMC PMC521520. PMID 15494764. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC521520. 
  8. Helms, A.J.; Bradford, K.D.; Warren, N.J.; Schwartz, D.G. (2004). "Bioinformatics opportunities for health sciences librarians and information professionals". Journal of the Medical Library Association 92 (4): 489–93. PMC PMC521520. PMID 15494764. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC521520. 
  9. Lyon, J.A.; Tennant, M.R.; Messner, K.R.; Osterbur, D.L. (2006). "Carving a niche: Establishing bioinformatics collaborations". Journal of the Medical Library Association 94 (3): 330–5. PMC PMC1525329. PMID 16888668. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525329. 
  10. "About Us". NIH Library. National Institutes of Health. https://nihlibrary.nih.gov/about-us. Retrieved 09 March 2018. 
  11. Bhagwat, M.; Wheeler, D.; Valjavec-Gratian, M. (2006). "Mini Courses". National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/Class/minicourses/. 
  12. "NINR "Precision Health: Smart Technologies, Smart Health” Boot Camp". National Institute of Nursing Research. National Institutes of Health. https://www.ninr.nih.gov/training/trainingopportunitiesintramural/bootcamp. Retrieved 09 March 2018. 
  13. "Summer Genetics Institute (SGI)". National Institute of Nursing Research. National Institutes of Health. https://www.ninr.nih.gov/training/trainingopportunitiesintramural/summergeneticsinstitute. Retrieved 09 March 2018. 
  14. "National Human Genome Research Institute Short Course in Genomics". National Human Genome Research Institute. National Institutes of Health. https://www.genome.gov/10000217/nhgri-short-course-in-genomics/. Retrieved 09 March 2018. 
  15. "Bioinformatics: Clinical Genomics Subject of Mini Course". CDU Newsletter. Charles R. Drew University of Medicine and Science. 1 April 2016. https://www.cdrewu.edu/CDUNewsletters/activenews_view.asp?articleID=719. 
  16. University of Maryland Health Sciences and Human Services Library (2013). "PubChem Training from NLM". Connective Issues 7 (4). http://www2.hshsl.umaryland.edu/newsletter/?p=1434. 
  17. "2015–2016 Catalog of Courses and Student Handbook" (PDF). Foundation for Advanced Education in the Sciences. 2015. p. 24. https://faes.org/sites/default/files/files/FAES%20Catalog%202015-16%20FINAL.pdf. 
  18. "MLA '12 Preliminary Program" (PDF). Medical Library Association. 2012. p. 15. http://www.mlanet.org/d/do/1854. 
  19. Hooper-Lane, C. (2010). "2010 Conference Program Preview" (PDF). Biofeedback 35 (2): 2. http://dbiosla.org/publications/pubs/biofeedback/Spring2010.pdf. 
  20. "Bioinformatics Support Program". NIH Library. National Institutes of Health. https://nihlibrary.nih.gov/services/bioinformatics-support. Retrieved 09 March 2018. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article lists references in alphabetical order, by author; this version lists them in order of appearance, by design.