Difference between revisions of "Journal:Development of an informatics system for accelerating biomedical research"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 27: Line 27:


==Abstract==
==Abstract==
The Biomedical Research Informatics Computing System (BRICS) was developed to support multiple disease-focused research programs. Seven service modules are integrated together to provide a collaborative and extensible web-based environment. The modules—Data Dictionary, Account Management, Query Tool, Protocol and Form Research Management System, Meta Study, Data Repository, and Globally Unique Identifier—facilitate the management of research protocols, including the submission, processing, curation, access, and storage of clinical, imaging, and derived [[genomics]] data within the associated data repositories. Multiple instances of BRICS are deployed to support various biomedical research communities focused on accelerating discoveries for rare diseases, traumatic brain injuries, Parkinson’s disease, inherited eye diseases, and symptom science research. No personally identifiable [[information]] is stored within the data repositories. Digital object identifiers are associated with the research studies. Reusability of biomedical data is enhanced by common data elements (CDEs), which enable systematic collection, [[Data analysis|analysis]], and sharing of data. The use of CDEs with a service-oriented [[Informatics (academic field)|informatics]] architecture enabled the development of disease-specific repositories that support hypothesis-based biomedical research.
The Biomedical Research Informatics Computing System (BRICS) was developed to support multiple disease-focused research programs. Seven service modules are integrated together to provide a collaborative and extensible web-based environment. The modules—Data Dictionary, Account Management, Query Tool, Protocol and Form Research Management System, Meta Study, Data Repository, and Globally Unique Identifier—facilitate the management of research protocols, including the submission, processing, curation, access, and storage of clinical, imaging, and derived [[genomics]] data within the associated data repositories. Multiple instances of BRICS are deployed to support various biomedical research communities focused on accelerating discoveries for rare diseases, traumatic brain injuries, Parkinson’s disease, inherited eye diseases, and symptom science research. No personally identifiable [[information]] is stored within the data repositories. Digital object identifiers (DOIs) are associated with the research studies. Reusability of biomedical data is enhanced by common data elements (CDEs), which enable systematic collection, [[Data analysis|analysis]], and sharing of data. The use of CDEs with a service-oriented [[Informatics (academic field)|informatics]] architecture enabled the development of disease-specific repositories that support hypothesis-based biomedical research.


'''Keywords''': informatics system, biomedical repository, translational research, FAIR
'''Keywords''': informatics system, biomedical repository, translational research, FAIR
Line 60: Line 60:


An overview of the current [[Informatics (academic field)|informatics]] system architecture is provided in Figure 1. The architecture is defined by its three layers: the (a) Presentation Layer, (b) Application Layer, and (c) Data Layer.  
An overview of the current [[Informatics (academic field)|informatics]] system architecture is provided in Figure 1. The architecture is defined by its three layers: the (a) Presentation Layer, (b) Application Layer, and (c) Data Layer.  


[[File:Fig1 Navale F1000Research2020 8.gif|800px]]
[[File:Fig1 Navale F1000Research2020 8.gif|800px]]
Line 85: Line 86:


Researchers can use the GUID tool (shown as a client in Figure 1) to support the de-identification of data and assign a unique identifier for each study participant. The GUID is a random alphanumeric unique subject identifier that is not directly generated from personally identifiable information (PII).<ref name="JohnsonUsing10">{{cite journal |title=Using global unique identifiers to link autism collections |journal=JAMIA |author=Johnson, S.B.; Whitney, G.; Mcauliffe, M. et al. |volume=17 |issue=6 |pages=689–95 |year=2010 |doi=10.1136/jamia.2009.002063 |pmid=20962132 |pmc=PMC3000750}}</ref> Generating a GUID involves inputting a required set of reproducible and invariant subject information, typically found on the subject’s birth certificate, into a client application. The PII fields include complete legal given (first) name of subject at birth, middle name (if available), complete legal family (last) name of subject at birth, day of birth, month of birth, year of birth, name of city/municipality in which the subject was born, and country of birth. The PII data is not sent to the GUID server; rather, one-way encrypted hash codes are created and sent from the GUID client to the server (represented as a service module, Figure 1), allowing the PII to reside only on the researcher’s site. A random number for each research participant is generated by the server and is returned to the researcher. The same GUID is provided if the participant is enrolled in multiple studies. The GUID server can be configured to support multi-center clinical trials and investigations that enroll research participants across various programs.
Researchers can use the GUID tool (shown as a client in Figure 1) to support the de-identification of data and assign a unique identifier for each study participant. The GUID is a random alphanumeric unique subject identifier that is not directly generated from personally identifiable information (PII).<ref name="JohnsonUsing10">{{cite journal |title=Using global unique identifiers to link autism collections |journal=JAMIA |author=Johnson, S.B.; Whitney, G.; Mcauliffe, M. et al. |volume=17 |issue=6 |pages=689–95 |year=2010 |doi=10.1136/jamia.2009.002063 |pmid=20962132 |pmc=PMC3000750}}</ref> Generating a GUID involves inputting a required set of reproducible and invariant subject information, typically found on the subject’s birth certificate, into a client application. The PII fields include complete legal given (first) name of subject at birth, middle name (if available), complete legal family (last) name of subject at birth, day of birth, month of birth, year of birth, name of city/municipality in which the subject was born, and country of birth. The PII data is not sent to the GUID server; rather, one-way encrypted hash codes are created and sent from the GUID client to the server (represented as a service module, Figure 1), allowing the PII to reside only on the researcher’s site. A random number for each research participant is generated by the server and is returned to the researcher. The same GUID is provided if the participant is enrolled in multiple studies. The GUID server can be configured to support multi-center clinical trials and investigations that enroll research participants across various programs.
==Data submission and processing==
Institutional grants (e.g., from the Department of Defense [DOD] or NIH) that support disease-specific research require data submission to a specific BRICS instance. For example, TBI researchers receiving grants from the DOD and NIH are required to submit data to FITBIR. A concerted approach of submitting study data to a BRICS instance facilitates data reuse, validation, and aggregation with other studies, thereby supporting meta-analysis of clinical studies. Currently, BRICS instance repositories contain patient assessment (form) data, imaging, electroencephalogram (EEG), magnetoencephalography (MEG), and derived [[genomics]] data. Researchers are responsible for data submission activities, which includes form structure approval, eForms review, curation, data element mapping, and study documentation development that describes data collected in the study. However, review and approval for using form structures is carried out by the data curator and the disease area program lead.
For clinical research work, the ProFoRMS tool can be used for scheduling subject visits, collecting data, adding new data, modifying previously collected data entries, and correcting discrepancies (Figure 2, Stage 1). Using ProFoRMS provides for automatic validation with data dictionaries associated with each of the BRICS instance(s). The data dictionaries were developed by collaborative efforts of disease area experts, including the NINDS, DOD, and National Library of Medicine.<ref name="NINDSCommon" /><ref name="NIHCommon13" />
[[File:Fig2 Navale F1000Research2020 8.gif|900px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="900px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 2.''' Schematic representation of '''1.''' Submission Information Package (SIP), '''2.''' Archival Information Package (AIP) preparation, '''3.''' storage of AIPs, and '''4.''' Dissemination Information Packages (DIP) access</blockquote>
|-
|}
|}
Researchers have the option to collect data by generic system (e.g., REDCap); however, the output file from the generic system will have to be validated with the specific BRICS instance data dictionaries before being uploaded to the data repository (Figure 2, Stage 2).
The data submission file format uses comma-separated values (CSV) and is structured so that the data is consistent with CDE-variable names and data values. The validation tool supports the data repository and ProFoRMs modules by validating data against CDEs, which have defined ranges or permissible values. If the data contains errors, the user must correct the errors before a submission package can be generated and the data can be submitted. This validation, as part of the data submission process, is a major step towards making data reusable.
Once the data has been validated, it is uploaded via the submission upload tool. An original copy of the user submitted data (raw data) is maintained in the repository. Nightly, the raw data is loaded into the query tool’s database (Figure 2, Stage 3). Study-specific clinical, imaging, and derived genomics data are available for search and retrieval.
User support is provided for data stewardship activities that include training and assistance to authorized users for CDE implementation, data validation, and submission to the repositories. Access is controlled by a Data Access Committee (DAC) that reviews user applications to a specified BRICS instance (defined by the biomedical program). In addition, access to the system is role-based, and specific permissions are associated with roles such as principal investigator (PI), data manager, and data submitter.
During packaging of data in the CSV file, the users responsible for storing PII data locally within their institutional systems assign GUIDs to research subjects (patients) using the GUID client. Data curation is carried out by identifying the available standard forms and CDEs in the data dictionary. In the event no corresponding CDEs are available, then the user can define unique data elements and obtain approval during the submission process.
The data repository module serves as a central hub, providing functionality for defining and managing study information and storing the research data associated with each study (Figure 2, Stage 3). Authorized investigators can submit data to a BRICS instance and organize one or many datasets into a single entity called a "Study." In general terms, a Study is a container for the data to be submitted, allowing an investigator to describe, in detail, any data collected, and the methods used to collect the data, which makes data accessible to users. By using the repository user interface, researchers can generate digital object identifiers (DOIs) for a study, which can be referenced in research articles. Since the NIH has a DataCite membership, users can directly use the repository interface for producing the DOIs.
Study metadata is entered within a BRICS instance manually through graphical user interfaces. The metadata fields include title, organization, PI, data, funding source and IDs, study type(s), and keywords that enable users to search for detailed information (e.g., clinical trial Grant ID(s), start and end dates for grants, therapeutic agents, sample size, publications and/or forms used).
Each of the BRICS instances exposes metadata and summary consistent with their respective program goals. For example, FITBIR provides a metadata [[Data visualization|visualization]] tool that graphically supports searching study identification ([https://fitbir.nih.gov/visualization available here]).
Depending on the BRICS instance, investigators can download summary statistics for specific studies. BRICS-based repositories (e.g., eyeGene) host high-throughput gene expression, [[wikipedia:RNA-Seq|RNA-Seq]], [[wikipedia:Single-nucleotide polymorphism|single-nucleotide polymorphism]] (SNPs), and sequence variation datasets (Figure 2, Stage 3).
==Data sharing and access==


==References==
==References==

Revision as of 14:40, 13 September 2020

Full article title Development of an informatics system for accelerating biomedical research (Version 2)
Journal F1000Research
Author(s) Navale, Vivek; Ji, Micehle; Vovk, Olga; Misquitta, Leonie; Gebremichael, Tsega; Garcia, Alison; Fann, Yang; McAuliffe, Matthew
Author affiliation(s) National Institutes of Health; General Dynamics Information Technology, Inc.; Sapient Government Services
Primary contact Email: Vivek dot Navale at nih dot gov
Year published 2020
Volume and issue 8
Article # 1430
DOI 10.12688/f1000research.19161.2
ISSN 2046-1402
Distribution license Creative Commons Attribution 4.0 International
Website https://f1000research.com/articles/8-1430/v2
Download https://f1000research.com/articles/8-1430/v2/pdf (PDF)

Abstract

The Biomedical Research Informatics Computing System (BRICS) was developed to support multiple disease-focused research programs. Seven service modules are integrated together to provide a collaborative and extensible web-based environment. The modules—Data Dictionary, Account Management, Query Tool, Protocol and Form Research Management System, Meta Study, Data Repository, and Globally Unique Identifier—facilitate the management of research protocols, including the submission, processing, curation, access, and storage of clinical, imaging, and derived genomics data within the associated data repositories. Multiple instances of BRICS are deployed to support various biomedical research communities focused on accelerating discoveries for rare diseases, traumatic brain injuries, Parkinson’s disease, inherited eye diseases, and symptom science research. No personally identifiable information is stored within the data repositories. Digital object identifiers (DOIs) are associated with the research studies. Reusability of biomedical data is enhanced by common data elements (CDEs), which enable systematic collection, analysis, and sharing of data. The use of CDEs with a service-oriented informatics architecture enabled the development of disease-specific repositories that support hypothesis-based biomedical research.

Keywords: informatics system, biomedical repository, translational research, FAIR

Introduction

Biomedical informatics systems can be used for the management of heterogeneous data, testing of data analysis methods, dissemination of translational research, and the generation of high-throughput hypotheses.[1][2] In the past, many disease-focused research programs have collected data in dissimilar ways, which has resulted in difficulties for data aggregation and comparative analyses. For example, non-standard methods of data collection in traumatic brain injury (TBI) research have led to many different types of injuries to be classified within the same class of injury. To overcome this problem, in October 2007, the National Institute of Neurological Disorders and Stroke (NINDS), National Institute on Disability and Rehabilitation Research (NIDRR), the Defense and Veterans Brain Injury Center, and the Brain Injury Association of America sponsored a workshop to examine barriers to TBI clinical trial effectiveness. The workshop recommendation of improving data discoverability and integration in TBI research resulted in the development and implementation of common data elements (CDEs) and the Federal Interagency Traumatic Brain Injury Research (FITBIR) informatics system.[3]

A CDE is defined as a fixed representation of a variable collected within a specified clinical domain, interpretable unambiguously in human and machine-computable terms.[4] It consists of a precisely defined question with a set of permissible values as responses. Typically, CDE development for biomedical disease programs involves multiple steps: identifying a need for a CDE or group of CDEs, bringing together stakeholders and expert groups for selection, implementing various iterations and updates to initial CDE development based on ongoing input from the broader community, and finally endorsing of the CDEs for widespread usage and adoption by the stakeholder community.[5] Use of CDEs enhances data quality and consistency, which facilitates data reuse for clinical and translational research.

CDEs are used in various programs of clinical research, including in neuroscience[6], rare diseases research[7], and management of chronic conditions.[8] For clinical data lifecycle management, the use of CDEs provides a structured data collection process, which enhances the likelihood for data to be pooled and combined for meta-analyses, modelling, and post-hoc construction of synthetic cohorts for exploratory analyses.[9] Investigators working to develop protocols for data collection can also consult the NIH Common Data Element Resource Portal for using established CDEs for disease programs.[10]

In 2010, the Department of Defense and the NINDS initiated the development of FITBIR. The goal was to develop a centralized repository for TBI research, in order to foster collaboration between researchers working in the field. Additionally, the design of FITBIR called for the use of CDEs during TBI data collection.

Prior to the development of FITBIR, the National Database for Autism Research (NDAR) system had demonstrated the use of CDEs for autism research.[11] Certain design features such as the use of a globally unique identifier (GUID) scheme were adopted from NDAR for FITBIR. However, the NDAR model was dedicated for access and submission to federated databases for autism research. FITBIR, on the other hand, required development of a multi-program centralized repository.

The Biomedical Research Informatics Computing System (BRICS) was designed to address the wide-ranging needs of several biomedical research programs. The overall concept was to develop services that could be integrated together and deployed as instances for individual research programs. FITBIR was the first initial BRICS instance and was leveraged to develop other instances (e.g., the Parkinson’s disease program). A BRICS instance supports electronic data capture and use of data dictionaries for processing and storing data within disease-specific digital repositories.

Data dictionaries comprise data elements, form structures, and electronic forms (eForms). A data element has a name, precise definition, and clear permissible values, if applicable. A data element also directly relates to a question on a paper, eForm, and/or field(s) in a database record. Form structures serve as the containers for data elements, and eForms are developed using form structures as their foundation. The data dictionary provides defined CDEs, as well as unique data elements (UDEs), for specific BRICS instance implementation. Reuse of CDEs is significantly encouraged, and in the case of FITBIR’s data dictionary, it incorporates and extends the CDE definitions developed by the National Institute of Neurological Disorders and Stroke (NINDS) CDE project.[6]

This paper discusses the overall system design and an architecture that supports the various BRICS instances. The functionalities developed to use the CDEs for electronic data submission, processing, validation, and storage within designated repositories have been presented. System access is highlighted for searching across research studies within a BRICS instance. An example has been provided for BRICS implementation within an area of disease research (Parkinson’s disease). Also shown is the role of individual system components that enable data to be findable, accessible, interoperable, and reusable (FAIR).

BRICS system design and architecture

The system design was predicated on the adoption of a CDE-based data collection method. To satisfy this requirement, an electronic data collection tool (ProFoRMS) was developed to interface with data dictionaries, which enabled deployment of multiple instances of the system to disease area programs. This method of using CDEs early in the data life cycle facilitated data harmonization and minimized the need for elaborate post processing and curation work. Services were developed to support the various stages in the data life cycle. De-identification of each patient within a research study is supported by the use of a GUID. A de-identification tool was developed for researchers to use prior to submission of data to a specific BRICS instance. No personally identifiable information could be retained in the BRICS repositories.

Since BRICS development started in 2011, the Java Web Start technology has been used for deploying the tools shown in the presentation layer of the architecture (Figure 1, below). Although Java Web Start was deprecated in subsequent editions of Java after Oracle Java SE 8, free public updates and auto updates to Java SE 8 are provided by Oracle Inc., until at least the end of December 2020. GUID and download tools that initially used the Web Start technology have been migrated to the JavaScript client. The Submission tool will also be migrated to JavaScript client by end of 2020. During the transition period, users continue to maintain the Oracle Java SE 8 installed on their local computers.

An open-source database, PostgreSQL, was preferred over Oracle database during BRICS development, primarily to minimize individual licensing costs when deploying instances of the system to various biomedical programs. However, three separate PostgreSQL databases were used, one for data dictionaries and the other two for ProFoRMS' data repository and meta-analysis functionalities, respectively. Separate databases were needed because the data dictionary is shared and ProFoRMS was developed as an application that was integrated with the system.

The Virtuoso database uses the World Wide Web Consortium's Resource Description Framework (RDF) for accessing data that comes from data dictionary, data repository, and meta-analysis modules. Virtuoso contains data that are linked together in RDF, to support the query tool. The repository data is linked to metadata (studies and datasets) and the data dictionary, which is processed and stored in Virtuoso for querying. An advantage of using the RDF triple model is its flexibility to adapt to user-driven data requirement changes that can be made in the study repository or query tool. Once the data is added to the RDF graph as triples, regardless of where the data is stored, it can easily be retrieved and processed by the query tool.

Since the initial release of the BRICS platform, we have initiated a migration to the MongoDB database to take advantage of schema-free development. Currently, the GUID module has been migrated to use MongoDB. Other BRICS functionalities will eventually be migrated to also use MongoDB, thereby eliminating the need for using PostgreSQL in the BRICS architecture.

An overview of the current informatics system architecture is provided in Figure 1. The architecture is defined by its three layers: the (a) Presentation Layer, (b) Application Layer, and (c) Data Layer.


Fig1 Navale F1000Research2020 8.gif

Figure 1. A schematic representation of the informatics system architecture

The Presentation Layer provides a secure entry point through the BRICS portal. A login page is used to enter valid credentials with a central authentication system (CAS) to support single sign-on for users to access all the BRICS modules. Role-based access has also been implemented by using Spring Security (a Java/Java Enterprise Edition framework that provides authentication and authorization features) throughout the system to provide an additional level of controlled access to each of the modules. The GUID client, validation/upload, download, and image submission tools are accessible via the BRICS portal.

The Image Submission Package Creation Tool, a plugin to the Medical Image Processing Analysis and Visualization (MIPAV) application[12][13], leverages medical image file readers found in the MIPAV software application (version 8.0.2) to support the semi-automated submission of image data into the data repository. The plugin supports more than 35 file formats commonly used in medical imaging, including DICOM, NIfTI, Analyze, AFNI, and more. The Image Submission Tool extracts available image header metadata from the image and attempts to map that metadata onto the CDEs in the selected imaging form structure. The quality and amount of image header metadata that can be extracted out of an image volume will depend on the medical image file format, the scanner on which the images were acquired, and the de-identification process performed.

The Application Layer is responsible for the logic that determines the capabilities of the BRICS modules and tools. Seven service modules within the Application Layer are integrated together to provide a collaborative and extensible web-based environment. This includes the data dictionary, account management, query tool, Protocol and Form Research Management System (ProFoRMS), GUID, data repository, and meta-analysis modules. To communicate and exchange information between the modules, a representational state transfer (RESTful) interface for web services is used.[14] Additional information about the various service modules is available from the BRICS site.

The Data Layer consists of open-source databases including PostgreSQL, Virtuoso, and MongoDB. Since a typical query use case requires data from a repository, data dictionary, and meta-analysis module, it is much more efficient to store and access data in a single Virtuoso database. Instead of using resource-intensive joins in PostgreSQL, data can be accessed in Virtuoso by traversing RDF graph database. Having related data linked together in one place allows the query tool to quickly query repository data, an otherwise slow process. RDF is also used to support searching of studies, form structures, and data elements.

Also utilized are open-source libraries such as Hibernate and Apache Jena for storing and retrieving data from databases. Hibernate is an object-relational mapping framework used to map PostgreSQL data into Java objects. Using Hibernate reduces the amount of software code that would otherwise be required to translate tabular data from SQL into Java objects. Jena is a Java framework that enables interaction with semantic web applications; it is the Hibernate equivalent for semantic web, mapping the Virtuoso data into Java objects. Both of these frameworks support users’ requests for retrieving and storing data. A single library was not available to support data persistence, therefore Hibernate was used with PostgreSQL, and JENA was used to support Virtuoso’s RDF structure.

The Data Layer is supported by the physical infrastructure located within the National Institutes of Health (NIH). It is certified to operate at the Federal Information Security Modernization Act's (FISMA) moderate level.[15][16][17] In accordance with FISMA moderate systems, the BRICS system adheres to the National Institute of Standards and Technology's (NIST) Special Publication 800-53 and its cybersecurity standards and guidelines. The BRICS system is also certified to 21 CFR Part 11, and as part of the its requirements, a stringent audit trail has been implemented within the BRICS system to verify that digital objects have not been altered or corrupted.

Researchers can use the GUID tool (shown as a client in Figure 1) to support the de-identification of data and assign a unique identifier for each study participant. The GUID is a random alphanumeric unique subject identifier that is not directly generated from personally identifiable information (PII).[18] Generating a GUID involves inputting a required set of reproducible and invariant subject information, typically found on the subject’s birth certificate, into a client application. The PII fields include complete legal given (first) name of subject at birth, middle name (if available), complete legal family (last) name of subject at birth, day of birth, month of birth, year of birth, name of city/municipality in which the subject was born, and country of birth. The PII data is not sent to the GUID server; rather, one-way encrypted hash codes are created and sent from the GUID client to the server (represented as a service module, Figure 1), allowing the PII to reside only on the researcher’s site. A random number for each research participant is generated by the server and is returned to the researcher. The same GUID is provided if the participant is enrolled in multiple studies. The GUID server can be configured to support multi-center clinical trials and investigations that enroll research participants across various programs.

Data submission and processing

Institutional grants (e.g., from the Department of Defense [DOD] or NIH) that support disease-specific research require data submission to a specific BRICS instance. For example, TBI researchers receiving grants from the DOD and NIH are required to submit data to FITBIR. A concerted approach of submitting study data to a BRICS instance facilitates data reuse, validation, and aggregation with other studies, thereby supporting meta-analysis of clinical studies. Currently, BRICS instance repositories contain patient assessment (form) data, imaging, electroencephalogram (EEG), magnetoencephalography (MEG), and derived genomics data. Researchers are responsible for data submission activities, which includes form structure approval, eForms review, curation, data element mapping, and study documentation development that describes data collected in the study. However, review and approval for using form structures is carried out by the data curator and the disease area program lead.

For clinical research work, the ProFoRMS tool can be used for scheduling subject visits, collecting data, adding new data, modifying previously collected data entries, and correcting discrepancies (Figure 2, Stage 1). Using ProFoRMS provides for automatic validation with data dictionaries associated with each of the BRICS instance(s). The data dictionaries were developed by collaborative efforts of disease area experts, including the NINDS, DOD, and National Library of Medicine.[6][10]


Fig2 Navale F1000Research2020 8.gif

Figure 2. Schematic representation of 1. Submission Information Package (SIP), 2. Archival Information Package (AIP) preparation, 3. storage of AIPs, and 4. Dissemination Information Packages (DIP) access

Researchers have the option to collect data by generic system (e.g., REDCap); however, the output file from the generic system will have to be validated with the specific BRICS instance data dictionaries before being uploaded to the data repository (Figure 2, Stage 2).

The data submission file format uses comma-separated values (CSV) and is structured so that the data is consistent with CDE-variable names and data values. The validation tool supports the data repository and ProFoRMs modules by validating data against CDEs, which have defined ranges or permissible values. If the data contains errors, the user must correct the errors before a submission package can be generated and the data can be submitted. This validation, as part of the data submission process, is a major step towards making data reusable.

Once the data has been validated, it is uploaded via the submission upload tool. An original copy of the user submitted data (raw data) is maintained in the repository. Nightly, the raw data is loaded into the query tool’s database (Figure 2, Stage 3). Study-specific clinical, imaging, and derived genomics data are available for search and retrieval.

User support is provided for data stewardship activities that include training and assistance to authorized users for CDE implementation, data validation, and submission to the repositories. Access is controlled by a Data Access Committee (DAC) that reviews user applications to a specified BRICS instance (defined by the biomedical program). In addition, access to the system is role-based, and specific permissions are associated with roles such as principal investigator (PI), data manager, and data submitter.

During packaging of data in the CSV file, the users responsible for storing PII data locally within their institutional systems assign GUIDs to research subjects (patients) using the GUID client. Data curation is carried out by identifying the available standard forms and CDEs in the data dictionary. In the event no corresponding CDEs are available, then the user can define unique data elements and obtain approval during the submission process.

The data repository module serves as a central hub, providing functionality for defining and managing study information and storing the research data associated with each study (Figure 2, Stage 3). Authorized investigators can submit data to a BRICS instance and organize one or many datasets into a single entity called a "Study." In general terms, a Study is a container for the data to be submitted, allowing an investigator to describe, in detail, any data collected, and the methods used to collect the data, which makes data accessible to users. By using the repository user interface, researchers can generate digital object identifiers (DOIs) for a study, which can be referenced in research articles. Since the NIH has a DataCite membership, users can directly use the repository interface for producing the DOIs.

Study metadata is entered within a BRICS instance manually through graphical user interfaces. The metadata fields include title, organization, PI, data, funding source and IDs, study type(s), and keywords that enable users to search for detailed information (e.g., clinical trial Grant ID(s), start and end dates for grants, therapeutic agents, sample size, publications and/or forms used).

Each of the BRICS instances exposes metadata and summary consistent with their respective program goals. For example, FITBIR provides a metadata visualization tool that graphically supports searching study identification (available here).

Depending on the BRICS instance, investigators can download summary statistics for specific studies. BRICS-based repositories (e.g., eyeGene) host high-throughput gene expression, RNA-Seq, single-nucleotide polymorphism (SNPs), and sequence variation datasets (Figure 2, Stage 3).

Data sharing and access

References

  1. Sarkar, I.N. (2010). "Biomedical informatics and translational medicine". Journal of Translational Medicine 8: 22. doi:10.1186/1479-5876-8-22. PMC PMC2837642. PMID 20187952. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2837642. 
  2. Payne, P.R.O. (2012). "Chapter 1: Biomedical knowledge integration". PLoS Computational Biology 8 (12): e1002826. doi:10.1371/journal.pcbi.1002826. PMC PMC3531314. PMID 23300416. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531314. 
  3. Thompson, H.J.; Vavilala, M.S.; Rivara, F.P. (2015). "Chapter 1: Common Data Elements and Federal Interagency Traumatic Brain Injury Research Informatics System for TBI Research". Annual Review of Nursing Research 33 (1): 1–11. doi:10.1891/0739-6686.33.1. PMC PMC4704986. PMID 25946381. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4704986. 
  4. Silva, J.; Wittes, R. (1999). "Role of clinical trials informatics in the NCI's cancer informatics infrastructure". Proceedings AMIA Symposium: 950–4. PMC PMC2232686. PMID 10566501. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2232686. 
  5. Zentzis, B. (15 May 2017). "Common Data Element (CDE)". Clinfowiki. https://clinfowiki.org/wiki/index.php/Common_Data_Element_(CDE). Retrieved 03 April 2018. 
  6. 6.0 6.1 6.2 National Institutes of Health. "NINDS Commond Data Elements". National Institutes of Health. https://www.commondataelements.ninds.nih.gov/. Retrieved 03 April 2018. 
  7. Rubinstein, Y.R.; McInnes, P. (2015). "NIH/NCATS/GRDR Common Data Elements: A leading force for standardized data collection". Contemporary Clinical Trials 42: 78–80. doi:10.1016/j.cct.2015.03.003. PMC PMC4450118. PMID 25797358. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4450118. 
  8. Moore, S.M.; Schiffman, R.; Waldrop-Valverde, D. et al. (2016). "Recommendations of Common Data Elements to Advance the Science of Self-Management of Chronic Conditions". Journal of Nursing Scholarship 48 (5): 437–47. doi:10.1111/jnu.12233. PMC PMC5490657. PMID 27486851. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5490657. 
  9. Sheehan, J.; Hirschfeld, S.; Foster, E. et al. (2016). "Improving the value of clinical research through the use of Common Data Elements". Clinical Trials 13 (6): 671–76. doi:10.1177/1740774516653238. PMC PMC5133155. PMID 27311638. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133155. 
  10. 10.0 10.1 "Common Data Element (CDE) Resource Portal". National Library of Medicine. National Institutes of Health. 3 January 2013. https://www.nlm.nih.gov/cde/glossary.html. Retrieved 03 April 2018. 
  11. Hall, D.; Huerta, M.F.; Mcauliffe, M.J. et al. (2012). "Sharing heterogeneous data: the national database for autism research". Neuroinformatics 10 (4): 331–9. doi:10.1007/s12021-012-9151-4. PMC PMC4219200. PMID 22622767. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4219200. 
  12. Haak, D.; Page, C.-E.; Meserno, T.M. (2016). "A Survey of DICOM Viewer Software to Integrate Clinical Research and Medical Imaging". Journal of Digital Imaging 29 (2): 206-15. doi:10.1007/s10278-015-9833-1. PMC PMC4788610. PMID 26482912. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4788610. 
  13. Shah, J.. "MIPAV". NIH Center for Information Technology. https://mipav.cit.nih.gov/. Retrieved 06 November 2017. 
  14. Fielding, R.T. (2000). "Architectural Styles and the Design of Network-based Software Architectures" (PDF). University of California, Irvine. https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding_dissertation.pdf. 
  15. National Institute of Standards and Technology (19 March 2018). "Federal Information Security Management Act (FISMA) Implementation Project". https://www.nist.gov/programs-projects/federal-information-security-management-act-fisma-implementation-project. 
  16. National Institute of Standards and Technology (March 2006). "Minimum Security Requirements for Federal Information and Information Systems". https://csrc.nist.gov/publications/detail/fips/200/final. 
  17. National Institute of Standards and Technology (1 May 2010). "SP 800-53 Rev. 3 - Recommended Security Controls for Federal Information Systems and Organizations". https://csrc.nist.gov/publications/detail/sp/800-53/rev-3/archive/2010-05-01. 
  18. Johnson, S.B.; Whitney, G.; Mcauliffe, M. et al. (2010). "Using global unique identifiers to link autism collections". JAMIA 17 (6): 689–95. doi:10.1136/jamia.2009.002063. PMC PMC3000750. PMID 20962132. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3000750. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.