Journal:Transferring exome sequencing data from clinical laboratories to healthcare providers: Lessons learned at a pediatric hospital
|Full article title||Transferring exome sequencing data from clinical laboratories to healthcare providers: Lessons learned at a pediatric hospital|
|Journal||Frontiers in Genetics|
Swaminathan, Rajeswari; Huang, Yungui; Miller, Katherine; Pastore, Matthew;|
Hashimoto, Sayaka; Jacobson, Theodora; Mouhlas, Danielle; Lin, Simon
|Author affiliation(s)||Nationwide Children's Hospital|
|Primary contact||Email: simon dot lin at nationwidechildrens dot org|
|Editors||Patrinos, George P.|
|Volume and issue||9|
|Distribution license||Creative Commons Attribution 4.0 International|
The adoption rate of genome sequencing for clinical diagnostics has been steadily increasing, leading to the possibility of improvement in diagnostic yields. Although laboratories generate a summary clinical report, sharing raw genomic data with healthcare providers is equally important, both for secondary research studies as well as for a deeper analysis of the data itself, as seen by the efforts from organizations such as American College of Medical Genetics and Genomics, as well as Global Alliance for Genomics and Health. Here, we aim to describe the existing protocol of genomic data sharing between a certified clinical laboratory and a healthcare provider and highlight some of the lessons learned. This study tracked and subsequently evaluated the data transfer workflow for 19 patients, all of whom consented to be part of this research study and visited the genetics clinic at a tertiary pediatric hospital between April 2016 and December 2016. Two of the most noticeable elements observed through this study are the manual validation steps and the discrepancies in patient identifiers used by a clinical lab vs. healthcare provider. Both of these add complexity to the transfer process as well as make it more susceptible to errors. The results from this study highlight some of the critical changes that need to be made in order to improve genomic data sharing workflows between healthcare providers and clinical sequencing laboratories.
Keywords: genomic data sharing, genomic data transfer, whole exome sequencing, clinical genomics, interoperability, laboratory workflows
The rate of genome sequencing is rising sharply, leading to the generation of substantial volumes of data. Despite the surge in data generation, utilizing the wealth of knowledge embedded in that data for the improvement of clinical outcomes is still lagging behind.Additional research is still required in order to better associate genes/variants with diseases. Currently, clinical laboratories return a summary report back to the ordering physician. However, depending on the complexity of the disease—as well as the availability of information within knowledge bases—not every report ends up with a diagnosis. In many cases, when a sequencing rest is unable to detect the underlying genetic cause, clinicians may choose to obtain the raw sequencing data (available as FASTQ, VCF, or BAM files) and perform a more detailed research study/analysis on it, in hopes of untangling some of the complex details associated with the case. However, the underlying decision to share data ultimately rests in the hands of the patient/participant. Sharing sequencing data directly with the patient itself can also be beneficial, especially when a researcher does not have adequate resources to return any clinically actionable information back to the patient. Sharing data directly with individuals makes them feel empowered and better controls the further flow of their confidential information. There are currently several initiatives, such as GenomeConnect, My Research Legacy by the American Heart Association, etc. that are involved in sharing biomedical information for research and health purposes. Although there are several challenges associated with patient-controlled sharing of genomic data, it is not within the scope of the current study.
At present, clinical laboratories either load the data onto hard drives or Universal Serial Bus (USB) drives and ship them to the providers or directly transfer data over a secure network. There is currently no standard protocol for transferring sequencing data from laboratories to healthcare providers. Through this study, we aim to describe the current state of the genomic data transfer process, specifically, data obtained from whole-exome sequencing (WES) studies between sequencing laboratories and healthcare providers and highlight some of the key lessons learned.
Materials and methods
During the observation period of this study from April 2016 to December 2016, samples from 122 patients admitted to a tertiary pediatric hospital and ordered for WES testing were sent to a genetic laboratory accredited by College of American Pathologists (CAP) and certified by the Clinical Laboratory Improvement Amendments (CLIA). Since genomic data is considered private and confidential, explicit consent had to be obtained from the patients in order to be able to use their data for research purposes. Nineteen of the 122 patients provided consent to have their WES data transferred from the laboratory to the researchers associated with the provider institution. There are many reasons for not being able to obtain patient consent, starting with participants having a complete lack of interest in research all the way to having to face discriminatory treatment in the event of being diagnosed with a high-risk disease mutation. The workflow, as shown in Figure 1 below, describes the steps involved from consenting the patient to receiving data back from the laboratory. For all 19 patients, the consent for WES as well as for raw data release were obtained on the same day by the same provider. Turnaround time for WES report release is approximately 12 weeks. Once the report is released, the raw data is independently released by the laboratory.
For securely transferring large volumes of health data, the laboratory in this study uses a “Managed File Transfer System” (MFTS), a service providing fine-grained access and control features over using simple Secure File Transfer Protocol (SFTP) clients. The MFTS service uses both SFTP and HyperText Transfer Protocol Secure (HTTPS) protocols underneath for performing data transfers, and users can download the data through either client. The FASTQ files are deposited on a laboratory server, where they stay up to 90 days, from the date of upload. The laboratory sends a notification to the provider email address listed on the data release consent form. Validation is performed by comparing the identifier on the notification with the identifier listed on the data release form to ensure integrity of the data being downloaded. Healthcare providers are given a secure login-based access to a restricted section on the server, containing only the data from their consented patients.
As seen in Table 1, the time taken by the laboratory to process each of the data release requests varied considerably. The “–” in some places is due to missing information on some of the WES report release dates. The average turnaround time from the time of test report release to having the raw data ready for download was around 9.7 weeks, with a maximum of 26 weeks, minimum of one week, and standard deviation of 8.5 weeks. The huge difference in processing times in the early cases compared to those toward the end can be attributed to the improvement in process workflow along the course of this study. When the study began, there was no standardized process in place for sending files over from the laboratory to the healthcare provider. Further, there were no protocols in place for creating specific users for the healthcare provider to access and download the data. However, as the process was repeatedly applied on subsequent cases, there was an iterative improvement to the workflow, as can be seen by the significant decrease in processing times.
Paper-based patient consents obtained by the genetic counselors are physically sent to the genetic laboratory along with the blood or DNA sample, printed medical records, and other appropriate information. We observed challenges in consistently providing all of the required information to the laboratory. One of the challenges to this manual process is the possibility of dealing with missing information. There were two cases in the current study where patient consent forms were missing yet the data was available for download. On the other hand, there was a single case of a patient who provided consent, but there was no data available for download. Each time the data is available for download, a manual notification needs to be sent by the laboratory personnel to the provider, alerting them of the availability of data, which can lead to unnecessary wait times. Thirdly, there are discrepancies between the provider and the laboratory in uniquely identifying a sample. In this study, the consent forms by the provider used patient name and DOB, but the sequencing lab assigned a DNA sample number to uniquely identify each patient in the data download notification. One of the data release forms did mention the DNA sample number, but the others used a combination of patient name and date of birth. The email notifications sent by the sequencing lab notifying the healthcare provider that the FASTQ files are ready for download also uses the DNA sample number as the identifier. It is necessary to verify the DNA sample number in the data download notification matches with the identifier on the consent forms to make sure only data with appropriate consents are being transferred, thereby introducing an additional mapping step. Although the workflow became more robust and the processing times reduced significantly toward the end of the study, the process is not completely free of manual interference.
The results from this study highlight an urgent need to implement automated systems to improve information exchange between healthcare providers and clinical genetic laboratories. As stated by the American College of Medical Genetics and Genomics (ACMG), genomic data sharing is extremely important for the development of new diagnostic techniques and therapeutics that will ultimately lead to an improvement of patient care and understanding of disease. The importance of genomic data and its impact on health outcomes is also entering the minds of patients now. Since the ultimate owner of the data are the patients themselves, it is important that they realize this need in order to provide the required consent. Repeated sessions of genetic counseling and the widespread information available on the internet have helped educate patients to a considerable extent. Having manual control of a possibly frequently used process in the future can lead to unwanted errors. Using the electronic health record (EHR) system to store all this data comes with the advantage that triggers could be set in place to validate all of the incoming and outgoing data as well as send automated notifications. On a shared note, since patients often see multiple healthcare providers during their lifetime and have their data shared across multiple provider institutions, an interoperable application programming interface (API) connecting the different systems would also be required in the future. This will eliminate the hassle of writing individual programs for each of the data access requests. In order to access genomic data across multiple systems, existing consortiums such as the Global Alliance for Genomics and Health (GA4GH) provide an interoperable genomics framework that can be accessed through an API. Additionally, the Office of the National Coordinator for Health Information Technology (ONC) encourages those involved in health IT to contribute to the development of a defined, shared roadmap leveraging health IT interoperability to ultimately protect and advance healthcare for all.
Similar to how all research sequencing data is stored in the centralized repository, dbGaP, sequencing laboratories can also deposit all of the clinical sequencing data into a similar centralized location and later provide appropriate access to researchers. The genomic world is also looking into the possibility of using a blockchain framework for the seamless sharing of sensitive genomic information. Instead of sharing data with the healthcare providers, who would eventually pass it on to the research community, the sequencing laboratories can also consider sharing the data directly with the patient themselves, who own that data. This way even if the data needs to be shared with multiple researchers, it can be taken care of by the patient themselves.
The current methods of secure data transfer, mainly by shipping hard drives, can be costly to providers (~150–200 USD). One prospective option is to store data in a centralized cloud and provide access to interested parties in a secure manner. Although the concept of the Health Insurance Portability and Accountability Act (HIPAA)-compliant cloud is slowly coming into existence, maintaining security and privacy of genomic data in the cloud still remains an outstanding question for many organizations.
In conclusion, there is massive potential to leverage genomic data to advance human health overall. The medical community needs to be able to share genomic data to achieve better and improved patient outcomes. Our study highlights some of the hurdles that can be encountered and some potential ways to address them in order to achieve the path to successful implementation of secure and efficient genomic data transfer and sharing.
ACMG, American College of Medical Genetics and Genomics
API, application programming interface
CAP, College of American Pathologists
CLIA, Clinical Laboratory Improvement Amendments
EHR, electronic health record
GA4GH, Global Alliance for Genomics and Health
HIPAA, Health Insurance Portability and Accountability Act
HTTPS, HyperText Transfer Protocol Secure
MFTS, managed file transfer system
MRN, medical record number
ONC, Office of the National Coordinator for Health Information Technology
SFTP, Secure File Transfer Protocol
WES, whole-exome sequencing
The authors wish to thank Ashley Kubatko for project management.
RS, YH, and SL conceived and designed the study. MP, SH, TJ, and DM obtained consent from patients and worked on obtaining the required data for the study. RS, YH, KM, and SL drafted the manuscript. All authors read, edited, and approved the final manuscript as written.
Funding for this study was provided by SL institutional faculty start-up funding at the Research Institute of Nationwide Children's Hospital.
All 19 patients whose data has been used as part of this study consented for their data to be used for research studies. Since this is just a Quality Improvement (QI) project, there was no requirement to pass through the ethics committee. There was no analysis or manipulation done to data from any of the patients.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
- Ginsburg, G. (2014). "Medical genomics: Gather and use genetic data in health care". Nature 508 (7497): 451–3. doi:10.1038/508451a. PMID 24765668.
- Middleton, A.; Wright, C.F.; Morley, K.I. et al. (2015). "Potential research participants support the return of raw sequence data". Journal of Medical Genetics 52 (8): 571–4. doi:10.1136/jmedgenet-2015-103119. PMC PMC4518751. PMID 25995218. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC4518751.
- Shabani, M.; Vears, D.; Borry, P. (2018). "Raw Genomic Data: Storage, Access, and Sharing". Trends in Genetics 34 (1): 8–10. doi:10.1016/j.tig.2017.10.004. PMID 29132689.
- Miller. K.E.; Lin, S.M. (2017). "Addressing a patient-controlled approach for genomic data sharing". Genetics in Medicine 19 (11): 1280–1. doi:10.1038/gim.2017.36. PMID 28425983.
- "Explanation of the FTP and SFTP protocols". Know-how - Wise-FTP. AceBIT GmbH. https://www.wise-ftp.com/know-how/ftp_and_sftp.htm.
- ACMG Board of Directors (2017). "Laboratory and clinical genomic data sharing is crucial to improving genetic health care: a position statement of the American College of Medical Genetics and Genomics". Genetics in Medicine 19 (7): 721-722. doi:10.1038/gim.2016.196. PMID 28055021.
- Morgan, T.; Schmidt, J.; Haakonsen, C. et al. (2014). "Using the internet to seek information about genetic and rare diseases: A case study comparing data from 2006 and 2011". JMIR Research Protocols 3 (1): e10. doi:10.2196/resprot.2916. PMC PMC3961701. PMID 24565858. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC3961701.
- Global Alliance for Genomics and Health (2016). "GENOMICS. A federated ecosystem for sharing genomic, clinical data". Science 352 (6291): 1278-80. doi:10.1126/science.aaf6162. PMID 27284183.
- Office of the National Coordinator for Health Information Technology (October 2015). "Connecting Health and Care for the Nation: A Shared Nationwide Interoperability Roadmap" (PDF). https://www.healthit.gov/sites/default/files/hie-interoperability/nationwide-interoperability-roadmap-final-version-1.0.pdf.
- Tryka, K.A.; Hao, L.; Sturcke, A. et al. (2014). "NCBI's Database of Genotypes and Phenotypes: dbGaP". Nucleic Acids Research 42 (DB1): D975–9. doi:10.1093/nar/gkt1211. PMC PMC3965052. PMID 24297256. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC3965052.
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original paper listed references alphabetically; this wiki lists them by order of appearance, by design. The sole footnote was turned into an inline reference for convenience.