Journal:Design, implementation and operation of a multimodality research imaging informatics repository
|Full article title||Design, implementation and operation of a multimodality research imaging informatics repository|
|Journal||Health Information Science and Systems|
|Author(s)||Nguyen, Toan D.; Raniga, Parnesh; Barnes, David G.; Egan, Gary F.|
|Author affiliation(s)||Monash University, CSIRO|
|Primary contact||Email: firstname.lastname@example.org|
|Volume and issue||3 (Suppl 1)|
|Distribution license||Creative Commons Attribution 4.0 International|
|This article should not be considered complete until this message box has been removed. This is a work in progress.|
Background: Biomedical imaging research increasingly involves acquiring, managing and processing large amounts of distributed imaging data. Integrated systems that combine data, meta-data and workflows are crucial for realising the opportunities presented by advances in imaging facilities.
Methods: This paper describes the design, implementation and operation of a multi-modality research imaging data management system that manages imaging data obtained from biomedical imaging scanners operated at Monash Biomedical Imaging (MBI), Monash University in Melbourne, Australia. In addition to Digital Imaging and Communications in Medicine (DICOM) images, raw data and non-DICOM biomedical data can be archived and distributed by the system. Imaging data are annotated with meta-data according to a study-centric data model and, therefore, scientific users can find, download and process data easily.
Results: The research imaging data management system ensures long-term usability, integrity inter-operability and integration of large imaging data. Research users can securely browse and download stored images and data, and upload processed data via subject-oriented informatics frameworks including the Distributed and Reflective Informatics System (DaRIS), and the Extensible Neuroimaging Archive Toolkit (XNAT).
Modern clinical and biomedical research is increasingly reliant on imaging across a range of electromagnetic and acoustic wavelengths. Contemporary studies now routinely collect images from more than one type of instrumentation - multi-modal studies - and strive to obtain high spatial and/or temporal resolution data. Multi-modal datasets provide complementary information and enable sophisticated, multivariate analysis, while high-resolution datasets provide insight that was not possible only a few years ago. Extremely large multi-modal imaging studies can result in terabyte (TB) size data collections, although most research studies generate data in the megabyte (MB) to gigabyte (GB) range per subject.
The data volume per subject is multiplied by the increasing number of subjects per study. Many of today's high profile biomedical imaging studies have hundreds to thousands of participants. Furthermore, many of these studies are longitudinal in nature and thus collect imaging data at multiple time points per subject. This multiplier effect results in a large collection of data that must be recorded per subject. Along with the imaging data, non-imaging and meta-data may also collected and should be stored and directly associated with the image data, especially if the data will be mined and/or shared.
Clinical informatics systems such as clinical picture archiving and communication systems (PACS) are commonplace, but their design, specifically for clinical settings, precludes effective use in a research environment. For example, the majority of PACS store data only in the Digital Imaging and Communications in Medicine (DICOM) format. The DICOM format consists of a binary header of tag/value pairs. The tags (2 bytes) are keys but the descriptions of tags are stored independently in DICOM dictionaries and not in the data itself. The type of the value is contained in the tag/value pair, which enables the accurate reading of the data and meta-data. Binary data is stored as a tag/value pair.
Neuroimaging processing and analysis is however typically conducted using a myriad of proprietary formats such as MINC, MRTrix image File (mif) and Freesurfer File Format (mgh). Recently, the Neuroimaging Informatics Technology Initiative (NIfTI) has provided a reference file format that is starting to become universally accepted and utilised. The reason for the use of non-DICOM format was that traditionally, DICOM data for imaging modalities was stored as a single 2D image per file. For large 3D datasets, this means a lot of repetition of meta-data and slow reading of the data. The newer DICOM 3.0 format has alleviated some of these performance issues but at the expense of simplicity. Moreover the DICOM standards define a set of required meta-data based on the acquisition modality. Many of these required fields do not make sense for processed data and other relevant meta-data would need to be stored as DICOM tags which may not be understandable by all software.
The limitations of a solitary supported image format notwithstanding, it is not possible to keep track of and provide provenance for processed image or sensor data, which is usually in non-DICOM formats such as the NIfTI format. While some proprietary formats include support for meta-data by storing key value pairs, no such ability is present in NIfTI for example. Examples of such meta-data include the diffusion direction table that was used to acquire diffusion magnetic resonance imaging (MRI), control/tag/reference flags for arterial spin labelling images, various reference images and parameters for magnetic resonance (MR) spectroscopy data. Most of this meta-data is encoded as tag/values in the DICOM header but is lost on conversion to other types. Other types of meta-data include descriptions of the type of data (e.g. brain gray matter segmentation) and of the tools and/or pipelines that generated the data. Typically the later is done by utilising common naming conventions. However this can lead to ambiguity if all users and all tools do not implement the convention. Moreover, only a limited amount of information can be stored in this format.
Furthermore the most commonly used DICOM data model is a subject (patient)-centric model. While the DICOM standard allows for a data model that is project-centric, such as the clinical trial information entity, but in practise, PACS usually do not support this feature. The patient or subject centric model in DICOM has been developed with the clinic in mind. Each subject/patient is assumed to be independent of the other with little in common and it is not possible to group subjects together. Moreover the DICOM model does not inherently support the idea of longitudinal studies where the same patient is repeatedly scanned, some time interval apart. The ability to organise and quickly access data based on a project centric data model is essential to research applications which are project centric by nature.
Apart from the need for storing acquired data, research projects require the storage of post-processed data. The raw data is put through various automated and semi-automated algorithms to produce images and well as other data types and statistics. A description of all the processing steps and parameters needs to be stored with the data in order to keep track of how the final data was obtained. This provenance information is crucial in also keeping track of potential changes that may have occurred over different processing runs as well to search for data across projects that maybe similarly acquired and processed.
The need for raw data collection and management, as well the accurate recording of data provenance of processed data, for large biomedical imaging research studies, has resulted in the recent development of software packages, unlike clinical picture archiving and communication systems (PACS), that are designed specifically for research studies. Along with the collection and storage of the primary data, these systems have been designed to store processed data as well as provenance information regarding the processing steps, although tight integration of the provenance information within the informatics platform is still under active research and development. Currently, in many such systems, provenance information is just another piece of meta-data that is optional. It's formatting and contents are up to the users. With tight integration, the province information would be required, would follow a known format and be ingestible by the system. The difficulty with this is that no universal standard for provenance in medical imaging exists either. Processed data storage and access is a critically important area since the size of processed datasets can be many tens of times larger than the original dataset, and in many cases are expensive to recompute.
While informatics platforms for medical imaging are available, implementing an informatics strategy at a research-focused imaging facility is a challenging task. It depends on integrating acquisition systems (modalities) with good imaging informatics practise realised as a data model-based system, underpinned by archival-grade data storage infrastructure, and complete with functional and practical user interfaces. Most of the informatics platforms are oriented around the Project-Subject-Study-Data (PSSD) model but differ slightly in their implementation details and access methods. In this paper we describe the implementation of the informatics systems and data flows at the Monash Biomedical Imaging (MBI) facility at Monash University. Moreover we describe how we developed a set of tools and standard practises to that have enabled the efficient storage and access of biomedical imaging data.
- Lauterbur, P.C. (1973). "Image Formation by Induced Local Interactions: Examples Employing Nuclear Magnetic Resonance". Nature 242 (5394): 190-1. doi:10.1038/242190a0.
- Wang, X.; Pang, Y.; Ku, G.; Xie, X.; Stoica, G.; Wang, L.V. (2003). "Noninvasive laser-induced photoacoustic tomography for structural and functional in vivo imaging of the brain". Nature Biotechnology 21 (7): 803-6. doi:10.1038/nbt839. PMID 12808463.
- Ledley, R.S.; Di Chiro, G.; Luessenhop, A.J.; Twigg, H.L. (1974). "Computerized transaxial x-ray tomography of the human body". Science 186 (4160): 207-12. doi:10.1126/science.186.4160.207. PMID 4606376.
- Pichler, B.J.; Kolb, A.; Nägele, T.; Schlemmer, H.-P. (2010). "PET/MRI: Paving the Way for the Next Generation of Clinical Multimodality Imaging Applications". Journal of Nuclear Medicine 51 (3): 333-6. doi:10.2967/jnumed.109.061853. PMID 20150252.
- Cherry, S.R. (2009). "Multimodality Imaging: Beyond PET/CT and SPECT/CT". Seminars in Nuclear Medicine 39 (5): 348-53. doi:10.1053/j.semnuclmed.2009.03.001. PMC PMC2735449. PMID 19646559. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC2735449.
- Amunts, K.; Lepage, C.; Borgeat, L. et al. (2013). "BigBrain: An Ultrahigh-Resolution 3D Human Brain Model". Science 340 (6139): 1472-5. doi:10.1126/science.1235381. PMID 23788795.
- Ellis, K.A.; Bush, A.I.; Darby, D. et al. (2009). "The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer's disease". International Psychogeriatrics 21 (4): 672-87. doi:10.1017/S1041610209009405. PMID 19470201.
- Mueller, S.G.; Weiner, M.W.; Thal, L.J. et al. (2005). "Ways toward an early diagnosis in Alzheimer's disease: the Alzheimer's Disease Neuroimaging Initiative (ADNI)". Alzheimer's & Dementia 1 (1): 55–66. doi:10.1016/j.jalz.2005.06.003. PMC PMC1864941. PMID 17476317. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC1864941.
- Hofman, A.; Breteler, M.M.B.; van Duijn, C.M. et al. (2009). "The Rotterdam Study: 2010 objectives and design update". European Journal of Epidemiology 24 (9): 553-72. doi:10.1007/s10654-009-9386-z. PMC PMC2744826. PMID 19728115. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC2744826.
- Linkert, M.; Rueden, C.T.; Allan, C. et al. (2010). "Metadata matters: access to image data in the real world". Journal of Cell Biology 189 (5): 777-82. doi:10.1083/jcb.201004104. PMC PMC2878938. PMID 20513764. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC2878938.
- Bryan, S.; Weatherburn, G.C.; Watkins, J.R.; Buxton, M.J. et al. (1999). "The benefits of hospital-wide picture archiving and communication systems: a survey of clinical users of radiology services". British Journal of Radiology 72 (857): 469-78. doi:10.1259/bjr.72.857.10505012. PMID 10505012.
- "BIC - The McConnell Brain Imaging Centre: Home Page". McConnell Brain Imaging Centre. http://www.bic.mni.mcgill.ca/ServicesSoftware/HomePage. Retrieved 17 June 2014.
- Tournier, J.-D.; Calamante, F.; Connelly, A. et al. (2012). "MRtrix: Diffusion tractography in crossing fiber regions". International Journal of Imaging Systems and Technology 22 (1): 53–66. doi:10.1002/ima.22005.
- "FreeSurfer". Harvard University. http://surfer.nmr.mgh.harvard.edu/. Retrieved 17 June 2014.
- "NIfTI: Neuroimaging Informatics Technology Initiative". Data Format Working Group. http://nifti.nimh.nih.gov/. Retrieved 17 June 2014.
- Marcus, D.S.; Olsen, T.R.; Ramaratnam, M.; Buckner, R.L. et al. (2007). "The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data". Neuroinformatics 5 (1): 11-34. doi:10.1385/NI:5:1:11. PMID 17426351.
- Lohrey, J.M.; Killeen, N.E.B.; Egan, G.F. et al. (2009). "An integrated object model and method framework for subject-centric e-Research applications". Frontiers in Neuroinformatics 3: 19. doi:10.3389/neuro.11.019.2009. PMC PMC2715266. PMID 19636389. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC2715266.
- Book, G.A.; Anderson, B.M.; Stevens, M.C.; Glahn, D.C.; Assaf, M.; Pearlson, G.D. (2013). "Neuroinformatics Database (NiDB)--a modular, portable database for the storage, analysis, and sharing of neuroimaging data". Neuroinformatics 11 (4): 495-505. doi:10.1007/s12021-013-9194-1. PMC PMC3864015. PMID 23912507. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC3864015.
- Van Horn, J.D.; Toga, A.W. (2009). "Is it Time to Re-Prioritize Neuroimaging Databases and Digital Repositories?". NeuroImage 47 (4): 1720-34. doi:10.1016/j.neuroimage.2009.03.086. PMC PMC2754579. PMID 19371790. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC2754579.
- MacKenzie-Graham, A.J.; Van Horn, J.D.; Woods, R.P.; Crawford, K.L.; Toga, A.W. (2008). "Provenance in neuroimaging". NeuroImage 42 (1): 178-95. doi:10.1016/j.neuroimage.2008.04.186. PMC PMC2664747. PMID 18519166. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=PMC2664747.
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. In one case a direct reference to a citation number was changed to reference the author names instead.