Difference between revisions of "User:Shawndouglas/Sandbox"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Replaced content with "<div class="nonumtoc">__TOC__</div> {{ombox | type = notice | style = width: 960px; | text = This is my primary sandbox page, where I play with features and...")
Tags: Manual revert Replaced
 
(98 intermediate revisions by the same user not shown)
Line 3: Line 3:
| type      = notice
| type      = notice
| style    = width: 960px;
| style    = width: 960px;
| text      = This is my sandbox, where I play with features and test MediaWiki code. If you wish to leave a comment for me, please see [[User_talk:Shawndouglas|my discussion page]] instead.<p></p>
| text      = This is my primary sandbox page, where I play with features and test MediaWiki code. If you wish to leave a comment for me, please see [[User_talk:Shawndouglas|my discussion page]] instead.<p></p>
}}
}}


==Sandbox begins below==
==Sandbox begins below==
{{Infobox journal article
|name        =
|image        =
|alt          = <!-- Alternative text for images -->
|caption      =
|title_full  = Developing a file system structure to solve healthcare big data storage and archiving problems using a distributed file system
|journal      = ''Applied Sciences''
|authors      = Ergüzen, Atilla; Ünver, Mahmut
|affiliations = Kırıkkale University
|contact      = Email: munver at kku dot edu dot tr
|editors      =
|pub_year    = 2018
|vol_iss      = '''8'''(6)
|pages        = 913
|doi          = [http://10.3390/app8060913 10.3390/app8060913]
|issn        = 2076-3417
|license      = [http://creativecommons.org/licenses/by/4.0/ Creative Commons Attribution 4.0 International]
|website      = [http://www.mdpi.com/2076-3417/8/6/913/htm http://www.mdpi.com/2076-3417/8/6/913/htm]
|download    = [http://www.mdpi.com/2076-3417/8/6/913/pdf http://www.mdpi.com/2076-3417/8/6/913/pdf] (PDF)
}}
{{ombox
| type      = content
| style    = width: 500px;
| text      = This article should not be considered complete until this message box has been removed. This is a work in progress.
}}
==Abstract==
Recently, the use of the internet has become widespread, increasing the use of mobile phones, tablets, computers, internet of things (IoT) devices, and other digital sources. In the healthcare sector, with the help of next generation digital medical equipment, this digital world also has tended to grow in an unpredictable way such that nearly 10 percent of global data is healthcare-related, continuing to grow beyond what other sectors have. This progress has greatly enlarged the amount of produced data which cannot be resolved with conventional methods. In this work, an efficient model for the storage of medical images using a distributed file system structure has been developed. With this work, a robust, available, scalable, and serverless solution structure has been produced, especially for storing large amounts of data in the medical field. Furthermore, the security level of the system is extreme by use of static Internet Protocol (IP) addresses, user credentials, and synchronously encrypted file contents. One of the most important key features of the system is high performance and easy scalability. In this way, the system can work with fewer hardware elements and be more robust than others that use name node architecture. According to the test results, the performance of the designed system is better than 97% from a Not Only Structured Query Language (NoSQL) system, 80% from a relational database management system (RDBMS), and 74% from an operating system (OS).
'''Keywords''': big data, distributed file system, health data, medical imaging
==Introduction==
In recent years, advances in information technology have increased worldwide; internet usage has exponentially accelerated the amount of data generated in all fields. The number of internet users was 16 million in 1995. This number reached 304 million in 2000, 888 million in 2005, 1.996 billion in 2010, 3.270 billion in 2015, and 3.885 billion in 2017.<ref name="ILS">{{cite web |url=http://www.internetlivestats.com/ |title=Internet Live Stats |work=InternetLiveStats.com |accessdate=16 July 2016}}</ref><ref name="KempDigital16">{{cite web |url=https://wearesocial.com/uk/special-reports/digital-in-2016 |title=Digital in 2016 |author=Kemp, S. |work=We Are Social |publisher=We Are Social Ltd |date=27 January 2016 |accessdate=27 June 2016}}</ref><ref name="IWS">{{cite web |url=https://www.internetworldstats.com/emarketing.htm |title=Internet Growth Statistics |work=Internet World Stats |publisher=Miniwatts Marketing Group |accessdate=21 May 2018}}</ref> Every day, 2.5 exabytes (EB) of data are produced worldwide. Also, 90% of globally generated data has been produced since 2015. The data generated are in many different fields such as aviation, meteorology, IoT applications, health, and energy sectors. Likewise, the data produced through social media has reached enormous volumes. Not only did Facebook.com store 600 terabytes (TB) of data a day in 2014, but Google also processed hundreds of petabytes (PB) of data per day in the same year.<ref name="VagataScaling14">{{cite web |url=https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/ |title=Scaling the Facebook data warehouse to 300 PB |author=Vagata, P.; Wilfong, K. |work=Facebook Code |publisher=Facebook |date=10 April 2014 |accessdate=27 June 2016}}</ref><ref name="DhavalchandraBig16">{{cite journal |title=Big data—A survey of big data technologies |journal=International Journal Of Science Research and Technology |author=Dhavalchandra, P.; Jignasu, M.; Amit, R. |volume=2 |issue=1 |pages=45–50 |year=2016 |url=http://www.ijsrt.us/vol2issue1.aspx}}</ref> Data production has also increased at a remarkable rate in the healthcare sector; widespread use of digital medical imaging peripherals has triggered this data production. Also, the data generated in the healthcare sector has reached such a point that it cannot be managed easily with traditional data management tools and hardware. Healthcare has accumulated a big data volume by keeping patients’ records, creating medical imaging that helps doctors with diagnoses, outputting digital files from various devices, and creating and storing the results of different surveys. Different types of data sources produce data in various structured and unstructured formats; examples include patient information, laboratory results, X-ray devices, computed tomography (CT) devices, and magnetic resonance imaging (MRI). World population and average human lifespan is apparently increasing continuously, which means an exponential increase in the number of patients to be served. As the number of patients increases, the amount of collected data also increases dramatically. Additionally, exhaustive digital healthcare devices make higher-density graphical outputs easy additions to the growing body of data. In 2011, the amount of data in the healthcare sector in the U.S. reached 150 EB. In 2013, it appeared to have achieved 153 EB. In 2020, it is estimated that this number will reach 2.3 ZB. For example, [[electronic medical record]] (EMR) use has increased 31% from 2001 to 2005 and more than 50% from 2005 to 2008.<ref name="DeanReview09">{{cite journal |title=Review: Use of electronic medical records for health outcomes research: A literature review |journal=Medical Care Research and Review |author=Dean, B.B.; Lam, J.; Natoli, J.L. et al. |volume=66 |issue=6 |pages=611–38 |year=2009 |doi=10.1177/1077558709332440 |pmid=19279318}}</ref><ref name="ErgüzenMedical17">{{cite journal |title=Medical Image Archiving System Implementation with Lossless Region of Interest and Optical Character Recognition |journal=Journal of Medical Imaging and Health Informatics |author=Ergüzen, A.; Erdal, E. |volume=7 |issue=6 |pages=1246-1252 |year=2017 |doi=10.1166/jmihi.2017.2156}}</ref> While neuroimaging operation data sizes had reached approximately 200 GB per year between 1985 and 1989, it has risen to 5 PB annually between 2010 and 2014, yet another indicator of the increase in data in the healthcare sector.<ref name="DinovVolume16">{{cite journal |title=Volume and Value of Big Healthcare Data |journal=Journal of Medical Statistics and Informatics |author=Dinov, I.D. |volume=4 |page=3 |year=2016 |doi=10.7243/2053-7662-4-3 |pmid=26998309 |pmc=PMC4795481}}</ref>
In this way, new problems have emerged due to the increasing volume of data generated in all fields at the global level. Now there are substantial challenges to store and to analyze the data. The storage of data has become costlier than gathering it.<ref name="ElgendyBig14">{{cite book |chapter=Big Data Analytics: A Literature Review Paper |title=Advances in Data Mining. Applications and Theoretical Aspects |series=Lecture Notes in Computer Science |author=Elgendy N.; Elragal A. |editor=Perner, P. |publisher=Springer |volume=8557 |year=2014 |isbn=9783319089768 |doi=10.1007/978-3-319-08976-8_16}}</ref> Thus, the amount of data that is produced, stored, and manipulated has increased dramatically, and because of this increase, big data and data science/knowledge have begun to develop.<ref name="GürsakalBüyük14">{{cite book |title=Büyük Veri |author=Gürsakal, N. |publisher=Dora Yayıncılık |year=2014 |page=2}}</ref> Big data is a reference to the variety, velocity, and volume of data; concerning healthcare records, finding an acceptable approach to cover these issues is particularly difficult to accomplish.
Big data problems in healthcare and the objectives of the study according to the previous arguments are listed as follows:
:1. Increasing number of patients: The global population and average human lifespan are apparently increasing. For example, in Turkey, the number of visits to a physician has increased by about 4% per year since 2012.<ref name="BaşaraSağlık16">{{cite web |url=http://www.deik.org.tr/contents-fileaction-15401 |title=Sağlık İstatistikleri Yıllığı 2016 Haber Bülteni |author=Başara, B.B.; Güler, C. |publisher=Republic of Turkey Ministry of Health General Directorate for Health Research |date=2017}}</ref> Moreover, the total number of per capita visits to a physician in healthcare facilities in 2016 was 8.6 while this value was 8.2 in 2012. As the number of patients increases, the amount of collected data also increases dramatically, which creates much more data to be managed.
:2. Technological devices: Extensively used digital healthcare devices create high-resolution graphical outputs, which means huge amounts of data to be stored.
:3. Expert personnel needs: To manage big data in institutions using software platforms such as Hadoop, Spark, Kubernetes, Elasticsearch, etc., qualified information technology specialists must be brought in to deploy, manage, and store big data solutions.<ref name="RaghupathiBig14">{{cite journal |title=Big data analytics in healthcare: Promise and potential |journal=Health Information Science and Systems |author=Raghupathi, W.; Raghupathi, V. |volume=2 |page=3 |year=2014 |doi=10.1186/2047-2501-2-3 |pmid=25825667 |pmc=PMC4341817}}</ref>
:4. Small file size problem: Current solutions for healthcare, including Hadoop-based solutions, have a block size of 64 MB (detailed in the next section). This leads to vulnerabilities in performance and unnecessary storage usage, called "internal fragmentation," that is difficult to resolve.
:5. [[Hospital information system]]s (HIS): These systems represent comprehensive software and related tools that help healthcare providers produce, store, fetch, and exchange patient [[information]] more efficiently and enable better patient tracking and care. The HIS must have essential non-functional properties like (a) robustness, (b) performance, (c) scalability, and (d) availability. These properties basically depend on a constructed data management architecture, which includes configured hardware devices and installed software tools. A HIS is responsible for solving big data problems alone, though it is much more than an IT project or a traditional application. As such, third-party software tools are needed to achieve the objectives of the healthcare providers.
This study seeks to obtain a mid-layer software platform which will help to address these healthcare gaps. In other words, we have implemented a server-cluster platform to store and to return health digital image data. It acts as a bridge between the HIS and various hardware resources located on the network. There are five primary aims of this study:
:1. to overcome growing data problems by implementing a distributed data layer between the HIS and server-cluster platform;
:2. to reduce operational costs, with no need to employ IT specialists to install and to deploy popular big data solutions;
:3. to implement a new distributed file system architecture to achieve non-functional properties like performance, security, and scalability, which are of crucial importance for a HIS;
:4. to show and prove that there can be different successful big data solutions; and, especially,
:5. to solve these gaps efficiently for our university HIS.
In this study, the first section describes general data processing methods. The second part discusses the work and related literature on the subject, while the third part is the materials and methods section that describes the implemented approach. The last section is the conclusion of the evaluation that emerges as the result of our work.
==Big data architecture in medicine==
Big data solutions, in healthcare worldwide, primarily consist of three different solutions.
The first is a database system, which has two different popular application architectures: relational database management systems (RDBMS) and NoSQL database systems. RDBMSs, as the most widely known and used systems for this purpose, store data in a structured format. The data to be processed must be of the appropriate type and format. In these systems, a single database can serve multiple users and applications. Since these systems are built on vertical growth functionality, the data structure must be defined in advance. As a result, they have a lot of constraints like atomicity, consistency, isolation, and durability. The strict rules that make these systems indispensable are beginning to be questioned today. However, due to the used hardware and software, the initial installation costs are high. Especially when the volume of data increases, the horizontal scalability feature becomes quite unsatisfactory and difficult to manage, which is a major factor of their not being a part of an overall big data solution. Also, these systems are more complex than file systems, which most importantly is not suitable for big data. Due to the deficiency of managing RDBMSs’ big data, NoSQL database systems have emerged as an alternative. The main purpose of these systems is to store the increasing unstructured data volume associated with the internet and to respond to the needs of high-traffic systems via unstructured or semi-structured formats. NoSQL databases are systems that provide high accessibility according to RDBMSs and in which data are easily scaled horizontally.<ref name="KleinApp15">{{cite journal |title=Application-Specific Evaluation of No SQL Databases |journal=Proceeding of the 2015 IEEE International Congress on Big Data |author=Klein, J.; Gorton, I.; Ernst, N. et al. |page=83 |year=2015 |doi=10.1109/BigDataCongress.2015.83}}</ref> Reading and writing performances may be more acceptable than RDBMS. One of the most important features is that they are horizontally expandable. Thousands of servers can work together as a cluster and operate on big data. They are easy to program and manage due to their flexible structures. Another feature of these systems is that they must be doing grid computing in clusters that consist of many machines connected to a network; in this way, data process speeds have increased. However, NoSQL does not yet have as advanced data security features as RDBMSs. Some NoSQL projects are also lacking in documentation and professional technical support. Finally, the concept of "transactions" is not available in NoSQL database systems, meaning loss of data may occur, so they are not suitable for use in banking and financial systems.<ref name="DavazNoSQL14">{{cite web |url=https://blog.kodcu.com/2014/03/nosql-nedir-avantajlari-ve-dezavantajlari-hakkinda-bilgi/ |title=NoSQL Nedir Avantajları ve Dezavantajları Hakkında Bilgi |author=Davaz, S. |work=Kodcu Blog |publisher=Kodcu |date=28 March 2014 |accessdate=13 June 2017}}</ref>
==References==
{{Reflist|colwidth=30em}}
==Notes==
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.
<!--Place all category tags here-->
[[Category:LIMSwiki journal articles (added in 2018)‎]]
[[Category:LIMSwiki journal articles (all)‎]]
[[Category:LIMSwiki journal articles on health informatics‎‎]]

Latest revision as of 17:47, 1 February 2022

Sandbox begins below