Journal:The development of data science: Implications for education, employment, research, and the data revolution for sustainable development

From LIMSWiki
Revision as of 20:07, 16 July 2018 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title The development of data science: Implications for education, employment, research, and the data revolution for sustainable development
Journal Big Data and Cognitive Computing
Author(s) Murtagh, Fionn; Devlin, Keith
Author affiliation(s) University of Huddersfield, Stanford University
Primary contact Email: fmurtagh at acm dot org
Year published 2018
Volume and issue 2(2)
Page(s) 14
DOI 10.3390/bdcc2020014
ISSN 2504-2289
Distribution license Creative Commons Attribution 4.0 International
Website http://www.mdpi.com/2504-2289/2/2/14/htm
Download http://www.mdpi.com/2504-2289/2/2/14/pdf (PDF)

Abstract

In data science, we are concerned with the integration of relevant sciences in observed and empirical contexts. This results in the unification of analytical methodologies, and of observed and empirical data contexts. Given the dynamic nature of convergence, the origins and many evolutions of the data science theme are described. The following are covered in this article: the rapidly growing post-graduate university course provisioning for data science; a preliminary study of employability requirements; and how past eminent work in the social sciences and other areas, certainly mathematics, can be of immediate and direct relevance and benefit for innovative methodology, and for facing and addressing the ethical aspect of big data analytics, relating to data aggregation and scale effects. Associated also with data science is how direct and indirect outcomes and consequences of data science include decision support and policy making, and both qualitative as well as quantitative outcomes. For such reasons, the importance is noted of how data science builds collaboratively on other domains, potentially with innovative methodologies and practice. Further sections point towards some of the major current research issues.

Keywords: big data training and learning, company and business requirements, ethics, impact, decision support, data engineering, open data, smart homes, smart cities, IoT

1. Introduction: Data science as the convergence and bridging of disciplines

The context of our problem solving and analytics will always be quite fundamental, very specific, and particularly oriented. (Section 4 of this article draws some interesting and relevant implications of this.) This article is oriented towards commonality and mutual influence of methodologies, and of analytical processes and procedures. A nice example of the parallel nature of such things is how "big data analytics" is often considered a synonym of "data science." In Section 2.2, it is mentioned how public transport may well use smartphone and mobile phone wireless connection data to observe locations of individuals. This close association or, perhaps even, identity of big data analytics and data science will have growing importance with the internet of things (IoT), and smart cities and smart homes, and so on (as noted in Section 8). The McKinsey Global Institute provided an outstanding perspective on this idea in their paper The age of analytics: Competing in a data-driven world.[1]

In Section 8 and Section 9 of this article, very important developments are at issue, encompassing newly oriented and pursued methodologies, and the integration of research domains. Section 7 notes how important all of the content here is to sustainable development. The phrase "data revolution" is based here on ongoing work by the United Nations, and by so many of us in this domain, and from national authorities in Africa and the Middle East discussing issues here at the most recent (2017) World Statistics Congress.

This converging and bridging of disciplines is increasingly important. For example, Mahabal et al.[2] discuss the parallels between astronomy and Earth science data, methodology transfer, and metadata and ontologies characterized as being crucial. They claim the convergence or bridging of disciplines must address “non-homogeneous observables, and varied spatial, temporal coverage at different resolutions.”[2] This quotation is very familiar to us in regard to how NoSQL databases are now widely used, as well as traditional relational databases. Another example is how text mining, social media, and many other domains have become so very important in many contexts. Then, given computational support, “it is the complexity more than the data volume that proves to be a bigger challenge.”[2] Further benefits of this data science convergence are termed here "tractability" and "reproducibility." Mahabal et al.[2] also discuss the complexity relating to resolution and distributions. In a separate work, Murtagh[3] characterized this in terms of data encoding. Plenty of work now emphasizes the importance of p-adic data encoding (binary or ternary when p = 2 or 3), compared with real-valued encoding (m-adic, especially when m = 10).

The convergence and bridging of disciplines is fully emphasized by Mahabal et al. as such[2]:

Methodology transfer can almost never be unidirectional. Diverse fields grow by learning tricks employed by other disciplines. The important thing is to abstract data—described by meaningful metadata—and the metadata in turn connected by a good ontology.

Further description is at issue in regard to collaboration in data science[2]:

We have described here a few techniques from astroinformatics that are finding use in geoinformatics. There would be many from earth science that space science would do well to emulate. Even other disciplines like bioinformatics provide ample opportunities for methodology transfer and collaboration. With growing data volumes, and more importantly the increasing complexity, data science is our only refuge. Collaboration in data science will be beneficial to all sciences.

2. Historical development of data science and some contemporary examples of cross-disciplinarity

A short historical perspective that follows is with reference to such disciplines as computer and information sciences, mathematics and statistics, physics, and, implicitly, social sciences. In concluding this description, a key point will be how data science encompasses and embraces all of the following: cross-disciplinarity, interdisciplinarity, and multidisciplinarity.

2.1 Historical prominence of data science in recent times

The origins of data science are largely due to Chikio Hayashi and others. Hayashi[4] says “I will present 'data science' as a new concept,” followed by a relevant introduction to the science of data: “Data Science consists of three phases: design for data, collection of data and analysis on data.”[4] In Ohsumi[5], the abstract has this: “In 1992, the author argued the urgency of the need to grasp the concept 'data science'. Despite the emergence of concepts such as data mining, this issue has not been addressed.”


References

  1. Henke, N.; Bughin, J.; Chui, M. et al. (December 2016). "The age of analytics: Competing in a data-driven world". McKinsey & Company. pp. 136. https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world. Retrieved 18 June 2018. 
  2. 2.0 2.1 2.2 2.3 2.4 2.5 Mahabal, A.A.; Crichton, D.; Djorgovki, S.G. et al. (2017). "From Sky to Earth: Data Science Methodology Transfer". Proceedings of the International Astronomical Union: 1–10. doi:10.1017/S1743921317000060. 
  3. Murtagh, F. (2017). Data Science Foundations: Geometry and Topology of Complex Hierarchic Systems and Big Data Analytics. CRC Press. pp. 206. ISBN 9781498763936. 
  4. 4.0 4.1 Hayashi, C. (1998). "What is Data Science? Fundamental concepts and a heuristic example". In Hayashi, C.; Yajima, K.; Bock H.H. et al.. Data Science, Classification, and Related Methods. Springer. pp. 40–51. ISBN 9784431702085. 
  5. Ohsumi, N. (2000). "From data analysis to data science". In Kiers, H.A.L.; Rasson, J.-P.; Groenen, P.J.F. et al.. Data Science, Classification, and Related Methods. Springer. pp. 329–34. ISBN 9783540675211. 

Notes

This presentation is faithful to the original, with only a few minor changes to grammar, spelling, and presentation, including the addition of PMCID and DOI when they were missing from the original reference. The original inline citation method was unorthodox; these inline citations have been made clearer with the addition of the author of the citation.