Difference between revisions of "Journal:Data science as an innovation challenge: From big data to value proposition"

From LIMSWiki
Jump to navigationJump to search
(Saving and adding more.)
(Saving and adding more.)
Line 88: Line 88:
The starting point for each analytics initiative varies. According to the four points mentioned above, the “state of the art” for each one needs to be assessed individually to estimate the analytics maturity:
The starting point for each analytics initiative varies. According to the four points mentioned above, the “state of the art” for each one needs to be assessed individually to estimate the analytics maturity:


# '''Business need''': From case to case, the precision of the problem description and scope varies. For some cases, the leading question and scope guiding the analysis phase are formulated very precisely, and for other cases it needs to be worked out and refined during the process.
# '''Data''': The data to be used in the project can be defined or an appropriate source is not yet clear. The size and quality of the data essentially determine the progress of the further process. Parameters are, for example, structure (i.e., pre-processing effort) or the size of the data set (e.g., one CSV file or a large database).
# '''Analytics''': Which methods to apply differs from case to case and must be tested and explored.
# '''Infrastructure''': The current (technical) state of the business unit (e.g., own data warehouse, reporting system) or own (human) resources and competencies is a further important aspect in classifying requests.


These four angles can be rated differently with reference to the maturity level of the analytics request. Based on our experience, three scenarios, representing different maturity levels, can be distinguished from these four angles (illustrated in Figure 2).


[[File:Fig2 Kayser TechInnoManRev2018 8-3.png|600px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="600px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 2.''' Classifying analytics requests: Three maturity levels</blockquote>
|-
|}
|}
In scenario 1, the data analysis is motivated by a defined requirement such as market observation during the rollout of a new product. The appropriate data source needs to be identified. The data missing so far implies that the precise analysis cannot be defined and also that there is no existing infrastructure. Ideas need to be developed as to which data sources could be relevant and which issues can be resolved on this basis. Then, different methods from data analysis are applied to generate new insights.
In scenario 2, the data source and infrastructure are clearly defined, and the specific questions need to be identified. One application is assessing the contribution of a specific data source that has not been professionally analyzed so far, for example, by means of machine learning. For instance, the business unit has an internal database, considers new methods, and wants to further develop a business intelligence system by adding a forecasting component. In this case, the scope is clearer than in the first scenario, and straight away an exploratory data analysis can be started.
In scenario 3, there is a precise analytical problem that needs to be professionalized. A first draft shows promising results, and the solution can, as a next step, be upscaled. Guidance in making architectural decisions is needed.
These three scenarios are exemplary starting points for analytics projects. The following section describes the implications for the innovation process and outlines different challenges and barriers.
===The analytics process===
To succeed with analytics, the process from data to value must be structured to be integrated in the existing organization. For example, Braganza and colleagues<ref name="BraganzaResource17" /> examine the management of organizational resources in big data initiatives. They stress the importance of systematic approaches and processes to operationalize big data.
Related work on analytics processes has a focus on service design<ref name="MeierhoferFromData17">{{cite journal |title=From Data Science to Value Creation |journal=Proceedings from IESS 2017: Exploring Services Science |author=Meierhofer, J.; Meier, K. |pages=173–81 |year=2017 |doi=10.1007/978-3-319-56925-3_14}}</ref> or concentrates on the methodical part of analyzing data.<ref name="CielenInto16">{{cite book |title=Introducing Data Science: Big Data, Machine Learning, and more, using Python tools |author=Cielen, D.; Meysman, A.; Ali, M. |publisher=Manning Publications |pages=320 |year=2016 |isbn=9781633430037}}</ref> The process, as introduced by Braganza and colleagues<ref name="BraganzaResource17" />, is too linear and does not address the systemic complexity of data analysis and necessary stakeholder discourse. To cover these issues, structuring the analytics process can be linked to the classic linear innovation process.<ref name="CooperStage90" /><ref name="SalernoInnov15">{{cite journal |title=Innovation processes: Which process for which project? |journal=Technovation |author=Salerno, M.S.; de Vasconcelos Gomes, L.A.; da Silva, D.E. et al. |volume=35 |pages=59–70 |year=2015 |doi=10.1016/j.technovation.2014.07.012}}</ref>
In our work, to guide the analytics process from ideation, scoping, and identifying a data set to value generation, a process with four phases is introduced. Taking the classic innovation funnel as starting point, this concept is transferred to the context of analytics. The process is divided in four parts: i) the generation of ideas, ii) the development of proof of concepts (PoCs) to test these ideas, iii) the implementation and testing of successful PoCs, and, finally, iv) making them available as a product or service. Based on a first idea or requirement, the process is initialized, while the number of ideas or projects is reduced within each phase. Each phase has tasks, as well as barriers or filters, that need to be successfully addressed to continue in the process chain.
The three scenarios described above are assessed differently concerning their maturity, as illustrated in the process in Figure 3. Scenario 1 is in a very early stage of idea generation and many open questions need to be addressed. Scenario 2 is more concrete and many more issues are resolved than in scenario 1. However, initiating questions need to be developed before a PoC can be conducted. Scenario 3 builds on a running system, so it is located in the phase of testing and operationalization (phase three).
[[File:Fig3 Kayser TechInnoManRev2018 8-3.png|800px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="800px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 3.''' Phases of the analytics process</blockquote>
|-
|}
|}
For each phase, different challenges arise. While related work emphasizes data-related challenges such as data acquisition, cleansing, or aggregation<ref name="SivarajahCritical17">{{cite journal |title=Critical analysis of Big Data challenges and analytical method |journal=Journal of Business Research |author=Sivarajah, U.; Kamal, M.M.; Irani, Z. et al. |volume=70 |pages=263–86 |year=2017 |doi=10.1016/j.jbusres.2016.08.001}}</ref>, this work focuses on process challenges.


==References==
==References==

Revision as of 18:37, 24 July 2018

Full article title Data science as an innovation challenge: From big data to value proposition
Journal Technology Innovation Management Review
Author(s) Kayser, Victoria; Nehrke, Bastian; Zubovic, Damir
Author affiliation(s) Ernst & Young
Year published 2018
Volume and issue 8(3)
Page(s) 16–25
DOI 10.22215/timreview/1143
ISSN 1927-0321
Distribution license Creative Commons Attribution 3.0 Unported
Website https://timreview.ca/article/1143
Download https://timreview.ca/sites/default/files/article_PDF/Kayser_et_al_TIMReview_March2018.pdf (PDF)

Abstract

Analyzing “big data” holds huge potential for generating business value. The ongoing advancement of tools and technology over recent years has created a new ecosystem full of opportunities for data-driven innovation. However, as the amount of available data rises to new heights, so too does complexity. Organizations are challenged to create the right contexts, by shaping interfaces and processes, and by asking the right questions to guide the data analysis. Lifting the innovation potential requires teaming and focus to efficiently assign available resources to the most promising initiatives. With reference to the innovation process, this article will concentrate on establishing a process for analytics projects from first ideas to realization (in most cases, a running application). The question we tackle is: what can the practical discourse on big data and analytics learn from innovation management? The insights presented in this article are built on our practical experiences in working with various clients. We will classify analytics projects as well as discuss common innovation barriers along this process.

Keywords: analytics, big data, digital innovation, idea generation, innovation process

Introduction

Understandably, much effort is being expended into analyzing “big data” to unleash its potentially enormous business value.[1][2] New data sources evolve, and new techniques for storing and analyzing large data sets are enabling many new applications, but the exact business value of any one big data application is often unclear. From a practical viewpoint, organizations still struggle to use data meaningfully or they lack the right competencies. Different types of analytics problems arise in an organizational context, depending on whether the starting point is a precise request from a department that only lacks required skills or capabilities (e.g., machine learning) or rather it stems from a principal interest in working with big data (e.g., no own infrastructure, no methodical experience). So far, clear strategies and process for value generation from data are often missing.

Much literature addresses the technical and methodical implementation, the transformative strength of big data[3], the enhancement of firm performance by building analytics capability[4], or other managerial issues[5][1] Little work covers the transformation process from first ideas to ready analytics applications or in building analytics competence. This article seeks to address this gap.

Analytics initiatives have several unique features. First, they require an exploratory approach—the analysis does not start with specific requirements as in other projects but rather with an idea or data set. To assess the contribution, ideation techniques and rapid prototyping are applied. This exploration plays a key role in developing a shared understanding and giving a big data initiative a strategic direction. Second, analytics projects in their early phase are bound to a complex interplay between different stakeholder interests, competencies, and viewpoints. Learning is an integral part of these projects to build experience and competence with analytics. Third, analytics projects run in parallel to the existing information technology (IT) infrastructure and deliver short scripts or strategic insights, which are then installed in larger IT projects. Due to a missing end-to-end target, data is not only to be extracted, transformed, and loaded, but also needs to be identified, classified, and partly structured. So, a general process for value generation needs to be established to guide analytics projects and address these issues.

Here, we propose an exact configuration and series of steps to guide a big data analytics project. The lack of specified requirements and defined project goals in a big data analytics project (compared to a classic analytics project) make it challenging to structure the analytics process. Therefore, the linear innovation process serves as reference and orientation.[6] As Braganza and colleagues[7] describe, for big data to be successfully integrated and implemented in an organization, clear and repeatable processes are required. Nevertheless, each analytics initiative is different and the process needs to flexible. Unfortunately, the literature rarely combines challenges in the analytics process with concepts from innovation management. Nevertheless, an integration of the concepts from innovation management could guide the analytics work of formulating digital strategies, organizational anchoring of the analytics units and their functions, and designing the analytics portfolio, as well as the underlying working principles (e.g., rapid prototyping, ideation techniques).

Thus, in this article, we will concentrate on the question of what the practical discourse and work on analytics respectively implementing big data in organizations can learn from innovation management. A process for analytics innovation is introduced to guide the process from ideation to value generation. Emphasis is put on challenges during this process as well as different entry points. Thereby, we build on experience and insights from a number of analytics projects for different sectors and domains to derive recommendations for successfully implementing analytics solutions.

We begin with a definition of big data and analytics. Next, we propose a process for a structured approach to retrieving value from data. Finally, we discuss the results and outline directions for future research.

Big data and analytics

In this section, we address the elementary angles from which the analytics value chain should be looked at (Figure 1): data, infrastructure, and analytics–and the business need as the driver. According to our understanding, value is generated by analyzing data within a certain context, with a problem statement related to a business requirement driving the need for innovation. Besides expertise in conducting data and analytics projects, this process requires a working infrastructure, especially when volume, velocity, or variety of data to be analyzed exceeds certain limits. Below, we describe the three technical angles in more detail.


Fig1 Kayser TechInnoManRev2018 8-3.png

Figure 1. Framework of data, infrastructure, analytics and business need

Data

Big data is often defined with volume (how much data), velocity (speed of data generation), and variety as the diversity of data types.[8][9] Big data describes data collections of a size difficult to process with traditional data management techniques. While many definitions of big data concentrate on the aspect of volume referring to the scale of data available, big data brings in particular heterogeneous formats and a broad spectrum of possible data sources. Examples are structured numeric data or unstructured data such as text, images, or videos. This variety and broad landscape of data sources offers many opportunities for generating insights. Moreover, the speed of data creation enables rapid insights in ongoing developments.

Recent technical improvements (e.g., cloud computing, big data architectures) enable data to be analyzed and stored on a large scale. For many (new) types of data, their exact business value is unclear so far and requires systematic exploration. Available data is often messy, and even when cleaned up can be overwhelming and too complex to be easily understood, even by professional data scientists. The contribution of data is, of course, context specific and varies among business cases and applications. One key challenge is to identify data that best meets the business requirement.

Analytics

Data science is concerned with knowledge generation from data. Analytics or data science addresses the exploration of data sets with different quantitative methods motivated from statistical modelling[10] or machine learning.[11] Methods from different disciplines such as statistics, economics, or computer science find application to identify patterns, influence factors, or dependencies. In contrast to business intelligence, analytics reaches further than descriptive analytics (based on SQL) and often has a predictive component. Which method to apply depends on the exact business case. Analyzing data is restricted, for example, by a company’s internal policies as well as legal restrictions and guidelines that vary among countries. Data quality and reliability are further issues. Data understanding and domain knowledge are key prerequisites in the analysis process (e.g., Waller & Fawcett[12]), especially when model assumptions are made.

Concerning data analysis, there are primarily the following opportunities for organizations:

  • Improved analysis of internal data: One example is forecasting methods that enhance expert-based planning approaches by additional figures. These methods build on existing databases such as business intelligence systems, and they contribute new or further insights to internal firm processes.
  • Putting data together in new ways: New combinations of data sets offer new insights, for example, through the combination of sensor data and user profiles.
  • Opening up to new or (so far) unused data sources (e.g., websites, open data) to identify potential for generating new insights: However, a context or application is necessary to use the data. One example is social media data used for market observation.

However, the core problem of analytics is to work out the guiding question and achieve a match between business need, data source, and analysis as discussed later in the article.

IT infrastructure

Relevant for the successful implementation of analytics is the adaption of the IT infrastructure to embed analytics solutions and integrate different data sources. The core layers of an IT infrastructure are the:

  1. Data ingestion layer: This layer covers the data transfer from a source system to an analytics environment. Therefore, a toolset and a corresponding process need to be defined. Traditional extract, transform, load (ETL) tools and relational databases are combined with Hadoop/big data setups covering, in particular, scenarios caused by less structured, high-volume, or streamed data. Analytics use cases build on data from data warehouses to fully unstructured data. This breadth challenges classic architectures and requires adaptable schemes. Which data sources to integrate depends on the specific application.
  2. Data value exploration layer: Based on the business need and corresponding use case, data is investigated, tested, and sampled in this layer. Depending on the complexity and business question, an appropriate analytics scheme is developed. Business and explorative analysis based on online analytical processing (OLAP) models in memory technologies are supplemented or expanded by using advanced analytics methods and integrating (e.g., R or Python plugins).
  3. Data consumption layer: Here, the results are used for visualization, for example. The end user can consume the data or service without deep technical understanding (e.g., for self-service business intelligence).

Modern approaches require structures that are adaptable and scalable to different requirements and data sources. Factors such as system performance, cost efficiency, and overall enterprise infrastructure strategy must be taken into consideration.

From data to value: Turning ideas into applications

Organizations still struggle to use data meaningfully or lack the right competencies. One of the key challenges in analytics projects is identifying the business need and the guiding questions. Principally, different types of analytics problems arise in an organizational context ranging from precise requests that only lack specific capabilities to a principal interest in working with big data (e.g., no own infrastructure, expert-based approaches). This approach implies different starting points for the analytics process and different innovation pathways, both of which are described later in this article.

What is the starting point?

The starting point for each analytics initiative varies. According to the four points mentioned above, the “state of the art” for each one needs to be assessed individually to estimate the analytics maturity:

  1. Business need: From case to case, the precision of the problem description and scope varies. For some cases, the leading question and scope guiding the analysis phase are formulated very precisely, and for other cases it needs to be worked out and refined during the process.
  2. Data: The data to be used in the project can be defined or an appropriate source is not yet clear. The size and quality of the data essentially determine the progress of the further process. Parameters are, for example, structure (i.e., pre-processing effort) or the size of the data set (e.g., one CSV file or a large database).
  3. Analytics: Which methods to apply differs from case to case and must be tested and explored.
  4. Infrastructure: The current (technical) state of the business unit (e.g., own data warehouse, reporting system) or own (human) resources and competencies is a further important aspect in classifying requests.

These four angles can be rated differently with reference to the maturity level of the analytics request. Based on our experience, three scenarios, representing different maturity levels, can be distinguished from these four angles (illustrated in Figure 2).


Fig2 Kayser TechInnoManRev2018 8-3.png

Figure 2. Classifying analytics requests: Three maturity levels

In scenario 1, the data analysis is motivated by a defined requirement such as market observation during the rollout of a new product. The appropriate data source needs to be identified. The data missing so far implies that the precise analysis cannot be defined and also that there is no existing infrastructure. Ideas need to be developed as to which data sources could be relevant and which issues can be resolved on this basis. Then, different methods from data analysis are applied to generate new insights.

In scenario 2, the data source and infrastructure are clearly defined, and the specific questions need to be identified. One application is assessing the contribution of a specific data source that has not been professionally analyzed so far, for example, by means of machine learning. For instance, the business unit has an internal database, considers new methods, and wants to further develop a business intelligence system by adding a forecasting component. In this case, the scope is clearer than in the first scenario, and straight away an exploratory data analysis can be started.

In scenario 3, there is a precise analytical problem that needs to be professionalized. A first draft shows promising results, and the solution can, as a next step, be upscaled. Guidance in making architectural decisions is needed.

These three scenarios are exemplary starting points for analytics projects. The following section describes the implications for the innovation process and outlines different challenges and barriers.

The analytics process

To succeed with analytics, the process from data to value must be structured to be integrated in the existing organization. For example, Braganza and colleagues[7] examine the management of organizational resources in big data initiatives. They stress the importance of systematic approaches and processes to operationalize big data.

Related work on analytics processes has a focus on service design[13] or concentrates on the methodical part of analyzing data.[14] The process, as introduced by Braganza and colleagues[7], is too linear and does not address the systemic complexity of data analysis and necessary stakeholder discourse. To cover these issues, structuring the analytics process can be linked to the classic linear innovation process.[6][15]

In our work, to guide the analytics process from ideation, scoping, and identifying a data set to value generation, a process with four phases is introduced. Taking the classic innovation funnel as starting point, this concept is transferred to the context of analytics. The process is divided in four parts: i) the generation of ideas, ii) the development of proof of concepts (PoCs) to test these ideas, iii) the implementation and testing of successful PoCs, and, finally, iv) making them available as a product or service. Based on a first idea or requirement, the process is initialized, while the number of ideas or projects is reduced within each phase. Each phase has tasks, as well as barriers or filters, that need to be successfully addressed to continue in the process chain.

The three scenarios described above are assessed differently concerning their maturity, as illustrated in the process in Figure 3. Scenario 1 is in a very early stage of idea generation and many open questions need to be addressed. Scenario 2 is more concrete and many more issues are resolved than in scenario 1. However, initiating questions need to be developed before a PoC can be conducted. Scenario 3 builds on a running system, so it is located in the phase of testing and operationalization (phase three).


Fig3 Kayser TechInnoManRev2018 8-3.png

Figure 3. Phases of the analytics process

For each phase, different challenges arise. While related work emphasizes data-related challenges such as data acquisition, cleansing, or aggregation[16], this work focuses on process challenges.

References

  1. 1.0 1.1 McAfee, A.; Byrnjolfsson, E. (2012). "Big data: The management revolution". Harvard Business Review 90 (10): 60–8. PMID 23074865. 
  2. Wamba, S.F.; Gunasekaran, A.; Akter, S. et al. (2017). "Big data analytics and firm performance: Effects of dynamic capabilities". Journal of Business Research 70: 356–65. doi:10.1016/j.jbusres.2016.08.009. 
  3. Wamba, S.F.; Akter, S.; Edwards, A. et al. (2015). "How ‘big data’ can make big impact: Findings from a systematic review and a longitudinal case study". International Journal of Production Economics 165: 234-46. doi:10.1016/j.ijpe.2014.12.031. 
  4. Akter, S.; Wamba, S.F.; Gunasekaran, A. et al. (2016). "How to improve firm performance using big data analytics capability and business strategy alignment?". International Journal of Production Economics 182: 113–31. doi:10.1016/j.ijpe.2016.08.018. 
  5. Devenport, T.H.; Harris, J.G. (2007). Competing on Analytics: The New Science of Winning. Harvard Business School Press. pp. 240. ISBN 9781422103326. 
  6. 6.0 6.1 Cooper, R.G. (1990). "Stage-gate systems: A new tool for managing new products". Business Horizons 33 (3): 44–54. doi:10.1016/0007-6813(90)90040-I. 
  7. 7.0 7.1 7.2 Braganza, A.; Brooks, L.; Nepelski, D. et al. (2017). "Resource management in big data initiatives: Processes and dynamic capabilities". Journal of Business Research 70: 328–37. doi:10.1016/j.jbusres.2016.08.006. 
  8. Philip Chen, C.L.; Zhang, C.Y. (2014). "Data-intensive applications, challenges, techniques and technologies: A survey on Big Data". Information Sciences 275: 314-347. doi:10.1016/j.ins.2014.01.015. 
  9. Gandomi, A.; Haider, M. (2015). "Beyond the hype: Big data concepts, methods, and analytics". International Journal of Information Management 35 (2): 137–44. doi:10.1016/j.ijinfomgt.2014.10.007. 
  10. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. (2015). An Introduction to Statistical Learning with Applications in R (6th ed.). Springer. doi:9781461471387. 
  11. Mitchell, T.M. (1997). Machine Learning (1st ed.). McGraw-Hill Education. doi:9780070428072. 
  12. Waller, M.A.; Fawcett, S.E. (2013). "Data Science, Predictive Analytics, and Big Data: A Revolution That Will Transform Supply Chain Design and Management". Journal of Business Logistics 34 (2): 77–84. doi:10.1111/jbl.12010. 
  13. Meierhofer, J.; Meier, K. (2017). "From Data Science to Value Creation". Proceedings from IESS 2017: Exploring Services Science: 173–81. doi:10.1007/978-3-319-56925-3_14. 
  14. Cielen, D.; Meysman, A.; Ali, M. (2016). Introducing Data Science: Big Data, Machine Learning, and more, using Python tools. Manning Publications. pp. 320. ISBN 9781633430037. 
  15. Salerno, M.S.; de Vasconcelos Gomes, L.A.; da Silva, D.E. et al. (2015). "Innovation processes: Which process for which project?". Technovation 35: 59–70. doi:10.1016/j.technovation.2014.07.012. 
  16. Sivarajah, U.; Kamal, M.M.; Irani, Z. et al. (2017). "Critical analysis of Big Data challenges and analytical method". Journal of Business Research 70: 263–86. doi:10.1016/j.jbusres.2016.08.001. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation and grammar. In some cases important information was missing from the references, and that information was added. The original article lists references alphabetically, but this version — by design — lists them in order of appearance.