Difference between revisions of "Journal:Explainability for artificial intelligence in healthcare: A multidisciplinary perspective"

Full article title	Explainability for artificial intelligence in healthcare: A multidisciplinary perspective
Journal	BMC Medical Informatics and Decision Making
Author(s)	Amann, Julia; Blasimme Allesandro; Vayena, Effy; Frey, Dietmar; Madai, Vince I.; Precise4Q Consortium
Author affiliation(s)	ETH Zürich, Charité – Universitätsmedizin Berlin, Birmingham City University
Primary contact	Online contact form
Year published	2020
Volume and issue	20
Page(s)	310
DOI	10.1186/s12911-020-01332-6
ISSN	1472-6947
Distribution license	Creative Commons Attribution 4.0 International
Website	https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-020-01332-6
Download	https://bmcmedinformdecismak.biomedcentral.com/track/pdf/10.1186/s12911-020-01332-6.pdf (PDF)

Revision as of 21:32, 28 December 2020

This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed.

Abstract

Background: Explainability is one of the most heavily debated topics when it comes to the application of artificial intelligence (AI) in healthcare. Even though AI-driven systems have been shown to outperform humans in certain analytical tasks, the lack of explainability continues to spark criticism. Yet, explainability is not a purely technological issue; instead, it invokes a host of medical, legal, ethical, and societal questions that require thorough exploration. This paper provides a comprehensive assessment of the role of explainability in medical AI and makes an ethical evaluation of what explainability means for the adoption of AI-driven tools into clinical practice.

Methods: Taking AI-based clinical decision support systems as a case in point, we adopted a multidisciplinary approach to analyze the relevance of explainability for medical AI from the technological, legal, medical, and patient perspectives. Drawing on the findings of this conceptual analysis, we then conducted an ethical assessment using Beauchamp and Childress' Principles of Biomedical Ethics (autonomy, beneficence, nonmaleficence, and justice) as an analytical framework to determine the need for explainability in medical AI.

Results: Each of the domains highlights a different set of core considerations and values that are relevant for understanding the role of explainability in clinical practice. From the technological point of view, explainability has to be considered both in terms of how it can be achieved and what is beneficial from a development perspective. When looking at the legal perspective, we identified informed consent, certification, and approval as medical devices, and liability as core touchpoints for explainability. Both the medical and patient perspectives emphasize the importance of considering the interplay between human actors and medical AI. We conclude that omitting explainability in clinical decision support systems poses a threat to core ethical values in medicine and may have detrimental consequences for individual and public health.

Conclusions: To ensure that medical AI lives up to its promises, there is a need to sensitize developers, healthcare professionals, and legislators to the challenges and limitations of opaque algorithms in medical AI and to foster multidisciplinary collaboration moving forward.

Background

All over the world, healthcare costs are skyrocketing. Increasing life expectancy, soaring rates of chronic diseases, and the continuous development of costly new therapies contribute to this trend. Thus, it comes as no surprise that scholars predict a grim future for the sustainability of healthcare systems throughout the world. Artificial intelligence (AI) promises to alleviate the impact of these developments by improving healthcare and making it more cost-effective.^[1] In clinical practice, AI often comes in the form of clinical decision support systems (CDSSs), assisting clinicians in diagnosis of disease and treatment decisions. Where conventional CDSSs match the characteristics of individual patients to an existing knowledge base, AI-based CDSSs apply artificial intelligence models trained on data from patients matching the use-case at hand. Yet, despite its undeniable potential, AI is not a universal solution. As history has shown, technological progress always goes hand in hand with novel questions and significant challenges. Some of these challenges are tied to the technical properties of AI, while others relate to the legal, medical, and patient perspectives, making it necessary to adopt a multidisciplinary perspective.

In this paper, we take such a multidisciplinary view on a major medical AI challenge: explainability. In its essence, explainability can be understood as a characteristic of an AI-driven system allowing a person to reconstruct why a certain AI came up with the predictions it offered. An important point to note here is that explainability has many facets and, unfortunately, the terminology of explainability is not well defined. Other terms such as interpretability and/or transparency are often used synonymously.^[2]^[3] We thus simply refer to explainability or explainable AI throughout the manuscript and add the necessary context for understanding.

Explainability is a heavily debated topic with far-reaching implications that extend beyond the technical properties of AI. Even though research indicates that AI algorithms can outperform humans in certain analytical tasks (e.g., pattern recognition in imaging), the lack of explainability for AI in the medical domain has been criticized.^[4] Legal and ethical uncertainties surrounding this issue may impede progress and prevent novel technologies from fulfilling their potential to improve patient and population health. Yet, without thorough consideration of the role of explainability in medical AI, these technologies may forgo core ethical and professional principles, disregard regulatory issues, and cause considerable harm.^[5]

To contribute to the discourse on explainable AI in medicine, this paper seeks to draw attention to the interdisciplinary nature of explainability and its implications for the future of healthcare. In particular, our work focuses on the relevance of explainability for a CDSS. The originality of our work lies in the fact that we look at explainability from multiple perspectives that are often regarded as independent and separable from each other. This paper has two central aims: (1) to provide a comprehensive assessment of the role of explainability in CDSSs for use in clinical practice and; (2) to make an ethical evaluation of what explainability means for the adoption of AI-driven tools into clinical practice.

Methods

Taking AI-based CDSSs as a case in point, we discuss the relevance of explainability for medical AI from the technological, legal, medical, and patient perspective. To this end, we performed a conceptual analysis of the pertinent literature on explainable AI in these domains. In our analysis, we aimed to identify aspects relevant to determining the necessity and role of explainability for each domain, respectively. Drawing on these different perspectives, we then conclude by distilling the ethical implications of explainability for the future use of AI in the healthcare setting. We do the latter by examining explainability against the four ethical principles of autonomy, beneficence, non-maleficence, and justice.

Results

Technological perspective

From the technological perspective, we will explore two issues. First, what explainability methods are, and second, where they are applied in medical AI development.

With regards to methodology, explainability can either be an inherent characteristic of an algorithm or can be approximated by other methods.^[2] The latter is highly important for methods that have until recently been labeled as “black-box models,” such as artificial neural network (ANN) models. To explain their predictions, however, numerous methods exist today.^[6] Importantly, however, inherent explainability will, in general, be more accurate than methods that only approximate explainability.^[2] This can be attributed to the complex characteristics of many modern machine learning methods. In ANNs, for example, the inner workings of sometimes millions of weights between artificial neurons need to be interpreted in a way that humans can understand. Thus, contrasting methods with inherent explainability have a crucial advantage. However, these methods are usually also traditional methods, such as linear or logistic regression. For many use cases, there is an inferiority of these traditional methods in performance compared to modern state-of-the-art methods such as ANNs.^[7] Thus, there is a trade-off between performance and explainability, and this trade-off is a big challenge for the developers of CDSSs. It should be noted that some assume that this trade-off does not exist in reality, but it is a mere artifact of suboptimal modelling approaches, as pointed out by Rudin et al.^[2] While the work of Rudin et al. is important to raise attention to the shortcomings of approximating explainability methods, it is likely that some approximating methods, in contrast to the notion of Rudin et al.^[2] , have value given the complex nature of explaining machine learning models. Additionally, while we can make the qualitative assessment that inherent explainability is likely better than approximated explainability, there exist only initial exploratory attempts to rank explainability methods quantitatively.^[8] Notwithstanding, for many applications—and generally in AI product development—there is a de facto preference for modern algorithms such as ANNs. Additionally, it cannot be ruled out that for some applications, such modern methods do exhibit actual higher performance. This necessitates to critically assess explainability methods further, both with regards to technical development, e.g., for methods ranking and optimization of methods for certain inputs, and with regards to the role of explainability from a multiple stakeholder view, as done in the current work.

From the development point-of-view, explainability will regularly be helpful for developers to sanity check their AI models beyond mere performance. For example, it is highly beneficial to rule out that the prediction performance is based on metadata rather than the data itself. A famous non-medical example was the classification task to discern between huskies and wolves, where the prediction was solely driven by the identification of a snowy background rather than real differences between huskies and wolves.^[9] This phenomenon is also called a “Clever Hans” phenomenon.^[10] Clever Hans phenomena are also found in medicine. An example is the model developed by researchers from Mount Sinai Health System which performed very well in distinguishing high-risk patients from non-high-risk patients based on x-ray imaging. However, when the tool was applied outside of Mount Sinai, the performance plummeted. As it turned out, the AI model did not learn clinically relevant information from the images. In analogy to the snowy background in the example introduced above, the prediction was based on hardware related metadata tied to the specific x-ray machine that was used to image the high-risk ICU patients exclusively at Mount Sinai.^[11] Thus, the system was able to distinguish only which machine was used for imaging and not the risk of the patients. Explainability methods allow developers to identify these types of errors before AI tools go into clinical validation and the certification process, as the Clever Hans predictors (snowy background, hardware information) would be identified as prediction relevant by the explainability methods rather than meaningful features from a domain perspective. This saves time and development costs. It should be noted that explainability methods aimed at developers to provide insight into their models have different prerequisites than systems aimed at technologically unsavvy end users such as clinical doctors and patients. For developers, these methods can be more complex in their approach and visualization.

Legal perspective

Medical perspective

Patient perspective

Ethical implications

Conclusion

References

↑ Higgins, D.; Madai, V.I. (2020). "From Bit to Bedside: A Practical Framework for Artificial Intelligence Product Development in Healthcare". Advanced Intelligent Systems 2 (10): 2000052. doi:10.1002/aisy.202000052.
↑ ^2.0 ^2.1 ^2.2 ^2.3 ^2.4 Rudin, C. (2019). "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead". Nature Machine Intelligence 1: 206–15. doi:10.1038/s42256-019-0048-x.
↑ Doran, D.; Schulz, S.; Besold, T.R. (2017). "What Does Explainable AI Really Mean? A New Conceptualization of Perspectives". arXiv. https://arxiv.org/abs/1710.00794v1.
↑ Shortliffe E.H.; Sepúlveda, M.J. (2018). "Clinical Decision Support in the Era of Artificial Intelligence". JAMA 320 (21): 2199–2200. doi:10.1001/jama.2018.17163.
↑ Obermeyer, Z.; Powers, B.; Vogeli, C. et al. (2019). "Dissecting racial bias in an algorithm used to manage the health of populations". Science 366 (6464): 447-453. doi:10.1126/science.aax2342.
↑ Samek, W.; Montavon, G.; Vedaldi, A. et al., ed. (2019). Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer Nature. doi:10.1007/978-3-030-28954-6. ISBN 9783030289546.
↑ Esteva, A.; Robicquet, A.; Ramsundar, B. et al. (2019). "A guide to deep learning in healthcare". Nature Medicine 25 (1): 24-29. doi:10.1038/s41591-018-0316-z. PMID 30617335.
↑ Islam, S.R.; Eberle, W.; Ghafoor, S.K. (2019). "Towards Quantification of Explainability in Explainable Artificial Intelligence Methods". arXiv. https://arxiv.org/abs/1911.10104v1.
↑ Samek, W.; Montavon, G.; Lapuschkin, S. et al. (2020). "Toward Interpretable Machine Learning: Transparent Deep Neural Networks and Beyond". arXiv. https://arxiv.org/abs/2003.07631v1.
↑ Lapuschkin, S.; Wäldchen, S.; Binder, A. et al. (2019). "Unmasking Clever Hans predictors and assessing what machines really learn". Nature Communications 10 (1): 1096. doi:10.1038/s41467-019-08987-4. PMC PMC6411769. PMID 30858366. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6411769.
↑ Zech, J.R.; Badgeley, M.A.; Liu, M. et al. (2018). "Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study". PLoS Medicine 15 (11): e1002683. doi:10.1371/journal.pmed.1002683. PMC PMC6219764. PMID 30399157. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6219764.

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.

[HigginsFromBit20-1] Higgins, D.; Madai, V.I. (2020). "From Bit to Bedside: A Practical Framework for Artificial Intelligence Product Development in Healthcare". Advanced Intelligent Systems 2 (10): 2000052. doi:10.1002/aisy.202000052.

[RudinStop19-2] 2.0 ^2.1 ^2.2 ^2.3 ^2.4 Rudin, C. (2019). "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead". Nature Machine Intelligence 1: 206–15. doi:10.1038/s42256-019-0048-x.

[DoranWhat17-3] Doran, D.; Schulz, S.; Besold, T.R. (2017). "What Does Explainable AI Really Mean? A New Conceptualization of Perspectives". arXiv. https://arxiv.org/abs/1710.00794v1.

[ShortliffeClinical18-4] Shortliffe E.H.; Sepúlveda, M.J. (2018). "Clinical Decision Support in the Era of Artificial Intelligence". JAMA 320 (21): 2199–2200. doi:10.1001/jama.2018.17163.

[ObermeyerDiss19-5] Obermeyer, Z.; Powers, B.; Vogeli, C. et al. (2019). "Dissecting racial bias in an algorithm used to manage the health of populations". Science 366 (6464): 447-453. doi:10.1126/science.aax2342.

[SamekExplain19-6] Samek, W.; Montavon, G.; Vedaldi, A. et al., ed. (2019). Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer Nature. doi:10.1007/978-3-030-28954-6. ISBN 9783030289546.

[EstevaAGuide19-7] Esteva, A.; Robicquet, A.; Ramsundar, B. et al. (2019). "A guide to deep learning in healthcare". Nature Medicine 25 (1): 24-29. doi:10.1038/s41591-018-0316-z. PMID 30617335.

[IslamToward19-8] Islam, S.R.; Eberle, W.; Ghafoor, S.K. (2019). "Towards Quantification of Explainability in Explainable Artificial Intelligence Methods". arXiv. https://arxiv.org/abs/1911.10104v1.

[SamekToward20-9] Samek, W.; Montavon, G.; Lapuschkin, S. et al. (2020). "Toward Interpretable Machine Learning: Transparent Deep Neural Networks and Beyond". arXiv. https://arxiv.org/abs/2003.07631v1.

[LapuschkinUnmask19-10] Lapuschkin, S.; Wäldchen, S.; Binder, A. et al. (2019). "Unmasking Clever Hans predictors and assessing what machines really learn". Nature Communications 10 (1): 1096. doi:10.1038/s41467-019-08987-4. PMC PMC6411769. PMID 30858366. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6411769.

[ZechVariable18-11] Zech, J.R.; Badgeley, M.A.; Liu, M. et al. (2018). "Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study". PLoS Medicine 15 (11): e1002683. doi:10.1371/journal.pmed.1002683. PMC PMC6219764. PMID 30399157. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6219764.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

@@ Line 51: / Line 51: @@
 With regards to methodology, explainability can either be an inherent characteristic of an algorithm or can be approximated by other methods.<ref name="RudinStop19" /> The latter is highly important for methods that have until recently been labeled as “black-box models,” such as artificial neural network (ANN) models. To explain their predictions, however, numerous methods exist today.<ref name="SamekExplain19">{{cite book |title=Explainable AI: Interpreting, Explaining and Visualizing Deep Learning |editor=Samek, W.; Montavon, G.; Vedaldi, A. et al. |publisher=Springer Nature |year=2019 |isbn=9783030289546 |doi=10.1007/978-3-030-28954-6}}</ref> Importantly, however, inherent explainability will, in general, be more accurate than methods that only approximate explainability.<ref name="RudinStop19" /> This can be attributed to the complex characteristics of many modern machine learning methods. In ANNs, for example, the inner workings of sometimes millions of weights between artificial neurons need to be interpreted in a way that humans can understand. Thus, contrasting methods with inherent explainability have a crucial advantage. However, these methods are usually also traditional methods, such as linear or logistic regression. For many use cases, there is an inferiority of these traditional methods in performance compared to modern state-of-the-art methods such as ANNs.<ref name="EstevaAGuide19">{{cite journal |title=A guide to deep learning in healthcare |journal=Nature Medicine |author=Esteva, A.; Robicquet, A.; Ramsundar, B. et al. |volume=25 |issue=1 |pages=24-29 |year=2019 |doi=10.1038/s41591-018-0316-z |pmid=30617335}}</ref> Thus, there is a trade-off between performance and explainability, and this trade-off is a big challenge for the developers of CDSSs. It should be noted that some assume that this trade-off does not exist in reality, but it is a mere artifact of suboptimal modelling approaches, as pointed out by Rudin ''et al.''<ref name="RudinStop19" /> While the work of Rudin ''et al.'' is important to raise attention to the shortcomings of approximating explainability methods, it is likely that some approximating methods, in contrast to the notion of Rudin ''et al.''<ref name="RudinStop19" /> , have value given the complex nature of explaining machine learning models. Additionally, while we can make the qualitative assessment that inherent explainability is likely better than approximated explainability, there exist only initial exploratory attempts to rank explainability methods quantitatively.<ref name="IslamToward19">{{cite journal |title=Towards Quantification of Explainability in Explainable Artificial Intelligence Methods |journal=arXiv |author=Islam, S.R.; Eberle, W.; Ghafoor, S.K. |year=2019 |url=https://arxiv.org/abs/1911.10104v1}}</ref> Notwithstanding, for many applications—and generally in AI product development—there is a ''de facto'' preference for modern algorithms such as ANNs. Additionally, it cannot be ruled out that for some applications, such modern methods do exhibit actual higher performance. This necessitates to critically assess explainability methods further, both with regards to technical development, e.g., for methods ranking and optimization of methods for certain inputs, and with regards to the role of explainability from a multiple stakeholder view, as done in the current work.
+From the development point-of-view, explainability will regularly be helpful for developers to sanity check their AI models beyond mere performance. For example, it is highly beneficial to rule out that the prediction performance is based on [[metadata]] rather than the data itself. A famous non-medical example was the classification task to discern between huskies and wolves, where the prediction was solely driven by the identification of a snowy background rather than real differences between huskies and wolves.<ref name="SamekToward20">{{cite journal |title=Toward Interpretable Machine Learning: Transparent Deep Neural Networks and Beyond |journal=arXiv |author=Samek, W.; Montavon, G.; Lapuschkin, S. et al. |year=2020 |url=https://arxiv.org/abs/2003.07631v1}}</ref> This phenomenon is also called a “Clever Hans” phenomenon.<ref name="LapuschkinUnmask19">{{cite journal |title=Unmasking Clever Hans predictors and assessing what machines really learn |journal=Nature Communications |author=Lapuschkin, S.; Wäldchen, S.; Binder, A. et al. |volume=10 |issue=1 |at=1096 |year=2019 |doi=10.1038/s41467-019-08987-4 |pmid=30858366 |pmc=PMC6411769}}</ref> Clever Hans phenomena are also found in medicine. An example is the model developed by researchers from Mount Sinai Health System which performed very well in distinguishing high-risk patients from non-high-risk patients based on x-ray imaging. However, when the tool was applied outside of Mount Sinai, the performance plummeted. As it turned out, the AI model did not learn clinically relevant information from the images. In analogy to the snowy background in the example introduced above, the prediction was based on hardware related metadata tied to the specific x-ray machine that was used to image the high-risk ICU patients exclusively at Mount Sinai.<ref name="ZechVariable18">{{cite journal |title=Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study |journal=PLoS Medicine |author=Zech, J.R.; Badgeley, M.A.; Liu, M. et al. |volume=15 |issue=11 |at=e1002683 |year=2018 |doi=10.1371/journal.pmed.1002683 |pmid=30399157 |pmc=PMC6219764}}</ref> Thus, the system was able to distinguish only which machine was used for imaging and not the risk of the patients. Explainability methods allow developers to identify these types of errors before AI tools go into clinical validation and the certification process, as the Clever Hans predictors (snowy background, hardware information) would be identified as prediction relevant by the explainability methods rather than meaningful features from a domain perspective. This saves time and development costs. It should be noted that explainability methods aimed at developers to provide insight into their models have different prerequisites than systems aimed at technologically unsavvy end users such as clinical doctors and patients. For developers, these methods can be more complex in their approach and [[Data visualization|visualization]].
 ===Legal perspective===

Difference between revisions of "Journal:Explainability for artificial intelligence in healthcare: A multidisciplinary perspective"

Revision as of 21:32, 28 December 2020

Contents

Abstract

Background

Methods

Results

Technological perspective

Legal perspective

Medical perspective

Patient perspective

Ethical implications

Conclusion

References

Notes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

Popular publications

Print/export