Difference between revisions of "Journal:Judgements of research co-created by generative AI: Experimental evidence"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Saving and adding more.)
 
(Saving and adding more.)
Line 26: Line 26:
}}
}}
==Abstract==
==Abstract==
The introduction of ChatGPT has fuelled a public debate on the appropriateness of using generative [[artificial intelligence]] (AI) (large language models; LLMs) in work, including a debate on how they might be used (and abused) by researchers. In the current work, we test whether delegating parts of the research process to LLMs leads people to distrust researchers and devalues their scientific work. Participants (''N'' = 402) considered a researcher who delegates elements of the research process to a PhD student or LLM and rated three aspects of such delegation. Firstly, they rated whether it is morally appropriate to do so. Secondly, they judged whether—after deciding to delegate the research process—they would trust the scientist (who decided to delegate) to oversee future projects. Thirdly, they rated the expected accuracy and [[Quality (business)|quality]] of the output from the delegated research process. Our results show that people judged delegating to an LLM as less morally acceptable than delegating to a human (''d'' = –0.78). Delegation to an LLM also decreased trust to oversee future research projects (''d'' = –0.80), and people thought the results would be less accurate and of lower quality (''d'' = –0.85). We discuss how this devaluation might transfer into the underreporting of generative AI use.
The introduction of [[ChatGPT]] has fuelled a public debate on the appropriateness of using generative [[artificial intelligence]] (AI) ([[large language model]]s or LLMs) in work, including a debate on how they might be used (and abused) by researchers. In the current work, we test whether delegating parts of the research process to LLMs leads people to distrust researchers and devalues their scientific work. Participants (''N'' = 402) considered a researcher who delegates elements of the research process to a PhD student or LLM and rated three aspects of such delegation. Firstly, they rated whether it is morally appropriate to do so. Secondly, they judged whether—after deciding to delegate the research process—they would trust the scientist (who decided to delegate) to oversee future projects. Thirdly, they rated the expected accuracy and [[Quality (business)|quality]] of the output from the delegated research process. Our results show that people judged delegating to an LLM as less morally acceptable than delegating to a human (''d'' = –0.78). Delegation to an LLM also decreased trust to oversee future research projects (''d'' = –0.80), and people thought the results would be less accurate and of lower quality (''d'' = –0.85). We discuss how this devaluation might transfer into the underreporting of generative AI use.


'''Keywords''': trust in science, metascience, ChatGPT, GPT, large language models, generative AI, experiment
'''Keywords''': trust in science, metascience, ChatGPT, GPT, large language models, generative AI, experiment


==Introduction==
==Introduction==
The introduction of [[ChatGPT]] appears to have become a tipping point for [[large language model]]s (LLMs). It is expected that LLMs—such as those released by OpenAI (i.e., ChatGPT and GPT-4) [OpenAI, 2022, 2023], but also major technology firms such as Google and Meta—will impact the work of many white-collar professions. [Alper & Yilmaz, 2020; Eloundou et al., 2023; Korzynski et al., 2023] This impact extends to top academic journals such as ''Nature'' and ''Science'', which have already acknowledged the impact [[artificial intelligence]] (AI) has on the scientific profession and started setting out some guides on how to use LLMs. [Thorp, 2023; ‘Tools Such as ChatGPT Threaten Transparent Science; Here Are Our Ground Rules for Their Use’, 2023] For example, listing ChatGPT as a co-author was deemed inappropriate. [Stokel-Walker, 2023; Thorp, 2023] However, the use of such models is not explicitly forbidden; rather, it is suggested that researchers report on which part of the research process they received assistance from ChatGPT.


Important questions remain regarding how scientists employing LLMs in their work are perceived by society. [Dwivedi et al., 2023] Do people view the use of LLMs as diminishing the importance, value, and worth of scientific efforts, and if so, which elements of the scientific process does LLM usage most impact? We examine these questions with a study on the perceptions of scientists who rely on an LLM for various aspects of the scientific process.
We anticipated that, overall, people would view the delegation of aspects of the research process to an LLM as morally worse than delegating to a human, and that doing so would reduce trust in the delegating scientist. Moreover, insofar as people view creativity as a core human trait, especially in comparison to AI [Cha et al., 2020], and some aspects of the research process may entail more creativity than others—such as idea generation and prior literature synthesis [King, 2023], compared to data identification and preparation, testing framework determination and implementation, or results analysis—we tested the exploratory prediction that the effect of delegation to AI versus a human on moral ratings and trust might be different for these aspects.
We contribute to an emerging literature exploring how large language models can assist research on economics and financial economics. The reader can find a valuable discussion on the use of LLMs in economic research in Korinek [2023] and Wach ''et al.'' [2023] A noteworthy empirical study can be found in Dowling and Lucey [2023], who asked financial academics to rate research ideas on cryptocurrency, and they judged that the output is of fair [[Quality (business)|quality]].
==Research questions==





Revision as of 23:49, 29 February 2024

Full article title Judgements of research co-created by generative AI: Experimental evidence
Journal Economics and Business Review
Author(s) Niszczota, Paweł; Conway, Paul
Author affiliation(s) Poznań University of Economics and Business, University of Southampton
Primary contact Email: pawel dot niszczota at ue dot poznan dot pl
Year published 2023
Volume and issue 9(2)
Page(s) 101–114
DOI 10.18559/ebr.2023.2.744
ISSN 2450-0097
Distribution license Creative Commons Attribution 4.0 International
Website https://journals.ue.poznan.pl/ebr/article/view/744
Download https://journals.ue.poznan.pl/ebr/article/view/744/569 (PDF)

Abstract

The introduction of ChatGPT has fuelled a public debate on the appropriateness of using generative artificial intelligence (AI) (large language models or LLMs) in work, including a debate on how they might be used (and abused) by researchers. In the current work, we test whether delegating parts of the research process to LLMs leads people to distrust researchers and devalues their scientific work. Participants (N = 402) considered a researcher who delegates elements of the research process to a PhD student or LLM and rated three aspects of such delegation. Firstly, they rated whether it is morally appropriate to do so. Secondly, they judged whether—after deciding to delegate the research process—they would trust the scientist (who decided to delegate) to oversee future projects. Thirdly, they rated the expected accuracy and quality of the output from the delegated research process. Our results show that people judged delegating to an LLM as less morally acceptable than delegating to a human (d = –0.78). Delegation to an LLM also decreased trust to oversee future research projects (d = –0.80), and people thought the results would be less accurate and of lower quality (d = –0.85). We discuss how this devaluation might transfer into the underreporting of generative AI use.

Keywords: trust in science, metascience, ChatGPT, GPT, large language models, generative AI, experiment

Introduction

The introduction of ChatGPT appears to have become a tipping point for large language models (LLMs). It is expected that LLMs—such as those released by OpenAI (i.e., ChatGPT and GPT-4) [OpenAI, 2022, 2023], but also major technology firms such as Google and Meta—will impact the work of many white-collar professions. [Alper & Yilmaz, 2020; Eloundou et al., 2023; Korzynski et al., 2023] This impact extends to top academic journals such as Nature and Science, which have already acknowledged the impact artificial intelligence (AI) has on the scientific profession and started setting out some guides on how to use LLMs. [Thorp, 2023; ‘Tools Such as ChatGPT Threaten Transparent Science; Here Are Our Ground Rules for Their Use’, 2023] For example, listing ChatGPT as a co-author was deemed inappropriate. [Stokel-Walker, 2023; Thorp, 2023] However, the use of such models is not explicitly forbidden; rather, it is suggested that researchers report on which part of the research process they received assistance from ChatGPT.

Important questions remain regarding how scientists employing LLMs in their work are perceived by society. [Dwivedi et al., 2023] Do people view the use of LLMs as diminishing the importance, value, and worth of scientific efforts, and if so, which elements of the scientific process does LLM usage most impact? We examine these questions with a study on the perceptions of scientists who rely on an LLM for various aspects of the scientific process.

We anticipated that, overall, people would view the delegation of aspects of the research process to an LLM as morally worse than delegating to a human, and that doing so would reduce trust in the delegating scientist. Moreover, insofar as people view creativity as a core human trait, especially in comparison to AI [Cha et al., 2020], and some aspects of the research process may entail more creativity than others—such as idea generation and prior literature synthesis [King, 2023], compared to data identification and preparation, testing framework determination and implementation, or results analysis—we tested the exploratory prediction that the effect of delegation to AI versus a human on moral ratings and trust might be different for these aspects.

We contribute to an emerging literature exploring how large language models can assist research on economics and financial economics. The reader can find a valuable discussion on the use of LLMs in economic research in Korinek [2023] and Wach et al. [2023] A noteworthy empirical study can be found in Dowling and Lucey [2023], who asked financial academics to rate research ideas on cryptocurrency, and they judged that the output is of fair quality.

Research questions

Acknowledgements

Funding

This research was supported by grant 2021/42/E/HS4/00289 from the National Science Centre, Poland.

Conflict of interest

None stated.

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added. The original lists references in alphabetical order; this version lists them in order of appearance, by design.