When combinations of humans and AI are useful - A systematic review and meta-analysis

nature.com/articles/s41562-024-02024-1

Abstract

Inspired by the increasing use of artificial intelligence (AI) to augment humans, researchers have studied human–AI systems involving different tasks, systems and populations. Despite such a large body of work, we lack a broad conceptual understanding of when combinations of humans and AI are better than either alone. Here we addressed this question by conducting a preregistered systematic review and meta-analysis of 106 experimental studies reporting 370 effect sizes. We searched an interdisciplinary set of databases (the Association for Computing Machinery Digital Library, the Web of Science and the Association for Information Systems eLibrary) for studies published between 1 January 2020 and 30 June 2023. Each study was required to include an original human-participants experiment that evaluated the performance of humans alone, AI alone and human–AI combinations. First, we found that, on average, human–AI combinations performed significantly worse than the best of humans or AI alone (Hedges’ g = −0.23; 95% confidence interval, −0.39 to −0.07). Second, we found performance losses in tasks that involved making decisions and significantly greater gains in tasks that involved creating content. Finally, when humans outperformed AI alone, we found performance gains in the combination, but when AI outperformed humans alone, we found losses. Limitations of the evidence assessed here include possible publication bias and variations in the study designs analysed. Overall, these findings highlight the heterogeneity of the effects of human–AI collaboration and point to promising avenues for improving human–AI systems.

Main

When do humans and AI complement each other, and by how much?

A large body of work suggests that integrating human creativity, intuition and contextual understanding with AI’s speed, scalability and analytical power can lead to innovative solutions and improved decision-making in areas such as health care1, customer service2 and scientific research. However, a growing number of studies reveal that human–AI systems do not necessarily achieve better results than the best of humans or AI alone. …

These seemingly contradictory results raise important questions: when do humans and AI complement each other, and by how much?

Results

Human-AI synergy: Human-AI system vs. max(human, AI)

We found that the human–AI systems performed significantly worse overall than this baseline. The overall pooled effect was negative (g = −0.23; t92 = −2.89; two-tailed P = 0.005; 95% confidence interval (CI), −0.39 to −0.07) and considered small according to conventional interpretations.

Human augmentation: Human-AI system vs. human alone

The human–AI systems performed significantly better than humans alone, and this pooled effect size was positive (g = 0.64; t98 = 11.87; two-tailed P = 0.000; 95% CI, 0.53 to 0.74) and medium to large.

Discussion

흥미로운 부분:

  • 인간이 AI보다 잘하는 일에 AI를 붙여주면 인간 혼자 할 때보다 결과가 더 좋음
  • AI가 인간보다 잘하는 일에 인간을 붙여주면 AI 혼자 할 때보다 결과가 더 나쁨

저자들은 인간이 AI보다 전반적으로 잘하는 경우에라야, AI의 제안을 수용할지 여부도 더 잘 판단할 수 있기 때문일 것으로 추측한다. 그러니까 인간이 AI보다 못하는 상황에선 AI를 의심할 타이밍에 의심 안하거나(overreliance), AI의 제안을 수용할 타이밍에 고집을 부리거나(underreliance) 한다는 얘기.

또다른 흥미로운 점은 인간이 AI보다 잘하는 일이건 못하는 일이건 간에, 인간이 혼자 하는 것보다는 AI랑 같이하는 게 대체로 더 나았다는 점. 특히 유한한 선택지 중 고르는 종류의 일이 아니라 열린 상황(콘텐츠 생산 등)에서는 더욱 그러함.

2024 © ak