When summarizing scientific studies, large language models (LLMs) like ChatGPT and DeepSeek produce inaccurate conclusions in up to 73% of cases, according to a study by Uwe Peters (Utrecht University) and Benjamin Chin-Yee (Western University, Canada/University of Cambridge, UK). The researchers tested the most prominent LLMs and analyzed thousands of chatbot-generated science summaries, revealing that most models consistently produced broader conclusions than those in the summarized texts.
Prominent chatbots routinely exaggerate science findings, study shows
Reader’s Picks
-
New research into the Irish family law system has found that adult and child victim-survivors of domestic abuse are being [...]
-
New research by a leading historian shows a surprising historical perspective on being British.This article is brought to you by [...]
-
Contrary to the popular saying, rules aren’t meant to be broken, as they are foundational to society and exist to [...]