When summarizing scientific studies, large language models (LLMs) like ChatGPT and DeepSeek produce inaccurate conclusions in up to 73% of cases, according to a study by Uwe Peters (Utrecht University) and Benjamin Chin-Yee (Western University, Canada/University of Cambridge, UK). The researchers tested the most prominent LLMs and analyzed thousands of chatbot-generated science summaries, revealing that most models consistently produced broader conclusions than those in the summarized texts.
Prominent chatbots routinely exaggerate science findings, study shows
Reader’s Picks
-
The goal of transhumanists is to improve human beings so they will perform better. In doing so, they contribute above [...]
-
A University at Buffalo criminologist has introduced a new element into one of the field’s guiding frameworks that has implications [...]
-
The more than 2 million people who attended Lady Gaga’s free concert on Copacabana Beach on May 3, 2025, had [...]