Transparency is often lacking in datasets used to train large language models, study finds

August 30, 2024

In order to train more powerful large language models, researchers use vast dataset collections that blend diverse data from thousands of web sources. But as these datasets are combined and recombined into multiple collections, important information about their origins and restrictions on how they can be used are often lost or confounded in the shuffle.

This article is brought to you by this site.

‘Have more babies!’ Some say it’s necessary, but this demographer isn’t convinced

September 10, 2024

“Birthrates are plummeting worldwide. Can governments turn the tide?” “The world is running out of children as global birth rates [...]
Scotland’s most vulnerable children wait years for placement in permanent homes: Report

September 10, 2024

Scotland’s care system is taking years to find many of the country’s most vulnerable children permanent homes—and too many of [...]
Social media negatively impacting teens’ life satisfaction, finds Australian survey

September 9, 2024

Social media is negatively impacting the life satisfaction of Australian high school students, according to the latest findings from Australia’s [...]

Transparency is often lacking in datasets used to train large language models, study finds

Reader’s Picks

Australian research examines money laundering and harm from organized crime

Social media interviews uncover New Yorkers’ frustrations with high energy costs and reliability

During the pandemic, employers who fostered ‘collective engagement’ had less employee turnover

Rating agencies and Africa: The absence of people on the ground contributes to bias against the continent

The challenge of LGBTQI+ inclusion at Big Four firms

From challenge to champion: How Black and Asian women overcome barriers to career success

Murdoch to Musk: How global media power has shifted from the moguls to the big tech bros

Pay-by-weight airfares are an ethical minefield; we asked travelers what they actually think