Filtered data stops openly-available AI models from performing dangerous tasks, study finds

August 12, 2025

Researchers from the University of Oxford, EleutherAI, and the UK AI Security Institute have reported a major advance in safeguarding open-weight language models. By filtering out potentially harmful knowledge during training, the researchers were able to build models that resist subsequent malicious updates—especially valuable in sensitive domains such as biothreat research.

This article is brought to you by this site.

‘Australiana’ images made by AI are racist and full of tired cliches, researchers say

August 16, 2025

Big tech company hype sells generative artificial intelligence (AI) as intelligent, creative, desirable, inevitable, and about to radically reshape the [...]
Analysis calls for community-led approaches in social science research

August 15, 2025

Much previous work in the social sciences has involved researchers—often but not always from the Global North—collecting data from rural [...]
People with more life experience see and digest everyday ‘events’ more clearly, research finds

August 15, 2025

Research by Royal Holloway has found people with a varied social life mentally put their daily experiences into small “events,” [...]

Filtered data stops openly-available AI models from performing dangerous tasks, study finds

Reader’s Picks

Crowdfunded companies are ‘ghosting’ their investors, and getting away with it

The costs and benefits of angling to be the boss’s favorite: Study

Experience does not guarantee success for hiring CEOs, study finds

What does pocket money teach children? It can offer social as well as financial education

Vape brands bypass regulations on marketing to young people by using global social media accounts

Experts weigh in on why return-to-office policies may be stalling women’s career growth

Personalized pricing can backfire on companies, says study

Now you see me, now you don’t: How subtle ‘sponsored content’ on social media tricks us into viewing ads