Researchers from the University of Oxford, EleutherAI, and the UK AI Security Institute have reported a major advance in safeguarding open-weight language models. By filtering out potentially harmful knowledge during training, the researchers were able to build models that resist subsequent malicious updates—especially valuable in sensitive domains such as biothreat research.
Filtered data stops openly-available AI models from performing dangerous tasks, study finds
Reader’s Picks
-
Big tech company hype sells generative artificial intelligence (AI) as intelligent, creative, desirable, inevitable, and about to radically reshape the [...]
-
Much previous work in the social sciences has involved researchers—often but not always from the Global North—collecting data from rural [...]
-
Research by Royal Holloway has found people with a varied social life mentally put their daily experiences into small “events,” [...]