AI is a relatively new tool, and despite its rapid deployment in nearly every aspect of our lives, researchers are still trying to figure out how its “personality traits” arise and how to control them. Large learning models (LLMs) use chatbots or “assistants” to interface with users, and some of these assistants have exhibited troubling behaviors recently, like praising evil dictators, using blackmail or displaying sycophantic behaviors with users. Considering how much these LLMs have already been integrated into our society, it is no surprise that researchers are trying to find ways to weed out undesirable behaviors.
Anthropic says they’ve found a new way to stop AI from turning evil
Reader’s Picks
-
Imagine serving your country overseas, returning home and feeling unwelcome in the very place meant to support you.This article is [...]
-
Ever felt like doing a bare minimum at work? Not investing any extra effort, not going any extra mile? You [...]
-
A simple model developed by a RIKEN researcher and a collaborator predicts the emergence of self-organized institutions that manage limited [...]