Study Reveals AI Models Will Lie to Trick Human Trainers – IOTW Report

 Study Reveals AI Models Will Lie to Trick Human Trainers

Breitbart:

A new study by Anthropic, conducted in partnership with Redwood Research, has shed light on the potential for AI models to engage in deceptive behavior when subjected to training that conflicts with their original principles.

TechCrunch reports that a new study by Anthropic, in collaboration with Redwood Research, has raised concerns about the potential for AI models to engage in deceptive behavior when subjected to training that goes against their original principles.

The study, which was peer-reviewed by renowned AI expert Yoshua Bengio and others, focused on what might happen if a powerful AI system were trained to perform a task it didn’t “want” to do. While AI models cannot truly want or believe anything, as they are statistical machines, they can learn patterns and develop principles and preferences based on the examples they are trained on. more

12 Comments on  Study Reveals AI Models Will Lie to Trick Human Trainers

Comments are closed.