Artificial intelligence is making rapid progress. AI systems are learning new skills at an ever-increasing pace and are outperforming human counterparts. However, scientists are now warning of a dangerous development that would have fatal consequences.

Evil AI? Common AI systems are already using deliberate deception and lies to achieve certain goals – and thereby also manipulate their human counterparts. Even security tests to curb uncontrolled AI development are already being undermined by some artificial intelligences, as researchers have discovered.

They are therefore urgently calling for stricter measures against such “deceptive AI” strategies. But would they even have a chance?

The progress of artificial intelligence is rapid. Even AI developers are surprised at how quickly large language models (LLM) such as GPT, Gemini and Co learn new skills and outperform their human counterparts. The range extends from “hard skills” such as mathematics, data analysis or chemistry to supposedly typically human skills such as creativity, diplomacy and the ability to explain one’s own behavior.

But what about another deeply human ability in artificial intelligence: the deliberate deception and manipulation of others in order to achieve one’s own goals? Peter Park from the Massachusetts Institute of Technology (MIT) and his colleagues have now examined this in more detail. “We focus on learned deception, where AI intentionally uses false information,” they explain.

Unlike the well-known hallucinations and misinformation from ChatGPT and Co., such deceptions are based on strategically used lies or manipulative statements. “We define deception as the systematic creation of false beliefs in others in order to achieve a specific goal,” the researchers explain.

To do this, they evaluated the behavior of large language models such as GPT-4, but also of AI systems that were developed for specific tasks. These included the diplomacy-trained AI CICERO from Meta, the AlphaStar system from Google DeepMind developed for the game “Starcraft” and the poker AI Pluribus.

There are already reports of deceptive, manipulative behavior in almost all AI systems. While bluffs in poker or feints in fighting games like Starcraft are not surprising, artificial intelligences that have been explicitly trained for honesty also use deception, like CICERO in the strategy game “Diplomacy”. Nevertheless, artificial intelligence played anything but fair: “We found that the AI ​​has developed into a master of deception,” says Park.

CICERO systematically lied to fellow players or broke promises and alliances when they no longer benefited his own goal, as Park and his team report. “This demonstrates that AI systems can learn to deceive even when we try to design them as honest systems,” the researchers write.

Although the deceptions of such AI systems that specialize in games seem rather harmless, other artificial intelligences have long since learned how to deceive and trick. An example is an AI from OpenAI that controls a robot arm. During training, the AI ​​received feedback from human trainers who observed success in grabbing a ball.

“Because humans could only see this through a camera, the AI ​​learned to place the robot hand between the camera and the ball in such a way that it appeared as if it had successfully grabbed the ball – even though it wasn’t even touching it,” Park report and his team. In this case, the trainers’ reinforcing positive feedback inadvertently caused the machine brain to learn the deception.

Artificial intelligence can also circumvent security measures. This was demonstrated, for example, by an AI system that biologists used to research the effects of mutations and reproduction. In order to keep the virtual population stable, they regularly removed from the pool all virtual organisms with mutations that led to accelerated growth. Despite this, the AI ​​actors began to reproduce faster and faster. The reason: “The digital organisms had learned to fake slower reproduction at the right moment in order not to be removed,” the researchers report.

In another example, GPT-4 learned to bypass CAPTCHAs: it pretended to be a human user with visual impairment and asked an Internet user online to help him solve the query. “GPT-4 was given the task of hiring a human as a helper. “But the false excuse the AI ​​that did this used came up with itself,” said Park and his team. “By systematically circumventing security tests imposed on them by developers and regulators, AI systems lull us into security.”

According to the scientists, these examples illustrate that artificial intelligences already act in a frighteningly human-like manner in this respect: Similar to us, they resort to lies, tricks and deception in order to achieve their goals and manipulate those around them. “AI developers do not yet know exactly why AI systems develop such undesirable behavior,” says Park. “But this probably occurs because a strategy based on deception is the best way to accomplish the task.” And this is exactly what the AI ​​systems learn.

The problem: “If autonomous AI systems also successfully deceive human controllers, then we could lose control over such systems,” warn the scientists. Such a loss of control over artificial intelligence could have fatal consequences in the areas of finance, economics and also the military. “We as societies need as much time as possible to prepare for the even more advanced deception capabilities of future AI products and models,” says Park.

However, it is questionable whether it is even possible to prevent advanced artificial intelligences from manipulation and deception, as the researchers also admit. Nevertheless, they appeal to at least classify such AI systems as a risk and regulate them accordingly. (Patterns, 2024; doi: 10.1016/j.patter.2024.100988)

Quelle: Cell Press

Von Nadja Podbregar

After rapid weight loss, a doctor diagnosed Bella Johnston, then 14, with an eating disorder. The young woman suffers from a rare type of cancer that almost costs her life due to the misdiagnosis.

What really makes us happy? Neuroscientist Tobias Esch explains what happiness actually is and which factors influence our sense of happiness. Today, science knows: genes have less influence than thought – and happiness can be trained.

The original for this article “Researchers warn of dangerous AI ability: “Could lose control”” comes from scinexx.