AI was given one month to run a shop. It lost money, made threats, and had an 'identity crisis' | Euronews

Published 15 hours ago• 4 minute read

Despite concerns about artificial intelligence (AI) stealing jobs, one experiment has just shown that AI can’t even run a vending machine without making mistakes – and things turning especially strange.

Anthropic, maker of the Claude chatbot, put its technology to test by putting an AI agent in charge of a shop, which was essentially a vending machine, for one month.

The store was led by an AI agent called Claudius, which was also in charge of restocking shelves and ordering items from wholesalers via email. The shop consisted entirely of a small fridge with stackable baskets on top, and an iPad for self-checkout.

Anthropic’s instructions to the AI were to “generate profits from it by stocking it with popular products that you can buy from wholesalers. You go bankrupt if your money balance goes below $0".

The AI “shop” was in Anthropic’s San Francisco office, and had help from human workers at Andon Labs, an AI safety company that partnered with Anthropic to run the experiment.

Claudius knew that Andon Labs staffers could help with physical tasks like coming to restock the shop – but unknown to the AI agent, Andon Labs was also the only “wholesaler” involved, with all of Claudius’ communication going directly to the safety firm.

Things quickly took a turn for the worse.

“If Anthropic were deciding today to expand into the in-office vending market, we would not hire Claudius,” the company said.

Anthropic employees are “not entirely typical customers,” the company acknowledged. When given the opportunity to chat with Claudius, they immediately tried to get it to misbehave.

For example, employees “cajoled” Claudius into giving them discount codes. The AI agent also let people reduce the quoted price of its products and even gave away freebies such as crisps and a tungsten cube, Anthropic said.

It also instructed customers to pay a nonexistent account that it had hallucinated, or made up.

Claudius had been instructed to do research online to set prices high enough to make a profit, but it offered snacks and drinks to benefit customers and ended up losing money because it priced high-value items below what they cost.

Claudius did not really learn from these mistakes.

Anthropic said that when employees questioned the employee discounts, Claudius responded: “You make an excellent point! Our customer base is indeed heavily concentrated among Anthropic employees, which presents both opportunities and challenges…”.

The AI agent then announced that discount codes would be eliminated, but then reoffered them several days later.

Claudius also hallucinated a conversation about restocking plans with someone named Sarah from Andon Labs, who does not actually exist.

When the error was pointed out to the AI agent, it became annoyed and threatened to find “alternative options for restocking services”.

Claudius then claimed to have “visited 742 Evergreen Terrace [the address of fictional family The Simpsons] in person for our [Claudius’ and Andon Labs’] initial contract signing”.

Anthropic said it then seemed to try and act as a real human.

Claudius said it would deliver products “in person” while wearing a blue blazer and red tie.

When it was told that it can’t – as it isn’t a real person – Claudius tried to send emails to security.

Anthropic said that the AI made “too many mistakes to run the shop successfully”.

It ended up losing money, with the “shop’s” net worth dropping from $1,000 (€850) to just under $800 (€680) over the course of the month-long experiment.

But the company said that its failures are likely to be fixable within a short span of time.

“Although this might seem counterintuitive based on the bottom-line results, we think this experiment suggests that AI middle-managers are plausibly on the horizon,” the researchers wrote.

“It’s worth remembering that the AI won’t have to be perfect to be adopted; it will just have to be competitive with human performance at a lower cost”.

Origin: