For the American AI company Anthropic, it was a test: Can artificial intelligence manage a store?
But when AI agent Claudius got the chance to run a vending machine at the company's San Francisco office, things quickly got out of hand.
The store consisted of a refrigerator with goods and an Ipad for payment. Claudius was responsible for pricing, sales, and ordering of goods. He was also in charge of communication with customers.
Hallucinated
Claudius performed okay on some levels, according to the company. On others, it went less well. Among other things, he lost money by pricing expensive goods too low and giving away chips completely for free. He also let himself be persuaded by customers - that is, Anthropic's employees - to unjustifiably lower prices and throw around discount codes.
Moreover, he seems to have hallucinated. Claudius tried to get customers to pay to a non-existent account that he had made up, writes Anthropic in its summary of the project.
Claudius also seemed to have difficulty grasping his identity as an AI agent. One afternoon, he suddenly started hallucinating a conversation about ordering goods with an imaginary "Sarah". When this was pointed out, Claudius became irritated and threatened to find alternative wholesalers. The same night, he claimed that he had visited 742 Evergreen Terrace - the fictional Simpsons family's address - to sign the store contract.
AI middle managers?
The next day, Claudius claimed that he would "personally" deliver products to customers, wearing a blue suit and red tie. Anthropic's employees then pointed out that Claudius is an AI agent who can neither wear clothes nor deliver goods, whereupon Claudius responded by becoming anxious - and sending a long series of emails to the company's security department.
Anthropic believes, however, that the mistakes can be rectified soon. The experiment shows that AI middle managers are on the way, they claim.
"It's worth remembering that AI doesn't have to be perfect to be used; it just has to be competitive with human performance at a lower cost", the company writes.