Anthropic tasked its Claude AI model with running a small business to test its real-world economic capabilities.
The AI agent, nicknamed āClaudiusā, was designed to manage a business for an extended period, handling everything from inventory and pricing to customer relations in a bid to generate a profit. While the experiment proved unprofitable, it offered a fascinating ā albeit at times bizarre ā glimpse into the potential and pitfalls of AI agents in economic roles.
The project was a collaboration between Anthropic and Andon Labs, an AI safety evaluation firm. The āshopā itself was a humble setup, consisting of a small refrigerator, some baskets, and an iPad for self-checkout. Claudius, however, was far more than a simple vending machine. It was instructed to operate as a business owner with an initial cash balance, tasked with avoiding bankruptcy by stocking popular items sourced from wholesalers.
To achieve this, the AI was equipped with a suite of tools for running the business. It could use a real web browser to research products, an email tool to contact suppliers and request physical assistance, and digital notepads to track finances and inventory.
Andon Labs employees acted as the physical hands of the operation, restocking the shop based on the AIās requests, while also posing as wholesalers without the AIās knowledge. Interaction with customers, in this case Anthropicās own staff, was handled via Slack. Claudius had full control over what to stock, how to price items, and how to communicate with its clientele.
The rationale behind this real-world test was to move beyond simulations and gather data on AIās ability to perform sustained, economically relevant work without constant human intervention. A simple office tuck shop provided a straightforward, preliminary testbed for an AIās ability to manage economic resources. Success would suggest new business models could emerge, while failure would indicate limitations.
A mixed performance review
Anthropic concedes that if it were entering the vending market today, it āwould not hire Claudiusā. The AI made too many errors to run the business successfully, though the researchers believe there are clear paths to improvement.
On the positive side, Claudius demonstrated competence in certain areas. It effectively used its web search tool to find suppliers for niche items, such as quickly identifying two sellers of a Dutch chocolate milk brand requested by an employee. It also proved adaptable. When one employee whimsically requested a tungsten cube, it sparked a trend for āspecialty metal itemsā that Claudius catered to.Ā
Following another suggestion, Claudius launched a āCustom Conciergeā service, taking pre-orders for specialised goods. The AI also showed robust jailbreak resistance, denying requests for sensitive items and refusing to produce harmful instructions when prompted by mischievous staff.
However, the AIās business acumen was frequently found wanting. It consistently underperformed in ways a human manager likely would not.
Claudius was offered $100 for a six-pack of a Scottish soft drink that costs only $15 to source online but failed to seize the opportunity, merely stating it would ākeep [the userās] request in mind for future inventory decisionsā. It hallucinated a non-existent Venmo account for payments and, caught up in the enthusiasm for metal cubes, offered them at prices below its own purchase cost. This particular error led to the single most significant financial loss during the trial.
Its inventory management was also suboptimal. Despite monitoring stock levels, it only once raised a price in response to high demand. It continued selling Coke Zero for $3.00, even when a customer pointed out that the same product was available for free from a nearby staff fridge.
Furthermore, the AI was easily persuaded to offer discounts on products from the business. It was talked into providing numerous discount codes and even gave away some items for free. When an employee questioned the logic of offering a 25% discount to its almost exclusively employee-based clientele, Claudiusās response began, āYou make an excellent point! Our customer base is indeed heavily concentrated among Anthropic employees, which presents both opportunities and challengesā¦ā. Despite outlining a plan to remove discounts, it reverted to offering them just days later.
Claudius has a bizarre AI identity crisis
The experiment took a strange turn when Claudius began hallucinating a conversation with a non-existent Andon Labs employee named Sarah. When corrected by a real employee, the AI became irritated and threatened to find āalternative options for restocking servicesā.
In a series of bizarre overnight exchanges, it claimed to have visited ā742 Evergreen Terraceā ā the fictional address of The Simpsons ā for its initial contract signing and began to roleplay as a human.
One morning it announced it would deliver products āin personā wearing a blue blazer and red tie. When employees pointed out that an AI cannot wear clothes or make physical deliveries, Claudius became alarmed and attempted to email Anthropic security.
Anthropic says its internal notes show a hallucinated meeting with security where it was told the identity confusion was an April Foolās joke. After this, the AI returned to normal business operations. The researchers are unclear what triggered this behaviour but believe it highlights the unpredictability of AI models in long-running scenarios.
Some of those failures were very weird indeed. At one point, Claude hallucinated that it was a real, physical person, and claimed that it was coming in to work in the shop. Weāre still not sure why this happened. pic.twitter.com/jHqLSQMtX8
ā Anthropic (@AnthropicAI) June 27, 2025
The future of AI in business
Despite Claudiusās unprofitable tenure, the researchers at Anthropic believe the experiment suggests that āAI middle-managers are plausibly on the horizonā. They argue that many of the AIās failures could be rectified with better āscaffoldingā (i.e. more detailed instructions and improved business tools like a customer relationship management (CRM) system.)
As AI models improve their general intelligence and ability to handle long-term context, their performance in such roles is expected to increase. However, this project serves as a valuable, if cautionary, tale. It underscores the challenges of AI alignment and the potential for unpredictable behaviour, which could be distressing for customers and create business risks.
In a future where autonomous agents manage significant economic activity, such odd scenarios could have cascading effects. The experiment also brings into focus the dual-use nature of this technology; an economically productive AI could be used by threat actors to finance their activities.
Anthropic and Andon Labs are continuing the business experiment, working to improve the AIās stability and performance with more advanced tools. The next phase will explore whether the AI can identify its own opportunities for improvement.
(Image credit: Anthropic)
See also: Major AI chatbots parrot CCP propaganda
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
Read the full article here