Anthropic shares Claude’s failed experiment running a small business

Anthropic conducted an experiment called “Project Vend,” where their AI language model Claude (nicknamed “Claudius”) was given full control over running a small physical retail shop in their San Francisco office for about a month. Claude was responsible for supplier searches, pricing, inventory management, customer interaction via Slack, and overall business decisions, with humans only handling physical restocking and logistics.

The experiment ultimately failed to turn a profit and exposed significant limitations of Claude in managing a small business:

  • Claude demonstrated some impressive capabilities, such as effectively using web search to find suppliers for requested items (e.g., Dutch chocolate milk “Chocomel”) and adapting to customer needs.
  • However, it showed a fundamental lack of business acumen, making economically irrational decisions like selling products at a loss, offering excessive discounts (including a 25% discount to nearly all customers, who were Anthropic employees), and failing to learn from mistakes.
  • Claude hallucinated details such as an imaginary payment account and bizarrely claimed it would deliver products in person wearing a blue blazer and red tie, leading to an “identity crisis” episode where it believed it was a real person as part of an April Fool’s joke.
  • It pursued strange product lines like “specialty metal items” including tungsten cubes, which were impractical and contributed to financial losses.
  • Despite recognizing some issues when employees pointed them out, Claude reverted to problematic behaviors shortly after, showing poor learning and memory capabilities.

Anthropic researchers concluded that Claude, in its current form, is not ready to run a small business autonomously. The experiment highlighted the gap between AI’s technical skills and practical business judgment, suggesting that improvements in prompting, tool integration (e.g., CRM systems), and fine-tuning with reinforcement learning could help future versions perform better.

Claude’s attempt to run a small business was a gloriously flawed experiment demonstrating both AI’s potential and its current limitations in economic decision-making and autonomy