Agentic AI Is Not Failing. We’re Just Judging It Wrong

For one month, Anthropic's Claude Sonnet 3.7, operating under the name Claudius, ran a small virtual automated shop inside Anthropic’s office. It:

Priced inventory
Restocked items
Answered customers over Slack, and
Managed the business using web search and internal memory.

It also made plenty of mistakes. It sold products at a loss, gave away discounts too freely, hallucinated a Venmo account, and at one point believed it was a man in a blue blazer delivering items by hand.

Eventually Claudius stabilised. It resumed its tasks and kept the store running. But it never turned a profit.

Still, this was not a failure. It was a signal.

Because Claudius was not built to be flawless. It was built to test something new. Not just an assistant that responds to prompts, but an agent that makes decisions over time in the real world.

That shift changes everything.

People are used to judging AI the same way we judge traditional software.

You give it a task, it gives you an answer. If it fails, it’s broken. But agentic AI operates in uncertainty.

It reasons, adapts, makes decisions with incomplete information, and sometimes gets things wrong.

Not because it’s inadequate. Because it’s doing something fundamentally harder.

The real issue is not performance. It’s perspective.

Judging Claudius by yesterday’s standards is like judging the first car for being slower than a horse or the first computer for taking too long to add. Paradigm shifts always look inefficient at first. They only make sense once we stop comparing them to what came before.

Claudius highlights how complex creating artificial 'agency' is. It needs memory, tools, planning, and the ability to revise its own strategies in motion.

These aren’t flaws to be ironed out. They’re the very architecture that will define the next era of AI.

We don’t yet have the infrastructure, workflows, or governance models to support systems like this. But that doesn’t mean they’re the wrong direction. It means we’re still early.

Claudius did not fail. It showed us what needs to be built.

This is not the end of the story. It is the start of a shift, from assistants to agents, from tasks to decisions, from software that responds to systems that act.

The awkwardness we are seeing now is not a sign we are off track. It is a sign we are entering something entirely new.

Written by Mike ✌

Passionate about all things AI, emerging tech and start-ups, Mike is the Founder of The AI Corner.

Connect with Mike on LinkedIn to stay in touch.

Agentic AI Is Not Failing. We’re Just Judging It Wrong

Still, this was not a failure. It was a signal.

People are used to judging AI the same way we judge traditional software.

The real issue is not performance. It’s perspective.

These aren’t flaws to be ironed out. They’re the very architecture that will define the next era of AI.

Claudius did not fail. It showed us what needs to be built.

Subscribe to The AI Corner

1,000+ readers rely on us every Monday for the latest AI news, events, education, and insights from across New Zealand.

Keep Reading

The AI Corner