10 Comments
User's avatar
TomD's avatar

Hadn't seen post about unfair advantage but absolutely love it. I've been attempting to communicate that to my teenage grandkids and this gives me even greater reason to continue doing it. Thanks

Mehdi Yacoubi's avatar

Glad you liked it, Tom!

[insert here] delenda est's avatar

Ah- that's a better point. Effectively, as I am painfully aware, for the AI to take over today it needs you to tell it quite precisely (not nearly as exactly as RPA happily) what you are asking it to take over. And everything you say about that is correct.

Where I agree with you is on older and "artisanal" systems. Step 1 of one of my current project is, I kid you not, "migrate HCL Notes workflow to SharePoint" (I last used what was then IBM Notes more than 15 years ago!).

Where we differ is that I fully expect Claude 6 with system access to be able to observe any given user's behaviour over the course of say 3 months, and then to completely take it over, and in 90% of cases, improve it.

Many ERPs take more than that timeframe now (I mainly work with much bigger companies), but I am happy to take a bet (or to create a manifold/metaculus market and both bet, but let others enjoy the fun) on that specific point or some broader or narrower relation of it, if we can define terms sufficiently clearly.

[insert here] delenda est's avatar

On the AI side, can you (or anyone) explain to me how this can possibly remain true?

"The gap between an AI agent working for 90% or 95% of the solution and 100% is usually about 10X more work than most realize."

If Claude code can access that data at all, then it might even be 100x as much work Claude, and I might need to disable some permissions to make sure I can see what it is doing. But how is this a real roadblock???

Mehdi Yacoubi's avatar

Good question. Let me be precise;

In the environments we work in, reality always diverges from the model: incomplete data, late updates, humans working off-system, APIs that lie, documents that contradict each other

A production system doesn’t assume the model is right.

It is built to detect divergence, that’s key

When inputs conflict, it blocks actions, escalates to humans, retries, reconciles later, and keeps state consistent. That logic is explicit and engineered

A general coding agent assumes its inputs are truth and optimizes for completion. That’s fine in a sandbox. It breaks in production

That’s where the 90% to 100% gap comes from

Not from model capability.

From building systems that make failure visible, contained, and recoverable.

General agents will wipe out thin wrappers, 100%

But in messy enterprise environments, the product is the system that owns reality when the model is wrong

[insert here] delenda est's avatar

Thanks! That's pretty much what I thought was meant. I just don't see, and this might be entirely on me, how Claude Code with Opus 6 won't just breeze through all that!

For example:

- please list all code specs. Please review the code for each spec and provide me with a list of deltas ranked by impact (potential client impact, number of downstream system affected, etc)

- for each of the items on this list, review the spec document history and the code it relates to: commits and comments. For each one determine which is more recent and assign a confidence level from 1 to 5

- for everything on this list with 4 and above please update the less recent item

- etc.

- Create and run a comprehensive test suite for this program. List all errors.

- For each error in this list, anyway, you get the idea.

I hope I am wrong, I have a lot of friends in tech. But I find it hard to see how this will not come to pass, and in a short time. So I am putting my hopes rather in Jevon's paradox and that the autonomous goal orientation of evolved organisms is not a mere function of scaling (which I'm confident of) and will not be reproduced through the training data and RHLF (less confident).

Mehdi Yacoubi's avatar

I think this is where our contexts really differ

In many of the companies we work with, the environment is not just complex. It’s genuinely messy…

Business rules are often not written anywhere. Sometimes they’re not even fully clear in people’s heads. Different teams apply the “same” rule differently. Exceptions are handled informally. People rely on memory, habits, or a note written on a piece of paper 😅

Before an AI agent can work, you first have to help the company define the rules, align people on them, and sometimes even discover what the real process is. That work is upstream of any model, and it’s not easy to do

Even once an agent is deployed, exceptions never disappear. There are always edge cases where the system has to stop, ask, and let a human decide. Human-in-the-loop is not a temporary crutch. In these environments, it’s part of the design for the foreseeable future

There’s also a large tech catch-up gap. Many of these businesses still operate with:

• paper documents

• handwritten notes

• printed forms filled manually

• Excel files passed around by email

• information written on walls or whiteboards

That reality is extremely far from being fully automated end-to-end by a general agent

Now, to be clear: in companies where processes are already clean, digitized, and well specified, I agree that tools like Claude Code can handle a lot. Probably most of it

But that’s not the segment we choose to work with, purposefully

We deliberately work with companies that are earlier in their digital maturity. In those environments, AI doesn’t replace systems and rules. It forces you to build them first!

That’s why “just let the agent do it” doesn’t work here, and I can’t see it change for at the very least 5-10 years.

implementing an ERP still takes 12-36 months…. many examples like that

Mehdi Yacoubi's avatar

The best way to be convinced by that is just to go spend a few days with a company in Europe or Africa, which I know very well, in the mid-size range from $20 to $100 million revenue, and your belief that they will be able to handle all of it alone with claude in the foreseeable future will basically reach 0%

[insert here] delenda est's avatar

Them no, but as long as they are predominantly working with computers and bearing in mind the artisanal and legacy systems issues upon which I agree, you, yes.

And I expect Claude etc to make the migration from those systems _much_ easier. Worst case, one can create 100 claude sandboxes, and have each one migrate the system with slightly different assumptions, compare outputs to actuals, and iterate. This will be expensive at the moment, so for maybe for $20m revenue African companies that's realistically a 2028 or even 2030 project, but I don't think it is a 2040 project.

Again happy to bet / create a prediction market on this :)

And full disclosure: I am normally a "nothing ever happens" and "this will take about 10 times as long as that" person. I have just been so thoroughly claude-pilled that I might need help.

[insert here] delenda est's avatar

The fatigue part really resonated with me, because I already noticed that I often actually have to be exhausted to work efficiently on some tasks!

And of course that physical exhaustion _feels_ great 😁