Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult by Simon Willison
Anthropic released Claude Opus 4.5 this morning, which they call "best model in the world for coding, agents, and computer use". This is their attempt to retake the crown for best coding model after significant challenges from OpenAI's GPT-5.1-Codex-Max and Google's Gemini 3, both released within the past week!
I did not have preview access to Opus4.5. Nor do I need it for the things I generally use LLMs for.
With the base text only models, I guess there is no more step change now. They may show benchmarks that they are the best model for coding, but it’s single decimal points. It does not really matter.
What matters more is the features they add - like when Anthropic added the skills feature. What you can do is more important. And yes I still believe it will be human in the loop situation. Will we be centaurs of reverse-centaurs is an open question.
Work at a natural pace
Obsess over quality
Do fewer things
Code like a surgeon by Geoffrey Litt
A lot of the “secondary” tasks are “grunt work”, not the most intellectually fulfilling or creative part of the work. I have a strong preference for teams where everyone shares the grunt work; I hate the idea of giving all the grunt work to some lower-status members of the team. Yes, junior members will often have more grunt work, but they should also be given many interesting tasks to help them grow.
With AI this concern completely disappears! Now I can happily delegate pure grunt work. And the 24/7 availability is a big deal. I would never call a human intern at 11pm and tell them to have a research report on some code ready by 7am… but here I am, commanding my agent to do just that!
The idea being AI works on the secondary stuff and keep it ready while you work on the primary stuff.
I found the above idea important as well, to rotate grunt work among the full team. I have had this in the past where senior members would not work on tickets, etc.
We try to make sure everyone works on everything.
Better than average
What to automate
A little inefficiency is good
How to handle stress at work
How to work with your boss
Majority of the organisations are not seeing any monetary benefits from deploying AI
Art is a project. Connection, community building, counseling–all of these are projects. When our work is project-focused, we’re not a cog in a vast machine. Instead, we’re a contributor with agency, someone who is working with and for the agenda we’ve agreed to.
The Bad bosses try to have it both ways. They are stingy with agency, authority and compensation, and insatiable when it comes to effort. But smart leaders understand that given the chance, most of us would love the chance to be seen, to contribute and to be part of something.
Be a hybrid
Have expertise in two or more things
About reflections on writing
From people who have been doing this for many years
Why work?
The myths of work
The last work left in this world
Train the models!
How to complain
Or, how to make your boss's life easier
About glue work
Is glue work bad? Depends.
Two lessons on work
Show your work + Ask for help
Types of workers in an organisation
Or, evolution of the type of worker you are
Trusting people to do the work they were hired to do
Curb micro-management