Micro posts tagged "llms" - Sajal Choudhary

Also posted to:

Highlights from my conversation about agentic engineering on Lenny’s Podcast by Simon Willison

People talk about how important it is not to interrupt your coders. Your coders need to have solid two to four hour blocks of uninterrupted work so they can spin up their mental model and churn out the code. That's changed completely. My programming work, I need two minutes every now and then to prompt my agent about what to do next. And then I can do the other stuff and I can go back. I'm much more interruptible than I used to be.

Kind of goes against the whole deep work principle. Times sure are changing.

Also posted to:

Giving LLMs a personality is just good engineering by Sean

In other words, human-like personalities are not imposed on AI tools as some kind of marketing ploy or philosophical mistake. Those personalities are the medium via which the language model can become useful at all. This is why it’s surprisingly tricky to “just” change a language model’s personality or opinions: because you’re navigating through the near-infinite manifold of the base model. You may be able to control which direction you go, but you can’t control what you find there3.

Also posted to:

Anthropic accuses DeepSeek and other Chinese firms of using Claude to train their AI by Emma Roth

DeepSeek, which caused a stir in the AI industry for its powerful but more efficient models, held over 150,000 exchanges with Claude and targeted its reasoning capabilities, according to Anthropic. It’s also accused of using Claude to generate “censorship-safe alternatives to politically sensitive questions about dissidents, party leaders, or authoritarianism.” In a letter to lawmakers last week, OpenAI similarly accused DeepSeek of “ongoing efforts to free-ride on the capabilities developed by OpenAI and other U.S. frontier labs.”

I remember there being similar comments being made when Deepseek had first come out. But hey, you did not ask for permission when you trained on the world’s data.

All this fear mongering and for what?

Also, I’m in a weird position re: Anthropic. I use the Pro plan and am their customer. With the way things are you are bound to feel some sense of loyalty toward the company. You may feel the need to defend them. They’re better than OpenAI!

Not really.

The way these companies have built their tools is generally shitty. The products are useful though. Make of that what you will. I had read recently a post by Cory Doctorow which talked about this.

Refusing to use a technology because the people who developed it were indefensible creeps is a self-owning dead-end. You know what's better than refusing to use a technology because you hate its creators? Seizing that technology and making it your own. Don't like the fact that a convicted monopolist has a death-grip on networking? Steal its protocol, release a free software version of it, and leave it in your dust:

That’s where I stand. My dream is to be able to run these tools locally. I don’t want to send my data out to these companies.

Also posted to:

Write-Only Code | Heavybit

Much as humans no longer shell into individual production servers, I believe we will develop similar practices around unread code. Over time, we will treat “humans had to read this to be comfortable” as a smell in our code generation pipeline, or as an explicit, expensive trade-off reserved for truly mission-critical subsystems. A natural outcome of this shift is a “code reading coverage” metric, tracked much like test coverage. What fraction of production code has actually been read by humans, partly as a safety signal, and partly as a metric teams deliberately and safely work to drive downward toward an asymptote.

Also posted to:

The Limits of AI by Hugh Howey

The limit holding us back will never be the limits of AI, but rather the limits of our biology. Can we stop hurting ourselves and others? Can we expand our circles of empathy until they include every living thing and even most non-living things. Can we be satisfied with less than our neighbors if it means we all have the basic necessities of life? I’m an atheist, and the 10 commandments start off with some very weak sauce about fearing no other god and what not to believe, but even I can see that most of our problems would be solved if we lived by the rest of what’s there. No lying. No jealousy. No killing. We’ve had all the answers for thousands of years. We still can’t abide by them.

Also posted to:

Giving University Exams in the Age of Chatbots

Like every generation of students, there are good students, bad students and very brilliant students. It will always be the case, people evolve (I was, myself, not a very good student). Chatbots don’t change anything regarding that. Like every new technology, smart young people are very critical and, by defintion, smart about how they use it.

Interesting read.

Also posted to:

A Software Library with No Code by Drew Breunig

the whenwords library contains no code. Instead, whenwords contains specs and tests, specifically:

SPEC.md: A detailed description of how the library should behave and how it should be implemented.

tests.yaml: A list of language-agnostic test cases, defined as input/output pairs, that any implementation must pass.

INSTALL.md: Instructions for building whenwords, for you, the human.

Drew goes on to list a bunch of scenarios when this will not be useful. But it’s an interesting way to look at libraries.

Also posted to:

2025: The year in LLMs by Simon Willison

My tools.simonwillison.net collection of HTML+JavaScript tools was mostly built this way: I would have an idea for a small project, prompt Claude Artifacts or ChatGPT or (more recently) Claude Code via their respective iPhone apps, then either copy the result and paste it into GitHub's web editor or wait for a PR to be created that I could then review and merge in Mobile Safari.

I have been doing this a lot this past year as well. Most of the site was done this way. First using cursor, then Codex and finally using Claude.

Also posted to:

2025 LLM Year in Review by Andrej Karpathy

LLMs are emerging as a new kind of intelligence, simultaneously a lot smarter than I expected and a lot dumber than I expected. In any case they are extremely useful and I don't think the industry has realized anywhere near 10% of their potential even at present capability. Meanwhile, there are so many ideas to try and conceptually the field feels wide open.

A nice read, if a little longer. Perhaps the reason why I had not gotten to it yet.

Also posted to:

Your job is to deliver code you have proven to work by Simon Willison

A computer can never be held accountable. That's your job as the human in the loop.

Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That's no longer valuable. What's valuable is contributing code that is proven to work.

I liked the way Simon said it - your job is to deliver code you have proven to work.

Also posted to:

Three Years from GPT-3 to Gemini 3 by Ethan Mollick

Three years ago, we were impressed that a machine could write a poem about otters. Less than 1,000 days later, I am debating statistical methodology with an agent that built its own research environment. The era of the chatbot is turning into the era of the digital coworker. To be very clear, Gemini 3 isn’t perfect, and it still needs a manager who can guide and check it. But it suggests that “human in the loop” is evolving from “human who fixes AI mistakes” to “human who directs AI work.” And that may be the biggest change since the release of ChatGPT.

Google announced Gemini 3.0 which takes it closer to the state of the art with respect to other models. They claim it’s better than the rest. In this field, that’s a little subjective.

It has given me an interesting headache though. I was planning to take yearly subscription of Claude. I will test this out instead now.

Also posted to:

DeepSeek may have found a new way to improve AI’s ability to remember by Caiwei Chen

Instead of storing words as tokens, its system packs written information into image form, almost as if it’s taking a picture of pages from a book. This allows the model to retain nearly the same information while using far fewer tokens, the researchers found.

It also uses older or less critical info in slightly blurred pictures.

A picture is worth a thousand words after all.