essays

Thoughts on AI

LLMs are AI models trained on the vast dataset of the internet. This allows them to predict, based on the earlier word, what is the most likeliest next word.

Sajal Choudhary

02 Jun 2024 — 3 min read

Photo by Alina Grubnyak / Unsplash

I have been thinking about AIs and how they would or already are affecting how I work since quite a long time now. These past couple of weeks there have been a few announcements that put it more in the foreground. I wrote about in the past two NordLetters:

#8 about OpenAI and Gemini's models, and
#9 about Microsoft launching Copilot + PCs

Additionally,

I had come across this excellent podcast in Recommendo #411. It is a must listen. I had some thoughts after this.
Then, I read this post after coming across it on HN. This essay is again an excellent read. It solidified some of the thoughts, and experiences I've had with ChatGPT and others.

And so, here we are.

A little disclaimer before we begin. I have simplified things a lot here. That is by choice. You can and should first read/listen to the articles/podcasts for more context and nuance.

And now that is out of the way, let's begin.

/what are LLMs

LLMs are Large Language Models. Think ChatGPT, Gemini, Copilot (which uses ChatGPT), Anthropic, etc.

They are AI models trained on the vast dataset of the internet. This allows them to predict, based on the earlier word, what is the likeliest next word. This might seem like intelligence. It might seem like you are having an intelligent conversation with the agent. But it is not.

Based on what was said earlier, they guess what should come next.

Most of the time they guess wrong.

/how does training work

You take the entire corpus of knowledge available to you. You scrape the web. You scan the books, newspaper, whatever. You caption the videos. You take vast amounts of compute and then you train your models on that. This allows the model to form relationships between the words. Finally you put this model out into the world.

/how training does not work

LLMs have something called a context window. GPT 4 can handle 25000 words of context for example. This also limits how much stuff you can put in a LLM to summarise.

Whenever you are talking to an LLM, the entire history of the chat becomes what is being sent to the LLM. And the LLM responds based on that history. This context window defines how long you can hold this conversation for. Once you reach this limit, you need to start a new conversation.

LLMs are stateless. They don't remember anything. So once an LLM is released into the world, it cannot learn anything new.

/a bit of philosophy

These current generation of AIs/LLMs are based on neural networks. Neural networks mimic in a simplified way how the human brain works. You train it based on some data.

Either you teach it the relationship between the data.

Or, it figures out the relationship between the data itself.

Then, when you ask it something, it predicts the outcome based on the model it has (is?).

This has worked brilliantly in case of languages.

Language is a patently human construct. We created language. All language. So, these AI models can solve for human language.

But it turns out the same principles cannot be applied to systems which are not designed by the human mind. So it cannot figure out how physical systems work, for example.

/how has it affected my work

I use Copilot at work. I use it mostly as how I was using Google/StackOverflow earlier. As a thing I can reference. Copilot, handily, shows a list of links from which it generated the output. Seldom has it given me anything I can use directly. You know copy-paste. Most of the times I click on the link and read further from the source.

It feels better to use than Googling. Because it understands complex queries. It can generate a first draft of a query.

Additionally, it is very good at generating first drafts for emails.

But again, just the first drafts. The scaffoldings. And then you have to read and edit and do the work.

/to summarise

The dream is to have an assistant, which knows everything but is customised for you.

I am not sure if the Transformer based models that are in use today will go all the way to AGI. The models have improved a lot since ChatGPT came out and took the industry by storm. But maybe there's a limit to how good these models can be. Maybe someone will invent something new. Or there will be a breakthrough somewhere.

At present, these models hallucinate a lot, and lie with confidence. There needs to be a human present. Someone with experience and know-how. To guide the AI. To filter the output. To know what is right.

Let's see how long it takes to get to her.