Things We Learned about LLMs in 2024

Simon Willison has a detailed overview of major changes in large-language models from 2024 that I took time to read today. I'm a skeptic, especially because of the copyright and environmental issues that come with creating and running these services at scale. Some items that jumped out:

The really impressive thing about DeepSeek v3 is the training cost. The model was trained on 2,788,000 H800 GPU hours at an estimated cost of $5,576,000. Llama 3.1 405B trained 30,840,000 GPU hours—11x that used by DeepSeek v3, for a model that benchmarks slightly worse.

I don't pretend to understand the complexities of the models and the relationships they're trained to form, but the fact that powerful models can be trained for a reasonable amount (compared to OpenAI raising 6.6 billion dollars to do some of the same work) is interesting. Costs are down, which means that electric use is also going down, which is good. Simon makes the same observation in his section on environmental impacts:

A welcome result of the increased efficiency of the models—both the hosted ones and the ones I can run locally—is that the energy usage and environmental impact of running a prompt has dropped enormously over the past couple of years.

An interesting point of comparison here could be the way railways rolled out around the world in the 1800s. Constructing these required enormous investments and had a massive environmental impact, and many of the lines that were built turned out to be unnecessary—sometimes multiple lines from different companies serving the exact same routes!

The resulting bubbles contributed to several financial crashes, see Wikipedia for Panic of 1873, Panic of 1893, Panic of 1901 and the UK’s Railway Mania. They left us with a lot of useful infrastructure and a great deal of bankruptcies and environmental damage.

I think the last paragraph is where I'm still sticking. Companies are plowing ahead to give us AI in things we don't want it in at the cost of A) an economic bubble and B) massive environmental damage. I'm seeing economic impacts near home with datacenters being built at massive tax discounts which benefits the corporations at the expense of residents. There will be bills to pay and right now it doesn't look like it'll be corporations.

I'm not going to start using an LLM daily, but reading Simon over the last year is helping me think critically. I dabbled with self-hosted models, which was interesting but ultimately not really worth the effort on my lower-end machine. Maybe that will change as systems become more and more optimized for more general use.

If you have time, I would encourage you to go read the entire post.

[Original link]

Published: 2025-01-05 | Category: Links | Tags: llm, ai, simon willison

Share this post

Things We Learned about LLMs in 2024

Comments

Get in touch

Archives

Categories