Model intelligence is no longer the constraint for automation

(latentintent.substack.com)

42 points | by drivian 13 hours ago

9 comments

mrlongroots 2 hours ago
I very much disagree. To attempt a proof by contradiction:
Let us assume that the author's premise is correct, and LLMs are plenty powerful given the right context. Can an LLM recognize the context deficit and frame the right questions to ask?
They can not: LLMs have no ability to understand when to stop and ask for directions. They routinely produce contradictions, fail simple tasks like counting the letters in a word etc. etc. They can not even reliably execute my "ok modify this text in canvas" vs "leave canvas alone, provide suggestions in chat, apply an edit once approved" instructions.
[-]
- themanmaran 4 minutes ago
  This depends on whether you mean LLMs in the sense of single shot, or LLMs + software built around it. I think a lot of people conflate the two.
  In our application e use a multi-step check_knowledge_base workflow before and after each LLM request. Pretty much, make a separate LLM request to check the query against the existing context to see if more info is needed, and a second check after generation to see if output text exceeded it's knowledge base.
  And the results are really good. Now coding agents in your example are definitely stepwise more complex, but the same guardrails can apply.
- bobbylarrybobby 1 hour ago
  Claude routinely stops and asks me clarifying questions before continuing, especially when the given extended thinking or doing research.
- beering 1 hour ago
  It feels crazy to keep arguing about LLMs being able to do this or that, but not mention the specific model? The post author only mentions the IMO gold-medal model. And your post could be about anything. Am I to believe that the two of you are talking about the same thing? This discussion is not useful if that’s not the case.
thorum 4 hours ago
This article is insightful, but I blinked when I saw the headline “Reducing the human bottleneck” used without any apparent irony.
At some point we should probably take a step back and ask “Why do we want to solve this problem?” Is a world where AI systems are highly intelligent tools, but humans are needed to manage the high level complexity of the real world… supposed to be a disappointing outcome?
[-]
- fnordpiglet 1 hour ago
  Assuming you buy the idea of a post scarcity society and assuming we can separate our long ingrained notion that spending your existence in toil to survive is a moral imperative and not working is deserving of punishment if not death, I personally look forward to a time we can get off the hamster wheel. Most buttons that get pushed by people are buttons not worth spending your existence pushing. This includes an awful lot of “knowledge work,” which is often better paid but more insidious in that it requires not just your presence but capturing your entire attention and mind inside and outside work. I would also be hopeful that fertility rates would decline and there would simply be far fewer humans.
  In Asimov’s robots stories the spacers are long lived and low population because robots do most everything. He presents this as a dead end, that stops us from conquering the galaxy. This to me sounds like a feature not a bug. I think human existence could be quite good with large scale automation, fewer people, and less suffering due to the necessity for everyone to be employed.
  Note I recognize you’re not saying exactly the same thing as I’m saying. I think humans will never cede full executive control by choice at some level. But I suspect, sadly, power will be confined to those few who do get to manage the high level complexity of the real world.
  [-]
  - nradov 1 hour ago
    We will never have a post scarcity society. Automation can make certain foodstuffs and manufactured goods somewhat cheaper but the things that people really want will always be in short supply, for example real estate in geographically favorable areas.
    [-]
    - lll-o-lll 1 hour ago
      With a stable population, post scarcity is surely possible technically. Just invest resource into improving everything that already exists.
      I also agree that we will never have a post scarcity society; but this is more about humanity than technology.
      [-]
      - parineum 53 minutes ago
        There will always be scarcity for goods whose value is derived from their scarcity.
        Maybe food won't be scarce (we wre actually very close to that) and shelter may not be scarce but, even if you invent the replicator, there will still be things that are bespoke.
    - alanbernstein 57 minutes ago
      I have never understood "post scarcity" to mean the end of ALL scarcity, which is essentially impossible by definition.
      Relative to 500 years ago, we have already nearly achieved post-scarcity for a few types of items, like basic clothing.
      It seems this is yet another concept for which we need to adjust our understanding from binary to a spectrum, as we find our society advancing along the spectrum, in at least some aspects.
    - GauntletWizard 43 minutes ago
      We can automate plenty in physiological needs, and in fact have already. There's plenty of food and housing for everyone to have them, but a bunch of people will immediately destroy them if provided with such. I don't think "Dispose of a full house every 3 months" will ever be practical, but we might be able to "solve" physiological needs.
      Safety needs might be possible to solve. Totalitarian states with ubiquitous panopticons can leave you "safe" in a crime sense, and AI gaslighting and happy pills will make you "feel" safe.
      Love and belonging we have "Plenty" of already - If you're looking for your people, you can find them. Plenty aren't willing to look.
      But once you get up to Esteem, it all falls apart. Reputation and Respect are not scalable. There will always be a limited quantity of being "The Best" at anything, and many are not willing to be "The Best" within tight constraints; There's always competition. You can plausibly say that this category is inherently competitive. There's no respect without disrespect. There's no best if there's no second best, and second best is first loser. So long as humans interact with each other - So long as we're not each locked in our own private shards of reality - There will be competition, and there will be those that fall short.
      Self Actualization is almost irrelevant at this point. It falls into exactly the same as the above. You can simulate a reality where someone is always the best at whatever they decide to so, but I think it will inherently feel hollow. Agent Smith said it best: https://youtu.be/9Qs3GlNZMhY?t=23
  - djrj477dhsnv 54 minutes ago
    Do you really want to live in this "post scarcity" world? With no effort required to meet your needs and desires, what motivation will you have to do anything?
    Kaczynski's warnings seem more apt with every year that passes.
stephc_int13 27 minutes ago
This is because we tend to use a human-centric reference to evaluate the difficulty of a task : playing chess at grand master level is a lot harder than folding laundry, except that it is the opposite, and this weird bias is well known as Moravec’s Paradox.
Intelligence is the bottleneck, but not the kind of intelligence you need to solve puzzles.
threecheese 5 hours ago
Author IMO correctly recognizes that access to context needs to scale (“latent intent” which I love), but I’m not sure I’m convinced that current models will be effective even if given access to all priors needed for a complex task. The ability to discriminate valuable from extraneous context will need to scale with size of available context, it will be pulling needles from haystacks that aren’t straightforward similarity. I think we will need to steer these things.
[-]
- jondwillis 3 hours ago
  We’re already steering, during pre-training (e.g. reasoning RLHF), as well as test-time (structured outputs, tool calls, agents…)
Kuinox 4 hours ago
It's specific model that run for maths. GPT-5 and Gemini 2.5 still cannot compute an arbitrary length sum of whole number without a calculator. I have a proceduraly generated benchmark of basic operations, LLMs gets better at it with time, but they cant still solve basic maths or logic problems.
BTW I'm open to selling it, my email is on my hn profile.
[-]
- HappMacDonald 3 hours ago
  Have you ever seen what these arbitrary length whole numbers look like once they are tokenized? They don't break down to one-digit-per-token, and the same long number has no guarantee of breaking down into tokens the same way every time it is encountered.
  But the algorithms they teach humans in school to do long-hand arithmetic (which are liable to be the only algorithms demonstrated in the training data) require a single unique numeral for every digit.
  This is the same source as the problem of counting "R"'s in "Strawberry".
  [-]
  - Kuinox 3 hours ago
    That's was the initial thinking of anyone which I explained this, it was also my speculation, but when you look in it's reasoning where it do the mistake, it correctly extract the digits out of the input token. As I say in another comments, most of the mistakes her happen when it recopy the answer it calculated from the summation table. You can avoid tokenization issue when it extract the answer by making it output an array of digits of the answer, it will still fail at simply recopying the correct digit.
- gjm11 3 hours ago
  > GPT-5 and Gemini 2.5 still cannot compute an arbitrary length sum of whole number without a calculator.
  Neither can many humans, including some very smart ones. Even those who can will usually choose to use a calculator (or spreadsheet or whatever) rather than doing the arithmetic themselves.
  [-]
  - mathiaspoint 3 hours ago
    Right but most (competent) humans will reliably use a calculator. It's difficult to get these to reliably make lots of tool calls like that.
    [-]
    - Kuinox 3 hours ago
      I do think that competent humans can solve any arbitrary sum of 2 whole number with a pen, paper and time. LLMs can't do that.
      [-]
      - rileymat2 28 minutes ago
        That’s interesting, you added a tool. You did not just leave it to the human alone.
  - simoncion 42 minutes ago
    > Neither can many humans...
    1) GPT-5 is advertised as "PhD-level intelligence". So, I take OpenAI (and anyone else who advertises their bots with language like this) at their word about the bot's capabilities and constrain the set of humans I use for comparison to those who also have PhD-level intelligence.
    2) Any human who has been introduced to long addition will absolutely be able to compute the sum of two whole numbers of arbitrary length. You may have to provide them a sufficiently strong incentive to actually do it long-hand, but they absolutely are capable because the method is not difficult. I'm fairly certain that most adult humans [0] (regardless of whether or not they have PhD-level intelligence) find the method to be trivial, if tedious.
    [0] And many human children!
- bt1a 3 hours ago
  i'd wager your benchmark problems require cumbersome arithmetic or are poorly worded / inadequately described. or, you're mislabeling them as basic math and logic (a domain within which LLMs have proven their strengths!)
  i only call this out because you're selling it and don't hypothesize* on why they fail your simple problems. i suppose an easily aced bench wouldn't be very marketable
  [-]
  - Kuinox 3 hours ago
    This is a simple sum of 2 whole number, the number are simply big.
    Most of the time they make a correct summation table but fail to copy correctly the sum result into a final result. That is not a tokenisation problem (you can change the output format to make sure of it). I have a separated benchmark that test specifically this, when the input is too large, the LLMs fails to accuratly copy the correct token. I suppose the positional embedding, are not perfectly learned and it sometimes cause a mistake.
    The prompt is quite short, it use structured output, and I can generate a nice graph of % of good response accross difficulity of the question (which is just the total digit count of the input numbers.
    LLMs have 100% success rate on theses sum until they reach a frontier, past that their accuracy collapse at various speed depending of the model.
    [-]
    - energy123 1 hour ago
      Have you tried greedy decoding (temp 0) in aistudio?
      The temp 0.7-1.0 defaults are not designed for reconstructing context with perfect accuracy.
neom 5 hours ago
Same same human problems. Regardless of their inherent intelligence...humans perform well only when given decent context and clear specifications/data. If you place a brilliant executive into a scenario without meaningful context.... an unfamiliar board meeting where they have no idea of the company’s history, prior strategic discussions, current issues, personel dynamics...expectations..etc etc, they will struggle just as a model does surly. They may still manage something reasonably insightful, leveraging general priors, common sense, and inferential reasoning... their performance will never match their potential had they been fully informed of all context and clearly data/objectives. I think context is the primary primitive property of intelligent systems in general?
[-]
- mrlongroots 2 hours ago
  > they will struggle just as a model does surly
  A human will struggle, but they will recognize the things they need to know, and seek out people who may have the relevant information. If asked "how are things going" they will reliably be able to say "badly, I don't have anything I need".
- getnormality 2 hours ago
  This comparison may make sense on short-horizon tasks for which there is no possibility of preparation. Given some weeks to prepare, a good human executive will get the context, while today's best AI systems will completely fail to do so.
- simoncion 30 minutes ago
  > I think context is the primary primitive property of intelligent systems in general?
  What do you mean by 'context' in this context? As written, I believe that I could knock down your claim by pointing out that there exist humans who would do catastrophically poorly at a task that other humans would excel at, even if both humans have been fully informed of all of the same context.
  [-]
  - simoncion 17 minutes ago
    To clarify what I'm thinking here by analogy...
    Imagine that someone said:
    > I think wood is the primary primitive property of sawmills in general.
    An obvious observation would be that it is dreadfully difficult to produce the expected product of a sawmill without tools to cut or sand or otherwise shape the wood into the desired shapes.
    One might also notice that while a sawmill with no wood to work on will not produce any output, a sawmill with wood but without woodworking tools is vanishingly unlikely to produce any output... and any it does manage to produce is not going to be good enough for any real industrial purpose.
etler 3 hours ago
I think the framing of these models are being "intelligent" is not the right way to go. They've gotten better at recall and association.
They can recall prior reasoning from text they are trained on which allows them to handle complex tasks that have been solved before, but when working on complex, novel, or nuanced tasks there is no high quality relevant training data to recall.
Intelligence has always been a fraught word to define and I don't think what LLMs do is the right attribute for defining it.
I agree with a good deal of the article but because it keeps using loaded works like "intelligent" and "smarter", it has a hard time explaining what's missing.
jefftitan 4 hours ago
Providing more context is difficult for a number of reasons. If you do it RAG style you need to know which context is relevant. LLMs are notorious for knowing that a factor is relevant if directly asked about that factor, but not bringing it up if it's implicit. In business things like people's feelings on things, historical business dealings, relevance to trending news can all be factors. If you fine tune... well... there have been articles recently about fine tuning on specific domains causing overall misalignment. The more you fine tune, the riskier.
miller24 4 hours ago
It 100% is still intelligence. GPT-5 with Thinking still can't win at tic-tac-toe.
[-]
- storus 2 hours ago
  What if it's the desired outcome? Become more human-like (i.e. dumb) to make us feel better about ourselves? NI beats AI again!
  [-]
  - simoncion 26 minutes ago
    > What if it's the desired outcome?
    To be able to reason about the rules of a game so trivial that it has been solved for ages, so that it can figure out enough strategy to never not bring the game to a draw (if played against one who is playing to not lose), or a win (if played against someone who is leaving the bot an opening to win), as mentioned in [0] and probably a squillion other places?
    Duh?
    [0] <https://news.ycombinator.com/item?id=44919138>
- dismalaf 4 hours ago
  Tic-tac-toe is solved and a draw can be forced 100% of the time...
  [-]
  - miller24 3 hours ago
    That's exactly why it's so crazy that GPT-5 with Thinking still loses...
    [-]
    - dismalaf 3 hours ago
      Ah, your first comment said "can't win". Which is different than "always loses".
      [-]
      - miller24 2 hours ago
        Ah okay, well it will still lose some of the time, which is surprising. And it will lose in surprising way, e.g., thinking for 14 seconds and then making an extremely basic mistake like not seeing it already have two on a row and could just win.
  - HappMacDonald 3 hours ago
    .. and you can "program" a neural network — so simple it can be implemented by boxes full of marbles and simple rules about how to interact with the boxes — to learn by playing tictactoe until it always plays perfect games. This is frequently chosen as a lesson in how neural network training even works.
    But I have a different challenge for you: train a human to play tictactoe, but never allow them to see the game visually, even in examples. You have to train them to play only by spoken words.
    Point being that tictactoe is a visual game and when you're only teaching a model to learn from the vast sea of stream-of-tokens (similar to stream-of-phonemes) language, visual games like this aren't going to be well covered in the training set, nor is it going to be easy to generalize to playing them.
    [-]
    - miller24 3 hours ago
      Well whatever your story is, I know with near certainty that no amount of scaffolding is going to get you from an LLM that can't figure out tic-tac-toe (but will confidently make bad moves) to something that can replace a human in an economically important job.