This is exactly the argument in Brooks' No Silver Bullet. I still believe that it holds. However, my observation is that many people don't really need that level of details. When one prompts an AI to "write me a to-do list app", what they really mean is that "write me a to-do list app that is better that I have imagined so far", which does not really require detailed spec.
> When one prompts an AI to "write me a to-do list app", what they really mean is that "write me a to-do list app that is better that I have imagined so far", which does not really require detailed spec.
If someone was making a serious request for a to-do list app, they presumably want it to do something different from or better than the dozens of to-do list apps that are already out there. Which would require them to somehow explain what that something was, assuming it's even possible.
The cognitive dissonance comes from the tension between the-spec-as-management-artifact vs the-spec-as-engineering-artifact. Author is right that advocates are selling the first but second is the only one which works.
For a manager, the spec exists in order to create a delgation ticket, something you assign to someone and done.
But for a builder, it exists as a thinking tool that evolves with the code to sharpen the understanding/thinking.
I also think, that some builders are being fooled into thinking like managers because ease, but they figure it out pretty quickly.
I think it's only a matter of time before people start trying to optimize model performance and token usage by creating their own more technical dialect of English (LLMSpeak or something). It will reduce both ambiguity and token usage by using a highly compressed vocabulary, where very precise concepts are packed into single words (monads are just monoids in the category of endofunctors, what's the problem?). Grammatically, expect things like the Oxford comma to emerge that reduce ambiguity and rounds of back-and-forth clarification with the agent.
The uninitiated can continue trying to clumsily refer to the same concepts, but with 100x the tokens, as they lack the same level of precision in their prompting. Anyone wanting to maximize their LLM productivity will start speaking in this unambiguous, highly information-dense dialect that optimizes their token usage and LLM spend...
Unless you're training your own model, wouldn't you have to send this dialect in your context all the time? Since the model is trained on all the human language text of the internet, not on your specialized one? At which point you need to use human language to define it anyway? So perhaps you could express certain things with less ambiguity once you define that, but it seems like your token usage will have to carry around that spec.
Codex already has such a language. The specs it’s been writing for me are full of “dedupe”, “catch-up”, and I often need to feedback that it should use more verbose language. Some of that has been creeping into my lingo already. A colleague of mine suddenly says the word “today” all the time, and I suspect that’s because he uses Claude a lot. Today, as in, current state of the code.
It was mentioned somewhere else on hn today, but why do I care about token usage? I prompt AI day and night for coding and other stuff via claude code max 200 and mistral; haven't had issues for many months now.
E.g., those SKILL.md files that are tens of kilobytes long, as if being exhaustively verbose and rambling will somehow make the LLM smarter. (It won't, it will just dilute the context with irrelevant stuff.)
The reference spec is a terrible spec. There are also many alternative definitions as to what a spec actually is. You show me the 70K starred'd GitHub spec library and all the jargon that comes with it + 30 different commands it installs. I'll say, "Yeah, that's probably not the way." However, owning a process of constructing proper context steering, creating a tactical plan, and saying "go" is a different thing.
There are two ways to do AI today.
1. You dog walk it, you throw a ball exactly where you want it, knowing the AI will fetch and come back. (You own the code. You tell the AI to go do very explicit tasks. You can rely on this)
2. You whisper to the dog to go walk itself and cross your fingers hoping it comes back; if you can do this for one dog, you can do this for ten and you can quickly become the greatest dog walker overnight. (You're trying different spec libraries. You're trying GasTown. You're failing, you're faltering); however, you are continuously optimizing every day. - a consistent narrative is leaning into putting thoughtful effort into what's about to happen (some call this spec-driven development, others are anchoring onto "intent-driven", some just "plan mode").
This Deepmind paper[1] has aged well. Paradoxically, the more capable AI gets, the more important specification becomes (or costly lack of it becomes), and the more time you spend planning and iterating on intent before you let an agent act. Tokens and code might be cheap, but we are moving closer and closer to where we let AI operate overnight or where we want agents to operate within our actual environments in real time. The tokens or actions in these instances, have higher costs. It's idealistic to think AI will be omniscient and therefore able to turn a five-word prompt into a rocket ship.
Thus, an increasingly important question around this is still worth finding answers to: can a clear statement of intent, combined with growing agent capability and tighter feedback loops, outperform the traditional hands-on coding cycle? It's easy to hypothesize yes, given model capabilities - I don't think it's quite yet clear what exactly this looks like.
Natural language is fluid and ambiguous while code is rigid and deterministic. Spec-driven development appears to be the best of both worlds. But really, it is the worst of both. LLMs are language models - their breakthrough capability is handling natural language. Code is meant to be unambiguous and deterministic. A spec is neither fluid nor deterministic.
A corollary of this statement is that code without a spec is not code. No /s, I think that is true - code without a spec certainly does something, but it is, by the absence of a detailed spec, undefined behavior.
I am developing my own programming language, but I have no specification written for it. When people tell me that I need a specification, I reply that I already have one - the source code of the language compiler.
You are not wrong. But, they are not wrong either.
I feel like if you’re designing a language, the activity of producing the spec, which involves the grammar etc., would allow you to design unencumbered by whether your design is easy to implement. Or whether it’s a good fit for the language you are implementing the compiler with.
The OP also correctly identifies that thoughtful design takes a back seat in favor of action when we start writing the code.
Such amazing writing. And clear articulation of what I’ve been struggling to put into words - almost having to endure a mental mute state. I keep thinking it’s obvious, but it’s not, and this article explains it very elegantly.
I also enjoyed the writing style so much that I felt bad for myself for not getting to read this kind of writing enough. We are drowning in slop. We all deserve better!
I tried myself to make a language over an agent's prompt. This programing language is interpreted in real time, and parts of it are deterministic and parts are processed by an LLM. It's possible, but I think that it's hard to code anything in such a language. This is because when we think of code we make associations that the LLM doesn't make and we handle data that the LLM might ignore entirely. Worse, the LLM understands certain words differently than us and the LLM has limited expressions because of it's limits in true reasoning (LLMs can only express a limited number of ideas, thus a limited number of correct outputs).
I agree to this, with the caveat that a standard is not a spec. E.g.: The C or C++ standards, they're somewhat detailed, but even if they were to be a lot more detailed, becoming 'code' would defeat the purpose (if 'code' means a deterministic turing machine?), because it won't allow for logic that is dependent on the implementer ("implementation defined behavior" and "undefined behavior" in C parlance). whereas a specification's whole point is to enforce conformance of implementations to specific parameters.
Even programs are just specifications by that standard. When you write a program in C, you are describing what an abstract C machine can do. When the C compiler turns that into a program it is free to do so in any way that is consistent with the abstract C machine.
If you look at what implementions modern compilers actually come up with, they often look quite different from what you would expect from the source code
Compilers are converters. There’s the abstract machine specified by the standard and there’s the real machine where the program will run (and there can be some layer in between). So compilers takes your program (which assumes the abstract machine) and builds the link between the abstract and the real.
If your program was a DSL for steering, the abstract machine will be the idea of steering wheel, while the machine could be a car without one. So a compiler would build the steering wheel, optionally adding power steering (optimization), and then tack the apparatus to steer for the given route.
IMHO, LLMs are better at Python and SQL than Haskell because Python and SQL syntax mirrors more aspects of human language. Whereas Haskell syntax reads more like a math equation. These are Large _Language_ Models so naturally intelligence learned from non-code sources transfers better to more human like programming languages. Math equations assume the reader has context not included in the written down part for what the symbols mean.
They are heavily post-trained on code and math these days. I don‘t think we can infer that much about their behavior from just the pre-training dataset anymore
I suspect your probably right, but just for completeness, one could also make the argument that LLMs are better at writing Haskell because they are overfit to natural language and Haskell would avoid a lot of the overfit spaces and thus would generalize better. In other words, less baggage.
I would guess they’re better at python and SQL than Haskell because the available training data for python and SQL is orders of magnitude more than Haskell.
Is that true though? If I define a category or range in formal language, I’m still ambiguous on the exact value. Dealing with randomness is even worse (eg input in random order), and can’t be prevented in real world programs.
I have a lot of fun making requirements documents for Claude. I use an iterative process until Claude can not suggest any more improvements or clarifications.
I agree with the overall structure of the argument but I like to think of specifications like polynomial equations defining some set of zeroes. Specifications are not really code but a good specification will cut out a definable subset of expected behaviors that can then be further refined with an executable implementation. For example, if a specification calls for a lock-free queue then there are any number of potential implementations w/ different trade-offs that I would not expect to be in the specification.
I kind of feel like the specification would call for an idealized lock free queue. Whereas the code would generate a good enough approximation of one that can be run on real hardware.
To invert your polynomial analogy, the specification might call for a sine wave, your code will generate a Taylor series approximation that is computable.
This articles ignores that AI agents have intelligence which means that they can figure out unspecified parts of the spec on their own. There is a lot of the design of software that I don't care about and I'm fine letting AI pick a reasonable approach.
These algorithms don't have intelligence, they just regurgitate human intelligence that was in their training data. That also goes the other way - they can't produce intelligence that wasn't represented in their training input.
Exactly. The real speed up from AI will come when we can under specify a system and the AI uses its intelligence to make good choices on the parts we left out. If you have to spec something out with zero ambiguity you’re basically just coding in English. I suspect current ideas around formal/detailed spec driven development will be abandoned in a couple years when models are significantly better.
This is humans have traditionally done with greenfield systems. No choices have been made yet, they're all cheap decisions.
The difficulty has always arisen when the lines of code pile up AND users start requesting other things AND it is important not to break the "unintended behavior" parts of the system that arose from those initial decisions.
It would take either a sea-change in how agents work (think absorbing the whole codebase in the context window and understanding it at the level required to anticipate any surprising edge case consequences of a change, instead of doing think-search-read-think-search-read loops) or several more orders of magnitude of speed (to exhaustively chase down the huge number of combinations of logic paths+state that systems end up playing with) to get around that problem.
So yeah, hobby projects are a million times easier, as is bootstrapping larger projects. But for business works, deterministic behaviors and consistent specs are important.
> in a couple years when models are significantly better.
They aren't significantly better now than a couple of years ago. So it doesn't seem likely they will be significantly better in a couple of years than they are now.
For now I would be happy if it just explored the problem space and identify the choices to be made and filtered down to the non-obvious and/or more opinionated ones. Bundle these together and ask the me all at once and then it is off to the races.
Exactly, I find that type of article too dismissive. Like, we know we don't have to write the full syntax of a loop when we write the spec "find the object in the list", and we might even not write this spec because that part is obvious to any human (hence to an LLM too)
This is exactly the argument in Brooks' No Silver Bullet. I still believe that it holds. However, my observation is that many people don't really need that level of details. When one prompts an AI to "write me a to-do list app", what they really mean is that "write me a to-do list app that is better that I have imagined so far", which does not really require detailed spec.
If someone was making a serious request for a to-do list app, they presumably want it to do something different from or better than the dozens of to-do list apps that are already out there. Which would require them to somehow explain what that something was, assuming it's even possible.
I guess it depends on whether or not we want to make money, or otherwise, compete against others.
For a manager, the spec exists in order to create a delgation ticket, something you assign to someone and done. But for a builder, it exists as a thinking tool that evolves with the code to sharpen the understanding/thinking.
I also think, that some builders are being fooled into thinking like managers because ease, but they figure it out pretty quickly.
The uninitiated can continue trying to clumsily refer to the same concepts, but with 100x the tokens, as they lack the same level of precision in their prompting. Anyone wanting to maximize their LLM productivity will start speaking in this unambiguous, highly information-dense dialect that optimizes their token usage and LLM spend...
[1] https://en.wikipedia.org/wiki/Lojban
[2] Someone speaking it: https://www.youtube.com/watch?v=lxQjwbUiM9w
Context pollution is a bigger problem.
E.g., those SKILL.md files that are tens of kilobytes long, as if being exhaustively verbose and rambling will somehow make the LLM smarter. (It won't, it will just dilute the context with irrelevant stuff.)
Ah, the Lisp curse. Here we go again.
coincidently, the 80s AI bubble crashed partly because Lisp dialetcts aren't inter-changable.
But some random in-house DSL? Doubt it.
There are two ways to do AI today.
1. You dog walk it, you throw a ball exactly where you want it, knowing the AI will fetch and come back. (You own the code. You tell the AI to go do very explicit tasks. You can rely on this)
2. You whisper to the dog to go walk itself and cross your fingers hoping it comes back; if you can do this for one dog, you can do this for ten and you can quickly become the greatest dog walker overnight. (You're trying different spec libraries. You're trying GasTown. You're failing, you're faltering); however, you are continuously optimizing every day. - a consistent narrative is leaning into putting thoughtful effort into what's about to happen (some call this spec-driven development, others are anchoring onto "intent-driven", some just "plan mode").
This Deepmind paper[1] has aged well. Paradoxically, the more capable AI gets, the more important specification becomes (or costly lack of it becomes), and the more time you spend planning and iterating on intent before you let an agent act. Tokens and code might be cheap, but we are moving closer and closer to where we let AI operate overnight or where we want agents to operate within our actual environments in real time. The tokens or actions in these instances, have higher costs. It's idealistic to think AI will be omniscient and therefore able to turn a five-word prompt into a rocket ship.
Thus, an increasingly important question around this is still worth finding answers to: can a clear statement of intent, combined with growing agent capability and tighter feedback loops, outperform the traditional hands-on coding cycle? It's easy to hypothesize yes, given model capabilities - I don't think it's quite yet clear what exactly this looks like.
[1] https://deepmind.google/blog/specification-gaming-the-flip-s...
I feel like if you’re designing a language, the activity of producing the spec, which involves the grammar etc., would allow you to design unencumbered by whether your design is easy to implement. Or whether it’s a good fit for the language you are implementing the compiler with.
The OP also correctly identifies that thoughtful design takes a back seat in favor of action when we start writing the code.
If you want a specification from source code, you need to reverse engineer it.
That is the great insight I can offer
I also enjoyed the writing style so much that I felt bad for myself for not getting to read this kind of writing enough. We are drowning in slop. We all deserve better!
If you look at what implementions modern compilers actually come up with, they often look quite different from what you would expect from the source code
If your program was a DSL for steering, the abstract machine will be the idea of steering wheel, while the machine could be a car without one. So a compiler would build the steering wheel, optionally adding power steering (optimization), and then tack the apparatus to steer for the given route.
Simply put: Formal language = No ambiguities.
Once you remove all ambiguous information from an informal spec, that, whatever remains, automatically becomes a formal description.
"Is this it?" "NOPE"
https://www.youtube.com/watch?v=TYM4QKMg12o
To invert your polynomial analogy, the specification might call for a sine wave, your code will generate a Taylor series approximation that is computable.
The difficulty has always arisen when the lines of code pile up AND users start requesting other things AND it is important not to break the "unintended behavior" parts of the system that arose from those initial decisions.
It would take either a sea-change in how agents work (think absorbing the whole codebase in the context window and understanding it at the level required to anticipate any surprising edge case consequences of a change, instead of doing think-search-read-think-search-read loops) or several more orders of magnitude of speed (to exhaustively chase down the huge number of combinations of logic paths+state that systems end up playing with) to get around that problem.
So yeah, hobby projects are a million times easier, as is bootstrapping larger projects. But for business works, deterministic behaviors and consistent specs are important.
They aren't significantly better now than a couple of years ago. So it doesn't seem likely they will be significantly better in a couple of years than they are now.