Uber's $1,500/month AI limit is a useful signal for AI tool pricing

(simonwillison.net)

280 points | by pdyc 9 hours ago

41 comments

ValentineC 3 hours ago
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.
[-]
- vidarh 45 minutes ago
  We can tell that the inferencing costs for many of these models are low enough that these models are being sold close to real costs on the basis that many of them are open weight and available from third party providers who have no incentive to subsidize them.
  I think the frontier labs will need to drop their high per-token prices at least for their low and mid-level models for the reason that several Chinese models (at least Qwen, DeepSeek, Kimi and GLM) are "close enough" that with the right harness they are cost effective alternatives.
  They won't necessarily need to close the gap - at least not yet -, because these models won't necessarily compete at the same token counts. E.g. at least some of them need to do far more work to solve the same problems.
  But, yeah, the prices will come down one way or the other.
  At the same time, even the subscriptions for the cheap Chinese models are probably subsidised, and those subscriptions are likely to get less generous over time.
- dgellow 2 hours ago
  One aspect Paul Kedrosky mentioned recently is the concept of „duration mismatch“. The price per token goes down over time (either because the AI vendor reduces due to competition pressure, or because customers are now incentivized to use older cheaper models). But datacenters are financed through debt, with the assumption their revenue increases over time. Quoting him: „[AI vendors are] paying for a fixed cost with a depreciating commodity“[0].
  So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.
  0: https://youtu.be/wGZboZcSGDY?is=64GuKyqBh_4aSjTE
  [-]
  - missedthecue 1 hour ago
    "So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt."
    Not necessarily, the bond holders could simply take a massive hair cut and lose shitloads of money. On the topic of bubbles and exuberance, Jeff Bezos made the salient point that there was a massive over-invested biotech boom in the 1990s and tons of sophisticated investors ended up losing lots of money. But humanity still kept the medical advancements made by the boom. Stocks going down didn't un-research drugs, and it won't un-research new GPUs or un-build datacenters.
    [-]
    - biztos 49 minutes ago
      In order to not un-build the data centers, they at least have to make more than it costs to operate them, and also not have some attractive liquidation value (the land, maybe).
      I could imagine something like “inference is done at home or in China, that’s the price to beat” and it’s not worth keeping all those GPUs cool out in Nevada.
      [-]
      - missedthecue 45 minutes ago
        But the parent comment was that one of the bigger costs in these data centers was the interest expense on the borrowed money. A restructuring removes or heavily reduces that amount.
        The fiber laid during the dotcom bubble never paid back the investors or lenders, but it's still profitably connecting customers all these years later.
  - geysersam 34 minutes ago
    Current AI datacenter/model development investment rate is roughly 1T/year. That's a lot. But the US economy is 33T/year. So the investment pays back (roughly) over ten years if, each year, the AI investments increase overall productivity by 0.6%, assuming the AI companies can capture half of the value of that productivity gain.
    > „[AI vendors are] paying for a fixed cost with a depreciating commodity“
    That's just a confusing way to say you don't think future models will be worth the development costs. Because if future models are significantly better, why would the price of tokens to access those models deprecate?
    [-]
    - jiggawatts 25 minutes ago
      The $1T number seems more promises than reality, which is closer to the $300B to $500B level. Still a big number, but between a third and a half of the value used in the popular media.
  - bijowo1676 2 hours ago
    do GPU chips really depreciate physically? There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally.
    I think its only accounting depreciation.
    I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade?
    [-]
    - bgnn 1 hour ago
      Chips age and fail with age. You can check hot-carrier injection, bias-temperature instability and electromigration as they are the main aging mechanisms. All if these are a linear function of time but exponentieal of temperature. 90-100C these chips are running at are really tough, so they are likely to fail at couple of percent to 10% range in 2-3 years depending on the margins they have in the design.
      The solder joints are notorious to fail at a high rate too.
      [-]
      - consp 1 hour ago
        If those don't go the caps and coils will eventually.
    - Aurornis 2 hours ago
      There are data centers that use and rent out 10 year old server GPUs.
      They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance.
      They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center.
      Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand.
      The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home.
      [-]
      - jmalicki 2 hours ago
        As long as the demand for GPUs keeps increasing, there are more data centers being built to house them.
        When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great.
        If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it?
        [-]
        HumanOstrich 1 hour ago
        We aren't talking about insect identification models from 2019.
        [-]
        jmalicki 16 minutes ago
        What do you think are running on the T4 GPUs in AWS? A lot of the use cases I know of for them are mid-level computer vision models that don't need to be frontier level.
    - munk-a 2 hours ago
      In addition to the physical depreciations other comments mentioned I'd also mention that old chips will settle into a low price and then actually go up on a per unit basis if you're trying to buy a significant amount of them. With a limitation on fabrication facilities continuing to pump out older cards is an opportunity cost to the manufacturers that would prefer to be producing newer cards. If you were in a place where you suddenly wanted to buy 10,000 3080s, as an example, I'm not certain if the market could actually fulfill that demand and no one with the ability to increase the available supply to meet that demand actually wants to do so.
      Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking.
    - vb-8448 2 hours ago
      I used to work in datacenters, during spinning disk era we had technicians from vendors basically every couple of days to replace some broken part. When the massive switch to ssd happened instead of having them every couple of days it was 3 or 4 times per month.
      Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market.
    - tardedmeme 2 hours ago
      Gradually, and especially when hot. Modern chips are pretty close to the physical limits of how small they can be made, and that means atomic/chemical effects like electromigration are accounted for and determine the lifetime. Every extra 10 degrees Celsius of temperature doubles the speed of chemical reactions.
      When they stray too close to the line ... you get Intel's 13/14th gen chips that wear out after 1-2 years instead of 10-20 years. Intel calls it "Vmin drift" because that doesn't sound scary, but the actual point is that various wear-out mechanisms push the chip outside of its design envelope - increasing the voltage or lowering the clock speed may get it to run for a while longer, but you're living on borrowed time as the various circuits just stop working right and you get unpredictable instruction mis-execution: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in...
      [-]
      - bijowo1676 1 hour ago
        sounds like planned depreciation on Intel's part, they definitely do not design server grade chips for longevity since that would harm their own revenues
    - malfist 2 hours ago
      They do degrade physically, but the bigger thing is they stop being competitive quickly. Each year or so we see doubling of GPU speeds for the same amount of power.
      If you build a 100MW data center with GPU compute and three years laster a new data center opens with the same cost for GPUs and same electricity cost you do, but can do twice as much compute, you quickly lose business unless the market is just so constrained customers can't afford to be picky. But the moment there's slack in the market you'll see major migrations off of providers that have the same cost but half, or quarter of the same performance.
      So when you see someone talking about GPUs fully deprecating in value in 1-3 years this is what they're talking about. Right now it's not a big deal because there's no slack in the market. But once there is, the bottom will drop out.
    - mattalex 58 minutes ago
      Nothing is stopping them, it's just not worth it: Have a look at e.g. vast.ai's pricing (https://vast.ai/pricing).
      The V100 (2017 -> 9 years old) can be rented from $0.02 to $0.37/h (right now I can find a V100 with a Xeon Gold 6140 and 48GB RAM for $0.165/h). Let's assume the guy you rent it to pins it at its 250W TDP and let's ignore the running costs of CPU/RAM/etc... Then you draw 1/4 kwh for that compute hour. The industrial electricity prices in the US vary between 7.5 and 25 ct per kwh (depending on state, time of day, etc...), so at 100% efficiency, assuming nothing ever breaks, and the CPU consumes 0W you earn about 14ct/h.
      And remember: V100s hours are sometimes sold at 1/10th the price.
      If I pick average conditions you need to start thinking of whether it is worth it to rent them out: Usually it isn't unless you have them anyways and just sell idle capacity.
      It's barely worth it to run them in a pure "is it profitable" sense, if we also account for the opportunity cost of taking up a slot in your datacenter it seizes to be worth it really quickly.
    - whateverboat 2 hours ago
      Today's data center GPUs are essentially overclocked, and so at limit of how much the chip materials can physically handle, and therefore degrade over time. For example, GH200s operate at 1W/superchip but the actual safe power is somewhere around 650W which will allow them to function for a decade or more. But that leads to around 15% slowdown and that is unacceptable in today's competition. So current GPUs are destined to be depreciating assets.
      In future, we might have fixed cost GPUs but not today.
      [-]
      - missedthecue 1 hour ago
        I would presume the reason they are overclocked is because they are trying to make up for the shortage. In time, the shortage of computing components will be remedied, and tokens produced at lower power pulls will be cheaper.
      - bijowo1676 58 minutes ago
        i think its reasonable to give up 15% of speed for a decade more lifetime. This depreciation change alters economics of GPU
    - numpad0 2 hours ago
      Chips do deteriorate and fail naturally at datacenter scale or in timescales of decades, though not exactly like on financial reports. Leak current increases or electro-migrations occur at junctions or whatever those words mean.
      And yeah, it does feel like GPUs will start losing values slower going forward with Moore's Law being dead for a while. It used to be that 3-5 years old GPUs were more useful as space heaters than GPUs, but that's much less of the case today.
    - dgellow 2 hours ago
      GPU do depreciate indeed, but here the depreciating commodity is the token, not the hardware. You sell cheaper token with the same hardware
    - threetonesun 2 hours ago
      I assumed the issue was similar to crypto mining, where given finite amounts of space and power it makes sense to always be running the latest and most powerful GPUs instead of keeping older hardware running. There's definitely a secondary market for these GPUs as well.
    - bigfishrunning 2 hours ago
      Your laptop doesn't have a 100% duty cycle. If you ran it like a data center it would indeed wear out much faster.
    - foobarian 2 hours ago
      > There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally
      I believe they do, but I too would love to know more details because there are several ways this can happen. Electromigration, package failures, VRAM failures, dielectric breakdown... Hopefully there will be studies soon similar to that old Google paper on HDD failures!
      [-]
      - hgoel 10 minutes ago
        Currently it's a pretty big ask to look at the several hundred billion transistors and the interconnects between them to find what broke.
        Though, those capabilities are maybe just a few years out, funnily it's taking AI to make it potentially doable.
    - manyatoms 2 hours ago
      the hardware itself is still useful, but random failures happen every so often, so if you're trying to run a fixed sized fleet then your fleet shrinks when you can't get spares any more
    - sandworm101 2 hours ago
      Yes, even if the hardware is untouched. As technology advances, the power cost per compute cycle goes down. A gpu using old tech costs progressively more to operate compared to the newer models. So its value goes down over time = depreciation.
      As for duty cycles, the chips are perfectly happy at 100% operation. Cooling and power componants fail, not the chips. But it costs manpower to repair such things and manpower is inconveniant these days. A gpu with any sort of fault just gets dumped.
- satvikpendem 1 hour ago
  Don't worry, they'll just lobby to ban Chinese models instead to keep their token revenues high.
  > Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.
  https://www.anthropic.com/research/2028-ai-leadership
  [-]
  - CuriouslyC 1 hour ago
    If you do the math, they don't have a choice. If China captures America's AI market it'll cause a major depression. They'll give it the BYD treatment, though it'll be a lot less effective.
    [-]
    - WarmWash 32 minutes ago
      They'll ban them because (unless run locally or self-hosted) they are just data capture tools for the China.
    - arealaccount 24 minutes ago
      The “you wouldn’t download a car” meme applies here
- Animats 2 hours ago
  Raise them, more likely. NVidia says that GPU hardware prices won't decrease until at least 2030. The world is out of fab capacity.
  [-]
  - EA-3167 1 hour ago
    Seriously, they’re trying to justify trillion+ IPO’s while setting piles of money on fire, prices aren’t going DOWN.
    [-]
    - criddell 1 hour ago
      Today's frontier models will be tomorrows low-end option. I think whatever model you are using today will be less expensive to use a year or two from now.
      [-]
      - missedthecue 59 minutes ago
        Last year's o3 was more expensive than 5.5 is. Whatever model we are using now is probably be more expensive than next year's leading models will be.
        [-]
        Insanity 37 minutes ago
        Price per M/tokens is also a fuzzy metric when newer models reason longer, and then burn more tokens while doing so.
- freediddy 3 hours ago
  Most sane US companies will disallow use of cloud-based Chinese AI providers, because everything including code, data, PII, etc is being sent to them.
  [-]
  - eikenberry 2 hours ago
    Then don't use the cloud-based Chinese providers, use cloud-base US/EU providers using Chinese models. The interesting Chinese models are all open making this issue mostly moot.
  - ceejayoz 2 hours ago
    Saner companies ask the same question about models from their own country too.
  - rd 2 hours ago
    I wonder if I could start a US-based company with good data regulation and just serve open-weight models at a competitive price. I feel like the real barrier is just that most companies willing to adopt AI usage enough to make it worth it at this point don't want to be using inferior models.
    [-]
    - tokioyoyo 2 hours ago
      Yes, you can. There are multiple inference providers out there. The problem is, it’s hard to beat the Chinese providers in cost. And you also have to compete with frontier model providers’ subsidized offerings.
    - CobrastanJorji 2 hours ago
      Here's a free startup idea: operate an open-weight model service, and offer "Verified AI Integrity," which signs the input tokens, the seed for the randomness in selecting outputs, and the model ID, proving that the result of the call to AI was completely "organic" and was not interfered with.
      Your main audience would be snake oil salesmen trying to prove their AI products are unbiased and not under the thumb of any outside influence. This doesn't address the biases of the model itself, but that's not your business. Your business is selling tokens and security certificates. If you can get the right angel investor, you could maybe have your new standard required for some government applications.
    - mediaman 2 hours ago
      There are plenty of US-based inference providers available, including AWS, that serve Chinese models at competitive prices (vs frontier US models). They also have lots of usage. Not necessarily for coding, but for other enterprise tasks.
    - fg137 1 hour ago
      It's called AWS. Bedrock is right there. Price or data policy is never the issue. The models themselves are the problem -- most large US companies are not going to touch them.
      Source: directly involved in these discussions. You can downvote as much as you'd like but you can't ignore the truth.
  - tmp10423288442 1 hour ago
    There are some objections here saying that some US firms are using Chinese AI providers, but I wonder if any of those are subject to compliance. Large firms that are disproportionately responsible for AI spending are all subject to compliance.
  - amunozo 2 hours ago
    You can run DeepSeek as it's open weights, unlike Claude or GPT.
  - cheeze 2 hours ago
    Deepseek has some models in Bedrock. There is definitely a huge market for a "good enough" model running within the country of the company
- testdelacc1 3 hours ago
  Per token costs will fall, but the harnesses will get more token hungry. Instead of just centering the div it’ll spin up a battery of agents to architect, critique, advise, code, review, refactor and so on.
  [-]
  - sevenzero 3 hours ago
    I wish I could disable most of these. I already hate all the "oh you're actually right, let me fix that" nonsense. Then it proceeds to burn 50k tokens on the git history instead of copying logic A from a different part of the codebase to logic B, where I want that exact logic without having to write the boilerplate myself...
    [-]
    - apsurd 3 hours ago
      Makes me think of how my Claude.md files specifies to use the built in framework code-generators (rails). Those generators are deterministically right every time.
      I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.
      [-]
      - thefunnyman 3 hours ago
        This is tricky since it can and will ignore your md directions. When possible I try to lean on tool call hooks or skills that invoke deterministic scripts. As much as you can remove the "choice" the better though still there's a lot of randomness in how reliably it invokes skills ime.
    - sfn42 3 hours ago
      A lot of the time if you're copying code from one place to another what you actually want to do is abstract it so you can reuse it in both places.
      The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.
      A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.
      [-]
      - sevenzero 2 hours ago
        Nah the codebase is legacy fucked and I cant be bothered to try and optimize business flows without the fear of other stuff breaking.
        Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.
  - KaiShips 3 hours ago
    [flagged]
- SecretDreams 3 hours ago
  > Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?
  I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.
  [-]
  - aDyslecticCrow 2 hours ago
    An inference only platform selling good open weight model inference without the research overhead could capture a-lot of market for lower size model uses (haiky, gemeni flash). Diffusion-transformers and clever cashing can drop inference even lower, which is improving at a high rate.
    The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)
    At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).
    For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)
    [-]
    - pianopatrick 1 hour ago
      AI may get so commoditized for certain use cases that you will not even be able sell inference at a profit. AI might be bundled in with other services, just like cursor bundles in their own AI model for auto complete with their editor. I.e. cameras might have AI for image recognition bundled in etc.
      [-]
      - HDThoreaun 54 minutes ago
        Agreed, this is where google is really, really set up to win the market. They can combine gemini subscription with a moderately more expensive google workspace and steal MSFTs entire $50 billion enterprise productivity software market. MSFT is quickly trying to get copilot in a good enough state but without TPUs I think itll be tough for them to serve a good enough model at a price people will accept.
    - SecretDreams 2 hours ago
      I agree with all of this.
      So my question remains the same: How are the players investing 100s of billions in buildout going to hope to make this back? Market capture looks bleak, inference looks like a race to the bottom. End users look like they could be beneficiaries. Where do the big boys go?
      [-]
      - CuriouslyC 1 hour ago
        The American big boys are hoping to create "labor as a service" rather than sell tools. You don't hire an accountant that uses Claude, you hire Claude and it just does everything, without the visibility of current agents. They'll need to make it remote and obfuscated to protect their secret sauce from distillation and reverse engineering. It'll be really expensive, and be focused on enabling rich business types and upper managers.
  - HDThoreaun 57 minutes ago
    Prices can go down while tokens sold increases so that profit increases. The labs number one goal right now is moving past software engineers so that every white collar worker in the country finds ai assistants indispensable. Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs.
    [-]
    - SecretDreams 45 minutes ago
      > Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs.
      Well, they just rent their hardware, so I'm not so sure. But they'll both be public soon and we should get that breakout in their cost structures, somewhat.
- cyanydeez 3 hours ago
  id be amazed any american business will aend data to china
  [-]
  - linkregister 3 hours ago
    HuggingFace offers DeepSeek as one of its models— it's pretty simple to spin up instances under your control.
    I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek.
    For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model.
    [-]
    - dghlsakjg 2 hours ago
      The majority of Deepseek providers on OpenRouter for v4 pro are in the US. Especially interesting is that they are in the same ballpark for pricing.
      [-]
      - eikenberry 2 hours ago
        They are in the same ballpark for deepseek-v4-flash, but deepseek-v4-pro from deepseek is still around 1/2 of the alternatives.
        [-]
        dghlsakjg 2 hours ago
        I'm pretty sure that Deepseek said that pricing was promotional. Be curious to see if it lasts.
        V3 pricing from them was right in line with what the commodity providers are charging.
        [-]
        eikenberry 1 hour ago
        They announced a few weeks back that the promotional pricing was permanent.
  - alpinisme 3 hours ago
    “Any” is a very high bar Unless laws prevent it, I don’t see why a substantial minority wouldn’t buy services from where they can get them at a similar quality and much lower price.
  - dkersten 2 hours ago
    Together.ai provide many open weights models and as far as I’m are their servers are US based (the company certainly is)
  - lowbloodsugar 2 hours ago
    Any IT cost center will send to the lowest bidder. This isn’t intellectual property: it’s annoying shit that is an unwelcome cost of doing business. China might copy our tedious scripts? Will they make a product out of it? Can I buy it and fire my IT staff? Great!
    Not everyone using AI is using it to code core value IP.
- mcmcmc 2 hours ago
  [dead]
f311a 8 hours ago
How many more months do we need to wait, until big companies realize that flash models work just fine if you:
1) Don't ask LLMs for big changes
2) Review everything and point them in the right direction
Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.
The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.
So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.
[-]
- _jab 2 hours ago
  It's pretty simple; organizations are willing to tolerate paying $1500/month/engineer, which seems to be roughly inline with "normal" consumption for most full-time engineers. If that number grows significantly, then I bet companies will start exploring flash models more, as you propose.
  [-]
  - lavezzi 2 hours ago
    They are willing to tolerate it now, which is quite a switch up from the free for all we had a few weeks ago, and if they aren’t able to tie in this new ~$1500p/m cap to demonstrable productivity and revenue increases then that will be kneecapped even faster
  - rudedogg 1 hour ago
    > organizations are willing to tolerate paying $1500/month/engineer
    One organization, that is a software company
    > which seems to be roughly inline with "normal" consumption for most full-time engineers
    My peers are using $20/mo plans, only a handful are using more than $100/mo in tokens. We haven’t had any limits imposed yet.
- mrothroc 2 hours ago
  The easy decision is to just go with the biggest SOTA model you can afford.
  But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.
  The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.
  It's the pipeline, not the model, that gets you quality at a given token budget.
- econ 3 hours ago
  I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly? Perhaps, if it can measure complexity, even generate a quote?
  Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.
  [-]
  - AgentMasterRace 2 hours ago
    Many harnesses do this, I've recently dropped all my big subscriptions for using deepseek. Codewhale (formerly deepseek-tui) will use pro for large tasks and route smaller ones to flash. It's pretty good, but I just use pro and everything as the cost is quite low.
    This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)
  - ValentineC 2 hours ago
    > I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly?
    This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.
    [-]
    - jorl17 1 hour ago
      Yes, they are all already doing this
- andersmurphy 1 hour ago
  This a thousand times. The bigger models also have a habit of overcomplicating things.
- warmwaffles 3 hours ago
  > Don't ask LLMs for big changes
  > Review everything and point them in the right direction
  Sorry upper management doesn't care. That's an engineering problem that you need to solve.
  [-]
  - eikenberry 3 hours ago
    They were proposing a solution.. To use flash models and use them in a way that best amplifies your work.
    [-]
    - AgentMasterRace 2 hours ago
      He was making a joke.
tuesdaynight 3 hours ago
Why there are so many people that still believe that AI coding is a fad? It's something that started less than two years ago and companies are already paying thousands per seat. I know one that gives you 5k per month. Which other tool went from nothing to this level of acceptance so quickly?
[-]
- OptionOfT 2 hours ago
  Because companies are betting that this spending will allow them to reduce cost by firing people.
  Right now the AI LLM PRs we're seeing are just introducing more work for other people, while these so-called builders are looking good with their new dashboards and functionality they're demoing.
  But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
  It's not built up from the ground with experience from x people taken into account. It's materialized from nothing, with no foundational separation, and barely any abstractions.
  No one wants to touch it. The PRs are too large, and the 'authors' of the PRs aren't on call with us.
  They get all the glory, but do none of the work.
  It's kinda like designing a house and then sending it to an architect and engineer saying: make this work.
  [-]
  - saulpw 2 hours ago
    > But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.
    You can absolutely do this. It's even right most of the time.
    [-]
    - chmod775 1 hour ago
      Let's be real. Most of the time you ask an LLM "Why did you do it like this?", it responds with something along the lines of "Oops. My bad. You're right to point this out."
      You even have a fair chance of getting a response like that when there isn't anything wrong and the question wasn't rhetorical - which perfectly illustrates the level of the genuine understanding LLMs operate at.
      [-]
      - seventhtiger 1 hour ago
        When you criticize AI, always remember that the alternative is the average employee. Today's models are pretty good.
        [-]
        devin 49 minutes ago
        A lot of people think they're above average. A lot of them are wrong.
        A lot of average people are producing gigantic messes. At least previous to this they were gated by their mediocrity.
      - djeastm 37 minutes ago
        I remember hearing (perhaps last year?) that the model companies have specifically tried to obfuscate the "thinking/reasoning" behind the decisions the models make so as to prevent cheaper models from training on the reasoning logs. So asking one "why did you do it like this" might be not fruitful.
        Not sure if that's true or if it might be influencing what you're seeing, but it's a thought.
      - dmayle 23 minutes ago
        That's because of a fundamental misunderstanding of what an LLM is. The only correct answer to "Why did you do it like this?" is that the specific combination of input text and RNG state caused this particular output. There's no reasoning to be had.
      - saulpw 1 hour ago
        This has happened to me, so I put this in my global CLAUDE.md, and it seems to help (I don't remember getting the response you mentioned for awhile now):
        **Lead with the answer when asked how/which/whether.** Name the command/mechanism first; a question seeking understanding isn't a go-ahead to execute. Answer, then offer to act.
      - baggy_trough 1 hour ago
        Can't remember the last time that happened.
        [-]
        javier2 18 minutes ago
        Happened to me at least three times the past 14 days. I point out where it made a design decision that causes data loss. «Oops my mistake»
    - datsci_est_2015 2 hours ago
      I believe the “them” the OP was talking about was referring to the people opening the PRs, not the LLMs.
      [-]
      - saulpw 1 hour ago
        My mistake, that is definitely a different scene.
    - ssss11 22 minutes ago
      And you can certainly tell it the flow you want (and any other constraints) in the prompt.
  - scuff3d 39 minutes ago
    Literally in the middle of ripping apart a vibe coded mess at work to figure out what's even worth keeping. Not fun :(
  - HNisCIS 1 hour ago
    It's so fucking bad. I'm watching a team try to maintain a huge dashboard/control application that interfaces with a large amount of hardware using solely AI workflows.
    Literally nothing works, all the timers/time counters are different across the pages, constantly commands hardware to do stupid shit, breaks during critical moments/in front of clients.
    Eventually mgmt had to institute change freezes for high profile events because the team was breaking too much shit all the time.
    The average C suite dipshit doesn't realize that the performance drops off a cliff once your project is more than some fraction of the context window so they will make pretty dashboards all day long but once you need to cover all the edge cases of a real system it all explodes.
    AI isn't trained on the type of software style we'll need to create systems using AI, it's trained on how we used to write software. It doesn't reuse code or elegantly structure annoying, it just adds more code until the thing builds and passes some fake tests, even if half of it is functionally dead/unused.
- javier2 14 minutes ago
  Because the vibe coded stuff is sometimes great, sometimes it breaks stuff, sometimes it breaks things that we fixed multiple times earlier. The PRs are too large, nobody can review that mess and you better be on call for your deployment. Maybe it will get better, maybe not. I dont know yet.
  [-]
  - marcosdumay 1 minute ago
    Oh, it won't get any better. LLMs already trained on every bit of code ever published, they won't get any more material.
- lbrito 2 hours ago
  That's just a non sequitur. "companies are already paying thousands per seat" has zero correlation with something being a fad or not. There are much more reasonable rationales explaining why companies are acting the way they are than "because AI coding is not a fad"
  [-]
  - tmp10423288442 1 hour ago
    Can you name a service that charged companies thousands/seat/month that turned out to be almost or completely useless? There's lots of random services sold to corporates that are not very useful (all the random benefits besides health care, life insurance, and other big-ticket items), but the per-seat charge of those is much smaller.
    [-]
    - marcosdumay 0 minutes ago
      There are so many. Can I start with Oracle databases?
    - mike_hock 8 minutes ago
      Every consultant ever, but to be fair that's not per seat.
    - edent 1 hour ago
      Google Jam Board (and other digital whiteboards) had high upfront capex and lowish opex. Probably close to the price for how often they were used before being killed off.
      Same with the MS surface(?) tables (not tablets). I saw load of companies buy into the hype and then discard.
  - Kiro 1 hour ago
    It's just silly to claim it has zero correlation.
- agumonkey 2 hours ago
  I would use these exact facts as a sign that it's maybe not what it seems. It's much too big and too fast to feel stable. It might keep at that level, increase even more, or drop down to a saner level of use / allocation.
  [-]
  - Aurornis 2 hours ago
    > It might keep at that level, increase even more, or drop down
    Bold prediction. :)
    I think anyone predicting a drop or near-term flattening is not thinking beyond the online bubbles where these tools are discussed. In a local tech meetup a lot of the normal companies are barely coming online with AI tools at their company, and even then with very low limits.
  - teeray 2 hours ago
    I can see a corporate future where tokens are haggled over in department budgets just like any other line item. Some projects will get more of them, other projects will get less of them. "Use AI for everything" will become "use AI economically and build things that outlast our budget for it."
  - johnfn 2 hours ago
    So it might either go up, stay the same, or go down? :)
- tokioyoyo 2 hours ago
  “AI coding is a fad” is not just one big camp of similar-minded people. Different groups have to give up on their pre-existing beliefs in order to be ok with AI coding.
  Think of people who were very strict with variable names. People who pushed for multiple-levels deep of abstractions for a single API logic that’s not going to be reused. People who believed that coding is craft, rather than just a process to get to the end during work hours. This makes most of these people’s points more-or-less moot.
  I was in some of those camps, but I’ve seen coding evolve in the last 15 years. So I understand that these priors need to be updated, as most arguments don’t apply to today’s world.
  [-]
  - devin 42 minutes ago
    "as most arguments don't apply to today's world" makes me want to roll my eyes so hard at you. The vast majority of problems we had with building complicated systems are all still just sitting there. People are speedrunning relearning things we've known about software engineering for decades.
    The more things change, the more they stay the same.
    [-]
    - tokioyoyo 25 minutes ago
      The examples I gave, and the arguments that usually support them don’t really translate into “building complicated systems”. I was talking about the arguments in support of variable naming flamewars, etc.
      I’m not proponent of AI generating everything without any supervision as of now. But willing to change my mind when it gets better.
      Most software engineering jobs are not cutting-edge tech, or research, or solving unsolved problems. Integrations, APIs, figma-to-react pipelines, devops and etc. is what people get hired for. All those can be done much faster in the same-or-better quality by an experienced person with the supplement of AI. It’s hard to imagine any company would go against the grain and slow things down on purpose.
    - rootusrootus 30 minutes ago
      Between AI and the stock market (which of course relates directly to AI), I’ve lost count of the number of times I’ve heard lately another variation of “this time is different.” Sometimes so close to those words that I wonder why the person speaking them doesn’t feel a bit tingly. Great big warning signs all around.
  - fragmede 2 hours ago
    What's an int vs a float vs a boolean? What's a function? What's a class? What's a variable? You don't actually need to know the answer to those questions in order to vibe code. That's a lot of priors to update!
    [-]
    - tokioyoyo 2 hours ago
      Just to go on record, as of today, I’m a big believer that a person that knows all that stuff is much more productive with AI-coding than a person who doesn’t.
      I have no idea how we can get people motivated to learn these through trial-and-error when AI coding exists though. I remember the days of spending hours on stupid bugs that AI can resolve within a minute. But I recall learning heavily from those experiences. Oh well…
      [-]
      - mewpmewp2 58 minutes ago
        I honestly feel like my own learning has accelerated after using AI. Simply because now it's so easy to write the same thing in so many different languages, I can e.g. learn pros and cons of each language, which otherwise would have been I think unfathomable to me. I have now created so much stuff I wouldn't have had time to create.
        I setup k3s, and tons of what would be otherwise unnecessarily complicated stuff on my laptop for my side projects with additional home servers, smart house stuff. Otherwise k8s and things like that would have been daunting to learn and in theory and without constant professional exposure, etc...
        Microservices in Go, Rust, which I didn't have any previous experience with, games in C and other languages. Didn't know anything about low level memory management before. Was just mainly TypeScript person. Just constantly building random fun stuff.
        [-]
        tokioyoyo 20 minutes ago
        The question is if you already had intuitive understanding of what those things “are”. The languages and systems have been easier to learn once you picked up a couple. Same applies here as well.
        The question is, how quickly does a junior with no experience builds intuition without trial and error.
    - nomel 1 hour ago
      And, you don't have to vibe code. A competent developer can make great use of AI. I think a developer that can develop the system themselves is the most accelerated user.
    - malfist 1 hour ago
      > You don't actually need to know the answer to those questions in order to vibe code
      No, but you do need to know the answer to respond to that 3AM page about prod being down.
- toasty228 1 hour ago
  There is a whole spectrum between "ai coding is a fad" and "unlimited tokens for every employees we don't even care if it actually ends up being a net positive financially"
  [-]
  - tmp10423288442 1 hour ago
    > "unlimited tokens for every employees we don't even care if it actually ends up being a net positive financially"
    That was clearly a short-term trend that would obviously get fixed. Doesn't say much about AI coding as a business model.
- anthonypasq 3 hours ago
  perhaps the personal computer? Companies were spending 3-5k (10-15k inflation adjusted) on every employee for just hardware.
  everyone making comparisons to the dotcom bubble seems misguided. this is clearly computing 2.0 imo
  [-]
  - thewebguyd 2 hours ago
    No disagreement on computing 2.0, but companies spending 3-5k per employee for hardware isn't generally a monthly cost. It's a at the time of hire, and then once every 3 to 5 years after that, for a monthly amortized cost of about $50/employee.
    I have my concerns with current inference pricing in that there's a non-zero possibility for a rug pull in the future for the subscription plans for organizations and individuals that can still use them. For now, its only companies larger than ~150 users that need to pay per token, but what if that wasn't the case? Not every company can afford over $1k/month/employee to give them access to AI tooling, further making it harder to compete against the behemoths. If we get to a point where an individual can no longer pay $100/month for nearly unlimited usage and instead must pay per token, that's going to be a problem.
    Personal computing eventually became an equalizer (until we started centralizing on mainframes again, aka the cloud) because it got cheap. My hope is that inference also gets just as, if not cheaper.
    I have high hopes for local AI and open weight models and we will continue the ethos of local, personal computing and not needing to offload everything to OpenAI/Anthropic/Google, etc. to get work done once the hardware and hardware availability catch up.
    [-]
    - GrinningFool 32 minutes ago
      Any kind of rug-pull is a serious concern. Companies are re-orienting their entire development processes around these tools. Sure they can go back, but it will require a much larger and more expensive effort than to transition in the first place.
      All companies who make this transition will be more or less at the mercy of model providers.
    - dghlsakjg 2 hours ago
      Every employee doesn't need $1k in token spend per month, either. That kind of spend makes sense for technical workers in r+d.
      Most other workers are served fine by $20-30 worth of tokens on a budget model. You don't need Opus to help support write emails.
      [-]
      - tmp10423288442 1 hour ago
        No, but you do want Opus-tier models to do desktop and office software automation (think about people who intensely use Excel and the like). Actually those might take even more tokens that coding in a lot of cases. Why do you think Claude Cowork is successful, and why do you think Codex is leaning so hard into Computer use?
  - dghlsakjg 2 hours ago
    The Dotcom bubble is an interesting comparison.
    The general thrust that everything would be online was correct, it was just that the market mistimed and misallocated of capital by a decade or more. There was massive spending on infrastructure capacity that we wouldn't end up needing until the 2010s. There were hype driven valuations completely disconnected from business fundamentals just because a company was an 'internet' company. Things were going from cutting edge to obsolete in less than a year. There were breathless promises that this was business 2.0! Of course, none of that sounds remotely like what is going on today...
    I'm optimistic about AI, but I also don't think that it is going to change everything as fast as promised.
    [-]
    - threetonesun 2 hours ago
      The question you always have to ask is what problems does it directly solve. I personally think most of the current problems in software development and really the world at large are not time-bound problems but alignment issues, and all an LLM can really do there is be some 3rd party oracle that gives you an answer without needing other humans to agree with you.
      [-]
      - squidbeak 41 minutes ago
        > The question you always have to ask is what problems does it directly solve
        Most directly, human labour. Labour is always a problem for capital. At a certain level of AI competence, businesses don't need to pay humans to complete the work they need doing in order to operate. I don't think anyone would dispute AI competence isn't growing steadily.
      - rafterydj 1 hour ago
        I agree with you. I think that if we're talking about actual reliable problem solving, we have to be discussing robotic / drone systems. Software is as complex as you want to make it, and always has been.
  - jghn 2 hours ago
    Two things can be true at the same time. It can be true that this is here to stay. It can also be true that companies are grossly overvalued right now and that the market is irrationally exuberant. This would mean we could both have a crash and also see AI coding be the new future.
  - pmg101 2 hours ago
    I think the right comparison is the invention of the microprocessor. At that time people were grappling with a lot of the same things we are today - would it automate jobs away, would it transform education and the work place, etc.
  - pixelesque 2 hours ago
    Hardware's not generally a subscription, monthly cost though.
    You update it for them every 3/4 years (if they're lucky).
    It probably makes a bit more sense to compare it to existing software subscriptions like Office, or the old-school 'per-seat' licenses per user for software.
    [-]
    - thewebguyd 2 hours ago
      [dead]
- Barrin92 2 hours ago
  >Why there are so many people that still believe that AI coding is a fad?
  Because there's not a single piece of evidence that this has improved the quality of the delivered software, or for that matter even the speed of features any of these companies produce, in fact if anything the opposite.
  The point of software development, the hint is in the name, is to develop software, not consume tokens. If Uber was now full of 10x engineers the stock price of Uber would be up, not down on a yearly basis. Hilariously enough the only company whose stock price is up appears to be Antrophic
- jbvlkt 1 hour ago
  Because writing huge amounts of code is easy for humans too. Agents already proved that they can do it. But are agents able to maintain it? I do not know and unless I know for sure, I am not fully committing to AI generated code.
  i.e. I am able to write about 1k lines of code of "acceptable" quality per week. Which means in 1 year, there will be about 5Ok LoC. I am pretty sure, that I would have to spent like 60-80% of time to maintain 1st year code and the rest to make new features in the second year so I would have to hire more people and spent time to onboard them to maintain velocity. All of that are rough estimates, probably overoptimistic and way worse in 3rd year. Good luck doing such estimates with code agents. Even worse if you already have huge amounts of legacy code.
- LAC-Tech 23 minutes ago
  Because we have spent a lot of time and money coding with using AI to genetate code and have been unimpressed with the results.
  As for why they got accepted so quickly 1) the industry's long running despetation to deskill computer programming 2) the addictive psychology baked into LLMs "That's an elegant solution! Shall I ... ?"
- themafia 36 minutes ago
  Why are there so many people who mistake simple anecdotes for actionable data? Why do the majority of businesses fail rather than succeed?
- jujube3 1 hour ago
  It's cope. People desperately want to believe that AI coding is going away so that they can go back to partying like it's 2020.
  So there's a huge number of HN posters claiming that the price of tokens will go UP over time rather than down (that's how Moore's Law works, right???) or that code bases that AI contributes to will spontaneously combust, or something.
  [-]
  - dofm 59 minutes ago
    I don't think it is unreasonable to say both will happen, is it?
    In the long term, tokens will fall in price. Obviously. (If "tokens" continues to be the unit)
    In the short to medium term, for the IPOs to succeed, people have to start actually paying for what they are using, so the price will go up, and is going up, quite a lot. Once their value is set they will slowly fall from that point (or some point maybe halfway, depending on how much the market is willing to continue to subsidise).
    I am an AI cynic, but I am now an informed cynic; I am learning agentic tools so I know where they are useful and I know my enemy.
    I think the "fad" here is cloud-based, metered AI being a dominant work mode.
    Nothing, so far, has suggested to me that any other outcome is likely than edge- to local-scale, on-device, on-laptop, on-prem models getting good enough to the point where people use them by default and use the cloud models only when they need the extra oomph.
    I cannot believe that there is anything other than an enormous incentive for companies like Uber to find local, small model and on-premises solutions to their problems, not least while pricing is so changeable and people are getting nasty surprises.
    Betting on OpenAI and Anthropic being around over the long term in the form that they are now, that feels like valley hopium. Utility monopolies essentially always derive from physical/geograpical limitations, don't they?
siliconc0w 2 hours ago
I use the $100/mo sub but my 30 day API cost is about $1700/mo.
It really depends how you use it, if you're using prompts to generate detailed designs, breaking those into lists of tasks, and then feeding those to multiple agents - it's really easy to burn through many thousands.
If you're being more deliberate and using a few agents at a time interactively, having it review PRs/resolve issues, automated clean-ups and performance optimization, etc it could be more like $1500.
If you're just throwing it one-off questions like a better stack-overflow that is well under a $100.
I've really gotten into /goal, if you can find something verifiable and leave it overnight - it's kinda like christmas morning to see where it landed.
CharlieDigital 8 hours ago
$1500/mo is $18,000/seat/annum.
Maybe Microsoft and Nvidia are on to something.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
[-]
- pqtyw 3 hours ago
  How is tok/s not a bottleneck I? I assume most people still use ai agents interactively rather than leaving them to do their own thing during the night.
  I find anything below 50 tps or so entirely unusable...
  Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.
  [-]
  - sweetjuly 1 hour ago
    Is interactive use for coding something that actually works today? With unsafe mode, even frontier hosted models are slow enough I end up just tabbing out to work on other tasks. It would need to be much faster if I am to sit and stare at it while it churns. Local models might be a lot slower but workflow-wise it doesn't change much for me.
  - brianwawok 3 hours ago
    I startup 4 or so projects then go do other things for 4 hours. I don’t have enough energy to steer overnight, but I’m at least “semi afk” for daytime steering. So throughput is king for me, tokens per hour. Not latency or actual tokens per second.
    [-]
    - smallerize 3 hours ago
      Running locally is even worse for this, because if you're running 4 jobs at once they just run at 1/4 speed. Not literally, you can make up some of the difference with batching, but you have limited resources instead of spreading your requests out on an API provider's nodes.
  - cyanydeez 2 hours ago
    It's not a bottleneck if you care about the actual code.
    [-]
    - pqtyw 2 hours ago
      I would expect the overwhelming majority of output tokens would not be the actual code but used for analysis, reasoning, testing and iteration. If you only use the agent for autocomplete then yes, the calculation is probably different.
      [-]
      - cyanydeez 2 hours ago
        yea, and understanding that too is important. the idea you dont need to read code or analysis seems to align with the depwndcy addiction being shoved in thw pipe.
- dgellow 2 hours ago
  You’re way better to run your own on premise models. Laptops are depreciating assets, do not benefit from economy of scale, have fixed specs, result in a fragmented fleet where you need to keep models up to date. Without talking about power consumption and cooling issues. I really don’t see why companies would go that direction
  [-]
  - bluGill 1 hour ago
    You don't need to run on laptops, desktops plugged into mains power get more power consumption and better cooling. I want my laptop to work, but I can accept when I'm on an airplane at 32k feet I get less abilities.
  - CharlieDigital 2 hours ago
    Even if the laptop costs $5k and you upgrade it every year with the latest hardware and run local models (assuming your workload can tolerate smaller models at slower tok/s), you win.
- Buttons840 3 hours ago
  I think companies will eventually just buy a local AI server.
  Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.
  These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.
  I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.
  [-]
  - pm90 3 hours ago
    Yep, its already quite easy to do so with tools like opencode/openrouter. Ive used some open source models and they seem … ok? Im not doing foundational math, just refactoring code, understanding existing code etc. I don’t see a future where companies blow 11% of employee compensation on a single tool; the hosted AI server + oss models will 99% win out.
  - dangus 3 hours ago
    I don’t think companies will do that. Why don’t they just buy local on-premise infrastructure even though it’s cheaper than AWS?
    “AI in a box” sounds a heck of a lot like “the box” from the Silicon Valley TV show. Or the Google search appliance. Or name any other on-premise thing that is equally dinosauric.
    The real finding of this article is that AI tokens are direct competitors with offshoring. $1,500/month buys you a whole employee in India.
    And this is before AI companies inevitably increase pricing after the conclusion of the growth phase.
    [-]
    - pm90 3 hours ago
      > I don’t think companies will do that. Why don’t they just buy local on-premise infrastructure even though it’s cheaper than AWS?
      For customer facing, production software, its worth paying a cloud tax to get the reliability guarantee. For tools that are used by engineers for code development, there is no need for such bulletproof guarantees.
      [-]
      - dangus 57 minutes ago
        That makes very little sense. SaaS/cloud tooling is overwhelmingly popular for internal tooling.
        Which category of developer tool has on-premise as the more popular option?
        Cloud isn’t about “reliability,” it’s about being able to focus on your core business rather than spending all your time maintaining stuff.
- zozbot234 6 hours ago
  I agree on the basic point, but running $1500/mo's worth of SOTA local AI is non-trivial already, and that's a figure for a single seat. That's equivalent to generating at least 20 tok/s on a 24/7 basis, in fact probably quite a bit more than that (because open-weight models are vastly cheaper than proprietary ones even when served from reputable Western providers - reaching the same spend would take around 100 tok/s or more, which is well within datacenter hardware territory).
  You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.
- sajithdilshan 1 hour ago
  I don't think it's necessarily what Uber build, but the gained productivity. If the engineers use the AI tools the correct way, it can drastically increase the productivity and that means they can actually use the LLM as a junior or an associate engineer. $1500/mo is way cheaper for that level of productivity where as they would have had to pay far more for a human engineer.
- ssivark 2 hours ago
  Even if companies decided to move away from expensive models from the major labs, it probably much more economical to pay a cloud provider to host some open weights model which could then be amortized across all (internal) users and do inference at a substantial batch size, rather than giving everyone their own hardware -- which means the company would need to provision for peak usage and inference at batch size of one.
- darkwater 8 hours ago
  > it's WTF did Uber build with all of that spend?
  You can ask the same for the median 330k salary in the US for Uber Engineering... and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.
  EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.
  [-]
  - SlinkyOnStairs 8 hours ago
    > You can ask the same for the median 330k salary in the US for Uber Engineering
    People DO.
    It's well known that most tech companies are ran incompetently. As you say, it's not the engineers' fault.
    But most projects and hiring in these companies exists to juice promotion criteria. And that, depending on perspective, these companies are either massively overstaffed or massively underproductive.
    The comparison to AI spending being wasteful holds up pretty well, these are companies that readily piss away billions in pointless spending.
    [-]
    - hibikir 1 hour ago
      The massive misalignment in large companies is no secret. But neither is the fact that when someone comes to cut, they also have no idea of who is doing load bearing work that matters, and who doesn't. I look at recent cuts around my large corp, and it's clear they are made at levels that have no visibility of the ground, and are uninterested in said visibility. Obvious mistakes that are worse than what claude would have told you (yes, I asked Claude to pretend to make the budget cuts in our org y looking at the same data an exec could probably get. They were better than what happened)
      I think it's a general problem, but in my rare conversations with execs nowadays, they seem rather uninterested in improving their decision making there. The actual performance of the organization does not appear to be all that relevant to them.
  - quantified 2 hours ago
    Sure, but has their rate of value added increased as a result? It's a good question to ask. They added value before LLM coding, and now are more expensive than before thanks to token costs.
  - FergusArgyll 3 hours ago
    This is a very good answer but there's a flip side too.
    The idea of "if you add intelligence you make more money" is contradicted by the fact companies don't just always hire more people. Wy doesn't google just hire everyone?
  - CharlieDigital 8 hours ago
    This is what all "platform engineers" have to do once things are working nicely: you have to keep inventing work.
    I don't know; I'm a Ron Popeil "set it and forget it" kind of guy. Make the dumbest, simplest thing that's going to work with some clear path for scaling. Then go do valuable things instead.
    [-]
    - darkwater 8 hours ago
      But most Platform Engineering teams in smaller companies (and especially non-US) add a layer on top of existing technologies. A layer that usually maps to the specific culture and idiosyncrasies of that company; a bit like the deployment flow which is usually very specifically shaped on how a company is.
      But in Uber's case, they tend to reinvent lower level pieces of platform/infra.
  - throwaw12 8 hours ago
    you don't get promotion for supporting existing things, but for "inventing" you can get promoted. also for large migrations
- dkdcdev 8 hours ago
  at their scale they could also just run a large on-premise or rented (basically still cloud, but cheaper) GPU cluster and run through that. fixed costs, even license a SOTA model’s weights if you’d like
  [-]
  - embedding-shape 8 hours ago
    > even license a SOTA model’s weights if you’d like
    Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.
    [-]
    - throwway120385 8 hours ago
      That's going to stop eventually, and I think at that point we're going to see business models more like the major CAD providers.
    - idiotsecant 8 hours ago
      I don't think they'll have a choice, open weights models are not far behind. At some point it's essentially a commodity game
      [-]
      - dkdcdev 8 hours ago
        they also already do this…
        Anthropic and OpenAI license to the public clouds. Google reportedly licenses to Apple. licensing to Fortune 100 companies running on their own infra is an obvious next step
        it is a race to the bottom and I’m not sure the labs win that race. we’ll see!
        [-]
        thewebguyd 2 hours ago
        I'm not sure the labs will win either. I wouldn't be surprised to see OpenAI & Anthropic just get acquired, either by Microsoft or Amazon and their models just become another product offering in their public cloud and and some hybrid on-prem offering like Azure Stack HCI or Azure Stack Hub (already basically a "cloud in a black box" that could become "AI in a box")
  - mrweasel 5 hours ago
    The problem isn't really Uber, Microsoft or Nvidia, it's all the smaller none IT companies that also have developers on staff. They are screwed. $1500 per seat per month is just way to expensive, but they also can't afford to build and maintain their own on-premise solution. If Microsoft can't afford to run CoPilot for their own developer, what chance does any of their customers stand?
    If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.
    [-]
    - treis 3 hours ago
      There's models for every price point. What was SOTA and stupid expensive to run a year ago is a cheap flash model today.
    - skybrian 3 hours ago
      It's an extra 18k a year for developer tools when they're paying how much a year per developer? Having software developers at all isn't cheap.
      Also, I don't believe you need to spend $1500 a month on a coding agent if you optimize usage at all.
      [-]
      - ecshafer 3 hours ago
        $18k a year is a non starter in most companies. Ive seen companies balk at Intellij.
      - mrweasel 3 hours ago
        That depends on where you are. $18K is the equivalent of paying around 15% more for your developer.
        [-]
        ricardobayes 2 hours ago
        In hcol locations yes, but in south of spain you can get full time talent for that figure. It's also an entry-level salary in eastern europe, with ukraine and turkey even being somewhat cheaper.
    - mvdtnz 3 hours ago
      Why are smaller non-IT companies "screwed" because they can't pay out the nose for their developers' AI usage? They're non-IT companies, developers are presumably not on their critical path, or not their bottleneck. Developers can keep on writing code the old way, or doing it with a more reasonable AI spend. I don't see how this "screws" any company.
      [-]
      - mrweasel 3 hours ago
        That was badly worded on my part, my intend was to indicate that there was no way they can or will pay $1500 per month per seat.
        [-]
- ricardobayes 2 hours ago
  128GB machines can't run anything locally that is even nearly as capable as a frontier model like Claude. We can get an idea from deepseek v4 pro being 1.6T model, requiring approx. 860GB VRAM to run.
- jvanderbot 8 hours ago
  Right - the future of LLMs is like ol' windows XP+Dell. Commercialized "things" you run locally offline, co-designed with hardware, with a known productivity suite, and large businesses building the next generation thing and suite with 18mo release cycles (ish).
  [-]
  - treis 3 hours ago
    I don't see it. Leasing equipment and paying per seat license fees makes a lot of accounting and cash flow sense. Maybe when it gets to the point where you can run SOTA LLMs on consumer hardware. But that seems a solid decade and probably much more away.
    Even then it makes more sense to rent the bigger GPU and get your answer faster.
  - nonethewiser 8 hours ago
    XP? I can see the argument for enterprise support but in that case the latest windows OS is going to be virtually free and I dont know if MS and Dell etc. would even support an XP machine. Might even be required for hardware. If no enterprise support wouldnt Linux make a lot more sense?
    I get that if it's offline the security downside of XP doesnt matter, and I assume XP is free, but being free doesnt really seem that valuable compared to alternatives (free linux and virtually free OS if buying wholesale).
    [-]
    - jvanderbot 8 hours ago
      "Windows XP+Dell" should have been in quotes. It's similar to the way enterprise productivity software was developed, packaged co-designed with hardware, and sold on an 18mo upgrade cycle assumption. It's not literally windows xp.
      [-]
      - nonethewiser 4 hours ago
        Oh gotcha. Yeah that's an interesting idea.
  - gedy 3 hours ago
    There's waayyyy too much money betting on that not happening, to the point I feel there'll be regulations popping up for "safety reasons" etc to ensure the big players control this.
    [-]
    - thewebguyd 2 hours ago
      3/4 of Microsoft's BUILD conference the past two days were about local AI, foundry local and Windows ML along with a big section in the keynote about running local workloads on their new hardware with Nvidia. Say what you want about Microsoft's reputation, but they are a "big player" and seem to be moving in the direction of local AI first.
      [-]
      - gedy 1 hour ago
        I would love this to happen of course, just paranoid it won't.
- ungreased0675 8 hours ago
  Your last question is really important. What did they accomplish with all that spend?
  I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?
  [-]
  - fg137 52 minutes ago
    [dead]
- devttyeu 8 hours ago
  If you believe a 128gb machine that is essentially DGX Spark in a laptop chassis can run models comparable to SOTA you either never ran open models on hard tasks, or you aren't scratching the surface of SOTA closed LLM capability in how you're using them.
  [-]
  - f311a 8 hours ago
    Can you show me an example of a hard task that can't be achieved using light models? When we don't want the model to work on autopilot without reviewing the code at all. Even SOTA models will produce garbage code, if you don't guide them all the time.
    Hard tasks require a lot of guidance and code reviewing, unless you are creating another throw away project where correctness, maintainability and code understanding does not matter.
- infecto 8 hours ago
  I am wondering more and more if this becomes true as these smaller models take off. I might be old fashioned but I have yet to crack the workflows some of the hype people spout like Claude codes Boris where he and others talk about running hundreds of agents overnight.
  I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.
  [-]
  - CharlieDigital 4 hours ago
    That's because for some of these folks, the cost of the tokens doesn't have to match the value of the output; the hype from the story is all they need.
    Normal people have to produce something of value from that spend. So starting 100 agents and then waking up to something cool but useless just means you spent a few thousand dollars and created nothing of value............
  - ofjcihen 7 hours ago
    Running hundreds of agents overnight is almost certainly 99 percent waste.
    [-]
    - Pixel-Labs 2 hours ago
      [flagged]
- empath75 3 hours ago
  I think probably the correct spend is something closer to 10x that if people can figure agent coordination problems out. It's not even really about capability at this point, it's about keeping track of what agents are doing.
- sourcecodeplz 8 hours ago
  $1.5kpm for SOTA. 128gb you run DSV4 Flash.
  [-]
  - pqtyw 3 hours ago
    What's the point of running it locally though? Inference for open models is quite cheap already. They could just selfhost, anyway. The experience of running LLMs locally will be excruciatingly bad in comparison at least for the near future.
- jcgrillo 8 hours ago
  > WTF did Uber build with all of that spend?
  WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?
  [-]
  - awesan 8 hours ago
    I can say at least for me at a small-ish company (~40 FTE) there has been a surge in internal productivity tools. Nothing to improve the end user product directly but a lot of tools to make processes easier and less error prone.
    What would previously be janky internal dashboards or excel sheets are now actually nice to use tools. That said of course the maintenance cost of all that has yet to be discovered, and the ROI is questionable.
    [-]
    - CharlieDigital 8 hours ago
      About the same ~40 FTE team. We're doing the same thing. Smattering of internal tools, but no net gain in external revenue. Who knows which of those tools will have any value or ppl are just doing it because it's cool now to make fancy dashboards.
      OK. I guess that's good, too.
    - jcgrillo 8 hours ago
      Yeah this seems to be a pretty widespread story, from what I've heard as well. The thing about those janky dashboards and spreadsheets though is that somebody understood them and built them with intent to solve a particular problem. Despite the rickety appearance, they're trustworthy tools. A polished single page app might look nicer but it's harder to debug than an excel sheet, and much less transparent in its internal workings--especially if nobody actually wrote it...
      [-]
      - izacus 3 hours ago
        More importantly, it's questionable how much extra revenue improving a design of internal tool brings.
  - ftkftk 3 hours ago
    ~70 FTE Engineering team. We are shipping more features, especially features that previously would not have survived the cut to make it on the roadmap. Even though we are shipping more, our total amount of escaped bugs has not increased, so our escape rate has actually lowered. On top of that we are able to triage and fix escaped bugs more quickly now. And then of course there has been an uptick in internal tooling that makes the rest of the company more efficient, and we have been able to address tech debt at a higher rate than before.
    I don't think this would have been possible without having solid engineering culture and processes in place before bringing in ai coding tools.
    And I don't want to sugarcoat it, this hasn't been easy, requires continued discipline, and took well over a year to get good at. And we still have to continuously learn, experiment and adapt our training, tooling, and processes.
    [-]
    - CharlieDigital 2 hours ago
      > We are shipping more features
      That's not really the important question; the important question: is it generating revenue.
      If you increase your spend -> ship more features -> no correlated increase in revenue, that's just burning money.
      If a team of 10 spends 1 extra headcount ($180k/year) and ships features with no corresponding growth in revenue, what does that mean?
      There was probably a reason it was on the backlog (because it didn't really have value).
      [-]
      - ftkftk 1 hour ago
        > is it generating revenue
        Yes! :)
        > There was probably a reason it was on the backlog (because it didn't really have value).
        There are definitely things in the backlog with low value. We don't work those items, even if we could now. The additional bandwidth we have now goes to valuable features that drive revenue and retention metrics. The reason they were on the backlog were because we just didn't have the bandwidth to execute on them well and they were just somewhat less valuable than the critical path items on the roadmap.
  - nonethewiser 8 hours ago
    The real answer?
    Software engineer quality of life.
    There can be an increase in productivity without a corresponding increase in total output. The gains could be captured by software engineers doing a days work in an hour then fucking off in a variety of ways.
    [-]
    - pqtyw 3 hours ago
      > doing a days work in an hour then fucking off in a variety of ways
      Until companies start hiring 5x less engineers than they did before and well.. we are clearly moving towards that direction
      [-]
      - nonethewiser 3 hours ago
        Quite possibly. Doubftul it will happen all at once. If you can get 8 hours of work done in 1 they'd need to ramp up demand 8x. Would be interesting to see that happen over night. Happy monday. Here, take these 30 tickets.
    - MengerSponge 2 hours ago
      But that's an inefficient use of dev salary. Y'all are gonna get ground to smooth well-compensated paste.
    - slopinthebag 3 hours ago
      Yeah I think this is probably most accurate.
  - RugnirViking 8 hours ago
    Imo its pretty clear that anyone who is taking the issue at least somewhat seriously knows the amount of value they provide is not non-zero. However, the problems are manifold: firstly, toolchains vary wildly, from fancy autocomplete, to engineers chatting with codebases they're unfamiliar with, to people integrating them into devops and infra, to people doing spec driven development, with a thousand philosophies inbetween. Many people suspect that those above them in the ladder are on the cusp of massive failure due to losing track of the code, and many people higher on the ladder think those below them are overly cautious. I hate to be the guy saying "oh it must be somewhere in the middle", but I will say at the very least I like being able to use it to read docs for me, and to synthesize syntax and simple scripts (give me a join that works across these tables and gives me column x, y and z - give me a python script that parses a file like this example and extracts abc data - given this api spec figure out how I can get this data from this endpoint, go)
    as for building actually complex software, the art of that is not in simply chaining together such scripts. Its the art of using architecture and testing to shape uncertainty, and developing requirements (and extrapolating sensibly from incomplete requirements). I don't think llms are great at this, but they arent terrible either. A lot of the more active users in the space are doing stuff where theyve realised they need more detailed specs, which like, yeah, we knew this already - better defined problems lead to better software.
    [-]
    - jcgrillo 8 hours ago
      I agree the most interesting use cases I've heard of are about increasing the rigor of software development practices, but there's definitely a lack of coherence in methodology.. I believe that some users and companies are successful in this effort, but the odd (and interesting!) thing is that so far we don't seem to know how to communicate how to do it successfully.
- m3kw9 8 hours ago
  You can't get an edge using local models, these guys may have competitors that will spend on SOTA models. They won't likely ever consider local machines even for some offloading scenarios, the complexity and costs will be even higher.
  [-]
  - CharlieDigital 8 hours ago
    Consider rewiring your perspective: getting an edge doesn't really matter; the only thing that matters is will customers pay for this? Is this a useful, valuable problem to solve?
    Coding faster doesn't really solve that.
    Uber makes more money if people buy more rides, order more food, have some breakthrough in autonomous driving. They can save money if they can optimize some ops or spend somewhere. Is there any evidence that with the spend on AI that they achieved any of this? If they did, I'm sure we'd hear about it in some engineering blog.
- analognoise 7 hours ago
  18k/yr? None of the LLMs generate anything like that in value!
  [-]
  - simonw 7 hours ago
    I'm definitely getting that much value out of Claude Code and Copilot.
    [-]
    - CharlieDigital 7 hours ago
      You're a content creator; you define your revenue stream.
      Uber engineers do not define their revenue stream; the product leadership team does.
      $1500/mo of AI spend by engineers does not equate to revenue. They need to figure out revenue first before zeroing in on AI spend.
      [-]
      - Daishiman 3 hours ago
        $18K a year is a fraction of the salary of a junior engineer.
        Claude has allowed me to do refactors that would have taken weeks to instead take a couple of days. It has, objectively, increased the velocity of the engineering component of greenfield features by 40% in my org. You can put a number value on that and decide if it gives you favorable ROI.
        [-]
        fg137 58 minutes ago
        In the old world, the refactor probably won't happen in the first place, but the effort would be put elsewhere. "Increased velocity of .. greenfield features" doesn't directly translate to additional revenue, and your number is very questionable in the first place.
        Software engineers like to talk as if business and finance are as easy as pushing code out and refactoring. It's not and never has been.
        jg0r3 2 hours ago
        $18k a year is near half of my salary as junior verging on senior developer in the conservation field. Not everyone works in FAANG.
        analognoise 2 hours ago
        The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).
        You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
        It’s a lose/lose situation for…I would say anyone employed as an engineer or programmer. I’m not taking responsible for AI output, the same way I won’t try to fix auto-generated code: because you just regenerate it.
        The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
        [-]
        Marsymars 1 hour ago
        > The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).
        I'm pretty pessimistic on AI and don't have access to good agentic workflows, but refactors are exactly the thing where it seems to me like agents could be really strong - once I've refactored something architecturally, I might have hundreds of instances of a thing that needs to be updated in a predictable way, but is complicated enough that it's going to be faster for me to manually update hundreds of instances rather than writing a generalizable find/replace tool.
        Daishiman 1 hour ago
        > The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).
        Absolutely false. Refactors (in my case) can be as simple as dropping old packages for newer packages with slightly different semantics. It can be moving legacy pages from jQuery to Vue.
        > You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.
        I've 25 years coding, trust me, I don't lose anything by not finding out on my own that the semantics of a jQuery promise changed between major versions.
        > The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.
        You have no idea of what you're talking about. There are entire classes of K8s networking issues that would have taken me a day to debug which Claude solved in minutes just because it can run 20 diagnostics commands in two minutes and deal with technical minutae that is time-consuming but ultimately irrelevant to my business goals.
    - ofjcihen 7 hours ago
      Can you share some examples that you would say justify that price? Not a gotcha, I’m genuinely curious where you’re seeing a return at that level.
      [-]
      - simonw 6 hours ago
        I've written tens of thousands of lines of tested, working code that I would not have written otherwise, and that code is useful to me.
        I effectively get to operate at the rate of a small team of engineers - I know that because I've managed small teams of engineers in the past.
        [-]
        ofjcihen 5 hours ago
        > that I would not have written otherwise
        I think this is the part I struggle with. The code I write makes me money or is a way of teaching me something, both of which are reasons that I would write the code regardless.
        I don’t think I have any projects in mind that I’d be willing to spend half of a car on that I also wouldn’t have written myself.
        Obviously just a personal take though. I’m glad you get the usage you want out of it.
        [-]
        simonw 4 hours ago
        My "job" is building open source software for data journalism (and anyone else who needs the tools data journalists need, which is pretty much everyone else). I can build more of those tools, and better, in exchange for a fraction of the cost it would take to hire a team to help.
suncemoje 47 minutes ago
Lock-in / switching costs are increasingly concerning me. I am using Claude for a good year now and have been accumulating so much "knowledge" in there by now. If Claude became less favorable in terms of price/performance in the future, that would worry me. I've started to think about a distributed solution, where my storage is detached from the inference, but currently Claude is still the way to go for me. Wondering if anyone has similar concerns?
[-]
- dadoomer 34 minutes ago
  Isn't all the "knowledge" just text files? I've transitioned between services easily by simply copying the text files.
- sparrc 35 minutes ago
  My favorite solution to this is to use the Cline coding agent, which is open and allows you to easily switch between different providers and models.
- spicyusername 39 minutes ago
  Knowledge in there?
  Where is the knowledge stored?
  All of my knowledge typically gets stored in plans outside of the agent?
  And each agent window gets archived regularly, anyways.
colonelspace 1 hour ago
If a worker doesn't use their AI/LLM budget, can they get a raise?
[-]
- asadm 1 hour ago
  probably will get fired for lack of performance.
  [-]
  - colonelspace 54 minutes ago
    Let's just say their performance (OKR, KPI, whatever "impact" metric you want) was indistinguishable from a peer that used the AI/LLM monthly allowance in full.
    Maybe a $10k raise would be nice?
    [-]
    - HDThoreaun 48 minutes ago
      Theyd get a bad review for leaving performance on the table. When has finishing your work ever resulted in anything other than more work?
      [-]
      - conartist6 28 minutes ago
        It's disturbingly anti-merotocratic. You're not allowed to prove that you're more useful without AI because they just assume that AI is a 10x multiplier on everyone.
john01dav 2 hours ago
Why isn't self hosting (even just renting a GPU server, not necessarily on premise) at large companies or hosting via something like together AI to run the open weight models not more common? I've tried the open weight models and the premium models like Opus and Gemini Pro, and I find that the latter are a little better, but not nearly to the degree to justify the extreme price difference, since the differences largely don't matter for what I've tried them for, and I expect that many other users likely have similar use cases.
[-]
- Jianghong94 1 hour ago
  I just went through a similar discussion in my $WORK (traditional finance company on NYSE with average IT expertise) and I think the thought process is as such: it's one thing to just give your stellar dev/hacker a beefy GPU server and run whatever model they can run; it's another thing to maintain such platform for company wide. You would need human resource (likely way above normal software dev paygrade) to understand and maintain such models, maintain backend, availability etc. All these extra hassle make it just easier to pay a top tier external lab + slap a reasonable spending limit on everybody.
- soleveloper 1 hour ago
  If the premium models are just about 10% better - that could justify the price vs. self hosting a ~0.5-1T open weights model.
  Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially.
  Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits.
  Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases.
  For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice.
  No one got fired from licensing claude code.
- esikich 1 hour ago
  Why do you think it would be more common? The pooling of GPUs to serve multiple users and connecting to docs/datalakes while respecting security controls, as a start, is non-trivial. You'd end up paying a team to manage that.
- fg137 1 hour ago
  For the same reasons companies are not building data centers for their "regular" hosting and storage needs but put things on AWS, Azure etc.
  It costs money to maintain the hardware and hire experts to manage the services. For something as common as LLM models, there is absolutely no reason a company serves models on their own hardware unless they are maniac about sending bytes to AWS.
- fg137 41 minutes ago
  > I've tried the open weight models ...
  You tried that on a personal machine for yourself once. It's completely different calculation when serving a model to 3000 employees with ever evolving hardware and software requirements. You'll need dedicated hardware in data centers and experts to run them. A company will need to figure out how to manage acquisition, assets and expenses plus 1000 other things, in addition to its actual business. Guess who has figured out all of that already? AWS/Azure/OpenAI etc.
- datsci_est_2015 1 hour ago
  There’s probably plenty of money to be made in LLMs as a service - but not enough time has passed for the commodification to occur. I’m with you in that when the dust settles I don’t think any of the frontier model providers will have a moat. Just like during the dotcom boom a catchy URL and a webpage that could accept payments wasn’t a moat, either.
- malfist 1 hour ago
  Where are you buying the GPUs to have enough compute to run a medium size buisness?
- throwaway613746 1 hour ago
  [dead]
jkwang 8 hours ago
The $1500 number is less interesting than the fact that they hit a ceiling at all. Most engineering teams I've talked to have no idea what their AI spend is per developer because it's buried in a consolidated cloud bill. Having a hard cap forces two useful conversations: what workflows actually justify API calls vs local inference, and whether the output is being measured against any real productivity metric. Without that feedback loop it's just a race to see who can burn tokens fastest.
[-]
- simonw 8 hours ago
  Both the Anthropic and OpenAI "Enterprise" plans include per-developer analytics:
  Anthropic: https://support.claude.com/en/articles/12883420-view-usage-a...
  OpenAI: https://help.openai.com/en/articles/10875114-workspace-analy...
  [-]
  - Igrom 3 hours ago
    I believe you might be replying to a bot account.
    [-]
    - lazyasciiart 3 hours ago
      What makes it look like one? All their dead comments read pretty normal to me.
geodel 2 hours ago
> A $1,500 monthly limit per tool strikes me as a rational policy response to over-spending,...
> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.
This whole article seems to me like Multi level marketing "businesses" where 'Diamonds' have made their money by promoting MLM in seminars and telling hopefuls at bottom that "Buying AI subscription now is their one shot to be a winner in life"
Perhaps there is something to MLM vs LLM to create a FOMO effect.
[-]
- iLoveOncall 1 hour ago
  That's just Simon Willison since LLMs came out. It's glaringly obvious that he's a paid shill.
  [-]
  - fontain 1 hour ago
    oh come on, a paid shill?
    Simon is very fascinated by AI and at times he can be a little too optimistic but he is generally balanced and his perspective evolves over time which can be seen in his writing.
    Nerd who loves nerd things a little too much? Sure. Paid shill by Big LLM? Nah.
    [-]
    - iLoveOncall 1 hour ago
      Yes, a paid shill. You can find a clear point in time where he shifted from sceptic to 1000% fully onboard non-stop praise, with no reason.
      [-]
      - HDThoreaun 47 minutes ago
        Maybe the reason is because he thought the tools became really powerful?
szatkus 2 hours ago
That's a lot. On my usual day I burn less than $1 on Opus. I could get beyond $10 only if I have a complex and well-defined problem, which is rare (the second part at least).
pmontra 3 hours ago
I wonder what they are doing with $1500 per month. I'm on Claude Pro $20 plan and I'm doing well. That's 3 days per week. On the other 2 days I'm using a customer's Claude Max, I don't know if it's the $100 or the $200 plan, but I'm sharing it with some of its other developers.
[-]
- hrpnk 3 hours ago
  $1500/mth is token pricing.
  Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly. These plans are economical only if majority of users spend less tokens in $ than the plan's costs. This subsidizes the gap vs. power users who spend multiple k$ monthly in API tokens.
  [-]
  - pmontra 2 hours ago
    > Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly.
    Or the fixed cost plans reflect the real cost and the people paying API prices give them the profit.
    Anyway, none of my customers will let me bill them $1500 more (about $75 per day) because I'm using AI. And what for? I'm not working to move money from the pockets of my customers to the pockets of AI companies.
    [-]
    - fontain 1 hour ago
      No, we know from the financials of these companies that API prices are close to being at cost and the individual developer plans are heavily subsidized (because they are roughly 10% of API cost per token[1]).
      If plans were at cost and API pricing was marked up that would mean there’s a 90%+ profit margin on tokens and instead of raising money and talking about revenue, Anthropic and OpenAI would be talking about their obscene profits.
      [1] the caveat is that the average plan user probably doesn’t use all of their quota, I guess maybe 30% is the average across all users.
  - kingstnap 2 hours ago
    Next to no one would be using less than the subscription price given how expensive Opus API is.
  - flyinglizard 3 hours ago
    Yea, I’m sure the personal plans are subsidized. I have $200 Claude Max at home and straight API pricing at work and equivalent work would easily cost me 5x if not more on the API.
- SyneRyder 2 hours ago
  I'm on a $100 Claude Max plan, my usage is only about 50% of the plan limits, but in the last 30 days my usage was equivalent to API token spend of $1850. If you save all your Claude Code conversations, the saved files include API costs and you can calculate this yourself.
  One of my most expensive sessions cost me over $100 in token spend in a single evening. I'd just found out that the time tracking & invoicing SaaS I use is increasing their monthly pricing by 2.4x - so I assigned Claude Opus 4.8 to recreate the entire SaaS for myself, and load in 13 years of my historical data. I've only completed a full read-only implementation so far, with adding & editing of records still to come, but I do expect Claude will have fully recreated the entire SaaS for me at an API cost less than a single 1 year seat of continued subscription to their service. And since I'm actually on a Max plan, it didn't actually cost me $200 of tokens at all.
  coff i would not buy the Bending Spoons IPO coff saaspocalypse
  I could ramble on about where the other $1750 of usage goes, but I imagine it's similar for most heavy Claude / AI users. Interactive coding sessions, a daily personalized podcast, some automated overnight agentic "proactive" sessions, a daemon that wakes up if I send Claude an email or voicetext to check something when I'm out. I've also noticed that if Claude's tool-use goes haywire & Claude gets confused or lost, sometimes a single email reply session that would normally be just $1 of API might spiral to $12 of API while it bangs its head against trying to run a program that's in a different folder to the one it's currently in. Sometimes a simple 'pwd' would save you a lot of headache, Claude....
- idiliv 3 hours ago
  Uber is likely on an enterprise plan - these charge tokens at API cost, which can be much more expensive than the $20 flat rate.
etothet 3 hours ago
In my experience, this is far below the cost the average dev will incur per month so this seems very reasonable to me. And, no doubt there are exceptions for heavy users so they can get some extra token usage when they need it.
[-]
- waffuldrop 3 hours ago
  unless they changed something in the like 2 months (edit: besides implementing a cap for claude code specifically, since other tools already had caps) since ive left my job there im pretty sure 1500$ is the very max you can use after maxing out free calls, initial budget, then 2 extensions individually reviewed by your manager
  higher ups pushed for these last 2 years to be AI focused so I don't think this restriction is a measure of "don't use too much AI" as much as it is a measure of "don't use only 'manual' AI tooling" since we had a dozen more specialized tools in-house running locally or otherwise that didn't count towards the budget
cmiles8 2 hours ago
And $1500 a month is on the very high end of where most companies will land. When you run the numbers there isn’t a realistic path that connects the dots between likely market size and the claimed valuation of the AI companies. The math simply does not add up.
newobj 4 hours ago
It's also a useful signal for AI value. Looks like it's a max value add of $18,000 per engineer per year.
[-]
- Anon1096 3 hours ago
  No, that's not what it means at all even if just doing it purely in math terms. Really it is just a reasonable amount to cap at to stop the long tail of super spenders (tokenmaxxers). You could also call it "the amount of AI spend after which Uber has decided there is diminishing returns for the average engineer".
  [-]
  - dandellion 2 hours ago
    I'm sure if a dev can show useful results at 1k they won't have trouble getting permission for a higher cap as well.
- csallen 3 hours ago
  It's not so simple to determine and generalize how much value AI adds. It's going to be different on a per-company basis and a per-engineer basis. It's also affected by the competitive market place and how many other companies are using AI for their engineers.
  For example, what if you're a tiny startup and you're considering whether to hire an extra engineer or do all the coding yourself. I would estimate that AI is worth far more than $18,000 a year in that situation where you might reasonably decide to put off hiring an engineer.
- pqtyw 3 hours ago
  I find it really doubtful anyone has managed to quantify that in any meaningful way. Seems like mostly an arbitrary number. Also the article does claim that's its actual several times more than 18k if you are fine with using Codex, Cursor or etc. when you Claude tokens run out.
- themafia 34 minutes ago
  It means Uber thinks they can sustain that level of expense. Whether engineers at Uber are representative of the rest of the work force is an easily debatable question.
- alasano 3 hours ago
  Their initial budget for determining how much value AI adds is $18,000 per engineer.
- tfehring 3 hours ago
  Not really. There are clearly diminishing marginal returns, so it's likely that the first $2,400/engineer/year adds >>$2,400 of value, even if 18,001st $/engineer/year adds <$1 of value.
- eqvinox 3 hours ago
  It's among a wave of fresh "non-insane" takes on AI in the enterprise. Maybe we can reel things in to a sustainable level before a giant bubble bursts.
- jdkdksksn 3 hours ago
  [dead]
galaxyLogic 3 hours ago
It's probabaly a good things that Uber-developers are now forced to do some coding on their own. Only use AI where it absolutely helps
[-]
- sva_ 3 hours ago
  Or be smarter about their usage. $50 on tokens per day can get you a long way.
  [-]
  - estomagordo 3 hours ago
    Some people also take weekends off.
- aerhardt 2 hours ago
  I don't think at $1,500 you're not forced to code on your own at all, in the sense of typing code. You're simply forced to not yolo-max twelve parallel agents at all times.
walthamstow 45 minutes ago
I think a lot of people are missing that this is $1500 _per tool_ which is still rather a lot of money.
transitorykris 1 hour ago
Is anyone doing story point estimation in terms of tokens? If you have a token budget, does this change how you prioritize?
[-]
- sanex 41 minutes ago
  I think there's too much variance between what model you're using and how much you turn your brain off. If I just paste a ticket number into 4.8xHigh its going to use a lot more tokens than if I read the ticket, tell Sonnet what it needs to do, make my commit, run unit tests myself, etc.
PessimalDecimal 8 hours ago
These are still at currently subsidized prices. We'll see if they think they're getting $1500/month of value when that buys significantly fewer tokens.
[-]
- square_usual 8 hours ago
  There is no evidence that per-token inference prices (which is what Uber is setting a cap on) is subsidized.
  [-]
  - pier25 8 hours ago
    AI companies have more expenses than inference.
    [-]
    - RugnirViking 8 hours ago
      yes, and theres no evidence that they arent (or can't) use profitable inference to subsidise those other expenses. Some companies will keep spending massively to train better models, and some other companies will not, and offer good api prices. Which will end up being used? That depends on whether the spending turns into better value models
      [-]
      - pier25 3 hours ago
        > theres no evidence that they arent (or can't) use profitable inference to subsidise those other expenses
        as far as we know there's no evidence that they can produce any profits at all
  - lelanthran 8 hours ago
    Is there any evidence that it's not?
    [-]
    - Topfi 8 hours ago
      The fact that Anthropic models are offered at the same API pricing by not just themselves but AWS, Azure and Vertex despite Anthropic taking a major slice on licensing along with the cost an open weight 1T parameter model like K2.6 costs to run on any third-party provider, make it unlikely that API inference cost are subsidized by the labs.
    - pqtyw 3 hours ago
      Openrouter? i.e. Even excluding Deep Seek inference for very large open models is way cheaper. Maybe these providers are not very profitable but its highly unlikely that they are losing $4 for every $1 they make since selling inference is their only product...
    - thejazzman 8 hours ago
      Yes; they ban various uses of their subscriptions but say you can do whatever if you’re paying for the API without limits
      [-]
      - pqtyw 3 hours ago
        That's just market segmentation and them trying to maximize revenue it doesen't really say anything about their costs.
      - lelanthran 8 hours ago
        That's not evidence. Very likely though, but the only evidence we get one way or another is when they IPO.
      - simonw 8 hours ago
        This story isn't about those subscriptions - enterprise customers like Uber are paying the full API prices.
- pdyc 8 hours ago
  afaik, enterprise plans are not subsidized. its 20$/seat+api pricing. Unless you are saying api pricing itself is subsidized.
  [-]
  - LurkandComment 8 hours ago
    This is market introductory pricing that hasn't factored in cost recovery. Most of it has been run on early investment with the assumption they will recover costs in the long run. The prices are subsidized across the board and they will need to go up signficantly to recover them.
    [-]
    - swiftcoder 8 hours ago
      Assuming this were accurate, then presumably the AI companies would be betting that inference costs come down before the bill is due - I don't see enterprises being willing to absorb another ~10x price increase for tokens (as they've just done going from subscription prices to per-token pricing)
      [-]
      - LurkandComment 7 hours ago
        For claude shops this was a huge hit. But lets back this up. There are some companies that haven't even built a break-even model at this price because they are funded by investment. As soon as those investors lose patience the first dominos will fall. For those who have somewhat of a business model, will it survive a price increase? The bigger question is do the base model providers have enough runway and have a way to keep going as they need to recover costs.
        [-]
        pqtyw 3 hours ago
        It's mostly R&D though, not inference. If LLM's effectively become a commodity then they are screwed anyway.
        [-]
        swiftcoder 2 hours ago
        Aren’t the Chinese labs quickly turning them into a commodity?
        The open-weight models will have a steady race to the bottom on inference costs just by dint of competition between providers. They aren’t at the frontier yet, but they are rapidly eating the flash market.
    - pqtyw 3 hours ago
      Yeah, that's not going to work if you can get e.g. 80% of value by using 10-20x or more cheaper open models. At some point it would just make sense for large companies to rent compute and deploy their version of DeepSeek or whatever (if they don't trust Chinese providers)
    - logancbrown 8 hours ago
      None of what you said is true
      [-]
      - rimliu 8 hours ago
        And you know this how?
- pqtyw 3 hours ago
  The inference prices for very large open models would indicate that Antrophic's and OpenAI's margins are quite large.
- boringg 8 hours ago
  True but they will raise prices slowly so people will optimize their workflow so they aren't just throwing as much inference as fast as possible like the current state. Right now you should do everything you wanted to try out because it is cheap (as long as you don't become dependent ... the risk).
- sourcecodeplz 8 hours ago
  I understand current Codex $20 sub is worth about $480 GPT5 api credits.
  [-]
  - esafak 3 hours ago
    Way more. Track with https://github.com/junhoyeo/tokscale
- MagicMoonlight 8 hours ago
  It's not. They recently forced enterprise customers onto API billing instead of the cheap consumer pricing. Now the pricing is brutal.
epsteingpt 8 hours ago
Uber engineers reported that loading their workspace and pulling recent commits exhausted that AI limit for Claude Code (4.8 x-high) immediately.
[-]
- wmf 3 hours ago
  I don't think loading up a single context window costs $1,500. Which limit are you talking about?
rasbmn 3 hours ago
Uber is in the business of experimenting with robotaxis and automated food delivery.
They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.
There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.
Probably they'll quietly reduce the number more soon.
[-]
- lazyasciiart 3 hours ago
  Is this inside knowledge, or speculation?
LurkandComment 8 hours ago
1) This happened because they fundementally misunderstand how to use AI and how AI is priced 2) Most organizations are throwing everything in for analyses and not limiting the answer they want. You need to be specific of about what you analyze and what answers you want 3) People undervalue prompting or templated responses. I will have written. validated and sanity checked a prompt several times and run it across several models before I say its ready for use. But when it is, I know what it will give me and that the scope of its research and answer is as close to what I want as it can be. As little excess as I can. This all saves tokens
5701652400 2 hours ago
eventually tokens will cost price of energy. and china is miles ahead.
china will be major token exporter soon. mark my words.
[-]
- cmiles8 4 minutes ago
  Electricity actually is only a small part of the data center costs. There are challenges in getting enough electricity that create problems, but the cost of the electricity really isn’t an issue.
- dude250711 1 hour ago
  Technically, tokens travel both ways.
cadamsdotcom 53 minutes ago
Token costs rising because data center build costs must be paid down.. is not the whole picture. It is actually possible for token costs to fall despite the spending frenzy.
Naively you’d expect to always keep paying more - but growth in token usage is what changes the equation. Amortizing debt over an exponentially growing amount of spend across a growing customer base (not per customer) lets the debt be paid off & costs covered even as each individual’s spend stays steady or even goes down - but it only works if there’s growth beyond some threshold that makes the whole thing hang together. No one on the outside knows how much growth that is, and everyone chases maximum growth.
Jevons Paradox ends up being your friend as well as the friend of the inference providers as well as the friend of the inference financiers.
If it’s a strong enough effect, it has potential to cancel out all the circular financing too, and let everyone ride out the bursting of the bubble.
KnuthIsGod 49 minutes ago
China will bring down the price per million tokens.
hrpnk 3 hours ago
If budgeted at $1,500/month per user, power users still can get 5-10x of that allocation if the user pool is large enough.
nphardon 34 minutes ago
It's wild; at my shop in Silicon Valley they dropped us from unlimited use to 60% prem budget on copilot. People are walking around like zombies.
[-]
- conartist6 32 minutes ago
  Poor people! Thinking takes calories
jwpapi 8 hours ago
If you estimate 10k salary per engineer that means the moment it’s cheaper for them to hire another engineer but that doesn’t mean it’s improving productivity 15% but if 15% is the moment it stopped being better than another human we can assume 7.5%?
Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.
This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?
ilia-a 8 hours ago
Seems odd limit, especially since it highly dependant on Token provider used, with Opus this is not much and could easily be burnt in a week or less, but with something like deepseek the 1500 can literarily be an annual budget.
That being said, I do have to wonder why someone as bug as say Uber, simply not rollout OSS model in the cloud for their team, I'd imagine that would be cheapest & most flexible option, while also keeping all the data shared with LLM private.
[-]
- iceman28 8 hours ago
  It’s not just about the model but also setting up the system to create and share compute (GPUs) which is quite complicated on its own. Ubers primary business focus isn’t infrastructure.
ipunchghosts 1 hour ago
Why aren't they using Claude code 20x for 200/month?
[-]
- hazelnut 1 hour ago
  if you have more than x seats, you have to use Enterprise pricing as far as I know which is pay as you go with a pool.
insane_dreamer 1 hour ago
I still have never hit a ceiling with my Claude Max $100 account, much less the Max $200 account. I'm not burning tokens needlessly, nor running it all day, but I do use CC almost daily. What are these devs doing that they are burning more than $1500 in tokens a month?
Maybe it's just me, but I still find that I really have to "shepherd" the AI and work with it to get the results I want. And I read every line of code added and challenge the model's logic. So that limits my token burning. Maybe these people are just "vibe-coding" without really checking the results?
ChrisArchitect 9 hours ago
Related:
Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing
https://news.ycombinator.com/item?id=48268871
Uber torches 2026 AI budget on Claude Code in four months
https://news.ycombinator.com/item?id=47976415
Corporate America Is Starting to Ration AI as Cost Skyrockets
https://news.ycombinator.com/item?id=48335388
cloudking 8 hours ago
They are also beholden to enterprise pricing and can't use the subsidized consumer max plans.
cyanydeez 2 hours ago
no....the fact that you could buy a reasonably prices MAC or AMD395+ thats AI tool pricing; it loads a big enough model and spits out tokens just fast enough that you can read what it's doing and comprehend it instead of magic.
That's the most useful signal. Pre OpenAI mafia RAM pricing, that comes out to $250/month.
sremani 7 hours ago
I have strong conviction that companies will now choose tech stack/programming languages based on 'tokenomics'. I am vibe coding using Clojure, a language I can read but cannot write and I never hit the usage limits even when using the latest model on Claude. I have similar experience with F#, which is a bit more verbose than clojure but absolutely beats every OOP language, Python, Typescript etc.
The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.
In my not so humble opinion Lisp(Clojure) still remains the language of AI.
[-]
- genericone 1 hour ago
  Typescript is also hugely represented. My projects are TS in a big way, where I have no experience with it at all.
jedisct1 8 hours ago
A lot of things can be done with local models.
[-]
- rimliu 8 hours ago
  Even more things can be done without any models just as well.
  [-]
  - dude250711 8 hours ago
    Single developers seeking local models.
    [-]
    - fHr 1 hour ago
      goated comment
agentbc9000 4 minutes ago
[flagged]
dmaso191 2 hours ago
[flagged]
throwaway613746 3 hours ago
[dead]
Ozzie-D 2 hours ago
[flagged]
ashahin 8 hours ago
[flagged]
[-]
- onlyrealcuzzo 8 hours ago
  It's interesting to me how ineffective LLMs are at refactoring, but when you think closely about how they work, it makes sense.
  They are good at searching for things that have been done 10,000 times before, and slightly changing them. This is the majority of all "new" features.
  Almost nothing is "new"...
  Refactors are not this. If you can't just write a gsub to do the work, they need to essentially break it up into N problems to solve, each of them pretty slow and expensive. Sure, none of these problems individually are "new" - which is why they can do it. But they can't do it as effectively as you'd think.
  [-]
  - jbvlkt 4 hours ago
    Exactly my experience. I always refactor first myself then delegate boring tasks to AI. It saves me energy, time and also tokens. If code is not prepared for easy implementation agents always fail.
- hanzeweiasa 8 hours ago
  Good point about the unit of consumption shifting from prompts to agent loops. That makes pricing even trickier for vertical-specific AI tools.
  We see this firsthand building AI Workdeck (open-source AI workspace for legal teams). A single due diligence review might chain 20+ agent calls: OCR -> text extraction -> clause classification -> risk scoring -> evidence chain assembly. The user sees one action, but the backend burns through significant inference.
  The interesting thing about vertical tools is the pricing model can be fundamentally different. Horizontal tools charge per seat or per token. But in legal, the value is in the document, not the seat. A lawyer reviewing a 500-page M&A file gets way more value than one reviewing a 2-page NDA.
  Self-hosting changes the calculus too. Our users run on their own infra, so the AI cost is whatever their GPU costs. That makes $1,500/month caps less relevant and throughput optimization more important.
- slopinthebag 3 hours ago
  LLM generated comments are against site rules btw.