Nvidia Launches Vera CPU, Purpose-Built for Agentic AI

(nvidianews.nvidia.com)

161 points | by lewismenelaws 17 hours ago

26 comments

WhitneyLand 13 hours ago
Agentic AI CPU? No.
It’s a CPU designed for an AI cluster. Their last CPU Grace was the same thing and no one called it agentic.
Vera now just has more performance/more bandwidth. It’s cool, I’d like to have one of these clusters, but this is not new.
It’s marketed as agentic AI because that’s fashionable in 2026.
[-]
- storus 13 hours ago
  They significantly lowered latency compared to EPYC/Xeon, which is critical for streaming agents (e.g. text/audio/video agents).
  [-]
  - stingraycharles 11 hours ago
    What latency? How much is it compared to LLM inference speed?
    [-]
    - storus 12 minutes ago
      See the Redpanda comment/link here.
PeterCorless 16 hours ago
This is the related benchmark blog from Redpanda [disclosure: I work for Redpanda and I helped write this. Credit to Travis Downs & others at Redpanda for the heavy lifting on the testing and analysis.]
https://www.redpanda.com/blog/nvidia-vera-cpu-performance-be...
baal80spam 16 hours ago
Say what you want about NVIDIA (to me they are just doing what every company would do in their place), but they create engineering marvels.
[-]
- szmarczak 13 hours ago
  [flagged]
  [-]
  - jazzpush2 13 hours ago
    Who is NVIDIA killing, exactly?
    Is Apple complicit in killings because operators planned missions on Macbooks? Dell? Microsoft?
    [-]
    - penguin_booze 6 hours ago
      Whom, not who.
    - szmarczak 6 hours ago
      [flagged]
      [-]
      - tomhow 5 hours ago
        > You're lacking all the brain cells.
        You can't comment like this on Hacker News, no matter what you're replying to.
        Please take a moment to read the guidelines and make an effort to observe them in future.
        https://news.ycombinator.com/newsguidelines.html
        [-]
        szmarczak 5 hours ago
        Immoral behavior needs to be called out. See https://news.ycombinator.com/item?id=47274075
        [-]
        tomhow 4 hours ago
        The guidelines don’t get disregarded just because the topic is important to you. People have been trying that trick since the beginning. The guidelines are only worth having if they enable us to discuss difficult topics without burning this place to the ground. If you’re not able to discuss a topic without making swipes about someone’s “brain cells”, you need to find a different discussion forum. This site is only here for you to post that kind of trash because other people make the effort to raise the standards rather than drag them down.
        [-]
        szmarczak 3 hours ago
        > The guidelines are only worth having
        I'm not saying they aren't.
        [1] This question is trash as well. I admit I got baited. My bad.
        [1] https://news.ycombinator.com/item?id=47406857
  - zoklet-enjoyer 13 hours ago
    GE, Samsung, Microsoft, Google, IBM, and so many others
d_silin 17 hours ago
It is a 88-core ARM v9 chip, for somewhat more detailed spec.
[-]
- PeterCorless 16 hours ago
  Vera does what NVIDIA calls Spatial Multithreading, "physically partitioning each core’s resources rather than time slicing them, allowing the system to optimize for performance or density at runtime." A kind of static hyperthreading; you get two threads per core.
  It's somewhat different from how x86 chips do simultaneous multithreading (SMT),
- mixmastamyk 16 hours ago
  Hmm, the 128-core Ampere Altra CPU is already available, and in a case from System76. I wonder what else differentiates it.
  If they're going to build CPUs I wish they had used Risc-V instead. They are using it somewhat already.
  [-]
  - OneDeuxTriSeiGo 15 hours ago
    You can see here[1] what the specs are for the CPU (listed as "NVIDIA Vera Rubin Superchip").
    The CPU is integrated with two Rubin GPUs but each of the CPU cores has dedicated FP8 acceleration as well.
    1. https://www.nvidia.com/en-us/data-center/vera-rubin-nvl72/
  - ibgeek 9 hours ago
    I own one of these systems. My interpretation is the Ampere systems are targeted at lower cost scale out. The Ampere Altra CPUs are limited to DDR4. The raw single core performance doesn’t match Intel or AMD offerings. You get a lot of cores for a lower hardware cost and at lower energy usage.
    The Nvidia CPUs are designed for a very specific use case. They are designed for high performance with less concern about cost control.
    The newer AmpereOne CPUs use DDR5 with the AmpereOne M supporting even higher memory bandwidth. Even then, I doubt the AmpereOne CPUs will match the performance of the Nvidia Rubin CPUs. But the Ampere processors are available for general use. I am guessing that Nvidia is only going to sell the complete rack system and only to high-volume customers.
gcanyon 16 hours ago
Anyone know how this compares to Apple’s M5 chips? Or is that comparison <takes off sunglasses> apples to oranges.
[-]
- pdpi 16 hours ago
  Features like hardware FP8 support definitely make it apples-to-oranges.
  [-]
  - philjohn 14 hours ago
    But doesn't the Apple M series NPU support FP8, and as it's a monolithic die (except for the GPU in the M5 Pro and Max) it could be argued it has hardware FP8 support, no?
    [-]
    - pdpi 12 hours ago
      By that logic, on the M4 (which still has the GPU on the same die as the CPU), CPU cores have hardware accelerated raytracing, which is obviously nonsense.
    - llm_nerd 2 hours ago
      Apple's hardware does not support FP8 (neither the ANE NPU, or the new "neural accelerator" tensor cores), though the most recent variant supports INT8.
    - badc0ffee 14 hours ago
      I thought the M5 had FP16 support, and not FP8.
- pjmlp 5 hours ago
  It doesn't matter, because you will never find M5 chips on cloud offerings, or server racks.
  It is kind of rediculous that the only server option with Apple hardware has been to stack up mac minis.
  They got rid of the server and workstation market, focusing on consumers only.
- storus 16 hours ago
  Grace GB10, Vera's predecessor, had a single core performance comparable to M3 so I guess we can expect at least M4 level performance now.
  [-]
  - porphyra 16 hours ago
    Isn't the GB10 a Mediatek chip and not directly related to the Grace datacenter CPU?
    [-]
    - wtallis 15 hours ago
      More fair to say it's completely unrelated to the Grace data center CPU.
    - llm_nerd 15 hours ago
      The DGX Spark (and the white box variants of it) run on the Grace Blackwell GB10 "superchip".
      [-]
- d_silin 16 hours ago
  M5 are 9-18 cores and optimized for power-efficiency, those are more like Xeons, with 200-300W TDP, I'd bet.
  [-]
  - kllrnohj 16 hours ago
    If M5 has 9-18 cores and takes ~20w, then that's ~1-2w per CPU core. If these are 200-300W, and have ~100-200 CPU cores, then guess what? That's also ~1-2w per CPU core.
    Xeons, Epycs, whatever this is - they are all also typically optimized for power efficiency. That's how they can fit so many CPU cores in 200-300W.
tencentshill 17 hours ago
So does this cut out Intel/x86 from all the massive new datacenter buildouts entirely? They've already lost Apple as a customer and are not competitive in the consumer space. I don't see how they can realistically grow at all with x86.
[-]
- alecco 17 hours ago
  Even Apple hardware looks inexpensive compared to Nvidia's huge premium. And never mind the order backlog.
  x86 and Apple already sell CPUs with integrated memory and high bandwidth interconnects. And I bet eventually Intel's beancounter board will wake up and allow engineering to make one, too.
  But competition is good for the market.
  [-]
  - storus 16 hours ago
    Apple went from a high-end PC to a low-end AI provider due to blocking Nvidia on their platform.
    [-]
    - pjmlp 35 minutes ago
      Worse, because since they no longer care about workstation market, there are pluggable cards, and no update to the chese grater, which the Studio is not comparable to.
      They also dropped the ball on the data center, having left OS X Server behind.
      Those markets are now served by Windows or Linux based configurations.
  - bigyabai 14 hours ago
    Even with those advantages, Apple can't even sell datacenter hardware to themselves: https://9to5mac.com/2026/03/02/some-apple-ai-servers-are-rep...
    [-]
    - MoonWalk 14 hours ago
      "And as the initial crop of Apple Intelligence features hasn’t been used as much as Apple expected"
      Nah, as so-called "analysts" expected. The no-effort crybabies deriding Apple for being "behind on AI" have turned out to be, shocker of shockers, wrong. Anyone who even put a few minutes of thought into Apple's business realized that it (and its customers) didn't stand to benefit much from "AI."
      It's sad that Apple hurried to pander to these clowns, only to be derided further... and to encounter the appropriate apathy from customers, who were and are doing just fine without asinine "AI" gimmicks.
      [-]
      - bigyabai 13 hours ago
        Apple wouldn't have built the server capacity if they thought it wouldn't be used. It's indeed their own analysis.
        In any case, that article is also looking forward to next-gen models like the sparse Gemini model Google trained for Siri. Apple Silicon simply isn't powerful enough to compete for that inference.
- mikrl 16 hours ago
  >are not competitive in the consumer space
  AFAIK they still dominate on clock rate, which I was surprised to see when doing some back of the envelope calculations regarding core counts.
  I felt my 8 core i9 9900K was inadequate, so shopped around for something AMD, and IIRC the core multiplier of the chip I found was dominated by the clock rate multiplier so it’s possible that at full utilization my i9 is still towards the best I can get at the price.
  Not sure if I’m the typical consumer in this case however.
  [-]
  - kllrnohj 16 hours ago
    Your 9900k at 5ghz does work slower than a Ryzen 9800X3D at 5ghz. A lot slower (1700 single core geekbench vs 3300, and just about any benchmark will tell the same story). Clock speed alone doesn't mean anything.
    [-]
    - mikrl 15 hours ago
      From the newegg listing:
      >8 Cores and 16 processing threads, based on AMD "Zen 5" architecture
      which is the same thread geometry as my 9900K.
      My main concerns at the time were:
      1. More cores for running large workloads on k8s since I had just upgraded to 128G RAM
      2. More thread level parallelism for my C++ code
      Naively I thought that, ceteris paribus and assuming good L1 cache utilization, having more physical cores with a higher clock rate would be the ticket for 2.
      Does the 9800X3D have a wider pipeline or is it some other microarchitectural feature that makes it faster?
      [-]
      - Aurornis 12 hours ago
        Comparing CPUs by clock speed doesn’t work. New CPUs are do more work per clock cycle.
        A 9800X3D is twice as fast as your 9900K in benchmarks like GeekBench, despite having similar clock speed and the same core count.
        If you could downclock the AMD part to 2.5GHz as an experiment it would still beat your 5GHz 9900K.
      - joefourier 15 hours ago
        You don't even need to go into the pipeline details. The 9800X3D has 8x more L2 cache, 6x more L3 cache, 2x the memory bandwidth than the now 8 years old i9 9900K. 3D V-cache is pretty cool.
      - kllrnohj 14 hours ago
        I purposely picked a CPU with the same thread geometry as your 9900K to avoid calls of "apples & oranges" or whatever. If you want more threads, the 9950X is right there in the same socket. Or Core Ultra 9 285k. Either of which will run circles around a 9900K in code compilation.
        You can research microarchitecture differences if you want, it's a fascinating world, or you can just skip to looking at benchmarks/reviews. Little hard to compare against quite that large of a generation gap, but eg https://gamersnexus.net/cpus/rip-intel-amd-ryzen-7-9800x3d-c... or https://www.phoronix.com/review/amd-ryzen-7-9800x3d-linux/2
      - zzzoom 11 hours ago
        The 9800X3D has wider everything. Decoder, execution ports, vectors, cache, memory bandwidth...
        [-]
        mikrl 10 hours ago
        I think my i9 was released right after the Spectre and Meltdown mitigations in 2019, but I seem to remember even more recent vulns in that family… so that could also be a factor.
  - wmf 16 hours ago
    A 9700X is twice the performance of a 9900K and M5 Max is almost 3X the performance. The megahertz myth is a myth.
    [-]
    - mikrl 15 hours ago
      I replied to the sibling comment: I was making simplifying assumptions for two specific use cases and naively treated physical cores and clock rate as my variables.
      [-]
      - joefourier 46 minutes ago
        But why? That's like trying to determine which car is faster by looking at only at the rpm.
      - Aurornis 12 hours ago
        Yes, but core count and clock speed of a nearly 10 year old CPU are meaningless when comparing to current processors.
RantyDave 15 hours ago
Ahhh, so is this a chip "more optimised" for connecting GPU's to reality ... or are they skipping the GPU step entirely? Are GPU's only for training now?
[-]
- cyanydeez 15 hours ago
  have you seen this: https://chatjimmy.ai/
  It's quite impressive what purpose build inference can/will do once everyone stops trying to become kind of the best model.
  [-]
  - redwood 15 hours ago
    Wow impressive. What's the story with this?
    [-]
    - jffry 14 hours ago
      It's a tech demonstrator for a company that turns models into custom silicon for fast inference. In this case llama3.1-8b https://taalas.com/products/
      [-]
      - gizajob 11 hours ago
        Is this an ASIC? Or FPGA? Or something even more exotic?
        I’m guessing it’s some form of ASIC because I can’t imagine crafting the logic of Llama on silicon is a very quick or easy job. Not that doing it on an ASIC is a piece of cake either.
        [-]
        jffry 32 minutes ago
        An ASIC is custom silicon, no?
        Anyways, I found this article discussing it a bit more: https://www.eetimes.com/taalas-specializes-to-extremes-for-e...
        "Taalas is borrowing some ideas from the structured ASICs of the early 2000s to make its hardwired model-specific chips. Structured ASICs used gate arrays and hardened IP blocks, changing only the interconnect layers to adapt the chip to a specific workload. At the time, this was seen as a more cost-effective alternative to a full-custom ASIC that was more performant than an FPGA."
        "Taalas changes only two masks to customize a chip for a specific model, but the two masks can change both model weights and dataflow through the chip. On the HC1, the model and its weights are stored on the chip using a mask-ROM-based recall fabric paired with a (programmable) SRAM, which can be used to hold fine-tuned weights and/or the KV cache. Future generations of chips may split the SRAM onto a separate chip, meaning they could be denser than the HC1."
    - hmartin 14 hours ago
      Taalas hardware implementation of Llama 3.1 8B They claim 16k tok/s vs Cerbras at 2k. https://taalas.com/products/
yalogin 16 hours ago
This is yet not the grok acquisition, so there is another update coming with that claiming more improvements?
[-]
- nilstycho 15 hours ago
  https://developer.nvidia.com/blog/inside-nvidia-groq-3-lpx-t...
recvonline 16 hours ago
Does this mean their gaming GPUs are becoming less in demand, and therefore cheaper/more available again?
[-]
- Teknoman117 13 hours ago
  Absolutely not, unfortunately.
  The problem is not that gaming GPUs are in demand, it’s that selling silicon to AI center buildouts is so absurdly profitable right now they just aren’t making many gaming GPUs.
  If you can only get so many mm^2 of dies from TSMC, might as well make 50x selling to AI providers.
  [-]
  - pjmlp 33 minutes ago
    Check the GTC 2026 agenda, there are hardly any graphics programming talks.
    At least there are a few cool ones about programming CUDA directly in Python.
- TheRoque 16 hours ago
  It means it will be profitable to mine crypto again
- wmf 16 hours ago
  No.
rishabhaiover 17 hours ago
I'm assuming this is for tool call and orchestration. I didn't know we needed higher exploitable parallelism from the hardware, we had software bottlenecks (you're not running 10,000 agents concurrently or downstream tool calls)
Can someone explain what is Vera CPU doing that a traditional CPU doesn't?
[-]
- kibibu 16 hours ago
  > you're not running 10,000 agents concurrently or downstream tool calls
  Cursor seem to be doing exactly that though
- urig 17 hours ago
  Lots and lots of CPUs pooled. Faster more efficient power RAM accessible to both GPU and CPU. IIUC.
  [-]
  - rishabhaiover 16 hours ago
    But at what stage are we asking for that RAM? if it's the inference stage then doesn't that belong to the GPU<>Memory which has nothing to do with the CPU?
    I did see they have the unified CPU/GPU memory which may reduce the cost of host/kernel transactions especially now that we're probably lifting more and more memory with longer context tasks.
rka128 16 hours ago
"democratize access to AI and accelerating innovation."
So they make inference cheaper and the models get even worse. Or Jensen Huang has AI psychosis. Or both.
Here is a new business idea for Nvidia: Give me $3000 in a circular deal which I will then spend on a graphics card.
[-]
- kwertyoowiyop 15 hours ago
  Me too plz. To quote (more or less) Harvey Pekar: “I’m trying to sell out, but nobody’s buying!”
kibibu 16 hours ago
Am I crazy, or is Jensen's statement a copy-paste from ChatGPT?
(Could be both)
[-]
- wmf 16 hours ago
  If AI is so great why should he not use it?
  [-]
  - magackame 14 hours ago
    Should work on building the AI Jensen. Maybe it's already the AI Jensen
jauntywundrkind 17 hours ago
Given the price of these systems the ridiculously expensive network cards isn't such a huge huge deal, but I can't help but wonder at the absurdly amazing bandwidth hanging off Vera, the amazing brags about "7x more bandwidth than pcie gen 6" (amazing), but then having to go to pcie to network to chat with anyone else. It might be 800Gbe but it's still so many hops, pcie is weighty.
I keep expecting we see fabric gains, see something where the host chip has a better way to talk to other host chips.
It's hard to deny the advantages of central switching as something easy & effective to build, but reciprocally the amazing high radix systems Google has been building have just been amazing. Microsoft Mia 200 did a gobsmacking amount of Ethernet on chip 2.8Tbps, but it's still feels so little, like such a bare start. For reference pcie6 x16 is a bit shy of 1Tbps, vaguely ~45 ish lanes of that.
It will be interesting to see what other bandwidth massive workloads evolve over time. Or if this throughout era all really ends up serving AI alone. Hoping CXL or someone else slims down the overhead and latency of attachment, soon-ish.
Maia 200: https://www.techpowerup.com/345639/microsoft-introduces-its-...
[-]
- bob1029 17 hours ago
  > It might be 800Gbe but it's still so many hops, pcie is weighty.
  Once you need to reach beyond L2/L3 it is often the case that perfectly viable experiments cannot be executed in reasonable timeframes anymore. The current machine learning paradigm isn't that latency sensitive, but there are other paradigms that can't be parallelized in the same way and are very sensitive to latency.
- babelfish 17 hours ago
  Most of the big AI/HPC clusters these systems are aimed at aren’t running regular PCIe Ethernet between nodes, they’re usually wired up with InfiniBand fabrics (HDR/NDR now, XDR soon)
  [-]
  - tryauuum 9 hours ago
    Infiniband cards are connected to the rest of the machine with the PCIe as well
dmitrygr 17 hours ago
> Purpose-Built for Agentic AI
From the "fridge purpose-built for storing only yellow tomatoes" and "car only built for people whose last name contains the letter W" series.
When can this insanity end? It is a completely normal garden-variety ARM SoC, it'll run Linux, same as every other ARM SoC does. It is as related to "Agentic $whatever" as your toaster is related to it
[-]
- pdpi 16 hours ago
  > It is as related to "Agentic $whatever" as your toaster is related to it
  These things have hardware FP8 support, and a 1.8TB/s full mesh interconnect between CPUs and GPUs. We can argue about the "agentic" bit, but those are features that don't really matter for any workload other than AI.
  [-]
  - pezezin 15 hours ago
    The huge interconnect would also useful be for HPC tasks. The FP8 not so much, HPC still loves FP64.
  - kibibu 16 hours ago
    Would cloud gaming platforms benefit from the interconnect?
    [-]
    - pdpi 16 hours ago
      Don't think they would. Games aren't nearly as hungry for memory bandwidth as LLMs are. Also, I expect that the VRAM/GPU/CPU balance would be completely out of whack. Something would be twiddling its thumbs waiting for the rest of the hardware.
  - dmitrygr 16 hours ago
    mem bw between cores matters for .... literally all workloads that are not single-core (read: all). And FP8 matters not at all cause inference on cpu is too slow to be of any use whatsoever in the days of proper accelerators
- dpe82 17 hours ago
  The power and importance of marketing is deeply underappreciated by us technical types.
  [-]
  - LogicFailsMe 16 hours ago
    And yet more than a little Gavin Belson "Box III" vibes here. Fortunately, no signature edition.
  - dwb 16 hours ago
    I don’t underappreciate it, but I do despise it.
- pwg 16 hours ago
  > It is a completely normal garden-variety ARM SoC
  To mis-quote the politician quip:
  How can you tell a marketer is lying?
  Answer: His/her mouth is moving.
FridgeSeal 16 hours ago
Are we rapidly careening towards a world where _only_ AI “computing” is possible?
Wanted to do general purpose stuff? Too bad, we watched the price of everything up, and then started producing only chips designed to run “ai” workloads.
Oh you wanted a local machine? Too bad, we priced you out, but you can rent time with an ai!
Feels like another ratchet on the “war on general purpose computing” but from a rather different direction.
_s_a_m_ 5 hours ago
what a bizarre title
simulator5g 9 hours ago
The World's First Central Sloppressing Unit
akomtu 16 hours ago
They should've called it Vega: https://doom.fandom.com/wiki/VEGA
[-]
- pohuing 13 hours ago
  Perhaps, but consider the existence of the AMD Vega GPU line https://en.wikipedia.org/wiki/Radeon_RX_Vega_series
dude250711 15 hours ago
A GPU purpose-built for Slop.
felixsells 4 hours ago
[dead]
gpubridge 13 hours ago
[dead]
urig 17 hours ago
What the heck is agentic inference and how is it supposed to be different from LLM inference? That's a rhetorical question. Screw marketing and screw hype.
BoredPositron 17 hours ago
Who wants general computing anyways?
anesxvito 16 hours ago
The philosophy of knowing exactly what's on your system translates directly to how you think about software you build. Local-first, no telemetry, minimal dependencies. FreeBSD instilled that mindset in a generation of developers that now pushes back hard against cloud-everything SaaS. Tauri over Electron is the same argument applied to desktop apps.
[-]
- brazukadev 16 hours ago
  > Tauri over Electron is the same argument applied to desktop apps.
  you lost me here but still got my upvote. Tauri and Electron are pretty much the same, compared to local-first vs cloud SaaS.
KnuthIsGod 16 hours ago
China will beat this....
Seems like a triumph of hype over reality.
China can do breathless hype just as well as Nvidia.