Linux 7.0 Broke PostgreSQL: The Preemption Regression Explained

(read.thecoder.cafe)

67 points | by 0xKelsey 2 hours ago

9 comments

  • ameliaquining 14 minutes ago
    This post comes uncomfortably close to plagiarizing https://thebuild.com/blog/2026/04/23/preempt_none-is-dead-yo..., which it cites as a source; almost all the technical explanation is in there and some of the wording is extremely similar. Compare, e.g., "What Linux 7.0 actually changed" in Pettus's post to "What Is Preemption?" in this one. I think this link should have been to Pettus's post instead.
  • buster 21 minutes ago
    I'd rather like to know if any real world usage broke, before coming to the conclusion that an edge case synthetic benchmark is worth changing the kernel (back or wherever) where supposedly the change that broke the benchmark had real world benefits.

    Since we will never know it might be a good idea to feature gate the change, change the default and let users decide to change it back. This may give some feedback on the lkml or else to decide if the change is worthwhile?

    • nijave 16 minutes ago
      "synthetic benchmark" is doing some heavy lifting here. Pgbench just runs a bunch of SQL statements against a real Postgres instance.

      It's very close to a real world simulation of a production workload

  • ozgrakkurt 9 minutes ago
    It is a crime that postgres isn't able to allocate with 1GB huge pages by changing a config parameter in 2026

    Also a crime that people are still running databases with 4kb pages.

    To put it in perspective, this means you will have more than 30 million pages on a server with 128GB RAM. As an example, if there is 16bytes of metadata for memory page. The metadata itself would take more than half a gigabyte.

  • MBCook 22 minutes ago
    This only happened under a very odd configuration. Yeah it wasn’t great but it was not the normal case.

    The headline implies it broke PG everywhere. It didn’t.

  • selckin 1 hour ago
  • nijave 30 minutes ago
    Right on the heels of 6.19 breaking tcmalloc and Mongo
  • baq 52 minutes ago
    TLDR of the LMKL thread: 120GB RAM postgres with hugepages=off, lock contention went from terrible to abysmal. nothing to see here except that amazon for whatever reason runs DB tests with huge pages disabled. (hope I'm not paying for RDS and auroras like that in production!)
    • Twirrim 36 minutes ago
      Huge pages has had a spotty history, that lead to people being paranoid about it, and no doubt a whole bunch of folks just disable it "because that's what we've always done". It has been stable and reliable for quite a while now, would really hope folks could move away from that perspective.
      • nijave 31 minutes ago
        I tested it once about 2 years ago on Azure VM and got a nice 10-15% perf boost on pgbench (I want to say at least 64GB shared mem)
    • nijave 23 minutes ago
      In fairness, AWS could (and almost certainly is) using their own kernel build that does who-knows-what
    • dist-epoch 25 minutes ago
      Many people have desktops with 128 GB RAM. Should they enable hugepages? I've never heard this recommendation for a desktop.
      • nijave 22 minutes ago
        Huge pages is good when a single process is reserving a giant block of memory which I think isnt that common.

        You might have transparent huge pages on by default depending on the distro

    • mplanchard 30 minutes ago
      Also was only on ARM, wasn’t it?
  • dataflow 47 minutes ago
    An X% performance regression is basically a (100 - X)% feature breakage, so whatever that implies in terms of breaking userspace...
  • PunchyHamster 1 hour ago
    Seems Linus needs to yell at someone again.

    Especially with containers around you might very well hit the case of running new kernel but older version of PostgreSQL with no code mitigation for the problem

    • nobleach 45 minutes ago
      I get that folks love a good Linus rant. But as someone who's been at the end of that style of "feedback", nothing can be more humiliating or demotivating. Certainly there are contributors that are making "rookie mistakes". There are folks that aren't willing to ingest the entire context of what was tried back in 2.0.36, 2.2, 2.4... etc. And perhaps it's wise to simply stay away until you're completely certain you've got the chops to contribute. More than half the folks that enjoy that sort of abuse don't have those chops.

      I can defend someone who is unwilling to yield on quality. Afterall, this truly is his baby. Issuing scathing rebukes to well-intentioned contributors is like slapping my kid when he brings me the wrong type of screwdriver.

      • ecshafer 29 minutes ago
        I don't think a Linus rant ever hit anyone that was a rookie, they are always AFAIK against people "who should know better". Veteran developers, with multiple commits merged.
      • colechristensen 22 minutes ago
        If you're at the level of delivering to Linus, I'm sorry but humiliation and demotivation are earned.

        You don't talk like this to junior or even senior engineers, but you do reach a level at which gently telling isn't necessary.

        If you don't like it go fork Linux and try being the nice benevolent dictator and we'll applaud your success.