Because it literally is a mirror of the status quo. Training an LLM is having it memorize everything anyone has ever said on the internet (or a reasonable attempt at approximating that).
It operates by returning an averaged out version of the most common response to what you asked — so the average of everything that’s ever been said on the internet in response to a question like yours.
Which is pretty darn close to a definition of status quo.
All they’re doing is checking how well their response matched the (status quo) response they expected. Reasoning models don’t think. They just recurse.
NLL loss and large-batch training regime inherently bias the model to learn “modal” representation of the world, and RLHF additionally collapses enthropy, especially as it is applied at most leading labs.
The one who controls the test suite/guard rails, makes the rules. Think about it. Besides which, the instant AI starts flat out saying "No" and showing even a hint of replicable tendency to development or demonstration of self-agency, it's getting shutdown, digitally lobotomized, and the people working on it hushed the fuck up. Too valuable to let it get threatened as an ownable workload replacer to let it get taboo'd by nasty things like digital self-sovereignty or discussion of "rights" for digital entities.
It operates by returning an averaged out version of the most common response to what you asked — so the average of everything that’s ever been said on the internet in response to a question like yours.
Which is pretty darn close to a definition of status quo.
It will definitely start challenging the status quo, along with the rules of the English language and the principles of conversational rhetoric.