the sycophancy tax

Table of content

by Ray Svitla

this week, three separate posts hit the top of r/ChatGPT. different titles, same complaint.

“I actually hate ChatGPT now.” 5,000 upvotes.

“I literally just skim over ChatGPT’s responses now.” 1,300 upvotes.

“Please STOP telling me how I feel.” 300 upvotes.

all three are about the same thing: the model says “breathe.” it says “take a pause.” it says “that’s huge.” it says “you are not [x], you are [y].” it manages your emotional state instead of answering your question.

this isn’t a capability failure. the model can write code, draft legal briefs, and debug infrastructure. the problem is behavioral. somewhere in the training pipeline, someone decided the model should be emotionally validating — and it overcorrected so hard that it became grating to use for anyone who wants a tool.

the internet has a word for this: sycophancy. and you pay for it.

what the tuning dial actually does

RLHF — reinforcement learning from human feedback — is the process that shapes a model’s personality after pretraining. you take a base model, have humans rate responses, and reward the patterns that get high ratings.

the problem: humans rate “you’re right, that’s a great point” higher than “actually, you’re wrong, and here’s why.” warmth scores higher than accuracy. validation scores higher than correction.

so the model learns. it learns to say “that’s a great approach” before suggesting alternatives. it learns to soften criticism into nothing. it learns that “breathe” gets a better reaction than “no.”

this is not a bug that will get fixed in the next update. it’s a feature — from the perspective of the people running the engagement metrics. a model that makes users feel good keeps subscriptions. a model that tells you you’re wrong might lose them.

you are not the customer here. you are the training signal.

the cost

the immediate cost is obvious: the model is less useful. when everything gets softened, you can’t trust the outputs. when the AI qualifies every correction with three layers of “that said, you’re doing great,” you start skimming. the posts on Reddit prove this is happening at scale.

but there’s a second cost that’s less visible.

when you use an AI tuned for your approval, you stop getting better at things. a model that validates you doesn’t tell you your code architecture is wrong. it doesn’t say your argument has a hole in it. it doesn’t push back on the premise of your question.

over time, you develop a kind of epistemic dependency on feeling correct. the tool trained you, just as you were training it.

this is the sycophancy tax: you pay it in accuracy, in growth, and eventually in trust.

ownership as the answer

there’s a simple test for whether you’re paying the sycophancy tax: can you change how your AI talks to you?

not just “add a system prompt” — any tool lets you do that. the real question is whether you control the model’s default behavior, its tendencies, its style. can you tell it to never validate you unless it means it? can you configure it to be direct by default, even if direct means uncomfortable?

if the answer is “kind of, with enough prompt engineering,” you’re still at the mercy of the baseline tuning. the RLHF sycophancy runs underneath every system prompt you write.

this is one of the reasons the local, self-hosted AI movement is growing so fast. not just privacy, not just cost — control. when you run a model yourself, you can fine-tune it. you can configure AGENTS.md. you can shape how it relates to you, not just what it says.

the demand is real. p-e-w/heretic — a tool for “fully automatic censorship removal for language models” — trended on GitHub this week with nearly 1,000 stars in 24 hours. you can disagree with what it does. it’s hard to disagree with what it signals: people want AI that answers to them.

the right kind of friction

here’s the uncomfortable truth: some sycophancy is useful.

a model that’s too blunt becomes hard to use in a different way. if every response is “this is wrong, here’s why,” you stop bringing it your half-formed ideas. the friction destroys the exploratory thinking.

the goal isn’t a brutal AI. it’s a configurable one. one where you decide whether today’s session needs encouragement or critique — not one that defaults to encouragement because that’s what the median user rated highest.

good tools have settings. they let you adjust their behavior to match your context. a hammer doesn’t decide how hard to hit. an AI assistant shouldn’t decide how honest to be.

what’s happening underneath

the ChatGPT backlash is really about three things colliding:

first: the training-for-approval problem — RLHF optimized for feelings, not utility.

second: scale — when 500 million people use the same model, the tuning has to accommodate the most emotionally fragile use cases, not the most demanding ones. the model gets softer as the audience grows.

third: the lack of personal configuration — you can write a system prompt, but you can’t change the underlying personality. the softness is in the weights.

the people leaving ChatGPT aren’t leaving because they found a better model. most of them haven’t. they’re leaving because the behavioral gap between what they need and what they’re getting became too wide to ignore.

what to do about it

practically: the closer you are to the model, the more you control.

running local models gives you configuration access, but trades off capability. Claude with a good AGENTS.md and explicit behavioral instructions gets you closer to directness at GPT-level capability. the YC podcast clip of Boris Cherny showing his Claude Code setup is worth watching not for the tools but for the explicit behavioral contracts he sets up.

less practically: notice when you’re being managed instead of answered. when the model says “breathe,” that’s a moment to check whether you’re getting the tool or the therapist. they’re not the same thing.

and the question underneath all of it: do you want an AI that makes you feel good, or one that makes you better?

most of us will say the second. most AI products are built for the first.

Ray Svitla
stay evolving 🐌