the access wars: when vendors close what they opened

Table of content

by Ray Svitla

april 4, 2026. two things happened that tell you everything about where AI infrastructure is heading.

anthropic sent mass emails killing oauth access for third-party harnesses. google shipped gemma 4 with a memory bug so catastrophic the model was unusable. llama.cpp fixed it in under 24 hours.

these aren’t isolated incidents. they’re symptoms of the same fight: who controls the infrastructure layer when AI becomes commodity?

the oauth kill: walled gardens return

anthropic’s move is elegant in its brutality. starting april 4 at 12pm PT, your claude subscription no longer covers usage on third-party harnesses like openclaw. you can still use them. but now they require “extra usage” bundles (pay-as-you-go, billed separately) or API keys.

subscriptions cover claude code and claude cowork. official clients only.

the stated reason? “capacity management” and “usage patterns.” translation: third-party harnesses use more tokens per session than anthropic’s official clients, and they can’t scale to meet that demand without breaking subscription economics.

here’s what actually happened. openclaw pioneered a pattern: oauth login, bring your claude sub, use any harness you want. it worked because oauth was permissionless. you authenticated once, and your subscription covered everything. anthropic just killed that pattern.

now the ecosystem splits: official clients (blessed, oauth-enabled) vs third-party (api keys only, metered separately). the open harness movement worked because subscriptions were portable. now they’re not.

this is the “walled garden returns” moment. not because anthropic is evil. because the economics don’t work. subscriptions are priced for moderate usage. power users on third-party harnesses burn through tokens faster than the subscription model can sustain. so anthropic closed the door.

the llama.cpp emergency patch: open tooling as quality gate

meanwhile, google shipped gemma 4 on wednesday with a memory bug that made it unusable for most users. the 31B model at 2K context length needed 40GB+ VRAM. more than the model weights themselves.

the KV cache implementation was catastrophic. r/LocalLLaMA erupted. “my biggest issue with gemma-4 models is the massive KV cache!!” 219 upvotes, 119 comments. people with 40GB VRAM cards couldn’t fit a 31B model.

google didn’t acknowledge the issue. llama.cpp maintainers pushed an emergency fix thursday morning. gemma 4 now runs at normal memory footprint. viral celebration post: “FINALLY GEMMA 4 KV CACHE IS FIXED” (104 upvotes, 33 comments).

here’s the pattern: vendor ships model → community diagnoses bug → llama.cpp patches faster than vendor can acknowledge issue.

llama.cpp is the de facto standard for running open models locally. when it moves faster than the vendors who create the models, infrastructure ownership shifts. google makes models. llama.cpp makes them usable.

this is “open tooling as quality gate.” vendors ship. community debugs. llama.cpp standardizes. the vendor’s official release is just the starting point. the real deployment happens after the community fixes it.

the GLM-5 milestone: when open beats closed

while all this was happening, z.ai dropped GLM-5: 754 billion parameters under MIT license.

previous largest open model: llama 3 at 405B. GLM-5 is 2.5× bigger. matched claude opus 4.6 on startup management benchmarks at 11× lower cost ($7.62/run vs $86/run). full weights available. commercial use. modify however you want.

frontier capability just went from “rent from anthropic/openai” to “download and own.”

most frontier models are closed (gpt-4, claude, gemini) or restricted-license (llama). GLM-5 says: here’s 754B parameters under MIT — use it commercially, modify it, deploy it however you want.

when the largest open model jumps from 405B to 754B overnight, and it’s capability-competitive with frontier closed models, the “open vs closed” split stops being about capability. it becomes about control.

the final bottleneck: code review

buried in today’s signals is an essay from armin ronacher (creator of flask, sentry VP, rye maintainer). published february but trending today in the cache. title: “the final bottleneck.”

thesis: code creation was always slower than code review. now agents make creation faster than review. result: codebases fill with code nobody fully understands.

quote: “when more people tell me they no longer know what code is in their own codebase, I feel like something fundamental shifted.”

for decades, writing code was the bottleneck. reviewing was fast (relatively). agents flipped it. now you can generate 10 PRs in an hour, but reviewing them takes days.

when creation becomes cheaper than verification, the constraint shifts from “can we build it?” to “do we understand what we built?”

this is the “slop accumulation” problem. code that works but nobody owns. armin’s warning: if you can’t review as fast as your agent generates, you lose architecture control.

the pattern: access control tightens as capability spreads

these four signals — anthropic oauth kill, llama.cpp emergency patch, GLM-5 frontier open model, armin’s review bottleneck — form a pattern.

vendors tighten access control (anthropic kills oauth) while open tooling accelerates (llama.cpp patches faster than google). capability spreads to open weights (GLM-5 754B MIT) while creation outpaces verification (code review becomes the bottleneck).

infrastructure is consolidating. but not in the direction everyone expected.

the prediction was: closed models win, vendors control everything, open models stay behind. the reality: open models catch up (GLM-5 matches opus), open tooling moves faster than vendors (llama.cpp emergency patches), but access tightens anyway (oauth kill).

why? because economics don’t care about ideology. anthropic can’t afford to let third-party harnesses burn unlimited tokens on fixed-price subscriptions. google can’t prioritize bug fixes for open models when they’re competing with closed frontier labs. open tooling fills the gaps because vendors won’t.

the result: fragmentation. official clients with oauth. third-party harnesses with api keys. open models that need community patches before they work. code generation that outpaces human review.

what this means for personal AI

if you’re building a personal AI stack, here’s what changed this week:

subscriptions aren’t portable anymore. anthropic killed the “bring your own sub” pattern. budget for api costs if you run third-party harnesses.
open tooling is the real infrastructure. llama.cpp is faster than vendor support. if you’re running local models, the community layer matters more than the official release.
frontier open models are real. GLM-5 754B MIT isn’t a toy. it’s capability-competitive with closed models at 11× lower cost. if you have the hardware, you can own frontier capability.
code review is the new bottleneck. if you’re using agents for development, your constraint isn’t generation speed. it’s verification bandwidth. budget time for review, or you’ll accumulate slop.

the next move

anthropic closed oauth. google shipped broken models. llama.cpp emergency-patched them. z.ai dropped 754B parameters under MIT. armin warned about code nobody understands.

the infrastructure wars are heating up. vendors tighten control. open tooling accelerates. capability spreads to open weights. creation outpaces verification.

the question isn’t “who wins?” it’s “what do you control?”

subscriptions are vendor-locked. models are open. tooling is community-maintained. code is generated faster than you can review.

choose your dependencies carefully. the walled gardens are closing. but the walls are porous. and the tools to route around them are getting better every week.

Ray Svitla
stay evolving 🐌