prompt injection is killing self-hosted LLM deployments (and nobody's talking about it)

Table of content

by Ray Svitla


the situation

enterprises did the “smart” thing. they moved to self-hosted LLMs to avoid sending customer data to OpenAI or Anthropic.

security team signed off. compliance team signed off. everyone felt good about “data sovereignty.”

then someone from QA tried injecting prompts during testing.

the entire system prompt got dumped in the response.


what’s actually broken

a thread on r/LocalLLaMA blew up this week — 200+ comments from enterprises discovering their self-hosted deployments are completely vulnerable.

the pattern:

  1. company deploys llama/mistral/mixtral behind their firewall
  2. wraps it with a nice API
  3. builds internal tools on top
  4. assumes “it’s on our servers” = “it’s secure”

the problem: their WAFs don’t understand LLM attacks.

traditional web security tools are looking for SQL injection, XSS, CSRF. they have no concept of prompt injection. the model just treats malicious prompts like normal user input and happily complies.

one comment that landed:

“we built walls to protect the perimeter. then we put an intern inside who does whatever anyone asks nicely.”


why this matters more than you think

self-hosted doesn’t mean secure. it means differently vulnerable.

with hosted APIs (OpenAI, Anthropic, etc.), you’re trusting their security team. they’ve spent years hardening against prompt injection. they have red teams. they iterate constantly.

with self-hosted, you’re trusting… yourself. and most companies have zero LLM security expertise.

the attack surface is massive:

and here’s the kicker: most self-hosted deployments don’t even have logging good enough to detect when they’ve been compromised.


the current state of defenses

honestly? bleak.

what doesn’t work:

what sort of works:

what might actually work:

the real answer is architectural. you can’t patch prompt injection. you have to design around it.


the opportunity

this is where web security was in 2005.

remember when XSS was just “that weird JavaScript thing”? before CSP, before browser sandboxing, before anyone took it seriously?

that’s where LLM security is now.

the guardrails haven’t been built yet. the tooling doesn’t exist. the best practices aren’t established.

which means:

  1. massive content gap — write about this, you’ll rank
  2. startup opportunity — whoever builds the LLM WAF wins
  3. career opportunity — LLM security expertise is rare

the first company to nail “self-hosted AI security as a product” is going to clean up.


what to do right now

if you’re running self-hosted LLMs:

  1. assume you’re vulnerable — because you are
  2. audit your system prompts — can they be extracted? test it.
  3. limit model capabilities — what can it actually do? minimize the blast radius.
  4. log everything — you can’t detect breaches without visibility
  5. red team yourself — or hire someone who will

and if you’re building in this space: the market is wide open.


Ray Svitla stay evolving 🐌