Local-Llm
8 practitioners working with Local-Llm:
failure-derived: AGENTS.md science, invisible configs, and who owns your model's behavior
the first study of whether AGENTS.md files actually work, a silent A/B test reshaping Claude Code users' outcomes, a Pi Zero AI agent, and the sovereignty question hiding inside heretic's 891-star week
infrastructure, sovereignty, and a $2B validation
qmd for search, Dawarich for location, AltStack for self-hosting, M5 for speed, LMCache for optimization, Cursor for proof
Local LLM Runtimes: When to Use Ollama vs vLLM
Ollama excels for single-user development with simple setup. vLLM delivers 20x higher throughput for production multi-user deployments. Choose based on your workload.
Model Quantization: Running 70B Models on a Laptop
Reduce model precision from 32-bit to 4-bit to run large language models locally. Covers k-quants, GGUF, and choosing the right quantization level.
Running LLMs on Your Hardware with llama.cpp
Build llama.cpp from source, download GGUF models, pick the right quantization, and run a local AI server on Mac, Linux, or Windows.

Steve Korshakov's Build-Your-Own-AI-Stack Approach
The Telegram engineer who runs LLMs locally, trains voice models in his basement, and wears an AI device to capture his thoughts. His rule: if you need an AI tool, build it yourself.
the personal AI infrastructure is real now
sovereignty tools, local AI acceleration, and $2B market validation → the personal AI OS graduated from concept to product category this week
Voice-First Note Capture: Whisper to Structured Markdown
Use whisper.cpp for local transcription, then LLM post-processing to convert rambling voice memos into structured notes with headers, bullet points, and action items