Local-Llm

5 practitioners working with Local-Llm:

Local LLM Runtimes: When to Use Ollama vs vLLM Ollama excels for single-user development with simple setup. vLLM delivers 20x higher throughput for production multi-user deployments. Choose based on your workload.

Model Quantization: Running 70B Models on a Laptop Reduce model precision from 32-bit to 4-bit to run large language models locally. Covers k-quants, GGUF, and choosing the right quantization level.

Running LLMs on Your Hardware with llama.cpp Build llama.cpp from source, download GGUF models, pick the right quantization, and run a local AI server on Mac, Linux, or Windows.

Local-Llm

Steve Korshakov's Build-Your-Own-AI-Stack Approach The Telegram engineer who runs LLMs locally, trains voice models in his basement, and wears an AI device to capture his thoughts. His rule: if you need an AI tool, build it yourself.

Voice-First Note Capture: Whisper to Structured Markdown Use whisper.cpp for local transcription, then LLM post-processing to convert rambling voice memos into structured notes with headers, bullet points, and action items