Ettore Di Giacinto's LocalAI Platform
Table of content

Ettore Di Giacinto is a software developer based in Italy who serves as Head of Open Source at Spectro Cloud. He spent 16+ years in open-source communities as a Gentoo developer, Sabayon Linux lead, and SUSE/Rancher engineer. In 2023, he created LocalAI, a self-hosted alternative to OpenAI that now has over 42,000 GitHub stars.
Di Giacinto also built and donated Kairos to the CNCF, an immutable Linux meta-distribution for edge Kubernetes deployments.
Background
Di Giacinto’s work spans infrastructure, containers, and now AI:
- Sabayon Linux - Led this Gentoo-based distribution
- SUSE/Rancher - Led the Elemental team for edge computing
- Kairos - Immutable Linux for Kubernetes (CNCF project)
- EdgeVPN - Decentralized P2P VPN without central servers
- LocalAGI - Agent platform built on LocalAI
- LocalRecall - Knowledge base and memory system for AI
His GitHub profile lists 289 repositories. He’s been recognized as a top 3% most active speaker on Sessionize for two consecutive years.
The LocalAI Approach
LocalAI started as weekend hacking sessions. Di Giacinto wanted to run AI models locally without cloud dependencies, GPU requirements, or complex setup processes.
The core principle: OpenAI API compatibility. Any application that works with OpenAI’s REST API works with LocalAI. Change one environment variable and your existing code runs locally.
# Run LocalAI with Docker
docker run -p 8080:8080 --name localai \
-v $PWD/models:/build/models \
localai/localai:latest-cpu
# Use exactly like OpenAI
curl http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"prompt": "Explain edge computing in one sentence"
}'
Multi-Backend Architecture
Unlike single-framework tools, LocalAI supports multiple inference backends:
| Backend | Use Case | Hardware |
|---|---|---|
| llama.cpp | GGUF models, general LLM | CPU, any GPU |
| vLLM | High-throughput serving | NVIDIA GPU |
| transformers | Hugging Face models | CPU/GPU |
| MLX | Apple Silicon optimized | Mac M-series |
| diffusers | Image generation | CPU/GPU |
| whisper.cpp | Speech-to-text | CPU |
| piper | Text-to-speech | CPU |
This lets you pick the right backend for each task. Text generation with llama.cpp, images with diffusers, voice with piper.
Beyond Text
LocalAI handles more than chat completions:
- Text generation - Chat, completions, function calling
- Vision - Image understanding via multimodal models
- Embeddings - Semantic search and RAG
- Image generation - Stable Diffusion, DALL-E API compatible
- Audio - Text-to-speech, speech-to-text, voice cloning
- Video - Generation capabilities
- Object detection - YOLO integration
- MCP support - Model Context Protocol for agentic tools
The December 2025 release added dynamic memory management and multi-GPU distribution for large models.
Distributed Inference
LocalAI includes P2P capabilities for splitting work across machines:
# docker-compose for distributed setup
services:
localai-main:
image: localai/localai:latest
environment:
- LOCALAI_GALLERIES=...
- LOCALAI_P2P=true
ports:
- "8080:8080"
localai-worker:
image: localai/localai:latest
environment:
- LOCALAI_P2P=true
- LOCALAI_P2P_TOKEN=${P2P_TOKEN}
Multiple machines share the inference load. No central coordinator required.
Practical Workflow: Slack Documentation Bot
Di Giacinto wrote about building a Slack bot that answers questions from project documentation, running entirely on local hardware:
- Index documentation with embeddings
- Store vectors in local ChromaDB
- Query with LangChain + LocalAI
- Return answers without external API calls
From his blog post:
“The Slack bot is aware of the Kairos project documentation and can provide answers to questions based on it. It operates locally, without requiring an OpenAI API key.”
Key Takeaways
| Principle | Implementation |
|---|---|
| API compatibility | Drop-in OpenAI replacement |
| No GPU required | Runs on consumer CPU |
| Multi-backend | llama.cpp, vLLM, MLX, more |
| Full-stack AI | Text, image, audio, video |
| Distributed | P2P inference across machines |
| Privacy first | Data never leaves your network |
Links
Next: Jesse Vincent’s Superpowers Framework
Get updates
New guides, workflows, and AI patterns. No spam.
Thank you! You're on the list.