Ettore Di Giacinto's LocalAI Platform

Table of content

Ettore Di Giacinto is a software developer based in Italy who serves as Head of Open Source at Spectro Cloud. He spent 16+ years in open-source communities as a Gentoo developer, Sabayon Linux lead, and SUSE/Rancher engineer. In 2023, he created LocalAI, a self-hosted alternative to OpenAI that now has over 42,000 GitHub stars.

Di Giacinto also built and donated Kairos to the CNCF, an immutable Linux meta-distribution for edge Kubernetes deployments.

Background

Di Giacinto’s work spans infrastructure, containers, and now AI:

Sabayon Linux - Led this Gentoo-based distribution
SUSE/Rancher - Led the Elemental team for edge computing
Kairos - Immutable Linux for Kubernetes (CNCF project)
EdgeVPN - Decentralized P2P VPN without central servers
LocalAGI - Agent platform built on LocalAI
LocalRecall - Knowledge base and memory system for AI

His GitHub profile lists 289 repositories. He’s been recognized as a top 3% most active speaker on Sessionize for two consecutive years.

The LocalAI Approach

LocalAI started as weekend hacking sessions. Di Giacinto wanted to run AI models locally without cloud dependencies, GPU requirements, or complex setup processes.

The core principle: OpenAI API compatibility. Any application that works with OpenAI’s REST API works with LocalAI. Change one environment variable and your existing code runs locally.

# Run LocalAI with Docker
docker run -p 8080:8080 --name localai \
  -v $PWD/models:/build/models \
  localai/localai:latest-cpu

# Use exactly like OpenAI
curl http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "prompt": "Explain edge computing in one sentence"
  }'

Multi-Backend Architecture

Unlike single-framework tools, LocalAI supports multiple inference backends:

Backend	Use Case	Hardware
llama.cpp	GGUF models, general LLM	CPU, any GPU
vLLM	High-throughput serving	NVIDIA GPU
transformers	Hugging Face models	CPU/GPU
MLX	Apple Silicon optimized	Mac M-series
diffusers	Image generation	CPU/GPU
whisper.cpp	Speech-to-text	CPU
piper	Text-to-speech	CPU

This lets you pick the right backend for each task. Text generation with llama.cpp, images with diffusers, voice with piper.

Beyond Text

LocalAI handles more than chat completions:

Text generation - Chat, completions, function calling
Vision - Image understanding via multimodal models
Embeddings - Semantic search and RAG
Image generation - Stable Diffusion, DALL-E API compatible
Audio - Text-to-speech, speech-to-text, voice cloning
Video - Generation capabilities
Object detection - YOLO integration
MCP support - Model Context Protocol for agentic tools

The December 2025 release added dynamic memory management and multi-GPU distribution for large models.

Distributed Inference

LocalAI includes P2P capabilities for splitting work across machines:

# docker-compose for distributed setup
services:
  localai-main:
    image: localai/localai:latest
    environment:
      - LOCALAI_GALLERIES=...
      - LOCALAI_P2P=true
    ports:
      - "8080:8080"

  localai-worker:
    image: localai/localai:latest
    environment:
      - LOCALAI_P2P=true
      - LOCALAI_P2P_TOKEN=${P2P_TOKEN}

Multiple machines share the inference load. No central coordinator required.

Practical Workflow: Slack Documentation Bot

Di Giacinto wrote about building a Slack bot that answers questions from project documentation, running entirely on local hardware:

Index documentation with embeddings
Store vectors in local ChromaDB
Query with LangChain + LocalAI
Return answers without external API calls

From his blog post:

“The Slack bot is aware of the Kairos project documentation and can provide answers to questions based on it. It operates locally, without requiring an OpenAI API key.”

Key Takeaways

Principle	Implementation
API compatibility	Drop-in OpenAI replacement
No GPU required	Runs on consumer CPU
Multi-backend	llama.cpp, vLLM, MLX, more
Full-stack AI	Text, image, audio, video
Distributed	P2P inference across machines
Privacy first	Data never leaves your network

Links

Next: Jesse Vincent’s Superpowers Framework