Ettore Di Giacinto's LocalAI Platform

Table of content
Ettore Di Giacinto's LocalAI Platform

Ettore Di Giacinto is a software developer based in Italy who serves as Head of Open Source at Spectro Cloud. He spent 16+ years in open-source communities as a Gentoo developer, Sabayon Linux lead, and SUSE/Rancher engineer. In 2023, he created LocalAI, a self-hosted alternative to OpenAI that now has over 42,000 GitHub stars.

Di Giacinto also built and donated Kairos to the CNCF, an immutable Linux meta-distribution for edge Kubernetes deployments.

Background

Di Giacinto’s work spans infrastructure, containers, and now AI:

His GitHub profile lists 289 repositories. He’s been recognized as a top 3% most active speaker on Sessionize for two consecutive years.

The LocalAI Approach

LocalAI started as weekend hacking sessions. Di Giacinto wanted to run AI models locally without cloud dependencies, GPU requirements, or complex setup processes.

The core principle: OpenAI API compatibility. Any application that works with OpenAI’s REST API works with LocalAI. Change one environment variable and your existing code runs locally.

# Run LocalAI with Docker
docker run -p 8080:8080 --name localai \
  -v $PWD/models:/build/models \
  localai/localai:latest-cpu

# Use exactly like OpenAI
curl http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "prompt": "Explain edge computing in one sentence"
  }'

Multi-Backend Architecture

Unlike single-framework tools, LocalAI supports multiple inference backends:

BackendUse CaseHardware
llama.cppGGUF models, general LLMCPU, any GPU
vLLMHigh-throughput servingNVIDIA GPU
transformersHugging Face modelsCPU/GPU
MLXApple Silicon optimizedMac M-series
diffusersImage generationCPU/GPU
whisper.cppSpeech-to-textCPU
piperText-to-speechCPU

This lets you pick the right backend for each task. Text generation with llama.cpp, images with diffusers, voice with piper.

Beyond Text

LocalAI handles more than chat completions:

The December 2025 release added dynamic memory management and multi-GPU distribution for large models.

Distributed Inference

LocalAI includes P2P capabilities for splitting work across machines:

# docker-compose for distributed setup
services:
  localai-main:
    image: localai/localai:latest
    environment:
      - LOCALAI_GALLERIES=...
      - LOCALAI_P2P=true
    ports:
      - "8080:8080"

  localai-worker:
    image: localai/localai:latest
    environment:
      - LOCALAI_P2P=true
      - LOCALAI_P2P_TOKEN=${P2P_TOKEN}

Multiple machines share the inference load. No central coordinator required.

Practical Workflow: Slack Documentation Bot

Di Giacinto wrote about building a Slack bot that answers questions from project documentation, running entirely on local hardware:

  1. Index documentation with embeddings
  2. Store vectors in local ChromaDB
  3. Query with LangChain + LocalAI
  4. Return answers without external API calls

From his blog post:

“The Slack bot is aware of the Kairos project documentation and can provide answers to questions based on it. It operates locally, without requiring an OpenAI API key.”

Key Takeaways

PrincipleImplementation
API compatibilityDrop-in OpenAI replacement
No GPU requiredRuns on consumer CPU
Multi-backendllama.cpp, vLLM, MLX, more
Full-stack AIText, image, audio, video
DistributedP2P inference across machines
Privacy firstData never leaves your network

Next: Jesse Vincent’s Superpowers Framework

Topics: open-source local-first ai-coding privacy