file over app: why ai should work with files, not databases

Table of content

the principle

steph ango (CEO of obsidian) articulated the file over app philosophy:

if you want to create digital artifacts that last, they must be files you can control, in formats that are easy to retrieve and read. use tools that give you this freedom.

the idea: apps are temporary, files are forever.

apps get acquired, shut down, paywalled, enshittified. files persist. if your data lives in a proprietary app database, you’re one corporate decision away from losing access.

if your data lives in plain text files on your disk, you own it. forever.

why this matters for ai

most AI tools store your data in their cloud. you interact through their app. you don’t have files, you have “workspaces” or “projects” or “conversations.”

when the tool shuts down (or raises prices, or gets acquired), your data is: → trapped in a walled garden
→ exportable only in formats the company chooses
→ lost entirely if the company doesn’t offer export

this is the opposite of file over app. it’s app over data.

what file-first ai looks like

→ plain text prompts and outputs
you write a prompt in a .txt or .md file. the AI reads it, generates a response, saves it to another file. no app, no database. just files.

→ markdown-based workflows
your knowledge base is a folder of markdown files. the AI reads them for context, generates new files, updates existing ones. all version-controlled with git.

→ local embeddings
instead of uploading docs to a vector database in the cloud, you generate embeddings locally and store them alongside your files. the AI queries them locally.

→ shell-scriptable agents
you can pipe text to an agent, get text back. cat prompt.txt | ai-agent > output.txt. no API key, no login, no vendor lock-in.

this is how unix tools work. compose them. chain them. swap them out when better ones appear.

the obsidian model

obsidian is the poster child for file-over-app. your notes are markdown files in a folder. obsidian is just a UI. if obsidian disappeared tomorrow, your notes would still be readable.

AI tools should follow this model: → your prompts are files — saved as .md or .txt
→ your outputs are files — generated into the same folder
→ your context is files — the AI reads from a folder you control
→ the tool is optional — you can switch to a different AI tool without migrating data

right now, almost no AI tools work this way. your “chats” live in a database. your “agents” live in a platform. your “workflows” live in a SaaS product.

the export trap

some tools offer export. “download all your data as JSON.”

this is not the same as file-first. because: → JSON is not human-readable — you can’t grep it, edit it, read it without a parser
→ structure is opaque — nested objects, arbitrary schemas, no guarantees
→ import is unsupported — you can export, but you can’t import into another tool

real portability means: your data is in a format anyone can read, edit, and reuse. markdown, CSV, plain text, JSON-lines (newline-delimited JSON).

the git-based workflow

if your AI data is files, you can version control it.

git add prompts/
git commit -m "added research prompts for Q1"
git push

now your AI workflows are: → versioned — you can see what changed and when
→ collaborative — others can clone, fork, and contribute
→ backed up — stored in git, not just one company’s server
→ auditable — full history of prompts and outputs

this is how code works. why not AI workflows?

the problem with saas ai

most AI tools are SaaS: → you log in to their app
→ you type prompts in their UI
→ they store everything in their database
→ you pay a subscription to access your own data

this model works for them (recurring revenue, vendor lock-in). it’s bad for you (no ownership, no portability, no control).

the alternative: local-first AI.

you run a model on your machine. your data never leaves. you interact via files or a local server. if you want to switch models, you switch models. your data doesn’t move.

the “but it’s easier” argument

yes, SaaS AI is easier. no setup, no config, just sign up and go.

file-first AI requires: → understanding file systems
→ maybe running a local server
→ managing your own backups
→ dealing with compatibility issues

this is a real cost. for most people, “easier” wins.

but easier for now often means harder later. when the tool shuts down, when the price 10x, when the company pivots, you’re stuck.

the file-over-app philosophy is long-term thinking. optimize for durability, not convenience.

the interoperability win

if your AI data is files, you can mix tools.

use tool A for embeddings, tool B for generation, tool C for fine-tuning. they all work with the same files. no vendor lock-in, no API translation layer, no data migration.

this is how unix tools work. grep, awk, sed — all operate on text streams. they compose cleanly because the data format is universal.

AI tools should be the same. but right now, every tool has its own format, its own API, its own walled garden.

the local-first stack

a file-first, local-first AI stack looks like:

→ models: local LLMs
Llama, Mistral, Phi — run them with ollama, llama.cpp, or HuggingFace transformers.

→ storage: plain text + git
prompts, outputs, embeddings — all stored as files, versioned with git.

→ search: local vector DB
sqlite-vec, chroma (local mode), or just embeddings stored as numpy arrays.

→ interface: shell scripts + markdown
write prompts in markdown, run them with a script, save outputs to files.

this is more work to set up than “sign up for ChatGPT.” but it’s also: → free (no subscription)
→ private (no cloud)
→ portable (files, not databases)
→ durable (works forever, no vendor dependency)

the command-line agent pattern

instead of a GUI, what if your AI agent was a CLI tool?

# generate a summary
ai summarize input.txt > summary.md

# embed a folder of docs
ai embed docs/ --output embeddings/

# query with context
ai query "what is X?" --context docs/

this composes with unix pipes:

cat article.md | ai summarize | ai translate --to=french > article_fr.md

no app, no login, no vendor. just a tool that reads files and writes files.

some tools are moving in this direction (llm by simon willison, aider, continue.dev). but most AI products are still GUI-first, file-hostile.

the mobile problem

file-over-app works great on desktops. on mobile, not so much.

mobile OSes hide the file system. apps are sandboxed. you can’t just “open a folder” and edit files.

so mobile AI tools end up being app-first by necessity. your chats live in the app, not in files.

this is a platform limitation, not an AI problem. but it makes file-first AI less practical for phone users.

the workaround: sync to desktop. use the AI on your phone, but the files live on your desktop (synced via icloud, dropbox, syncthing). mobile is a view, desktop is the source of truth.

the open format argument

even if you can’t avoid apps, you can demand open formats.

your AI tool should export: → prompts as markdown — readable, editable, git-friendly
→ embeddings as parquet or JSON-lines — standard, portable
→ metadata as YAML or TOML — human-editable config

if a tool locks you into a proprietary format, it’s a red flag. you’re betting your data on their long-term survival.

the illich lens

illich distinguished between tools (extend human capability) and institutions (create dependency).

apps are institutions. they require: → accounts, logins, subscriptions
→ vendor infrastructure
→ permission to access your own data

files are tools. they require: → a file system (which you already have)
→ basic literacy (how to open, edit, move files)
→ no permission from anyone

file-first AI is convivial technology — it empowers you without creating dependency. app-first AI is institutional — it makes you dependent on the vendor.

choose accordingly.

the long game

in 10 years, most of today’s AI tools will be gone. acquired, pivoted, shut down, or irrelevant.

if your data is in their app, it’s gone too.

if your data is in files on your disk, it’s still there. readable, searchable, usable.

file over app is not about nostalgia. it’s about durability. it’s about owning your work. it’s about not trusting companies to preserve your data when preserving your data doesn’t make them money.

it’s about building for the long term, not the next funding round.

questions worth asking

if the AI tool you use shut down tomorrow, could you access your data? in what format?
do you know where your AI conversations, prompts, and outputs are stored — on your machine, or someone else’s server?
if you wanted to switch AI tools, how hard would it be to migrate your data?
are you optimizing for convenience now, or durability later?