Yohei Nakajima's Build-in-Public AI Agent System

Table of content

Yohei Nakajima didn’t set out to create one of the most influential AI projects of 2023. He was prototyping an “autonomous founder”—an AI that could run a startup—when he realized the task-planning loop he’d built could be useful for more general purposes. He shared the code on Twitter, named it BabyAGI, and watched it go viral: millions of impressions, 22,000+ GitHub stars, coverage in Fortune, Forbes, Fast Company, and a TED AI talk in San Francisco.

But BabyAGI isn’t the system. It’s an artifact of the system. Nakajima’s real methodology is “build-in-public”—a workflow where he treats building AI experiments as a way to learn, meet founders, and contribute to the ecosystem. He’s a venture capitalist by day (general partner at Untapped Capital), builder by night, and his personal AI operating system runs on this dual identity.

Background

Personal site | GitHub | Twitter: @yoheinakajima | Build Log

The System: Build-in-Public for AI

Nakajima’s core insight is simple: building publicly is a superior learning strategy. Instead of reading papers or taking courses, he builds prototypes, shares them on Twitter, and learns from the feedback loop. This compounds in three ways:

  1. Learn faster: Shipping forces you to understand something deeply enough to make it work
  2. Meet founders: Building AI tools attracts AI founders—exactly who a VC wants to meet
  3. Contribute to the ecosystem: Open-source experiments help others learn and build

The build cycle:

PhaseWhat Happens
SparkSee an interesting AI capability or problem
PrototypeBuild a minimal working version (often in a single file)
SharePost on Twitter with explanation thread
IterateRespond to feedback, build variants (BabyBeeAGI, BabyCatAGI, BabyFoxAGI)
DocumentWrite a blog post or “paper” explaining the architecture
ArchiveMove to archive when superseded, start fresh

This is why BabyAGI has multiple versions: each iteration is a learning experiment, not a product. The original BabyAGI from March 2023 is now archived. The current BabyAGI is a self-building framework—an agent that can write its own functions.

The BabyAGI Architecture

BabyAGI introduced a pattern that became foundational for autonomous agents: the task-planning loop. The original system had three components:

  1. Execution Agent: Completes the current task using GPT-4
  2. Task Creation Agent: Generates new tasks based on results
  3. Prioritization Agent: Reorders the task queue based on objectives

The loop runs continuously: execute → create new tasks → prioritize → repeat. Vector storage (Pinecone) provides memory across iterations.

What made this significant wasn’t the complexity—the original was ~100 lines of Python. It was the demonstration that LLMs could plan, execute, and adapt without human intervention. Nakajima released it with a “paper” (actually GPT-4-generated based on the codebase) explaining the architecture, which helped researchers and builders understand the pattern.

Key variants:

VersionInnovation
BabyBeeAGITask management + function expansion
BabyCatAGIFaster execution through parallelization
BabyFoxAGISkill library + more sophisticated task chaining
BabyAGI 2.0Self-building: agent writes its own functions via “functionz” framework

AI-Augmented VC Work

Nakajima doesn’t just build AI experiments—he uses AI to run his VC firm. Untapped Capital is known for leveraging AI and automations across their investment process:

GPT VC Associate: A custom GPT that founders can use to practice their startup pitch. It asks questions, expands on answers, and generates a downloadable investment memo. This is publicly available, which means:

AI-assisted startup research: Automated pipelines for sourcing, screening, and analyzing startups. He’s presented on “How to run your VC firm using AI” at events like the a16z GP-LP Mixer.

Build log: A public Softr-powered dashboard tracking every experiment he builds, creating a living portfolio of work.

The Self-Improving Agent Philosophy

Nakajima’s recent work (December 2025) focuses on self-improving agents—systems that get better through their own experience rather than human labels. In his synthesis of NeurIPS 2025 research, he identifies six mechanisms:

  1. Self-reflection: Agents critique their own outputs and try again (Reflexion, Self-Refine)
  2. Self-generated curricula: Agents create the tasks they learn from
  3. Self-adapting models: Agents fine-tune themselves based on feedback
  4. Self-improving code agents: Agents modify their own source code
  5. Embodied self-improvement: Agents learn by acting in environments
  6. Verification and safety: Keeping self-improvement from going off the rails

The current BabyAGI implements this through “functionz”—a framework where functions are stored in a database with metadata about dependencies, imports, and relationships. The agent can register new functions, manage them via a dashboard, and build on its own capabilities over time.

import babyagi

@babyagi.register_function(
    imports=["math"],
    dependencies=["circle_area"],
    metadata={"description": "Calculates cylinder volume"}
)
def cylinder_volume(radius, height):
    import math
    area = circle_area(radius)
    return area * height

Knowledge Graphs + LLMs

Beyond task planning, Nakajima builds tools for knowledge representation:

These tools reflect a consistent philosophy: AI should help structure and retrieve information, not just generate text.

Why This Matters

Nakajima represents a specific archetype: the practitioner-theorist who builds first and explains second. His influence on autonomous agents comes not from papers or credentials (he’s quick to note he’s never held a job as a developer), but from shipping working code that others can learn from.

The build-in-public methodology has broader implications:

For VCs: Building publicly creates deal flow. Founders working on AI seek out investors who understand the technology deeply enough to build it themselves.

For learning: Building beats reading. A working BabyAGI teaches more about autonomous agents than any number of blog posts about autonomous agents.

For the ecosystem: Open-source experiments compound. BabyAGI influenced dozens of other agent frameworks, got cited in research papers, and appeared in Sequoia and a16z’s AI Canon—not because it was the most sophisticated, but because it was available to learn from.

The warning signs are also instructive. Nakajima’s BabyAGI README includes disclaimers about paperclip maximizers and AI safety risks. He’s not naive about the dangers of self-improving systems—the code comes with notes about “worst case scenarios” and why you shouldn’t run autonomous agents without constraints.

What You Can Steal

TechniqueHow to Apply
Task-planning loopImplement: execute task → generate new tasks from result → prioritize → repeat
Build-in-public workflowShare prototypes early, iterate publicly, document what you learn
Self-generated curriculaLet your agents create their own training tasks with verifiable outcomes
Function registry patternStore agent capabilities as database entries with metadata
Knowledge graph extractionUse LLMs to convert unstructured text into structured graph representations
GPT as VC AssociateBuild domain-specific GPTs that surface information while providing value

Quick start with BabyAGI:

pip install babyagi
import babyagi

if __name__ == "__main__":
    app = babyagi.create_app('/dashboard')
    app.run(host='0.0.0.0', port=8080)

Navigate to http://localhost:8080/dashboard to see the function registry and logs.

Resources:


Next: Steve Krshakov’s Local AI System

Topics: autonomous-agents babyagi build-in-public vc open-source