llm-application-dev

Name: llm-application-dev
Rating: 4.5 (304 reviews)
Author: Seth Hobson

LLM application development with LangGraph, RAG systems, vector search, and AI agent architectures for Claude 4.5 and GPT-5.2

View on GitHub

Author Seth Hobson

Namespace @wshobson/claude-code-workflows

Category ai-ml

Version 2.0.2

Stars 27,261

Downloads 304

self.md verified

Table of content

LLM application development with LangGraph, RAG systems, vector search, and AI agent architectures for Claude 4.5 and GPT-5.2

Installation

npx claude-plugins install @wshobson/claude-code-workflows/llm-application-dev

Folders: agents, commands, skills

Files: README.md

Documentation

Build production-ready LLM applications, advanced RAG systems, and intelligent agents with modern AI patterns.

Version 2.0.0 Highlights

LangGraph Integration: Updated from deprecated LangChain patterns to LangGraph StateGraph workflows
Modern Model Support: Claude Opus/Sonnet/Haiku 4.5 and GPT-5.2/GPT-5.2-mini
Voyage AI Embeddings: Recommended embedding models for Claude applications
Structured Outputs: Pydantic-based structured output patterns

Features

Core Capabilities

RAG Systems: Production retrieval-augmented generation with hybrid search
Vector Search: Pinecone, Qdrant, Weaviate, Milvus, pgvector optimization
Agent Architectures: LangGraph-based agents with memory and tool use
Prompt Engineering: Advanced prompting techniques with model-specific optimization

Key Technologies

LangChain 1.x / LangGraph for agent workflows
Voyage AI, OpenAI, and open-source embedding models
HNSW, IVF, and Product Quantization index strategies
Async patterns with checkpointers for durable execution

Agents

Agent	Description
`ai-engineer`	Production-grade LLM applications, RAG systems, and agent architectures
`prompt-engineer`	Advanced prompting techniques, constitutional AI, and model optimization
`vector-database-engineer`	Vector search implementation, embedding strategies, and semantic retrieval

Skills

Skill	Description
`langchain-architecture`	LangGraph StateGraph patterns, memory, and tool integration
`rag-implementation`	RAG systems with hybrid search and reranking
`llm-evaluation`	Evaluation frameworks for LLM applications
`prompt-engineering-patterns`	Chain-of-thought, few-shot, and structured outputs
`embedding-strategies`	Embedding model selection and optimization
`similarity-search-patterns`	Vector similarity search implementation
`vector-index-tuning`	HNSW, IVF, and quantization optimization
`hybrid-search-implementation`	Vector + keyword search fusion

Commands

Command	Description
`/llm-application-dev:langchain-agent`	Create LangGraph-based agent
`/llm-application-dev:ai-assistant`	Build AI assistant application
`/llm-application-dev:prompt-optimize`	Optimize prompts for production

Installation

/plugin install llm-application-dev

Requirements

LangChain >= 1.2.0
LangGraph >= 0.3.0
Python 3.11+

Changelog

2.0.0 (January 2026)

Breaking: Migrated from LangChain 0.x to LangChain 1.x/LangGraph
Breaking: Updated model references to Claude 4.5 and GPT-5.2
Added Voyage AI as primary embedding recommendation for Claude apps
Added LangGraph StateGraph patterns replacing deprecated initialize_agent()
Added structured outputs with Pydantic
Added async patterns with checkpointers
Fixed security issue: replaced unsafe code execution with AST-based safe math evaluation
Updated hybrid search with modern Pinecone client API

1.2.2

Minor bug fixes and documentation updates

License

MIT License - See the plugin configuration for details.

Included Skills

This plugin includes 5 skill definitions:

embedding-strategies

Select and optimize embedding models for semantic search and RAG applications. Use when choosing embedding models, implementing chunking strategies, or optimizing embedding quality for specific domains.

View skill definition

Embedding Strategies

Guide to selecting and optimizing embedding models for vector search applications.

When to Use This Skill

Choosing embedding models for RAG
Optimizing chunking strategies
Fine-tuning embeddings for domains
Comparing embedding model performance
Reducing embedding dimensions
Handling multilingual content

Core Concepts

1. Embedding Model Comparison (2026)

Model	Dimensions	Max Tokens	Best For
voyage-3-large	1024	32000	Claude apps (Anthropic recommended)
voyage-3	1024	32000	Claude apps, cost-effective
voyage-code-3	1024	32000	Code search
voyage-finance-2	1024	32000	Financial documents
voyage-law-2	1024	32000	Legal documents
text-embedding-3-large	3072	8191	OpenAI apps, high accuracy
text-embedding-3-small	1536	8191	OpenAI apps, cost-effective
bge-large-en-v1.5	1024	512	Open source, local deployment
all-MiniLM-L6-v2	384	256	Fast, lightweight
multilingual-e5-large	1024	512

…(truncated)

hybrid-search-implementation

Combine vector and keyword search for improved retrieval. Use when implementing RAG systems, building search engines, or when neither approach alone provides sufficient recall.

View skill definition

Hybrid Search Implementation

Patterns for combining vector similarity and keyword-based search.

When to Use This Skill

Building RAG systems with improved recall
Combining semantic understanding with exact matching
Handling queries with specific terms (names, codes)
Improving search for domain-specific vocabulary
When pure vector search misses keyword matches

Core Concepts

1. Hybrid Search Architecture

Query → ┬─► Vector Search ──► Candidates ─┐
        │                                  │
        └─► Keyword Search ─► Candidates ─┴─► Fusion ─► Results

2. Fusion Methods

Method	Description	Best For
RRF	Reciprocal Rank Fusion	General purpose
Linear	Weighted sum of scores	Tunable balance
Cross-encoder	Rerank with neural model	Highest quality
Cascade	Filter then rerank	Efficiency

Templates

Template 1: Reciprocal Rank Fusion

from typing import List, Dict, Tuple
from collections import defaultdict

def reciprocal_rank_fusion(
    result_lists: List[List[Tuple[str, float]]],
    k: int = 60,
    weights: List[float] = None
) -> List[Tuple[str, float]]:
    """
    Combine multiple ranked lists using RRF.

    Args:
        result_lists: List of (doc_id, score) tuples per search method
        k: RRF constant (higher = more weight to l

...(truncated)

</details>

### langchain-architecture

> Design LLM applications using LangChain 1.x and LangGraph for agents, memory, and tool integration. Use when building LangChain applications, implementing AI agents, or creating complex LLM workflows.

<details>
<summary>View skill definition</summary>

# LangChain & LangGraph Architecture

Master modern LangChain 1.x and LangGraph for building sophisticated LLM applications with agents, state management, memory, and tool integration.

## When to Use This Skill

- Building autonomous AI agents with tool access
- Implementing complex multi-step LLM workflows
- Managing conversation memory and state
- Integrating LLMs with external data sources and APIs
- Creating modular, reusable LLM application components
- Implementing document processing pipelines
- Building production-grade LLM applications

## Package Structure (LangChain 1.x)

langchain (1.2.x) # High-level orchestration langchain-core (1.2.x) # Core abstractions (messages, prompts, tools) langchain-community # Third-party integrations langgraph # Agent orchestration and state management langchain-openai # OpenAI integrations langchain-anthropic # Anthropic/Claude integrations langchain-voyageai # Voyage AI embeddings langchain-pinecone # Pinecone vector store


## Core Concepts

### 1. LangGraph Agents

LangGraph is the standard for building agents in 2026. It provides:

**Key Features:**

- **StateGraph**: Explicit state management with typed state
- **Durable Execution**: Agents persist through failures
- **Human-in-the-Loop**: Inspect and modify state at any point
- **Memory**: Short-term and long-term memory across sessions
- **Checkpointing**: Save and resume agent state

**Agent Patterns:**

- 

...(truncated)

</details>

### llm-evaluation

> Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

<details>
<summary>View skill definition</summary>

# LLM Evaluation

Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing.

## When to Use This Skill

- Measuring LLM application performance systematically
- Comparing different models or prompts
- Detecting performance regressions before deployment
- Validating improvements from prompt changes
- Building confidence in production systems
- Establishing baselines and tracking progress over time
- Debugging unexpected model behavior

## Core Evaluation Types

### 1. Automated Metrics

Fast, repeatable, scalable evaluation using computed scores.

**Text Generation:**

- **BLEU**: N-gram overlap (translation)
- **ROUGE**: Recall-oriented (summarization)
- **METEOR**: Semantic similarity
- **BERTScore**: Embedding-based similarity
- **Perplexity**: Language model confidence

**Classification:**

- **Accuracy**: Percentage correct
- **Precision/Recall/F1**: Class-specific performance
- **Confusion Matrix**: Error patterns
- **AUC-ROC**: Ranking quality

**Retrieval (RAG):**

- **MRR**: Mean Reciprocal Rank
- **NDCG**: Normalized Discounted Cumulative Gain
- **Precision@K**: Relevant in top K
- **Recall@K**: Coverage in top K

### 2. Human Evaluation

Manual assessment for quality aspects difficult to automate.

**Dimensions:**

- **Accuracy**: Factual correctness
- **Coherence**: Logical flow
- **Relevance**: Answers the question
- **Fluency**: Natural language quality
- **Safety**: No harmful content
- **Helpful

...(truncated)

</details>

### prompt-engineering-patterns

> Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production. Use when optimizing prompts, improving LLM outputs, or designing production prompt templates.

<details>
<summary>View skill definition</summary>

# Prompt Engineering Patterns

Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability.

## When to Use This Skill

- Designing complex prompts for production LLM applications
- Optimizing prompt performance and consistency
- Implementing structured reasoning patterns (chain-of-thought, tree-of-thought)
- Building few-shot learning systems with dynamic example selection
- Creating reusable prompt templates with variable interpolation
- Debugging and refining prompts that produce inconsistent outputs
- Implementing system prompts for specialized AI assistants
- Using structured outputs (JSON mode) for reliable parsing

## Core Capabilities

### 1. Few-Shot Learning

- Example selection strategies (semantic similarity, diversity sampling)
- Balancing example count with context window constraints
- Constructing effective demonstrations with input-output pairs
- Dynamic example retrieval from knowledge bases
- Handling edge cases through strategic example selection

### 2. Chain-of-Thought Prompting

- Step-by-step reasoning elicitation
- Zero-shot CoT with "Let's think step by step"
- Few-shot CoT with reasoning traces
- Self-consistency techniques (sampling multiple reasoning paths)
- Verification and validation steps

### 3. Structured Outputs

- JSON mode for reliable parsing
- Pydantic schema enforcement
- Type-safe response handling
- Error handling for malformed outputs

### 4. Prompt Optimization

- Iterative refinement 

...(truncated)

</details>

## Source

[View on GitHub](https://github.com/wshobson/agents)

Tags: ai-ml

llm-application-dev

Installation

Contents

Documentation

Version 2.0.0 Highlights

Features

Core Capabilities

Key Technologies

Agents

Skills

Commands

Installation

Requirements

Changelog

2.0.0 (January 2026)

1.2.2

License

Included Skills

embedding-strategies

Embedding Strategies

When to Use This Skill

Core Concepts

1. Embedding Model Comparison (2026)

hybrid-search-implementation

Hybrid Search Implementation

When to Use This Skill

Core Concepts

1. Hybrid Search Architecture

2. Fusion Methods

Templates

Template 1: Reciprocal Rank Fusion