llm-application-dev
LLM application development with LangGraph, RAG systems, vector search, and AI agent architectures for Claude 4.5 and GPT-5.2
View on GitHubTable of content
LLM application development with LangGraph, RAG systems, vector search, and AI agent architectures for Claude 4.5 and GPT-5.2
Installation
npx claude-plugins install @wshobson/claude-code-workflows/llm-application-dev
Contents
Folders: agents, commands, skills
Files: README.md
Documentation
Build production-ready LLM applications, advanced RAG systems, and intelligent agents with modern AI patterns.
Version 2.0.0 Highlights
- LangGraph Integration: Updated from deprecated LangChain patterns to LangGraph StateGraph workflows
- Modern Model Support: Claude Opus/Sonnet/Haiku 4.5 and GPT-5.2/GPT-5.2-mini
- Voyage AI Embeddings: Recommended embedding models for Claude applications
- Structured Outputs: Pydantic-based structured output patterns
Features
Core Capabilities
- RAG Systems: Production retrieval-augmented generation with hybrid search
- Vector Search: Pinecone, Qdrant, Weaviate, Milvus, pgvector optimization
- Agent Architectures: LangGraph-based agents with memory and tool use
- Prompt Engineering: Advanced prompting techniques with model-specific optimization
Key Technologies
- LangChain 1.x / LangGraph for agent workflows
- Voyage AI, OpenAI, and open-source embedding models
- HNSW, IVF, and Product Quantization index strategies
- Async patterns with checkpointers for durable execution
Agents
| Agent | Description |
|---|---|
ai-engineer | Production-grade LLM applications, RAG systems, and agent architectures |
prompt-engineer | Advanced prompting techniques, constitutional AI, and model optimization |
vector-database-engineer | Vector search implementation, embedding strategies, and semantic retrieval |
Skills
| Skill | Description |
|---|---|
langchain-architecture | LangGraph StateGraph patterns, memory, and tool integration |
rag-implementation | RAG systems with hybrid search and reranking |
llm-evaluation | Evaluation frameworks for LLM applications |
prompt-engineering-patterns | Chain-of-thought, few-shot, and structured outputs |
embedding-strategies | Embedding model selection and optimization |
similarity-search-patterns | Vector similarity search implementation |
vector-index-tuning | HNSW, IVF, and quantization optimization |
hybrid-search-implementation | Vector + keyword search fusion |
Commands
| Command | Description |
|---|---|
/llm-application-dev:langchain-agent | Create LangGraph-based agent |
/llm-application-dev:ai-assistant | Build AI assistant application |
/llm-application-dev:prompt-optimize | Optimize prompts for production |
Installation
/plugin install llm-application-dev
Requirements
- LangChain >= 1.2.0
- LangGraph >= 0.3.0
- Python 3.11+
Changelog
2.0.0 (January 2026)
- Breaking: Migrated from LangChain 0.x to LangChain 1.x/LangGraph
- Breaking: Updated model references to Claude 4.5 and GPT-5.2
- Added Voyage AI as primary embedding recommendation for Claude apps
- Added LangGraph StateGraph patterns replacing deprecated
initialize_agent() - Added structured outputs with Pydantic
- Added async patterns with checkpointers
- Fixed security issue: replaced unsafe code execution with AST-based safe math evaluation
- Updated hybrid search with modern Pinecone client API
1.2.2
- Minor bug fixes and documentation updates
License
MIT License - See the plugin configuration for details.
Included Skills
This plugin includes 5 skill definitions:
embedding-strategies
Select and optimize embedding models for semantic search and RAG applications. Use when choosing embedding models, implementing chunking strategies, or optimizing embedding quality for specific domains.
View skill definition
Embedding Strategies
Guide to selecting and optimizing embedding models for vector search applications.
When to Use This Skill
- Choosing embedding models for RAG
- Optimizing chunking strategies
- Fine-tuning embeddings for domains
- Comparing embedding model performance
- Reducing embedding dimensions
- Handling multilingual content
Core Concepts
1. Embedding Model Comparison (2026)
| Model | Dimensions | Max Tokens | Best For |
|---|---|---|---|
| voyage-3-large | 1024 | 32000 | Claude apps (Anthropic recommended) |
| voyage-3 | 1024 | 32000 | Claude apps, cost-effective |
| voyage-code-3 | 1024 | 32000 | Code search |
| voyage-finance-2 | 1024 | 32000 | Financial documents |
| voyage-law-2 | 1024 | 32000 | Legal documents |
| text-embedding-3-large | 3072 | 8191 | OpenAI apps, high accuracy |
| text-embedding-3-small | 1536 | 8191 | OpenAI apps, cost-effective |
| bge-large-en-v1.5 | 1024 | 512 | Open source, local deployment |
| all-MiniLM-L6-v2 | 384 | 256 | Fast, lightweight |
| multilingual-e5-large | 1024 | 512 |
…(truncated)
hybrid-search-implementation
Combine vector and keyword search for improved retrieval. Use when implementing RAG systems, building search engines, or when neither approach alone provides sufficient recall.
View skill definition
Hybrid Search Implementation
Patterns for combining vector similarity and keyword-based search.
When to Use This Skill
- Building RAG systems with improved recall
- Combining semantic understanding with exact matching
- Handling queries with specific terms (names, codes)
- Improving search for domain-specific vocabulary
- When pure vector search misses keyword matches
Core Concepts
1. Hybrid Search Architecture
Query → ┬─► Vector Search ──► Candidates ─┐
│ │
└─► Keyword Search ─► Candidates ─┴─► Fusion ─► Results
2. Fusion Methods
| Method | Description | Best For |
|---|---|---|
| RRF | Reciprocal Rank Fusion | General purpose |
| Linear | Weighted sum of scores | Tunable balance |
| Cross-encoder | Rerank with neural model | Highest quality |
| Cascade | Filter then rerank | Efficiency |
Templates
Template 1: Reciprocal Rank Fusion
from typing import List, Dict, Tuple
from collections import defaultdict
def reciprocal_rank_fusion(
result_lists: List[List[Tuple[str, float]]],
k: int = 60,
weights: List[float] = None
) -> List[Tuple[str, float]]:
"""
Combine multiple ranked lists using RRF.
Args:
result_lists: List of (doc_id, score) tuples per search method
k: RRF constant (higher = more weight to l
...(truncated)
</details>
### langchain-architecture
> Design LLM applications using LangChain 1.x and LangGraph for agents, memory, and tool integration. Use when building LangChain applications, implementing AI agents, or creating complex LLM workflows.
<details>
<summary>View skill definition</summary>
# LangChain & LangGraph Architecture
Master modern LangChain 1.x and LangGraph for building sophisticated LLM applications with agents, state management, memory, and tool integration.
## When to Use This Skill
- Building autonomous AI agents with tool access
- Implementing complex multi-step LLM workflows
- Managing conversation memory and state
- Integrating LLMs with external data sources and APIs
- Creating modular, reusable LLM application components
- Implementing document processing pipelines
- Building production-grade LLM applications
## Package Structure (LangChain 1.x)
langchain (1.2.x) # High-level orchestration langchain-core (1.2.x) # Core abstractions (messages, prompts, tools) langchain-community # Third-party integrations langgraph # Agent orchestration and state management langchain-openai # OpenAI integrations langchain-anthropic # Anthropic/Claude integrations langchain-voyageai # Voyage AI embeddings langchain-pinecone # Pinecone vector store
## Core Concepts
### 1. LangGraph Agents
LangGraph is the standard for building agents in 2026. It provides:
**Key Features:**
- **StateGraph**: Explicit state management with typed state
- **Durable Execution**: Agents persist through failures
- **Human-in-the-Loop**: Inspect and modify state at any point
- **Memory**: Short-term and long-term memory across sessions
- **Checkpointing**: Save and resume agent state
**Agent Patterns:**
-
...(truncated)
</details>
### llm-evaluation
> Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
<details>
<summary>View skill definition</summary>
# LLM Evaluation
Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing.
## When to Use This Skill
- Measuring LLM application performance systematically
- Comparing different models or prompts
- Detecting performance regressions before deployment
- Validating improvements from prompt changes
- Building confidence in production systems
- Establishing baselines and tracking progress over time
- Debugging unexpected model behavior
## Core Evaluation Types
### 1. Automated Metrics
Fast, repeatable, scalable evaluation using computed scores.
**Text Generation:**
- **BLEU**: N-gram overlap (translation)
- **ROUGE**: Recall-oriented (summarization)
- **METEOR**: Semantic similarity
- **BERTScore**: Embedding-based similarity
- **Perplexity**: Language model confidence
**Classification:**
- **Accuracy**: Percentage correct
- **Precision/Recall/F1**: Class-specific performance
- **Confusion Matrix**: Error patterns
- **AUC-ROC**: Ranking quality
**Retrieval (RAG):**
- **MRR**: Mean Reciprocal Rank
- **NDCG**: Normalized Discounted Cumulative Gain
- **Precision@K**: Relevant in top K
- **Recall@K**: Coverage in top K
### 2. Human Evaluation
Manual assessment for quality aspects difficult to automate.
**Dimensions:**
- **Accuracy**: Factual correctness
- **Coherence**: Logical flow
- **Relevance**: Answers the question
- **Fluency**: Natural language quality
- **Safety**: No harmful content
- **Helpful
...(truncated)
</details>
### prompt-engineering-patterns
> Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production. Use when optimizing prompts, improving LLM outputs, or designing production prompt templates.
<details>
<summary>View skill definition</summary>
# Prompt Engineering Patterns
Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability.
## When to Use This Skill
- Designing complex prompts for production LLM applications
- Optimizing prompt performance and consistency
- Implementing structured reasoning patterns (chain-of-thought, tree-of-thought)
- Building few-shot learning systems with dynamic example selection
- Creating reusable prompt templates with variable interpolation
- Debugging and refining prompts that produce inconsistent outputs
- Implementing system prompts for specialized AI assistants
- Using structured outputs (JSON mode) for reliable parsing
## Core Capabilities
### 1. Few-Shot Learning
- Example selection strategies (semantic similarity, diversity sampling)
- Balancing example count with context window constraints
- Constructing effective demonstrations with input-output pairs
- Dynamic example retrieval from knowledge bases
- Handling edge cases through strategic example selection
### 2. Chain-of-Thought Prompting
- Step-by-step reasoning elicitation
- Zero-shot CoT with "Let's think step by step"
- Few-shot CoT with reasoning traces
- Self-consistency techniques (sampling multiple reasoning paths)
- Verification and validation steps
### 3. Structured Outputs
- JSON mode for reliable parsing
- Pydantic schema enforcement
- Type-safe response handling
- Error handling for malformed outputs
### 4. Prompt Optimization
- Iterative refinement
...(truncated)
</details>
## Source
[View on GitHub](https://github.com/wshobson/agents)