llm-application-dev

LLM application development with LangGraph, RAG systems, vector search, and AI agent architectures for Claude 4.5 and GPT-5.2

View on GitHub
Author Seth Hobson
Namespace @wshobson/claude-code-workflows
Category ai-ml
Version 2.0.2
Stars 27,261
Downloads 304
self.md verified
Table of content

LLM application development with LangGraph, RAG systems, vector search, and AI agent architectures for Claude 4.5 and GPT-5.2

Installation

npx claude-plugins install @wshobson/claude-code-workflows/llm-application-dev

Contents

Folders: agents, commands, skills

Files: README.md

Documentation

Build production-ready LLM applications, advanced RAG systems, and intelligent agents with modern AI patterns.

Version 2.0.0 Highlights

Features

Core Capabilities

Key Technologies

Agents

AgentDescription
ai-engineerProduction-grade LLM applications, RAG systems, and agent architectures
prompt-engineerAdvanced prompting techniques, constitutional AI, and model optimization
vector-database-engineerVector search implementation, embedding strategies, and semantic retrieval

Skills

SkillDescription
langchain-architectureLangGraph StateGraph patterns, memory, and tool integration
rag-implementationRAG systems with hybrid search and reranking
llm-evaluationEvaluation frameworks for LLM applications
prompt-engineering-patternsChain-of-thought, few-shot, and structured outputs
embedding-strategiesEmbedding model selection and optimization
similarity-search-patternsVector similarity search implementation
vector-index-tuningHNSW, IVF, and quantization optimization
hybrid-search-implementationVector + keyword search fusion

Commands

CommandDescription
/llm-application-dev:langchain-agentCreate LangGraph-based agent
/llm-application-dev:ai-assistantBuild AI assistant application
/llm-application-dev:prompt-optimizeOptimize prompts for production

Installation

/plugin install llm-application-dev

Requirements

Changelog

2.0.0 (January 2026)

1.2.2

License

MIT License - See the plugin configuration for details.

Included Skills

This plugin includes 5 skill definitions:

embedding-strategies

Select and optimize embedding models for semantic search and RAG applications. Use when choosing embedding models, implementing chunking strategies, or optimizing embedding quality for specific domains.

View skill definition

Embedding Strategies

Guide to selecting and optimizing embedding models for vector search applications.

When to Use This Skill

Core Concepts

1. Embedding Model Comparison (2026)

ModelDimensionsMax TokensBest For
voyage-3-large102432000Claude apps (Anthropic recommended)
voyage-3102432000Claude apps, cost-effective
voyage-code-3102432000Code search
voyage-finance-2102432000Financial documents
voyage-law-2102432000Legal documents
text-embedding-3-large30728191OpenAI apps, high accuracy
text-embedding-3-small15368191OpenAI apps, cost-effective
bge-large-en-v1.51024512Open source, local deployment
all-MiniLM-L6-v2384256Fast, lightweight
multilingual-e5-large1024512

…(truncated)

hybrid-search-implementation

Combine vector and keyword search for improved retrieval. Use when implementing RAG systems, building search engines, or when neither approach alone provides sufficient recall.

View skill definition

Hybrid Search Implementation

Patterns for combining vector similarity and keyword-based search.

When to Use This Skill

Core Concepts

1. Hybrid Search Architecture

Query → ┬─► Vector Search ──► Candidates ─┐
        │                                  │
        └─► Keyword Search ─► Candidates ─┴─► Fusion ─► Results

2. Fusion Methods

MethodDescriptionBest For
RRFReciprocal Rank FusionGeneral purpose
LinearWeighted sum of scoresTunable balance
Cross-encoderRerank with neural modelHighest quality
CascadeFilter then rerankEfficiency

Templates

Template 1: Reciprocal Rank Fusion

from typing import List, Dict, Tuple
from collections import defaultdict

def reciprocal_rank_fusion(
    result_lists: List[List[Tuple[str, float]]],
    k: int = 60,
    weights: List[float] = None
) -> List[Tuple[str, float]]:
    """
    Combine multiple ranked lists using RRF.

    Args:
        result_lists: List of (doc_id, score) tuples per search method
        k: RRF constant (higher = more weight to l

...(truncated)

</details>

### langchain-architecture

> Design LLM applications using LangChain 1.x and LangGraph for agents, memory, and tool integration. Use when building LangChain applications, implementing AI agents, or creating complex LLM workflows.

<details>
<summary>View skill definition</summary>

# LangChain & LangGraph Architecture

Master modern LangChain 1.x and LangGraph for building sophisticated LLM applications with agents, state management, memory, and tool integration.

## When to Use This Skill

- Building autonomous AI agents with tool access
- Implementing complex multi-step LLM workflows
- Managing conversation memory and state
- Integrating LLMs with external data sources and APIs
- Creating modular, reusable LLM application components
- Implementing document processing pipelines
- Building production-grade LLM applications

## Package Structure (LangChain 1.x)

langchain (1.2.x) # High-level orchestration langchain-core (1.2.x) # Core abstractions (messages, prompts, tools) langchain-community # Third-party integrations langgraph # Agent orchestration and state management langchain-openai # OpenAI integrations langchain-anthropic # Anthropic/Claude integrations langchain-voyageai # Voyage AI embeddings langchain-pinecone # Pinecone vector store


## Core Concepts

### 1. LangGraph Agents

LangGraph is the standard for building agents in 2026. It provides:

**Key Features:**

- **StateGraph**: Explicit state management with typed state
- **Durable Execution**: Agents persist through failures
- **Human-in-the-Loop**: Inspect and modify state at any point
- **Memory**: Short-term and long-term memory across sessions
- **Checkpointing**: Save and resume agent state

**Agent Patterns:**

- 

...(truncated)

</details>

### llm-evaluation

> Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

<details>
<summary>View skill definition</summary>

# LLM Evaluation

Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing.

## When to Use This Skill

- Measuring LLM application performance systematically
- Comparing different models or prompts
- Detecting performance regressions before deployment
- Validating improvements from prompt changes
- Building confidence in production systems
- Establishing baselines and tracking progress over time
- Debugging unexpected model behavior

## Core Evaluation Types

### 1. Automated Metrics

Fast, repeatable, scalable evaluation using computed scores.

**Text Generation:**

- **BLEU**: N-gram overlap (translation)
- **ROUGE**: Recall-oriented (summarization)
- **METEOR**: Semantic similarity
- **BERTScore**: Embedding-based similarity
- **Perplexity**: Language model confidence

**Classification:**

- **Accuracy**: Percentage correct
- **Precision/Recall/F1**: Class-specific performance
- **Confusion Matrix**: Error patterns
- **AUC-ROC**: Ranking quality

**Retrieval (RAG):**

- **MRR**: Mean Reciprocal Rank
- **NDCG**: Normalized Discounted Cumulative Gain
- **Precision@K**: Relevant in top K
- **Recall@K**: Coverage in top K

### 2. Human Evaluation

Manual assessment for quality aspects difficult to automate.

**Dimensions:**

- **Accuracy**: Factual correctness
- **Coherence**: Logical flow
- **Relevance**: Answers the question
- **Fluency**: Natural language quality
- **Safety**: No harmful content
- **Helpful

...(truncated)

</details>

### prompt-engineering-patterns

> Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production. Use when optimizing prompts, improving LLM outputs, or designing production prompt templates.

<details>
<summary>View skill definition</summary>

# Prompt Engineering Patterns

Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability.

## When to Use This Skill

- Designing complex prompts for production LLM applications
- Optimizing prompt performance and consistency
- Implementing structured reasoning patterns (chain-of-thought, tree-of-thought)
- Building few-shot learning systems with dynamic example selection
- Creating reusable prompt templates with variable interpolation
- Debugging and refining prompts that produce inconsistent outputs
- Implementing system prompts for specialized AI assistants
- Using structured outputs (JSON mode) for reliable parsing

## Core Capabilities

### 1. Few-Shot Learning

- Example selection strategies (semantic similarity, diversity sampling)
- Balancing example count with context window constraints
- Constructing effective demonstrations with input-output pairs
- Dynamic example retrieval from knowledge bases
- Handling edge cases through strategic example selection

### 2. Chain-of-Thought Prompting

- Step-by-step reasoning elicitation
- Zero-shot CoT with "Let's think step by step"
- Few-shot CoT with reasoning traces
- Self-consistency techniques (sampling multiple reasoning paths)
- Verification and validation steps

### 3. Structured Outputs

- JSON mode for reliable parsing
- Pydantic schema enforcement
- Type-safe response handling
- Error handling for malformed outputs

### 4. Prompt Optimization

- Iterative refinement 

...(truncated)

</details>

## Source

[View on GitHub](https://github.com/wshobson/agents)
Tags: ai-ml