llm-application-dev

Name: llm-application-dev
Rating: 4.5 (5 reviews)
Author: Seth Hobson

LLM application development, prompt engineering, and AI assistant optimization

View on GitHub

Author Seth Hobson

Namespace @kivilaid/ando-marketplace

Category ai-ml

Version 1.2.1

Stars 8

Downloads 5

self.md verified

Table of content

LLM application development, prompt engineering, and AI assistant optimization

Installation

npx claude-plugins install @kivilaid/ando-marketplace/llm-application-dev

Folders: agents, commands, skills

Included Skills

This plugin includes 4 skill definitions:

langchain-architecture

Design LLM applications using the LangChain framework with agents, memory, and tool integration patterns. Use when building LangChain applications, implementing AI agents, or creating complex LLM workflows.

View skill definition

LangChain Architecture

Master the LangChain framework for building sophisticated LLM applications with agents, chains, memory, and tool integration.

When to Use This Skill

Building autonomous AI agents with tool access
Implementing complex multi-step LLM workflows
Managing conversation memory and state
Integrating LLMs with external data sources and APIs
Creating modular, reusable LLM application components
Implementing document processing pipelines
Building production-grade LLM applications

Core Concepts

1. Agents

Autonomous systems that use LLMs to decide which actions to take.

Agent Types:

ReAct: Reasoning + Acting in interleaved manner
OpenAI Functions: Leverages function calling API
Structured Chat: Handles multi-input tools
Conversational: Optimized for chat interfaces
Self-Ask with Search: Decomposes complex queries

2. Chains

Sequences of calls to LLMs or other utilities.

Chain Types:

LLMChain: Basic prompt + LLM combination
SequentialChain: Multiple chains in sequence
RouterChain: Routes inputs to specialized chains
TransformChain: Data transformations between steps
MapReduceChain: Parallel processing with aggregation

3. Memory

Systems for maintaining context across interactions.

Memory Types:

ConversationBufferMemory: Stores all messages
ConversationSummaryMemory: Summarizes older messages
ConversationBufferWindowMemory: Keeps last N m

…(truncated)

llm-evaluation

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

View skill definition

LLM Evaluation

Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing.

When to Use This Skill

Measuring LLM application performance systematically
Comparing different models or prompts
Detecting performance regressions before deployment
Validating improvements from prompt changes
Building confidence in production systems
Establishing baselines and tracking progress over time
Debugging unexpected model behavior

Core Evaluation Types

1. Automated Metrics

Fast, repeatable, scalable evaluation using computed scores.

Text Generation:

BLEU: N-gram overlap (translation)
ROUGE: Recall-oriented (summarization)
METEOR: Semantic similarity
BERTScore: Embedding-based similarity
Perplexity: Language model confidence

Classification:

Accuracy: Percentage correct
Precision/Recall/F1: Class-specific performance
Confusion Matrix: Error patterns
AUC-ROC: Ranking quality

Retrieval (RAG):

MRR: Mean Reciprocal Rank
NDCG: Normalized Discounted Cumulative Gain
Precision@K: Relevant in top K
Recall@K: Coverage in top K

2. Human Evaluation

Manual assessment for quality aspects difficult to automate.

Dimensions:

Accuracy: Factual correctness
Coherence: Logical flow
Relevance: Answers the question
Fluency: Natural language quality
Safety: No harmful content
Helpfulness

…(truncated)

prompt-engineering-patterns

Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability in production. Use when optimizing prompts, improving LLM outputs, or designing production prompt templates.

View skill definition

Prompt Engineering Patterns

Master advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability.

When to Use This Skill

Designing complex prompts for production LLM applications
Optimizing prompt performance and consistency
Implementing structured reasoning patterns (chain-of-thought, tree-of-thought)
Building few-shot learning systems with dynamic example selection
Creating reusable prompt templates with variable interpolation
Debugging and refining prompts that produce inconsistent outputs
Implementing system prompts for specialized AI assistants

Core Capabilities

1. Few-Shot Learning

Example selection strategies (semantic similarity, diversity sampling)
Balancing example count with context window constraints
Constructing effective demonstrations with input-output pairs
Dynamic example retrieval from knowledge bases
Handling edge cases through strategic example selection

2. Chain-of-Thought Prompting

Step-by-step reasoning elicitation
Zero-shot CoT with “Let’s think step by step”
Few-shot CoT with reasoning traces
Self-consistency techniques (sampling multiple reasoning paths)
Verification and validation steps

3. Prompt Optimization

Iterative refinement workflows
A/B testing prompt variations
Measuring prompt performance metrics (accuracy, consistency, latency)
Reducing token usage while maintaining quality
Handling edge cases and failure modes

4. Template Sys

…(truncated)

rag-implementation

Build Retrieval-Augmented Generation (RAG) systems for LLM applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases.

View skill definition

RAG Implementation

Master Retrieval-Augmented Generation (RAG) to build LLM applications that provide accurate, grounded responses using external knowledge sources.

When to Use This Skill

Building Q&A systems over proprietary documents
Creating chatbots with current, factual information
Implementing semantic search with natural language queries
Reducing hallucinations with grounded responses
Enabling LLMs to access domain-specific knowledge
Building documentation assistants
Creating research tools with source citation

Core Components

1. Vector Databases

Purpose: Store and retrieve document embeddings efficiently

Options:

Pinecone: Managed, scalable, fast queries
Weaviate: Open-source, hybrid search
Milvus: High performance, on-premise
Chroma: Lightweight, easy to use
Qdrant: Fast, filtered search
FAISS: Meta’s library, local deployment

2. Embeddings

Purpose: Convert text to numerical vectors for similarity search

Models:

text-embedding-ada-002 (OpenAI): General purpose, 1536 dims
all-MiniLM-L6-v2 (Sentence Transformers): Fast, lightweight
e5-large-v2: High quality, multilingual
Instructor: Task-specific instructions
bge-large-en-v1.5: SOTA performance

3. Retrieval Strategies

Approaches:

Dense Retrieval: Semantic similarity via embeddings
Sparse Retrieval: Keyword matching (BM25, TF-IDF)
Hybrid Search: Combine dense + sparse
**Mul

…(truncated)

Source

View on GitHub

Tags: ai-ml llm ai prompt-engineering langchain gpt claude

llm-application-dev

Installation

Contents

Included Skills

langchain-architecture

LangChain Architecture

When to Use This Skill

Core Concepts

1. Agents

2. Chains

3. Memory

llm-evaluation

LLM Evaluation

When to Use This Skill

Core Evaluation Types

1. Automated Metrics

2. Human Evaluation

prompt-engineering-patterns

Prompt Engineering Patterns

When to Use This Skill

Core Capabilities

1. Few-Shot Learning

2. Chain-of-Thought Prompting

3. Prompt Optimization

4. Template Sys

rag-implementation

RAG Implementation

When to Use This Skill

Core Components

1. Vector Databases

2. Embeddings

3. Retrieval Strategies

Source