sglang

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5x faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

View on GitHub
Author Orchestra Research
Namespace @zechenzhangAGI/ai-research-skills
Category general
Version 1.0.0
Stars 735
Downloads 2
self.md verified
Table of content

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5x faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.

Installation

npx claude-plugins install @zechenzhangAGI/ai-research-skills/sglang

Contents

Folders: references

Files: SKILL.md

Source

View on GitHub

Tags: general