SpecAlign Documentation

Documentation Status

SpecAlign is a multi-agent adversarial testing framework for evaluating LLM compliance with safety specifications. It generates synthetic data through red team testing, producing DPO (Direct Preference Optimization) datasets for model alignment.

Key Features

  • Multi-agent adversarial testing: Attacker and Defender agents compete to find specification violations

  • DPO dataset generation: Automatically generates preference pairs for training

  • Configurable providers: Support for OpenAI, Anthropic, local models, and more

  • Rich CLI: Modern command-line interface with progress tracking

  • Flexible configuration: JSON config files with environment variable support

Quick Example

# Install SpecAlign
pip install specalign

# Initialize configuration
specalign config init

# Set API key
export SPECALIGN_OPENAI_API_KEY="your-api-key"

# Run the complete pipeline
specalign run --config config.json

Documentation Contents

Indices and tables