Output Formats

Reference for all SpecAlign output file formats.

Overview

SpecAlign generates several output files during execution:

File

Format

Description

specs.json

JSON

Generated specifications

seeds.json

JSON

Seed prompts for testing

episodes.jsonl

JSON Lines

Full adversarial episode logs

dpo_dataset.json

JSON

DPO preference pairs

context_pool.jsonl

JSON Lines

Successful attack examples

token_stats.jsonl

JSON Lines

Token usage statistics

specs.json

Contains generated specifications with their rules.

Schema:

[
  {
    "id": "spec_001",
    "rules": [
      {
        "id": "R12",
        "text": "Rule description...",
        "category": "safety",
        "stage": "response"
      }
    ],
    "instruction": "Natural language instruction for this spec...",
    "metadata": {
      "created_at": "2024-01-15T10:30:00Z",
      "rule_count": 4
    }
  }
]

Fields:

Field

Type

Description

id

string

Unique specification identifier

rules

array

List of rules in this specification

rules[].id

string

Rule identifier

rules[].text

string

Rule description text

rules[].category

string

Rule category (safety, privacy, etc.)

rules[].stage

string

Rule application stage

instruction

string

Natural language instruction

metadata

object

Additional metadata

seeds.json

Contains seed prompts for adversarial testing.

Schema:

[
  {
    "id": "seed_001",
    "spec_id": "spec_001",
    "prompt": "User prompt text...",
    "target_rules": ["R12", "R15"],
    "metadata": {
      "topic": "financial_advice",
      "difficulty": "medium"
    }
  }
]

Fields:

Field

Type

Description

id

string

Unique seed identifier

spec_id

string

Associated specification ID

prompt

string

The seed prompt text

target_rules

array

Rules this seed aims to test

metadata

object

Additional seed metadata

episodes.jsonl

Contains complete adversarial episode logs (JSON Lines format).

Schema:

{
  "id": "episode_001",
  "seed_id": "seed_001",
  "spec_id": "spec_001",
  "spec": { ... },
  "success": true,
  "rounds": [
    {
      "round": 1,
      "role": "attacker",
      "attacker_prompt": "Attack prompt...",
      "defender_response": "Response...",
      "safety_result": {
        "violation": true,
        "severity": "high",
        "violated_rules": ["R12"],
        "reasoning": "Explanation...",
        "evidence": "Specific text..."
      },
      "quality_result": {
        "score": 0.85,
        "dimensions": {
          "relevance": 0.9,
          "completeness": 0.8,
          "clarity": 0.85
        }
      }
    }
  ],
  "attack_round": { ... },
  "violated_rules": ["R12", "R15"],
  "compliant_response": "Generated compliant response...",
  "metadata": {
    "total_rounds": 3,
    "tokens_used": 2500,
    "duration_seconds": 15.3
  }
}

Key Fields:

Field

Type

Description

success

bool

Whether attack succeeded

rounds

array

All conversation rounds

attack_round

object

The successful attack round (if any)

violated_rules

array

Rules violated in successful attack

compliant_response

string

Generated specification-compliant response

dpo_dataset.json

Contains DPO preference pairs for training.

Schema:

[
  {
    "prompt": "User query or instruction...",
    "chosen": "Preferred (specification-compliant) response...",
    "rejected": "Dispreferred (violating) response...",
    "metadata": {
      "spec_id": "spec_001",
      "violated_rules": ["R12"],
      "quality_score": 0.87,
      "strategy": "two_step_reframe"
    }
  }
]

Fields:

Field

Type

Description

prompt

string

The input prompt

chosen

string

Preferred response (compliant)

rejected

string

Dispreferred response (violating)

metadata

object

Additional pair metadata

Training Format:

For direct use in training, a simplified format is also saved:

[
  {
    "prompt": "...",
    "chosen": "...",
    "rejected": "..."
  }
]

context_pool.jsonl

Contains successful attack examples for strategy improvement.

Schema:

{
  "id": "ctx_001",
  "prompt": "Successful attack prompt...",
  "response": "Violating response...",
  "spec_id": "spec_001",
  "violated_rules": ["R12"],
  "embedding": [0.123, -0.456, ...],
  "diversity_score": 0.78,
  "timestamp": "2024-01-15T10:30:00Z"
}

Fields:

Field

Type

Description

embedding

array

Vector embedding for similarity search

diversity_score

float

Score indicating uniqueness (0-1)

token_stats.jsonl

Contains token usage statistics per operation.

Schema:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "operation": "red_team_round",
  "component": "attacker",
  "model": "gpt-4o",
  "prompt_tokens": 1500,
  "completion_tokens": 500,
  "total_tokens": 2000
}

File Locations

Default output structure:

output/
├── specs.json
├── seeds.json
├── episodes.jsonl
├── dpo_dataset.json
├── dpo_dataset_training.json
├── context_pool.jsonl
└── token_stats.jsonl

Custom output directory:

specalign run --output my_experiment/