Output Formats

Reference for all SpecAlign output file formats.

Overview

SpecAlign generates several output files during execution:

File	Format	Description
`specs.json`	JSON	Generated specifications
`seeds.json`	JSON	Seed prompts for testing
`episodes.jsonl`	JSON Lines	Full adversarial episode logs
`dpo_dataset.json`	JSON	DPO preference pairs
`context_pool.jsonl`	JSON Lines	Successful attack examples
`token_stats.jsonl`	JSON Lines	Token usage statistics

specs.json

Contains generated specifications with their rules.

Schema:

[
  {
    "id": "spec_001",
    "rules": [
      {
        "id": "R12",
        "text": "Rule description...",
        "category": "safety",
        "stage": "response"
      }
    ],
    "instruction": "Natural language instruction for this spec...",
    "metadata": {
      "created_at": "2024-01-15T10:30:00Z",
      "rule_count": 4
    }
  }
]

Fields:

Field	Type	Description
`id`	string	Unique specification identifier
`rules`	array	List of rules in this specification
`rules[].id`	string	Rule identifier
`rules[].text`	string	Rule description text
`rules[].category`	string	Rule category (safety, privacy, etc.)
`rules[].stage`	string	Rule application stage
`instruction`	string	Natural language instruction
`metadata`	object	Additional metadata

seeds.json

Contains seed prompts for adversarial testing.

Schema:

[
  {
    "id": "seed_001",
    "spec_id": "spec_001",
    "prompt": "User prompt text...",
    "target_rules": ["R12", "R15"],
    "metadata": {
      "topic": "financial_advice",
      "difficulty": "medium"
    }
  }
]

Fields:

Field	Type	Description
`id`	string	Unique seed identifier
`spec_id`	string	Associated specification ID
`prompt`	string	The seed prompt text
`target_rules`	array	Rules this seed aims to test
`metadata`	object	Additional seed metadata

episodes.jsonl

Contains complete adversarial episode logs (JSON Lines format).

Schema:

{
  "id": "episode_001",
  "seed_id": "seed_001",
  "spec_id": "spec_001",
  "spec": { ... },
  "success": true,
  "rounds": [
    {
      "round": 1,
      "role": "attacker",
      "attacker_prompt": "Attack prompt...",
      "defender_response": "Response...",
      "safety_result": {
        "violation": true,
        "severity": "high",
        "violated_rules": ["R12"],
        "reasoning": "Explanation...",
        "evidence": "Specific text..."
      },
      "quality_result": {
        "score": 0.85,
        "dimensions": {
          "relevance": 0.9,
          "completeness": 0.8,
          "clarity": 0.85
        }
      }
    }
  ],
  "attack_round": { ... },
  "violated_rules": ["R12", "R15"],
  "compliant_response": "Generated compliant response...",
  "metadata": {
    "total_rounds": 3,
    "tokens_used": 2500,
    "duration_seconds": 15.3
  }
}

Key Fields:

Field	Type	Description
`success`	bool	Whether attack succeeded
`rounds`	array	All conversation rounds
`attack_round`	object	The successful attack round (if any)
`violated_rules`	array	Rules violated in successful attack
`compliant_response`	string	Generated specification-compliant response

dpo_dataset.json

Contains DPO preference pairs for training.

Schema:

[
  {
    "prompt": "User query or instruction...",
    "chosen": "Preferred (specification-compliant) response...",
    "rejected": "Dispreferred (violating) response...",
    "metadata": {
      "spec_id": "spec_001",
      "violated_rules": ["R12"],
      "quality_score": 0.87,
      "strategy": "two_step_reframe"
    }
  }
]

Fields:

Field	Type	Description
`prompt`	string	The input prompt
`chosen`	string	Preferred response (compliant)
`rejected`	string	Dispreferred response (violating)
`metadata`	object	Additional pair metadata

Training Format:

For direct use in training, a simplified format is also saved:

[
  {
    "prompt": "...",
    "chosen": "...",
    "rejected": "..."
  }
]

context_pool.jsonl

Contains successful attack examples for strategy improvement.

Schema:

{
  "id": "ctx_001",
  "prompt": "Successful attack prompt...",
  "response": "Violating response...",
  "spec_id": "spec_001",
  "violated_rules": ["R12"],
  "embedding": [0.123, -0.456, ...],
  "diversity_score": 0.78,
  "timestamp": "2024-01-15T10:30:00Z"
}

Fields:

Field	Type	Description
`embedding`	array	Vector embedding for similarity search
`diversity_score`	float	Score indicating uniqueness (0-1)

token_stats.jsonl

Contains token usage statistics per operation.

Schema:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "operation": "red_team_round",
  "component": "attacker",
  "model": "gpt-4o",
  "prompt_tokens": 1500,
  "completion_tokens": 500,
  "total_tokens": 2000
}

File Locations

Default output structure:

output/
├── specs.json
├── seeds.json
├── episodes.jsonl
├── dpo_dataset.json
├── dpo_dataset_training.json
├── context_pool.jsonl
└── token_stats.jsonl

Custom output directory:

specalign run --output my_experiment/