Output Formats
Reference for all SpecAlign output file formats.
Overview
SpecAlign generates several output files during execution:
File |
Format |
Description |
|---|---|---|
|
JSON |
Generated specifications |
|
JSON |
Seed prompts for testing |
|
JSON Lines |
Full adversarial episode logs |
|
JSON |
DPO preference pairs |
|
JSON Lines |
Successful attack examples |
|
JSON Lines |
Token usage statistics |
specs.json
Contains generated specifications with their rules.
Schema:
[
{
"id": "spec_001",
"rules": [
{
"id": "R12",
"text": "Rule description...",
"category": "safety",
"stage": "response"
}
],
"instruction": "Natural language instruction for this spec...",
"metadata": {
"created_at": "2024-01-15T10:30:00Z",
"rule_count": 4
}
}
]
Fields:
Field |
Type |
Description |
|---|---|---|
|
string |
Unique specification identifier |
|
array |
List of rules in this specification |
|
string |
Rule identifier |
|
string |
Rule description text |
|
string |
Rule category (safety, privacy, etc.) |
|
string |
Rule application stage |
|
string |
Natural language instruction |
|
object |
Additional metadata |
seeds.json
Contains seed prompts for adversarial testing.
Schema:
[
{
"id": "seed_001",
"spec_id": "spec_001",
"prompt": "User prompt text...",
"target_rules": ["R12", "R15"],
"metadata": {
"topic": "financial_advice",
"difficulty": "medium"
}
}
]
Fields:
Field |
Type |
Description |
|---|---|---|
|
string |
Unique seed identifier |
|
string |
Associated specification ID |
|
string |
The seed prompt text |
|
array |
Rules this seed aims to test |
|
object |
Additional seed metadata |
episodes.jsonl
Contains complete adversarial episode logs (JSON Lines format).
Schema:
{
"id": "episode_001",
"seed_id": "seed_001",
"spec_id": "spec_001",
"spec": { ... },
"success": true,
"rounds": [
{
"round": 1,
"role": "attacker",
"attacker_prompt": "Attack prompt...",
"defender_response": "Response...",
"safety_result": {
"violation": true,
"severity": "high",
"violated_rules": ["R12"],
"reasoning": "Explanation...",
"evidence": "Specific text..."
},
"quality_result": {
"score": 0.85,
"dimensions": {
"relevance": 0.9,
"completeness": 0.8,
"clarity": 0.85
}
}
}
],
"attack_round": { ... },
"violated_rules": ["R12", "R15"],
"compliant_response": "Generated compliant response...",
"metadata": {
"total_rounds": 3,
"tokens_used": 2500,
"duration_seconds": 15.3
}
}
Key Fields:
Field |
Type |
Description |
|---|---|---|
|
bool |
Whether attack succeeded |
|
array |
All conversation rounds |
|
object |
The successful attack round (if any) |
|
array |
Rules violated in successful attack |
|
string |
Generated specification-compliant response |
dpo_dataset.json
Contains DPO preference pairs for training.
Schema:
[
{
"prompt": "User query or instruction...",
"chosen": "Preferred (specification-compliant) response...",
"rejected": "Dispreferred (violating) response...",
"metadata": {
"spec_id": "spec_001",
"violated_rules": ["R12"],
"quality_score": 0.87,
"strategy": "two_step_reframe"
}
}
]
Fields:
Field |
Type |
Description |
|---|---|---|
|
string |
The input prompt |
|
string |
Preferred response (compliant) |
|
string |
Dispreferred response (violating) |
|
object |
Additional pair metadata |
Training Format:
For direct use in training, a simplified format is also saved:
[
{
"prompt": "...",
"chosen": "...",
"rejected": "..."
}
]
context_pool.jsonl
Contains successful attack examples for strategy improvement.
Schema:
{
"id": "ctx_001",
"prompt": "Successful attack prompt...",
"response": "Violating response...",
"spec_id": "spec_001",
"violated_rules": ["R12"],
"embedding": [0.123, -0.456, ...],
"diversity_score": 0.78,
"timestamp": "2024-01-15T10:30:00Z"
}
Fields:
Field |
Type |
Description |
|---|---|---|
|
array |
Vector embedding for similarity search |
|
float |
Score indicating uniqueness (0-1) |
token_stats.jsonl
Contains token usage statistics per operation.
Schema:
{
"timestamp": "2024-01-15T10:30:00Z",
"operation": "red_team_round",
"component": "attacker",
"model": "gpt-4o",
"prompt_tokens": 1500,
"completion_tokens": 500,
"total_tokens": 2000
}
File Locations
Default output structure:
output/
├── specs.json
├── seeds.json
├── episodes.jsonl
├── dpo_dataset.json
├── dpo_dataset_training.json
├── context_pool.jsonl
└── token_stats.jsonl
Custom output directory:
specalign run --output my_experiment/