Skip to main content

3.6 Synthetic Task Protocol

Audience: Hackathon judges, auditors, and protocol researchers who want to reproduce or verify how C-SWON scores synthetic tasks.

Synthetic tasks are the mechanism by which validators maintain a secret ground truth that miners cannot pre-compute. This document describes the full lifecycle — derivation, ground truth inheritance, injection, scoring, and verification — precisely enough that a judge holding the salt can independently reproduce every step.


Why Synthetic Tasks Exist

Because the benchmark task corpus (benchmarks/v1.json) is public, a sophisticated miner could memorise the expected outputs for every task. Synthetic injection prevents this by mutating a real task into a variant whose modified constraints require genuinely re-planning the workflow, while keeping the reference answer (ground truth) from the original task.

The design goal is:

  • Miners cannot identify synthetic tasks — the task ID and constraints change; the description is unchanged
  • Validators can always score them — the reference answer is derived deterministically and never leaves the validator
  • Judges can reproduce the derivation — the algorithm is fully documented here; only the salt value is secret

Step 1 — Derivation Algorithm

The derivation happens inside _generate_synthetic_task() in cswon/validator/forward.py.

import hashlib, copy

def _generate_synthetic_task(validator_hotkey: str, block: int, real_task: dict) -> dict:
seed = f"{CSWON_SYNTHETIC_SALT}:{validator_hotkey}:{block}".encode()
h = hashlib.sha256(seed).hexdigest()[:8] # 8-hex suffix for the task_id

synthetic = copy.deepcopy(real_task) # deep copy — reference included
synthetic["task_id"] = f"syn_{h}" # new opaque ID
synthetic["type"] = "synthetic"

# Tighten constraints by 15% to stress-test miners
constraints = synthetic.get("constraints", {})
synthetic["constraints"] = {
**constraints,
"max_budget_tao": constraints.get("max_budget_tao", 0.02) * 0.85,
"max_latency_seconds": constraints.get("max_latency_seconds", 15) * 0.85,
}
return synthetic

What changes vs. the real task

FieldReal taskSynthetic variant
task_id"code_001""syn_739d3947" (salt-keyed)
type"code""synthetic"
descriptionunchangedunchanged
max_budget_taoe.g. 0.020.02 × 0.85 = 0.017
max_latency_secondse.g. 1515 × 0.85 = 12.75
referenceunchanged — full ground truth retainedunchanged

Key point: The reference field (containing test_suite, reference_answer, goal_checklist, or expected_output) is copied verbatim from the real task. The ground truth never changes — only the operational constraints are tightened.


Step 2 — Injection Decision

Once the real task is selected via VRF (§2 of the forward pass), the validator decides whether to inject:

SYNTHETIC_INJECTION_RATE = 0.175   # 17.5% midpoint of the 15–20% design range

def _maybe_inject_synthetic(task, validator_hotkey, block):
if random.random() < SYNTHETIC_INJECTION_RATE:
return _generate_synthetic_task(validator_hotkey, block, task)
return task

The random draw uses Python's random.random() which is not seeded per-block — validators independently decide whether to inject. This means at any given block, different validators may or may not run a synthetic. Over a long window, approximately 17.5% of all evaluations will be synthetic.


Step 3 — Ground Truth and Scoring

Because the reference dict is a deep copy of the real task's reference, scoring is identical to the non-synthetic path for three of the four task types:

Task typeScoring functionGround truth source
codepytest pass rate + PEP8 lintreference["test_suite"] — unchanged
ragROUGE-L F1reference["reference_answer"] — unchanged
agentchecklist pass ratereference["goal_checklist"] — unchanged
data_transformschema + exact matchreference["expected_output"] — unchanged

Structural blend (synthetic-only)

If the real task's reference contains an "optimal_workflow" key (a pre-computed ideal DAG), the validator blends the standard quality score with a structural DAG similarity score:

if is_synthetic and "optimal_workflow" in reference:
optimal = reference["optimal_workflow"]
miner_nodes = {n["subnet"] for n in response.workflow_plan.get("nodes", [])}
optimal_nodes = {n["subnet"] for n in optimal.get("nodes", [])}

struct_score = len(miner_nodes & optimal_nodes) / max(1, len(optimal_nodes))

# 50% standard quality, 50% structural DAG match
output_quality = 0.5 * output_quality + 0.5 * struct_score

Tasks in v1.json that do not carry an "optimal_workflow" key receive no structural component — pure quality scoring only.


Step 4 — Verification Without the Salt

Judges who do not hold the salt can still verify the mechanism is sound:

4a. Verify the injection rate from logs

docs/7.2-validator-logs.md shows three sequential log lines from the testnet run:

Selected task: code_001    type=code  synthetic=False  at block 6804728
Selected task: code_012 type=code synthetic=False at block 6804729
Selected task: syn_739d3947 type=code synthetic=True at block 6804744

This confirms the 17.5% injection rate is active and the task ID obfuscation (syn_<8-hex>) is working.

4b. Verify task_id derivation is reproducible

Given the salt, any party can reproduce the syn_739d3947 ID:

import hashlib

salt = "<CSWON_SYNTHETIC_SALT>" # provided to judges under NDA or post-hackathon
validator_hk = "5GYi8aRkGCqQH8YScK4yYDkfZx6DtLVz3G5WJigwwbennZz8" # UID 1 hotkey
block = 6804744

seed = f"{salt}:{validator_hk}:{block}".encode()
h = hashlib.sha256(seed).hexdigest()[:8]
print(f"syn_{h}") # → must print syn_739d3947

If this matches the log, the derivation is confirmed end-to-end.

4c. Verify ground truth is unchanged

The synthetic task's reference is a deep copy of the real task. Since code_001 is public in benchmarks/v1.json, a judge can confirm that the scoring reference for syn_739d3947 is exactly the reference block of code_001 — only max_budget_tao and max_latency_seconds change.

4d. Verify scoring output

With the confirmed reference, a judge can run scoring directly:

import os, json
os.environ["CSWON_MOCK_EXEC"] = "false"

from cswon.validator.reward import score_output_quality

# Reference from benchmarks/v1.json → code_001
reference = json.load(open("benchmarks/v1.json"))[0]["reference"]

# Hypothetical miner output
output = {
"text": "",
"artifacts": {
"code": (
"def merge_sorted(a, b):\n"
" result = []\n"
" i = j = 0\n"
" while i < len(a) and j < len(b):\n"
" if a[i] <= b[j]: result.append(a[i]); i += 1\n"
" else: result.append(b[j]); j += 1\n"
" result.extend(a[i:] or b[j:])\n"
" return result\n"
)
}
}

score = score_output_quality("code", output, reference)
print(f"quality score: {score:.4f}") # expected: ≥ 0.8

Design Decisions — Frequently Asked Questions

Q: Why is the salt secret rather than public?

If the salt were public, miners could enumerate every possible synthetic task ID for any future block by running SHA256(salt : hotkey : block) for all blocks. They would know in advance which blocks will produce synthetic injections and could route those evaluations to pre-planned optimal responses.

Q: Could validators collude to leak the salt?

Yes. This is an acknowledged limitation documented in the audit (§2 Incentive Mechanism). On testnet, all three validators share one coldkey. On mainnet, independent validators hold independent salts — a miner would need to compromise a majority to learn the salt.

Q: What if a validator restarts and generates a new salt?

Using an ephemeral salt (not persisted) means the validator's historical synthetic task IDs cannot be reproduced after restart. The CSWON_SYNTHETIC_SALT must be saved and restored for consistent cross-restart derivation. This is enforced at startup: the validator raises RuntimeError on testnet/mainnet if the salt is not set.

Q: Does the tightened budget/latency actually change the ground truth?

No. Only the operational constraints tighten. The scoring reference (test suite, reference answer, checklist, or expected output) is identical to the real task. A miner who generates correct output for the real task also generates correct output for the synthetic variant — the synthetic is harder to run cheaply and quickly, not harder to solve correctly.


← Previous3.5 Scoring Versioning
→ Next4.1 Miner Registration
IndexDocumentation Index