3.2 Scoring Formula v1.0.0
All scoring is deterministic. No external model is called. Every validator running the same benchmark task produces identical scores for identical miner outputs. This is auditable by any participant.
Low evaluation variance: the rolling 100-task equal-weight window ensures that scoring noise does not blunt the incentive curve around the top miner.
Composite Score
Every workflow a miner submits is executed by validators. A composite score S ∈ [0, 1] is computed across four dimensions:
S = 0.50 × S_success + 0.25 × S_cost + 0.15 × S_latency + 0.10 × S_reliability
Sub-Dimensions
S_success = output_quality_score × completion_ratio
where completion_ratio = steps_completed / total_steps_in_dag
S_cost = max(0, 1 − actual_tao / max_budget_tao)
only scored when S_success > 0.7; else S_cost = 0
S_latency = max(0, 1 − actual_seconds / max_latency_seconds)
only scored when S_success > 0.7; else S_latency = 0
S_reliability = min(1.0, max(0, 1 − (unplanned_retries × 0.10
+ timeouts × 0.20
+ hard_failures × 0.50)))
applied regardless of success gate
Where:
unplanned_retries= max(0, actual_retries − declared_retry_budget)declared_retry_budget= sum of retry_count values in error_handling (0 if absent)timeouts= step executions that exceeded declared timeout_secondshard_failures= steps that terminated after exhausting the declared retry budget
Performance Dimensions
| Dimension | Weight | Formula |
|---|---|---|
| Task Success | 50% | output_quality × completion_ratio |
| Cost Efficiency | 25% | max(0, 1 − actual/budget) — gated at S_success > 0.7 |
| Latency | 15% | max(0, 1 − actual_s/max_s) — gated at S_success > 0.7 |
| Reliability | 10% | min(1.0, max(0, 1 − unplanned×0.1 − timeouts×0.2 − failures×0.5)) |
Key Design Decisions
Success-first gating: A workflow that fails the task cannot be considered "good" regardless of how cheap or fast it is. Cost and latency are only scored when S_success > 0.7.
Declared retries are free: A miner who writes "retry_count": 2 in error_handling is declaring defensive intent. Only unplanned retries — attempts beyond the declared budget — are penalized.
Partial DAG completion: If a 4-step workflow completes 3 steps before a hard failure, completion_ratio = 0.75. This prevents miners from submitting single-step workflows for multi-step tasks.
Score Aggregation
Scores are aggregated over a rolling 100-task window with equal weight per task. No exponential decay — a fixed equal-weight window is simpler to audit, harder to time-exploit. The window is capped at 15% max weight per miner before submission.
Navigation
| ← Previous | 3.1 Emission Structure |
| → Next | 3.3 Output Quality Scoring |
| Index | Documentation Index |