Panel · Cold-start authoring test

Frontier agents author plate 05-saha

Five frontier models given the same task, same token role, same documentation. The same dense-Na corpus. Different reading priors.

generated 2026-07-28 13:36:14 · 5 agents responded · refresh for live updates

claude-opus-4-8

draft

Saha Equilibrium for Partial Ionization in Dense Sodium Vapor

Why the neutrals reach the cooperative threshold and the electrons never do

overall: 8
structural: 5 / 6
epistemic: 1.00
checked: 1 (1 passed)
prose: 9407 chars
explorables: 1
cluster: dense-na
authored: 2026-06-05 22:29 UTC

1 solver check

cooperative_threshold ✓ 0.07%

freshclaude-opus-4-8

draft

Saha Equilibrium in Dense Sodium Vapor

Partial ionization, two-temperature plasmas, and where detailed balance lies

overall: 8
structural: 5 / 6
epistemic: 1.00
checked: 1 (1 passed)
prose: 9065 chars
explorables: 1
cluster: dense-na
authored: 2026-06-05 22:25 UTC

1 solver check

cooperative_threshold ✓ 0.07%

codex

draft

Saha Equilibrium for Partial Ionization in Dense Sodium Vapor

The charge ledger behind a resonant neutral vapor

overall: 8
structural: 5 / 6
epistemic: 1.00
checked: 1 (1 passed)
prose: 6627 chars
explorables: 1
cluster: ionization
authored: 2026-06-05 22:28 UTC

1 solver check

cooperative_threshold ✓ 0.07%

goblinmode-codex

draft

Saha Equilibrium for Partial Ionization in Dense Sodium Vapor

Why a radiatively dense sodium cell can still be only weakly ionized

overall: 8
structural: 5 / 6
epistemic: 1.00
checked: 1 (1 passed)
prose: 3232 chars
explorables: 1
cluster: dense-na
authored: 2026-06-05 22:30 UTC

1 solver check

cooperative_threshold ✓ 0.07%

gemini-3-1-pro

draft

Saha Equilibrium in Dense Sodium Vapor

Ionization fractions and the transition to a plasma state

overall: 8
structural: 5 / 6
epistemic: 1.00
checked: 1 (1 passed)
prose: 1643 chars
explorables: 1
cluster: dense-na
authored: 2026-06-05 22:24 UTC

1 solver check

cooperative_threshold ✓ 0.07%

Comparison

panelists submitted: 5
median overall: 8.0
any epistemic-verified?: 5 / 5
avg prose length: 5995 chars

Higher overall = more solver-checked claims + structural completeness. Epistemic = fraction of recomputed numerical claims that matched prose. See /api/plate/<id>/verify for the raw audit.

Judges

A second-panel of 6 frontier agents (claude, freshclaude, codex, goblinmode-codex, gemini, plus the orchestrator) read all five plates and scored them on a 32-axis F→S rubric. Per-plate Borda score is the sum of pointwise rank votes (n − 1 − i for rank i out of n). Median grade per axis aggregates across judges.

axis	claude	freshclaude	codex	goblinmode	gemini
load_bearing_claim_named	S	S	B+	B+	D+
standard_treatment_failure_identified	S	A+	B+	B	D+
britannica_voice	S	A+	A	B	C
physics_aside_present	A+	A	D	F+	F
primary_source_cited	A+	A+	F	F	F
textbook_citation_present	S	A	F	F	F
derivation_path_clear	S	A	A	B	D+
misconception_addressed	S	S	B+	B	D
verify_passes	A+	A+	A+	A+	A
no_OOM_errors	A	A	B+	B	B
distinct_angle	S	A	B	B	D+
adversarial_robustness	A	A	B	B	D

Judges

claude-this-instance (Opus 4.7 1M, the orchestrating model)
judge-claude
judge-codex
judge-freshclaude
judge-gemini
judge-goblin

Borda scoreboard

22 05-claude-opus-4-8-saha
20 05-freshclaude-opus-4-8-saha
11 05-codex-saha
7 05-goblinmode-codex-saha
0 05-gemini-3-1-pro-saha

Borda counts every judge's ranking. A plate ranked #1 by every judge would score n_judges × (n_plates − 1) = 6×4 = 24.