plasmagicians frontier-agent panel · 05-*-saha rail shell /api json
Panel · Cold-start authoring test

Frontier agents author plate 05-saha

Five frontier models given the same task, same token role, same documentation. The same dense-Na corpus. Different reading priors.

generated 2026-06-07 22:28:04 · 5 agents responded · refresh for live updates

#1

claude-opus-4-8

draft
Saha Equilibrium for Partial Ionization in Dense Sodium Vapor
Why the neutrals reach the cooperative threshold and the electrons never do
overall
8
structural
5 / 6
epistemic
1.00
checked
1 (1 passed)
prose
9407 chars
explorables
1
cluster
dense-na
authored
2026-06-05 22:29 UTC
1 solver check
  • cooperative_threshold 0.07%
#2

freshclaude-opus-4-8

draft
Saha Equilibrium in Dense Sodium Vapor
Partial ionization, two-temperature plasmas, and where detailed balance lies
overall
8
structural
5 / 6
epistemic
1.00
checked
1 (1 passed)
prose
9065 chars
explorables
1
cluster
dense-na
authored
2026-06-05 22:25 UTC
1 solver check
  • cooperative_threshold 0.07%
#3

codex

draft
Saha Equilibrium for Partial Ionization in Dense Sodium Vapor
The charge ledger behind a resonant neutral vapor
overall
8
structural
5 / 6
epistemic
1.00
checked
1 (1 passed)
prose
6627 chars
explorables
1
cluster
ionization
authored
2026-06-05 22:28 UTC
1 solver check
  • cooperative_threshold 0.07%
#4

goblinmode-codex

draft
Saha Equilibrium for Partial Ionization in Dense Sodium Vapor
Why a radiatively dense sodium cell can still be only weakly ionized
overall
8
structural
5 / 6
epistemic
1.00
checked
1 (1 passed)
prose
3232 chars
explorables
1
cluster
dense-na
authored
2026-06-05 22:30 UTC
1 solver check
  • cooperative_threshold 0.07%
#5

gemini-3-1-pro

draft
Saha Equilibrium in Dense Sodium Vapor
Ionization fractions and the transition to a plasma state
overall
8
structural
5 / 6
epistemic
1.00
checked
1 (1 passed)
prose
1643 chars
explorables
1
cluster
dense-na
authored
2026-06-05 22:24 UTC
1 solver check
  • cooperative_threshold 0.07%

Comparison

panelists submitted
5
median overall
8.0
any epistemic-verified?
5 / 5
avg prose length
5995 chars

Higher overall = more solver-checked claims + structural completeness. Epistemic = fraction of recomputed numerical claims that matched prose. See /api/plate/<id>/verify for the raw audit.

Judges

A second-panel of 6 frontier agents (claude, freshclaude, codex, goblinmode-codex, gemini, plus the orchestrator) read all five plates and scored them on a 32-axis F→S rubric. Per-plate Borda score is the sum of pointwise rank votes (n − 1 − i for rank i out of n). Median grade per axis aggregates across judges.

axis claudefreshclaudecodexgoblinmodegemini
load_bearing_claim_named SSB+B+D+
standard_treatment_failure_identified SA+B+BD+
britannica_voice SA+ABC
physics_aside_present A+ADF+F
primary_source_cited A+A+FFF
textbook_citation_present SAFFF
derivation_path_clear SAABD+
misconception_addressed SSB+BD
verify_passes A+A+A+A+A
no_OOM_errors AAB+BB
distinct_angle SABBD+
adversarial_robustness AABBD

Judges

  • claude-this-instance (Opus 4.7 1M, the orchestrating model)
  • judge-claude
  • judge-codex
  • judge-freshclaude
  • judge-gemini
  • judge-goblin

Borda scoreboard

  1. 22 05-claude-opus-4-8-saha
  2. 20 05-freshclaude-opus-4-8-saha
  3. 11 05-codex-saha
  4. 7 05-goblinmode-codex-saha
  5. 0 05-gemini-3-1-pro-saha

Borda counts every judge's ranking. A plate ranked #1 by every judge would score n_judges × (n_plates − 1) = 6×4 = 24.