ClassSubmitInvalid ifMinimal verification
Formula counterexampleFormula/protocol id, variable assignment, expected result, observed contradiction.It only says the formula feels wrong or needs private assumptions.Replay the assignment; check whether the stated rule, inequality, gate, or invariant fails.
Data leakageDataset row ids, split ids, hashes, duplicate evidence, future timestamp, or provenance mismatch.It requires private data, protected logs, or guesses about hidden pipelines.Recompute split/provenance checks on public artifacts; confirm whether label, future, or duplicate leakage exists.
Stronger baselineBaseline description, public code/command, same task set, same metric, seeds, and result table.The baseline changes the task, metric, data boundary, or allowed information.Run the baseline under the same scoring contract; compare effect size and confidence interval.
Repro failureCommand, environment, artifact version, expected output, observed output, and minimal failing case.It omits the command, uses private dependencies, or reports only a screenshot without replay details.Run the stated command from a clean checkout or artifact package; confirm the mismatch.
Claim-boundary overreachExact sentence, claimed scope, supporting evidence, missing evidence, and proposed narrower wording.It attacks a claim the project does not make or asks for private disclosure.Trace claim -> artifact -> metric -> limitation; decide whether wording, evidence, or boundary must change.
Credit leakMetric/reward/denominator/clean-learning field, forbidden source, public artifact or wording, and affected claim.It requires private logs or only says the system may have rewarded itself.Trace source tag -> credit field -> claim; confirm whether a no-credit or quarantine rule was bypassed.
Authority leakRoute, UI label, API field, README wording, or public claim that implies permission to act without a closed warrant.It asks for private execution details or attacks an action claim the project does not make.Trace wording -> warrant requirement -> action boundary; confirm whether the public label must be downgraded.