P02-C1 WisdomBench measures longitudinal learning from failure rather than single-shot task capability. |
Public supporting evidence: GitHub, Hugging Face dataset, Zenodo record. |
Human-like wisdom, general deployment reliability, or that all agents learn from failure. |
Task leakage, scoring bugs, reproduction failure, or stronger baselines removing the longitudinal effect. |
WisdomBench issue template |
PCA-C1 High-risk AI action should not earn action credit until warrant and receipt closure exist. |
Public protocol and interface demo. |
Live trading profit, private product performance, or universal safety. |
The public gate allows unsafe action, gives credit without receipts, or cannot reproduce its no-go boundary. |
Proof-carrying action issue template |
CREDIT-C1 Repair intent, pretty reports, bootstrap probes, and semantic summaries must not become metric, reward, denominator, or clean-learning credit until closed evidence exists. |
Public boundary plus counterexample packet. |
That private repair queues are public, or that every private trace can be disclosed. |
A public artifact lets repair intent influence reward, denominator, clean-learning labels, or gate authority without closure. |
Credit leak packet |
AUTH-C1 Research-only, shadow, suggestion, no-go, or public demo outputs must not imply permission to act. |
Public boundary plus no-go demo and review-status route. |
Live deployment safety, private product readiness, or permission to act in any external system. |
A UI label, API field, README, or public page turns a research artifact into action authority. |
Authority leak packet |
P24-C1 Adaptive systems need relational observability: relations, constraints, control debt, and evidence half-life. |
Public protocol stage. |
A theorem covering all adaptive systems or a finished private product. |
Relation variables, control debt, or evidence half-life do not change decisions beyond scalar baselines. |
Public counterexample route |
P20-C1 Physical AI should route degraded evidence to recovery or abstention rather than direct action. |
Public bounded support; rebuild needed before stronger deployment claims. |
Detector SOTA, offensive autonomy, or real-world robot deployment performance. |
Stronger conformal, shield, or fusion baselines handle the same degraded evidence without this boundary. |
Public counterexample route |
F1-C1 Trading is used as a high-risk testbed for proof-carrying action discipline, not as a public claim of live profitability. |
Public boundary and technical boundary route. |
Live trading edge, customer readiness, private execution quality, or alpha dominance. |
Public language implies live profitability, private execution readiness, or authority beyond no-go evidence. |
Boundary issue template |