Most evaluations ask whether an agent succeeds on the first attempt. My work asks a different question: after failure, feedback, or environmental shift, does the system become more reliable, more bounded, and more capable of choosing the right next action?
The portfolio formalizes this question across cognitive agents, embodied benchmarks, world-model evidence, social calibration, cognitive immunity, and robust perception under adverse conditions.