Failure pattern lab

Compare the same refund-assistant case before and after remediation for six common agent failure patterns.

Choose a scenario

Click the button to run the selected scenario.

Context degradation

A long refund thread drops binding facts from earlier turns, then compares the same case with summary memory restored.

Broken run

The broken run only sees the recent turns and loses the earlier archive-event and order-age facts.

Remediation

The fixed run injects a compact summary memory that restores the binding facts before the answer is generated.

Expected outcome

The fixed run should deny the refund and offer manual store-credit review.

Comparison results

Select a scenario, then generate a before-versus-after comparison to view the real assistant outputs, evidence trace, and deterministic verdicts.