Context-aware safety refusal messages #
When a model declines a request due to safety measures, the error message is now tailored to the actual reason rather than defaulting to a generic cyber/bio warning for every refusal.
- Refusals specifically triggered by cybersecurity or biology content still display the existing message naming those topic areas.
- All other refusals now show: "This model has measures that flagged something in this session. This sometimes happens with safe, normal conversations." followed by a note that these measures help bring Mythos-level capability in other areas.
- The model-switching notification (shown when Claude auto-falls back to a different model) also uses this same category check, so users who triggered a fallback for non-cyber/bio reasons no longer see an unexplained reference to cybersecurity or biology topics.
- Refusal telemetry now records non-cyber/non-bio categories as
"other"rather than passing through raw category strings, which normalizes internal analytics.
New generic refusal string (search for "This model has measures that flagged something in this session"); category classifier introduced (search for H === "cyber" || H === "bio"); updated fallback-switch messages reference the same classifier