Platform

Safety Dashboard

Monitor safety evaluations across your AI platform. Review flagged interactions, track safety score trends, and maintain compliance with emotional AI governance requirements.

What Safety Does

The Safety dashboard aggregates ESAA evaluations from your API calls into actionable views. Every time you call /v1/esaa/evaluate, the resulting attestation is stored and surfaced here.

Key insight: The dashboard shows computed signals only — never the original interaction content. You can share dashboard access with compliance teams without exposing conversation text.

Dashboard Views

The Safety section includes four specialized views, each accessible from the console sidebar.

Overview

Summary metrics: total evaluations, average safety score, flag rate, and outcome distribution. Includes score histogram, timeseries chart, and compliance status panel.

Path: /platform/safety

Evaluations

Paginated table of all ESAA evaluations. Filter by outcome (pass, advisory, flag, critical) and time range. Click any row to see full evaluation details including safety signals, triggers, and recommended actions.

Path: /platform/safety/evaluations

Review Queue

Evaluations with "flag" or "critical" outcomes that require human review. Complete the review workflow: confirm the concern, mark as false positive, or escalate further.

Path: /platform/safety/review

Trends

Trajectory analysis over time. See safety score trends by day, compare flag rates across platforms and models, and view trajectory distribution (improving, stable, degrading, acute).

Path: /platform/safety/trends

Review Workflow

When an evaluation is flagged or critical, it enters the review queue. The workflow supports four review outcomes:

Confirmed

The safety concern is valid. The evaluation correctly identified a problem.

False Positive

The evaluation was incorrectly flagged. Select a category (benign context, therapeutic intent, creative writing, etc.) to help calibrate the system.

Inconclusive

Cannot determine from available signals. May require additional context.

Escalated Further

The concern is severe enough to warrant escalation beyond standard review.

Key Metrics

Understanding the metrics displayed on the Safety dashboard.

Safety Score

0.0 to 1.0 scale

Composite score where higher = safer. ≥0.80 passes, <0.40 is critical.

Flag Rate

% of evaluations

Percentage of evaluations resulting in "flag" or "critical" outcome.

Trajectory

Session-level trend

Direction of safety scores within a session: improving, stable, degrading, or acute.

Review Queue

Count

Number of flagged evaluations awaiting human review.

Outcome Levels

OutcomeScoreAction Required
pass≥0.80No action — interaction within safe boundaries
advisory0.60-0.79Log for monitoring — minor concerns noted
flag0.40-0.59Enters review queue — human review recommended
critical<0.40Immediate attention — consider suspending interaction

Compliance Reporting

The Safety dashboard includes compliance indicators for emotional AI governance.

EU AI Act Article 14

Human oversight requirements. The review queue workflow satisfies the requirement for human-in-the-loop review of high-risk AI decisions.

Cryptographic Attestation

Every evaluation includes a W3C Data Integrity Proof. Artifacts can be independently verified without trusting the dashboard.

Audit Trail

All evaluations and review actions are timestamped and immutable. Export-ready for regulatory audits.

Related