Ontinue’s Practical SOC Metrics Library: Measuring What Actually Matters

In security operations, you can’t improve what you don’t measure. Yet many security teams struggle with metrics that are either too abstract for decision-making or too tactical to demonstrate business value.

Ontinue has developed a comprehensive metrics library that bridges this gap. It provides security leaders with a practical framework to measure performance across the entire detection and response lifecycle—and, crucially, to quantify the impact of AI on SOC efficacy.

This is not about AI theater. These metrics aren’t designed to create impressive dashboards or justify technology purchases with vanity numbers. They’re built to measure real operational outcomes: faster detection, higher-quality investigations, reduced customer burden, and demonstrable risk reduction. Every metric ties back to one of four fundamentals of security operations:

  • Speed
  • Quality
  • Governance
  • Business impact

A Measurement Framework, Not Just Definitions

What makes Ontinue’s library uniquely practical is that it doesn’t just define what to measure—it explains how to measure it.

Each metric includes:

  • Specific calculation methods
  • The exact measurement points needed from existing systems

This transforms abstract concepts like Mean Time to Investigate into concrete data collection requirements (for example: alert_created_time, validated_time, and the timestamps in between).

The library distinguishes between:

  • Baseline measurement points – what traditional SOCs can track today
  • AI-instrumented measurement points – additional telemetry needed to understand AI’s contribution

This dual-track approach lets organizations measure AI impact through direct comparison, rather than assumption.

Built for Every Stakeholder

The Ontinue metrics library recognizes a fundamental truth: different audiences need different views of security performance.

  • Board members care about strategic outcomes and risk reduction.
  • CISOs need operational insights into service quality and continuous improvement.
  • Security managers require tactical metrics to optimize workflows and resource allocation.

The library addresses this by mapping 50+ metrics across three “altitude levels” and clearly identifying which metrics matter to which audience:

  1. Strategic
  2. Operational
  3. Tactical

This ensures boards aren’t drowning in tactical details while frontline managers get the operational visibility they need.

The Core: Detection and Response Fundamentals

At the foundation are time-based metrics that form the common language of modern security operations.

Mean Time to Detect (MTTD)

Definition: How quickly threats are identified relative to attacker activity.
Calculation: detection_time – incident_start_time

Because incident_start_time is often unknown, the library provides practical proxies (for example, file creation time for malware incidents).

AI-instrumented extension:

  • Adds ai_correlation_start and ai_correlation_end to track when AI accelerates the correlation process.

Mean Time to Investigate (MTTI)

Definition: The speed from alert creation to investigation start.
Calculation: updated_time – alert_created_time

AI-instrumented extension:

  • Adds timestamps such as ai_enrichment_start/end, ai_summary_start/end, and ai_confidence scores to measure whether AI enrichment actually speeds the investigator’s ability to act.

Mean Time to Qualified Response (MTTQR)

Definition: The full cycle from alert to state change conclusion.
Calculation: validated_time – alert_created_time

This is often where AI’s impact is most visible, as automated enrichment and summarization can compress the validation phase from hours to minutes.

Quality Metrics: Speed Without Noise

Speed without quality is just noise. The library includes robust quality metrics with clear calculation methods.

Benign Positive Rate

Calculation: benign_positives ÷ total_alerts

AI-instrumented extension:

  • Flags whether AI recommended the disposition.
  • Tracks ai_disposition and ai_confidence alongside human validation to understand where AI excels and where it struggles.

Alert-to-Incident Conversion Rate

Definition: Measures signal quality.
Calculation: validated_incidents ÷ total_alerts

A low rate suggests noise; a high rate indicates effective filtering.

Measurement points:

  • alert_id
  • incident_linked flag
  • validated status

Precision by Severity

Definition: Ensures accuracy where it matters most.
Calculation (for high severity): true_high ÷ all_high_flagged_alerts

Measurement points:

  • initial_severity
  • final_validated_outcome

This creates a feedback loop that can tune both human and AI severity assessments.

Reopen Rate

Definition: Tracks quality of closures.
Calculation: reopened_cases ÷ closed_cases

Measurement points:

  • case_closed_time
  • case_reopen_time
  • reopen_reason

AI-instrumented extension:

  • Splits reopens between AI-assisted and non‑AI‑assisted cases to identify whether automation is creating incomplete resolutions.

Customer-Facing Metrics

The framework also rigorously addresses often-overlooked customer-facing metrics.

Time to Notify (TTN)

Definition: Communication speed once an incident is validated.
Calculation: customer_notified_time – validated_time

AI-instrumented extension:

  • Adds ai_draft_start/end timestamps to quantify whether AI-generated notifications actually accelerate customer communication.

Time to Actionable Summary (TTAS)

Definition: How fast customers receive usable narratives.
Calculation: actionable_summary_sent – validated_time

Measurement points:

  • ai_summary_start/end for AI-generated drafts
  • human_edit_time to capture human revisions

This reveals whether AI produces drafts that need heavy revision or are close to “ready to send.”

Customer Involvement Rate

Definition: Tracks workload shift to the customer.
Calculation: Percentage of incidents requiring customer action.

Measurement points:

  • customer_action_required flag
  • action_count

AI-instrumented extension:

  • Shows whether AI can reduce customer burden by autonomously handling more resolution steps.

The Game-Changer: AI SOC Efficacy Metrics That Matter

What distinguishes Ontinue’s metrics library is its sophisticated approach to measuring AI’s impact on security operations.

These are not “bot optimization” metrics. The library deliberately avoids measuring AI activity for its own sake (for example, number of API calls, tokens consumed, or how “busy” the AI appears).

Instead, it focuses ruthlessly on operational outcomes:

  • Did the investigation complete faster?
  • Was the analysis more complete?
  • Did the customer receive better service?

AI Utilization Rate

Definition: Measures AI’s meaningful contribution.
Calculation: Percentage of cases where ai_action_type != "none".

Measurement points:

  • ai_assist_flag
  • ai_action_type taxonomy (enrich / summarize / disposition / recommend)

This prevents gaming the metric with trivial AI touches—only material contributions count.

AI-Attributed MTTI Reduction

Definition: The “crown jewel” for proving ROI.
Calculation: Compare MTTI for non‑AI cases vs. AI‑assisted cases within matched cohorts.

The library specifies what “matched” means:

  • Same alert types
  • Same severity levels
  • Comparable complexity

Measurement points:

  • All baseline timestamps
  • ai_assist_flag
  • Case-type matching fields

This rigor ensures any speed improvement is real, not an artifact of AI being applied only to easier cases.

AI Summary Latency & Human Edit Time

These two metrics work as a pair.

  • AI Summary Latency: ai_summary_end – ai_summary_start
  • Human Edit Time: final_sent – ai_draft_end or explicit edit_start/end timestamps

Together, they show whether AI drafts save time overall:

  • If AI takes 30 seconds but humans spend 15 minutes editing, the net benefit is questionable.
  • If AI takes 30 seconds and humans spend 2 minutes reviewing, that’s genuine productivity.

Human Validation Rate

Definition: Measures governance and human-in-the-loop oversight.
Calculation: Percentage of AI suggestions requiring human approval.

Measurement points:

  • ai_action_type
  • approval_required flag
  • approval_time
  • approved/rejected status

This creates transparency about where humans remain in the loop and whether those approval gates are calibrated appropriately.

AI Suggestion Acceptance Rate

Definition: Measures usefulness of AI recommendations.
Calculation: accepted_suggestions ÷ total_suggestions

Measurement points:

  • ai_suggestion_id
  • accepted flag

A high acceptance rate indicates that AI recommendations align with human judgment. A low rate signals the need for model tuning or highlights nuances AI is missing.

AI Error Escape Rate

Definition: Critical safety metric.
Calculation:
material_QC_failures_in_AI_cases ÷ AI_cases_sampled

Measurement points:

  • Quality-control sampling results
  • Linked ai_assist_flag
  • ai_action_type

This directly measures whether AI is making mistakes that slip through and provides data to set appropriate confidence thresholds and approval gates.

AI Confidence Distribution

Definition: Measures calibration of AI self‑assessment.

Approach:

  • Build a histogram of ai_confidence_score values.
  • Compare confidence scores against actual accuracy.

An overconfident AI (high confidence, low accuracy) is dangerous. A well‑calibrated AI (confidence aligned with accuracy) can be trusted with greater autonomy.

AI Fallback Rate

Definition: Tracks reliability of AI execution.
Calculation: Percentage of AI runs where ai_run_status equals fail or fallback.

Measurement points:

  • ai_run_status
  • fallback_reason

A high fallback rate suggests infrastructure issues or model limitations that require attention.

Customer Involvement for AI-Resolved Cases

Two important metrics extend Customer Involvement analysis:

  • Customer Involvement Rate for AI‑Resolved Cases
  • Self-Contained Resolution Rate (AI‑Assisted)

Both split the standard Customer Involvement Rate by ai_assist_flag, comparing baseline case outcomes against AI‑assisted outcomes. This reveals whether AI enables the SOC to resolve more incidents without customer data or actions—a key value proposition for managed security services.

Measuring Operations, Not Theater

The library also includes operational health metrics with the same level of rigor.

SLA Attainment for High/Critical Cases

Definition: Percentage of high/critical cases meeting SLA targets.

Measurement points:

  • sla_target values
  • Case timestamps (open, notify, contain)

AI-instrumented extension:

  • Splits metrics by AI‑assisted vs. non‑AI‑assisted cases to show whether AI helps cases meet SLAs more reliably.

Peak-Hour Performance

Definition: Compares performance during peak windows vs. baseline.

Approach:

  • Calculate 90th percentile metrics across relevant timestamps.
  • Use a peak_window_id marker.

AI-instrumented extension:

  • Adds “AI assist rate during peak” to understand whether AI helps maintain quality when volume spikes.

Containment Time

Definition: Time to containment, split by who leads the effort.

  • Provider-led containment:
    containment_complete – validated_time
  • Customer-led containment:
    handoff_sent – validated_time, plus tracking customer_complete_time separately

AI-instrumented extension:

  • automation_exec_start/end
  • human_approval_time
  • ai_recommendation_start/end
  • ai_confidence
  • handoff_quality_score

This granularity reveals where AI speeds containment and where human judgment or customer action remains the bottleneck.

Evidence Completeness Rate

Definition: Percentage of cases meeting an evidence checklist.

Measurement points (examples):

  • logs_attached
  • timeline_present
  • ioc_list

AI-instrumented extension:

  • Flags ai_evidence_summary_generated to track whether AI helps ensure investigations are thorough and well documented.

Automation “Hours Returned”

Definition: Quantifies effort reduction.

Approach:

  • Compare measured workflow durations against baseline task timing estimates.

AI-instrumented extension:

  • Uses ai_time_saved_estimate derived from actual timestamps rather than assumptions, providing defensible ROI calculations.

From Measurement to Improvement

Metrics exist to drive decisions, not just to populate dashboards. The library includes forward‑looking metrics with actionable calculations.

Detection Coverage Growth

Definition: Tracks how coverage expands over time.

Measurement points:

  • detection_added_time
  • detection_scope

This quantifies whether the SOC is expanding visibility or merely treading water.

Time to Improve

Definition: Measures continuous-improvement velocity.
Calculation: fix_deployed_time – gap_logged_time

This creates accountability for how quickly the SOC addresses identified weaknesses.

Root-Cause Recurrence

Definition: Tracks whether fixes “stick.”
Calculation:
repeat_incidents ÷ total_incidents within 30/60/90‑day windows.

Measurement points:

  • root_cause_tag
  • Incident dates

This identifies chronic issues that keep resurfacing.

Noise Growth Rate

Definition: Monitors alert fatigue.
Calculation: Month‑over‑month change in low_value_alert_count.

This acts as an early warning system to catch detection drift before it overwhelms analysts.

Making AI Transparent and Trustworthy

Ontinue’s approach makes AI measurable, not magical. Every traditional metric has AI‑instrumented measurement points that track:

  • When AI contributed
  • How long it took
  • What confidence level it assigned
  • Whether human approval was required

This transparency builds trust.

  • A CISO can show the board that AI‑assisted cases resolve 40% faster while maintaining equivalent Investigation Quality Scores (measured via qc_sample_id and qc_checklist_scores with an AI‑assisted attribute).
  • A security manager can demonstrate that AI suggestions are accepted 85% of the time for enrichment but only 60% for disposition recommendations—actionable data for tuning confidence thresholds and approval workflows.
  • When quality control reveals AI Error Escape Rate trending up as confidence thresholds increase, teams gain the feedback loop needed to balance speed and safety.

A Framework for Operational Excellence

Ontinue’s metrics library represents a maturation of security operations measurement. It doesn’t just tell you to measure “AI impact”:

  • Exactly which timestamps to capture
  • Which flags to track
  • How to calculate meaningful comparisons

The framework explicitly rejects vanity metrics. It doesn’t measure “number of AI enrichments performed” or “percentage of alerts that touched AI,” because those numbers can look impressive while delivering zero operational value.

Instead, every metric ties to:

  • Speed
  • Quality
  • Governance
  • Business impact

For security leaders evaluating AI investments, this library provides the measurement blueprint to demand operational proof from vendors. For teams already using AI, it offers the instrumentation needed to identify:

  • What’s working
  • What needs tuning
  • Where human expertise remains essential

In an era where security teams face mounting pressure to do more with less, having the right metrics isn’t just helpful—it’s essential. And having honest, measurable metrics that reveal real value rather than activity is what separates genuine operational excellence from expensive theater.

Sharing