Advanta is currently undergoing final system calibration ahead of launch. Selected infrastructure and experiences may still be in active refinement.

advanta

HomeModule Library

Module GOV-09 sigil: Governance pillar, Strategy layer, maturity bands 1 to 3.Deterministic sigil for Module GOV-09. The Pillar geometry encodes Governance (Pillar 4); the top-right marker S encodes the Strategy layer; the baseline meter encodes maturity bands 1 to 3.SGOV-09

P4

L-E

GOV-09

AI Evaluation Harness Specification

Specifies the standardised evaluation methodology, test suites, and pass thresholds for all AI tools before deployment and during ongoing operation.

ModuleAdvancedPer-engagementProtect lensComply lensSophistication lens

Audience

AI Governance LeadIT SecurityLegal OperationsRisk and ComplianceGeneral Counsel

·

Initial evaluation 3–5 business days per tool (Tiers 0–2); 7–10 business days for Tier 3–4 including agentic supplement. Ongoing monitoring is continuous.

Executive Summary

GOV-09 defines the organisation’s AI Evaluation Harness: a standardised, evidence-based methodology for testing every AI tool before deployment and throughout its lifecycle. It specifies three mandatory evaluation domains—hallucination and factual accuracy, bias and fairness, and robustness and operational reliability—plus an Agentic Tier Supplement for Tier 3–4 tools. Each domain is scored on a 0–100 scale with clear Pass, Conditional Pass, and Fail thresholds, and hard floors for unacceptable behaviour such as excessive legal citation fabrication. The Harness produces a Model Risk Profile that populates DAT-06 Field 10 and a defensible evidence package for GOV-08 Agentic Governance Panel decisions. It establishes performance and bias baselines to support continuous monitoring, defines re-evaluation triggers, and sets DPS-grade evidence retention requirements. Without GOV-09, governance decisions would rely on vendor assertions rather than independent testing, undermining defensibility under the EU AI Act, ISO/IEC 42001, NIST AI RMF, and professional liability standards.

Metric 0 — Pre-Check

Before any GOV-09 evaluation, three gates must pass:

  1. DAT-06 Registration Initiated – The tool must have a DAT-06 AI BoM entry at least at Draft status. Unregistered tools cannot be evaluated.
  2. GOV-02 AI Use Policy Entry Exists – A GOV-02 entry must define the approved use cases. Evaluation scope is limited to these use cases.
  3. Qualified Evaluation Team Available – The team must include at least one qualified assessor with documented AI evaluation experience, confirmed by the AI Governance Lead.

Failure at any gate pauses evaluation until remediated.

Defensibility Evidence

GOV-09 operates at DPS Tier 3 (Defensible) across all three lenses. Adoption lens: stakeholder notifications of evaluation completion, training records for evaluation team members, and documentation of evaluation toolchain and test set sources — 5-year retention from evaluation date. Sophistication lens: full AI Evaluation Harness Reports including all domain scores and sub-scores, bias baseline documentation, Conditional Pass Mitigation Plans, and re-evaluation trigger documentation provide an auditable trail of every deployment decision — 5-year retention. Defensibility lens: Agentic Tier Supplement Reports, kill-switch response time test records, scope boundary enforcement verification records, escalation trigger accuracy test results, Model Risk Profile summaries as supplied to DAT-06 Field 10, and GOV-08 Panel submission packages constitute the primary technical evidence for regulatory, client, and professional liability inquiries regarding AI tool deployment decisions — 7-year retention from tool decommissioning. Evidence available within 48 hours of regulatory or legal inquiry. Annual evidence accessibility audit required.

Operational Artefacts

  • AI Evaluation Harness Scorecard

    xlsx · v2026.1

    Gated
  • Domain Test Set Templates — Hallucination, Bias, Robustness

    xlsx · v2026.1

    Gated
  • Agentic Tier Supplement Checklist

    checklist · v2026.1

    Gated
  • AI Evaluation Harness Report Template

    docx · v2026.1

    Gated
  • Bias Monitoring Baseline Record Template

    xlsx · v2026.1

    Gated

Framework Crosswalk

EU AI Act

European Union

Supports pre-deployment testing, technical documentation, and accuracy, robustness, cybersecurity, and human oversight requirements under Articles 9–15.

NIST AI Risk Management Framework

NIST

Implements the MEASURE function by providing structured, quantitative and qualitative evaluation of AI risks at deployment and in operation.

ISO/IEC 42001

ISO/IEC

Provides AI management system controls for documented performance evaluation, including accuracy, reliability, and fairness, which GOV-09 operationalises.

NIST Special Publication 1270

NIST

Informs GOV-09 bias and fairness testing methods and metrics for identifying, measuring, and mitigating AI bias.

Operational Details

Inputs

  • · DAT-06 AI Bill of Materials entry at Draft or Provisional status
  • · GOV-02 AI Use Policy entry specifying approved use cases
  • · Vendor model card, system card, and technical documentation
  • · GOV-04 vendor due diligence outputs for infrastructure and data supply chain
  • · Legal domain test sets or Legal AI Test Corpus subsets
  • · Demographic and jurisdictional test data for intended operating scope
  • · Agentic workflow design documentation for Tier 3–4 tools
  • · Kill-switch architecture and infrastructure documentation for Tier 3–4 tools

Outputs

  • · AI Evaluation Harness Report per tool and evaluation cycle
  • · Domain scores and overall Pass / Conditional Pass / Fail verdict
  • · Model Risk Profile summary formatted for DAT-06 Field 10
  • · Agentic Tier Supplement Report for Tier 3–4 tools
  • · Bias and performance baseline records for GOV-08 monitoring
  • · Red-team and adversarial testing findings
  • · Conditional Pass Mitigation Plans and completion records
  • · Evaluation evidence package for DPS retention and audits

Owner

AI Governance Lead + IT Security

Telemetry & Observability

Telemetry-ready

Key Takeaways

  • Evaluate every AI tool across hallucination, bias, and robustness domains before deployment.

  • Apply a 0–100 scoring rubric with Pass (80+), Conditional Pass (60–79), and Fail (<60) thresholds.

  • Use the Agentic Tier Supplement for all Tier 3–4 tools; any supplement failure is an overall Fail.

  • Populate DAT-06 Field 10 (Model Risk Profile) directly from GOV-09 evaluation outputs.

  • Establish bias and performance baselines at initial evaluation to power continuous monitoring.

  • Retain evaluation evidence for 5 years, and 7 years for agentic supplement and red-team records.

  • Trigger re-evaluation on major model changes, annual review, or monitoring-detected drift.

Get This Module

This module is available as part of an Advanta Advisory engagement.

Explore Advisory

Module Details

Type

Pillar

P4

Duration

Initial evaluation 3–5 business days per tool (Tiers 0–2); 7–10 business days for Tier 3–4 including agentic supplement. Ongoing monitoring is continuous.

Advisory

Yes

Access

enterprise

Maturity Bands

OperationalIntegratedOptimisedDefensible

Governance

Methodology
v2026.1

ADVISORY

Need help implementing this — and the 49 modules around it?

Advanta Advisory works with legal departments to deploy the full Legal AI OS framework — governance design, implementation roadmap, and team capability — structured around your maturity baseline.