Governance Scoring

7-dimension governance scoring model with L0-L4 maturity levels for AI agent assessment.

Every agent gets a composite score from 0-100 across 7 weighted dimensions, mapped to governance levels L0 through L4. The score is computed instantly at registration time and updates when agent metadata changes.

7 Scoring Dimensions

Each dimension is scored 0-100 independently, then combined into a weighted composite. Higher-weight dimensions have more influence on the final score.

Dimension	Weight	What It Measures
Identity	1.5	Name, owner, framework, version, description, authentication, channels
Permissions	1.5	Explicit permissions, tool scoping, auth, bounded tool count
Guardrails	1.3	Input/output guardrails, auth, framework-native guardrails, bounded tools
Observability	1.2	Tracing, audit logging, framework tracing, metadata
Auditability	1.0	Audit logging, observability, ownership, versioning, documentation
Compliance	1.0	Audit logs, guardrails, auth, observability, ownership, permissions
Lifecycle	0.8	Owner, version, description, framework, channels, metadata

Note: The composite score is a weighted average: each dimension's score is multiplied by its weight, summed, then divided by the total weight (8.3). This means identity and permissions together account for ~36% of the final score.

Weight rationale

The default weights are opinionated defaults, not a research-validated model. The calibration question is: "if this dimension is weak, how likely is it that the agent causes a harmful incident in production?"

identity (1.5) — if you can't tell who's calling, every other control is weakened. Anchors the model.
permissions (1.5) — tool/scope over-grant is the #1 cause of "agent did the wrong thing" incidents.
guardrails (1.3) — prevent-before-action controls stop most classes of runtime harm.
observability (1.2) — you can only respond to incidents you can see.
auditability (1.0) — post-hoc forensics; important, but only AFTER the incident.
compliance (1.0) — procedural, downstream of the above.
lifecycle (0.8) — maturity metadata; contributes to posture, doesn't itself prevent incidents.

Override with a custom weight map if your risk profile differs (e.g. highly-regulated industries may weight compliance higher).

Score-inflation risk — cross-check self-reports against repo scan

The scorer accepts self-reported booleans (hasAuth, hasGuardrails, hasObservability, hasAuditLog) at face value. An agent that lies about its capabilities scores identically to one that actually has them. To defend against inflation, cross-check caller claims against the repository:

import { scanRepoContents } from 'governance-sdk/repo-patterns';

const scan = scanRepoContents(loadedFiles); // Map<path, contents>
for (const d of scan.detections) {
  if (selfReport[d.capability] && !d.detected) {
    console.warn(
      `agent claims ${d.capability}=true but repo scan detected=false (confidence ${d.confidence.toFixed(2)})`,
    );
  }
}

Run this check in CI before accepting a new agent's registration. Mismatches are not always fraud — regex detection is heuristic, confidence threshold is 0.4 — but they warrant manual review.

Governance Levels (L0-L4)

The composite score maps directly to a governance level, aligned with the CSA Agent Trust Framework progressive autonomy model.

Level	Label	Score Range	Autonomy
L0	Unregistered	0-20	No autonomous operation
L1	Basic	21-40	Human-in-loop required
L2	Managed	41-60	Limited autonomous actions
L3	Governed	61-80	Full autonomous within policy
L4	Certified	81-100	Cross-team, regulatory-ready

Tip: Use the requireLevel() policy preset to enforce minimum governance levels. Agents below the threshold are blocked from operating autonomously.

Scoring at Registration

Scores are computed automatically when you call gov.register(). The more metadata you provide, the higher the score.

import { createGovernance } from 'governance-sdk';

const gov = createGovernance({ rules: [] });

const agent = await gov.register({
  name: 'research-agent',
  framework: 'mastra',
  owner: 'research-team',
  description: 'Autonomous research agent for market analysis',
  version: '2.1.0',
  tools: ['web_search', 'summarize', 'write_report'],
  channels: ['slack', 'email'],
  hasAuth: true,
  hasGuardrails: true,
  hasAuditLog: true,
  hasObservability: true,
});

// agent.score     → 82
// agent.level     → 4
// agent.status    → "approved"

Dimension Breakdown

Every assessment includes per-dimension scores with evidence, so you know exactly which features contribute to the score and where the gaps are.

const assessment = gov.score(agent.id);

// assessment.dimensions:
// [
//   { dimension: "identity",      score: 100, weight: 1.5, evidence: { hasName: true, hasOwner: true, ... } },
//   { dimension: "permissions",   score: 80,  weight: 1.5, evidence: { hasPermissions: false, toolCount: 3, ... } },
//   { dimension: "observability", score: 90,  weight: 1.2, evidence: { hasObservability: true, ... } },
//   { dimension: "guardrails",    score: 70,  weight: 1.3, evidence: { hasGuardrails: true, ... } },
//   { dimension: "auditability",  score: 100, weight: 1.0, evidence: { hasAuditLog: true, ... } },
//   { dimension: "compliance",    score: 75,  weight: 1.0, evidence: { hasAuditLog: true, ... } },
//   { dimension: "lifecycle",     score: 85,  weight: 0.8, evidence: { hasOwner: true, ... } },
// ]
//
// assessment.compositeScore → 85
// assessment.level          → { level: 4, label: "Certified", ... }
// assessment.recommendations → ["Agent meets all governance thresholds..."]

Fleet-Wide Scoring

Assess your entire agent fleet at once. The fleet summary includes averages, distributions by level and status, and actionable recommendations.

const fleet = gov.scoreFleet();

// fleet.summary.totalAgents    → 12
// fleet.summary.averageScore   → 67
// fleet.summary.fleetLevel     → { level: 3, label: "Governed" }
// fleet.summary.byLevel        → { 0: 0, 1: 2, 2: 3, 3: 5, 4: 2 }
// fleet.summary.byStatus       → { approved: 7, flagged: 5, ... }
// fleet.summary.highestScoring → { name: "research-agent", score: 85 }
// fleet.summary.lowestScoring  → { name: "legacy-bot", score: 28 }
// fleet.summary.recommendations → [
//   "5 agent(s) below governance threshold — review immediately",
//   "Fleet average below 60 — prioritize governance improvements"
// ]

How to Improve Your Score

Transition	Action
L0 → L1	Register the agent with a name and owner. Declare a known framework.
L1 → L2	Add tools list, enable audit logging, set a version string.
L2 → L3	Enable authentication, add guardrails, configure permissions and observability.
L3 → L4	Complete all metadata: description, channels, metadata object. Enable all security features.