AISA
Methodology

Published. Validated. Auditable.

A conversational AI assessment built on behavioural science, validated by independent research, and designed for predictive validity — not multiple choice.

Why conversational assessment

Multiple-choice AI tests have three fundamental validity problems. AISA solves all of them architecturally.

01

AI can answer its own questions

Conversational evidence can’t be looked up

02

Recognition ≠ proficiency

We score demonstrated behaviour, not recall

03

No evidence trail

Every score is tied to a verbatim quote

Full analysis: How conversational evidence prevents cheating

Independently validated

In 2026, Anthropic published the AI Fluency Index — the largest empirical study of AI fluency to date, analysing 9,830 conversations. AISA's rubric was developed independently.

93%

of Anthropic's observable fluency markers covered by AISA

9,830

conversations analysed in Anthropic's study

+4

additional dimensions AISA measures that Anthropic couldn't

The four dimensions Anthropic couldn't measure through chat logs alone — AI Fundamentals, Tool Landscape, Domain Application, and Safety — require a structured conversational assessment to surface. AISA's dual-track architecture makes this possible.

The published rubric

5 dimensions, 11 criteria, published and auditable. Every score has a behavioural anchor — you can see exactly what a 5 looks like versus a 7 versus a 9.

Dimension weights

Workflow & Application25%
Prompting & Comms23%
Critical Thinking22%
Technical Understanding20%
Safety10%

Score scale (1–10)

1NoviceUnaware this is a skill; no intentional practice.
3DevelopingAware but inconsistent; reactive, not deliberate.
5CompetentFunctional approach with repeatable techniques; not yet internalized.
7ProficientConsistent and intentional; understands why, not just how.
10ExpertPrinciple-level mastery; pushes the craft forward, influences others.

5 dimensions · 11 criteria

Prompting & Comms
  • P1 Prompt Design
  • P2 Iterative Dialogue
  • P3 Context & Memory Management
Critical Thinking
  • T1 Output Evaluation
  • T2 Limitation Awareness
Technical Understanding
  • U1 AI Fundamentals
  • U2 Tool Landscape
Workflow & Application
  • W1 Workflow Integration
  • W2 Task Decomposition
  • W3 Domain Application
Safety & Responsibility
  • S1 AI Safety & Responsibility

Full rubric with behavioural anchors: The AISA Rubric — 5 Dimensions of AI Proficiency

How scoring works

Three layers of scoring, each adding reliability. No single model has the final say.

1

Evidence extraction

Track B scores every candidate message against the rubric, extracting verbatim quotes with confidence levels. Evidence is classified: demonstrated > described > managed.

2

Confidence-weighted aggregation

Multiple evidence pieces per criterion are blended by confidence (high/medium/low). Peak performance is weighted alongside sustained performance — a single brilliant answer counts, but consistency matters more.

3

Holistic calibration pass

A more capable model (Claude Opus) reviews the full transcript and adjusts scores that the per-turn evaluator got wrong. It must provide disconfirming evidence for every adjustment.

See expert scoring in practice: What a Score 9 looks like · Full validity audit

The dual-track architecture

Separating conversation from evaluation eliminates the bias that occurs when a single system asks questions and judges answers.

Track A

Conversationalist

The only AI the candidate sees. Warm, adaptive, peer-level. Gets steering notes from the evaluator but never sees scores. Natural dialogue, not a checklist.

Track B

Evaluator

Runs silently on every message. Evidence, scores, steering notes — structured data only. Behavioural anchors (1–10) keep scoring consistent and explainable.

Message
Integrity flags
Track B
Track A
Persist

Track B evaluates before Track A replies — the next response already reflects the latest steering.

Technical deep-dive: Inside AISA's assessment architecture

What the scores reveal

Predictive validity means scores produce real, differentiating insights — not just a number. Here's what 400+ assessments have surfaced.

412

completed assessments

52

average fluency score

0–98

real score range

1.7%

reach Expert tier

Why scores can't be faked

If scores can be gamed, they have no predictive validity. AISA solves this at the architecture level, not with proctoring.

Burst detection

Characters appearing in <50ms windows signal paste, not typing. Human keystrokes are 50–300ms apart.

Style analysis

Baseline vocabulary and formality shift mid-session. Sudden corporate prose after casual answers = flagged.

AI fingerprinting

Five-metric system detecting AI-generated text: correction rate, edit density, message length, formality, uniformity.

Consistency verification

The same topic probed from multiple angles across the session. Rehearsed frameworks crumble under varied questioning.

Typing metrics weigh 70%, style and AI signals 30%. Flags appear in the report with full transparency — integrity is an architectural property, not a policing function.

The 10 AI personas

Beyond the score: a profile of how someone interacts with AI, based on the shape of their dimension scores — not just the composite number. Two people can score identically and receive different personas.

The OracleUnderstands AI at its core — not just how to use it.

Deep technical mastery of AI itself. Understands or builds AI models, works with ML and LLMs at a technical level. Elite critical analysis comes from understanding the technology at its foundation, not just from using it.

The ArchitectBuilds highly complex integrated systems using AI.

Designs and builds sophisticated multi-system AI integrations at scale. Goes beyond creating individual tools to engineering production-grade architectures where AI components interact with each other and non-AI systems.

The BuilderHas actually built something with AI.

Personally created complex, useful tools, workflows, or products using AI — whether for their own use, their company, or commercially. Developed deep practical understanding through hands-on building that goes beyond secondhand knowledge.

The ConductorOrchestrates AI across the workflow, not just within it.

Uses AI heavily across complex workflows, automations, and multi-tool pipelines. Understands AI limitations well and knows which tool integrates with which. Orchestrates and configures sophisticated setups, but typically works with what's available rather than building novel tools from scratch.

The TacticianGets things done with AI — fast and reliably.

Productive with mainstream AI tools and uses them well within established workflows. Communicates clearly with AI and consistently gets quality output, but typically hasn't pushed into the cutting edge of AI tooling or complex integrations.

The EnthusiastCurious, capable, and picking up speed.

Actively building AI skill across multiple dimensions. Tries new tools, refines prompts, and is beginning to develop repeatable patterns — the trajectory is strong.

The ScepticQuestions everything — the output, the tool, the hype.

Approaches AI with critical caution. May under-use AI in practice, but the verification instinct and risk awareness form a strong foundation that many frequent users lack.

The Copy-PasterUses AI regularly — takes the output at face value.

Relies on AI for day-to-day output but with limited iteration or verification. Gets value, but leaves quality and safety gains on the table by accepting first-pass results.

The DabblerTries things out — hasn't locked in a rhythm yet.

Experiments with AI intermittently: a prompt here, a quick question there. Nothing sustained, but a willingness to explore that many skip entirely.

The BystanderAI is on the radar, but not in the routine.

Has heard of AI tools but hasn't meaningfully engaged — the assessment itself may be the most direct interaction to date. Awareness exists; habit does not.

Personas reflect interaction style — usage patterns, habits, and mindset. They correlate with the score but don't directly map to it.

Full profiles: The 10 AI Persona Types

Built on assessment science

AISA is built against the same standards that govern clinical and occupational assessments worldwide. We published a transparent self-audit.

Validity

5/5

Scores measure what they claim to measure. Conversational evidence, not recall. Published rubric, not a black box.

Reliability

4.5/5

Consistent results across sessions. Three-layer scoring, confidence weighting, and calibration pass reduce noise.

Fairness

4.5/5

Role-adaptive questioning, multilingual support, no demographic proxies in scoring.

Transparency

5/5

Published rubric, evidence-linked scores, auditable methodology. Every score comes with the quote that produced it.

Standards referenced: AERA/APA/NCME (2014) · ISO 10667 · Schmidt & Hunter (1998) · Messick (1989) · Sackett et al. (2022)

Full self-audit with ratings and evidence: Inside AISA's Assessment Framework

See what the report looks like

Every assessment produces a detailed report with per-criterion scores, evidence quotes, a persona profile, and personalised development guidance.

The Science Behind AISA

Metropolitan PoliceHarvard UniversityCrowdboticsEuropean School of Economics

In 2026, Anthropic published the AI Fluency Index — the largest empirical study of AI fluency to date, analysing 9,830 conversations. AISA covers 93% of the behaviours Anthropic identified as markers of AI fluency and goes even deeper with 4 additional dimensions.Read our white paper: Anthropic's AI Fluency Study & AISA

AISA's framework is developed by a team with deep roots in tech, behavioural science, and AI product leadership — the rubric is informed by backgrounds spanning the Metropolitan Police, Harvard, Crowdbotics (Silicon Valley), and the European School of Economics.

The Science Behind AISA

Metropolitan PoliceHarvard UniversityCrowdboticsEuropean School of Economics

In 2026, Anthropic published the AI Fluency Index — the largest empirical study of AI fluency to date, analysing 9,830 conversations. AISA covers 93% of the behaviours Anthropic identified as markers of AI fluency and goes even deeper with 4 additional dimensions.Read our white paper: Anthropic's AI Fluency Study & AISA

AISA's framework is developed by a team with deep roots in tech, behavioural science, and AI product leadership — the rubric is informed by backgrounds spanning the Metropolitan Police, Harvard, Crowdbotics (Silicon Valley), and the European School of Economics.