Methodology

Published. Validated. Auditable.

A conversational AI assessment built on behavioural science, validated by independent research, and designed for predictive validity — not multiple choice.

Why conversational assessment

Multiple-choice AI tests have three fundamental validity problems. AISA solves all of them architecturally.

Adaptive to role and depth

The conversation meets each candidate where they are — a developer and a PM get the same rubric but different questions

Keeps pace with the field

AI changes weekly. Question banks go stale. A conversation about how you actually use AI is always current.

Every score has an evidence trail

A verbatim quote justifying every score. An MCQ gives you a letter choice with zero context.

Better candidate experience

A conversation feels like talking to a smart colleague, not sitting a test. Higher engagement produces richer signal.

Full analysis: How conversational evidence prevents cheating

Independently validated

AISA's rubric was developed independently and has been cross-referenced against two major external frameworks — Anthropic's empirical research and the U.S. Department of Labor's official AI Literacy Framework.

Anthropic AI Fluency Index

93%

of Anthropic's observable fluency markers covered by AISA

~10,000

conversations analysed in Anthropic's study

additional dimensions AISA measures that Anthropic couldn't

The four dimensions Anthropic couldn't measure through chat logs alone — AI Fundamentals, Tool Landscape, Domain Application, and Safety — require a structured conversational assessment to surface.

U.S. Department of Labor AI Literacy Framework

100%

of DOL sub-competencies covered by AISA

25/25

sub-competencies across 5 content areas

7/7

delivery principles embodied by AISA's methodology

TEN 07-25 defines what every American worker needs to know about AI. AISA decomposes the DOL's 5 content areas into 11 calibrated criteria, each scored 1–10 — the framework defines the standard, the assessment measures it.

The published rubric

5 dimensions, 11 criteria, published and auditable. Every score has a behavioural anchor — you can see exactly what a 5 looks like versus a 7 versus a 9.

Score scale (1–10)

1NoviceUnaware this is a skill; no intentional practice.

3DevelopingAware but inconsistent; reactive, not deliberate.

5CompetentFunctional approach with repeatable techniques; not yet internalized.

7ProficientConsistent and intentional; understands why, not just how.

10ExpertPrinciple-level mastery; pushes the craft forward, influences others.

5 dimensions · 11 criteria

Prompting & Comms

P1 — Prompt Design
P2 — Iterative Dialogue
P3 — Context & Memory Management

Critical Thinking

T1 — Output Evaluation
T2 — Limitation Awareness

Technical Understanding

U1 — AI Fundamentals
U2 — Tool Landscape

Workflow & Application

W1 — Workflow Integration
W2 — Task Decomposition
W3 — Domain Application

Safety & Responsibility

S1 — AI Safety & Responsibility

Full rubric with behavioural anchors: The AISA Rubric — 5 Dimensions of AI Proficiency

How scoring works

Three layers of scoring, each adding reliability. No single model has the final say.

Evidence extraction

Track B scores every candidate message against the rubric, extracting verbatim quotes with confidence levels. Evidence is classified: demonstrated > described > managed.

Confidence-weighted aggregation

Multiple evidence pieces per criterion are blended by confidence (high/medium/low). Peak performance is weighted alongside sustained performance — a single brilliant answer counts, but consistency matters more.

Holistic calibration pass

A more capable model (Claude Opus) reviews the full transcript and adjusts scores that the per-turn evaluator got wrong. It must provide disconfirming evidence for every adjustment.

See expert scoring in practice: What a Score 9 looks like · Full validity audit

The dual-track architecture

Separating conversation from evaluation eliminates the bias that occurs when a single system asks questions and judges answers.

Track A

Conversationalist

The only AI the candidate sees. Warm, adaptive, peer-level. Gets steering notes from the evaluator but never sees scores. Natural dialogue, not a checklist.

Track B

Evaluator

Runs silently on every message. Evidence, scores, steering notes — structured data only. Behavioural anchors (1–10) keep scoring consistent and explainable.

Message

Integrity flags

Track B

Track A

Persist

Track B evaluates before Track A replies — the next response already reflects the latest steering.

Technical deep-dive: Inside AISA's assessment architecture

What the scores reveal

Predictive validity means scores produce real, differentiating insights — not just a number. Here's what 1,000+ assessments have surfaced.

1,017

completed assessments

average fluency score

0–98

real score range

1.7%

reach Expert tier

Workflow Teardown

How High Scorers Navigate Multi-Step Workflows

What a 7+ looks like in practice vs a 4-5.

Workflow Teardown

Recovering from Bad AI Output

How proficient candidates course-correct vs. mediocre ones.

Role Insights

What 400 Assessments Reveal About Developers

Strong on prompting, weaker than expected on technical understanding.

Dimension Deep-Dive

Safety: The 10% That Reveals the Most

Why the lowest-weighted dimension is the most diagnostic.

Dimension Deep-Dive

Workflow: Separating Talkers from Operators

The highest-weighted dimension and what it measures.

Workflow Teardown

Why Task Decomposition Separates Experts from Novices

W2 scoring applied to real candidate sessions.

Why scores can't be faked

If scores can be gamed, they have no predictive validity. AISA solves this at the architecture level, not with proctoring.

Burst detection

Characters appearing in <50ms windows signal paste, not typing. Human keystrokes are 50–300ms apart.

Style analysis

Baseline vocabulary and formality shift mid-session. Sudden corporate prose after casual answers = flagged.

AI fingerprinting

Five-metric system detecting AI-generated text: correction rate, edit density, message length, formality, uniformity.

Consistency verification

The same topic probed from multiple angles across the session. Rehearsed frameworks crumble under varied questioning.

Typing metrics weigh 70%, style and AI signals 30%. Flags appear in the report with full transparency — integrity is an architectural property, not a policing function.

The 10 AI personas

Beyond the score: a profile of how someone interacts with AI, based on the shape of their dimension scores — not just the composite number. Two people can score identically and receive different personas.

The OracleUnderstands AI at its core — not just how to use it.

Deep technical mastery of AI itself. Understands or builds AI models, works with ML and LLMs at a technical level. Elite critical analysis comes from understanding the technology at its foundation, not just from using it.

The ArchitectBuilds highly complex integrated systems using AI.

Designs and builds sophisticated multi-system AI integrations at scale. Goes beyond creating individual tools to engineering production-grade architectures where AI components interact with each other and non-AI systems.

The BuilderHas actually built something with AI.

Personally created complex, useful tools, workflows, or products using AI — whether for their own use, their company, or commercially. Developed deep practical understanding through hands-on building that goes beyond secondhand knowledge.

The ConductorOrchestrates AI across the workflow, not just within it.

Uses AI heavily across complex workflows, automations, and multi-tool pipelines. Understands AI limitations well and knows which tool integrates with which. Orchestrates and configures sophisticated setups, but typically works with what's available rather than building novel tools from scratch.

The TacticianGets things done with AI — fast and reliably.

Productive with mainstream AI tools and uses them well within established workflows. Communicates clearly with AI and consistently gets quality output, but typically hasn't pushed into the cutting edge of AI tooling or complex integrations.

The EnthusiastCurious, capable, and picking up speed.

Actively building AI skill across multiple dimensions. Tries new tools, refines prompts, and is beginning to develop repeatable patterns — the trajectory is strong.

The ScepticQuestions everything — the output, the tool, the hype.

Approaches AI with critical caution. May under-use AI in practice, but the verification instinct and risk awareness form a strong foundation that many frequent users lack.

The Copy-PasterUses AI regularly — takes the output at face value.

Relies on AI for day-to-day output but with limited iteration or verification. Gets value, but leaves quality and safety gains on the table by accepting first-pass results.

The DabblerTries things out — hasn't locked in a rhythm yet.

Experiments with AI intermittently: a prompt here, a quick question there. Nothing sustained, but a willingness to explore that many skip entirely.

The BystanderAI is on the radar, but not in the routine.

Has heard of AI tools but hasn't meaningfully engaged — the assessment itself may be the most direct interaction to date. Awareness exists; habit does not.

Personas reflect interaction style — usage patterns, habits, and mindset. They correlate with the score but don't directly map to it.

Full profiles: The 10 AI Persona Types

Built on assessment science

AISA is built against the same standards that govern clinical and occupational assessments worldwide. We published a transparent self-audit.

Validity

5/5

Scores measure what they claim to measure. Conversational evidence, not recall. Published rubric, not a black box.

Reliability

4.5/5

Consistent results across sessions. Three-layer scoring, confidence weighting, and calibration pass reduce noise.

Fairness

4.5/5

Role-adaptive questioning, multilingual support, no demographic proxies in scoring.

Transparency

5/5

Published rubric, evidence-linked scores, auditable methodology. Every score comes with the quote that produced it.

Standards referenced: AERA/APA/NCME (2014) · ISO 10667 · Schmidt & Hunter (1998) · Messick (1989) · Sackett et al. (2022)

Full self-audit with ratings and evidence: Inside AISA's Assessment Framework

See what the report looks like

Every assessment produces a detailed report with per-criterion scores, evidence quotes, a persona profile, and personalised development guidance.

Read the full rubric·Hiring guide·L&D implementation guide

The Science Behind AISA

In 2026, Anthropic published the AI Fluency Index — the largest empirical study of AI fluency to date, analysing nearly 10,000 conversations. AISA covers 93% of the behaviours Anthropic identified as markers of AI fluency and goes even deeper with 4 additional dimensions. The U.S. Department of Labor's AI Literacy Framework (TEN 07-25) defines what every worker needs to know about AI — AISA covers 100% of its 25 sub-competencies.Read our analysis: Anthropic's AI Fluency Study & AISA · DOL AI Literacy Framework & AISA

Featured on

Product Hunt

There's an AI for That