What We Built and Why — Inside AISA's Assessment Architecture

A look inside AISA's dual-track conversational assessment: how the facilitator and evaluator work together, why conversations beat tests, and what the evidence trail actually shows.

By AISA Team··5 min read
architecturemethodologyassessmentscoring

Why We Built a Conversation, Not a Test

When we set out to assess AI proficiency, the first question was format. Multiple-choice tests exist for everything — why not AI skills?

The answer became obvious within a week of prototyping. We wrote 50 MCQ questions about AI concepts: prompt engineering techniques, model architecture basics, tool selection principles. Then we tested them on a small group that included both genuine AI practitioners and people who had spent 30 minutes reading blog posts about AI. The blog-post readers scored nearly as well as the practitioners. Recognition and proficiency are different cognitive skills, and MCQs can only test the first one.

So we built a conversation instead.

The Dual-Track Architecture

AISA uses two separate AI systems for each assessment, and they never talk to each other during the session.

Track A is the facilitator. It conducts a 25-minute adaptive conversation with the candidate. It asks questions, follows up on interesting answers, probes areas of weakness, and adjusts difficulty based on demonstrated skill level. Track A's job is to elicit the richest possible evidence of how someone actually thinks about and works with AI. It does not score anything.

Track B is the evaluator. It receives the full conversation transcript and independently scores every response against 11 criteria across 5 dimensions. Track B never interacts with the candidate. It scores based on observed evidence: what the candidate said, what reasoning they demonstrated, and what behaviors they exhibited.

This separation matters because it eliminates the tension between making the candidate comfortable and maintaining scoring rigor. Track A can be encouraging and conversational. Track B can be ruthlessly analytical. Neither compromises the other.

The Five Dimensions

Every assessment produces scores across five dimensions, each weighted to reflect its impact on real-world AI-assisted work:

  • Prompting & Communication (23%) — Can they communicate effectively with AI systems? Not template memorization, but intentional, adaptive communication.
  • Critical Thinking (22%) — Can they evaluate AI outputs with rigor? Do they know when to trust and when to verify?
  • Technical Understanding (20%) — Do they have a working mental model of how AI systems behave? Not academic depth, but useful intuition.
  • Workflow & Application (25%) — Have they integrated AI into structured, repeatable workflows? This is the highest-weighted dimension because it is where value is created.
  • Safety & Responsibility (10%) — Do they consider failure modes, data privacy, and ethical implications? A threshold dimension — low scores here are disqualifying regardless of other strengths.

Each dimension contains two or three criteria, for a total of 11 scored independently on a 1–10 scale. Scores map to five proficiency bands: Novice (1–2), Developing (3–4), Competent (5–6), Proficient (7–8), Expert (9–10).

For a complete breakdown of what each criterion measures at each level, see The AISA Rubric.

What Makes the Conversation Adaptive

Track A does not follow a fixed script. It adapts based on what the candidate demonstrates.

If a developer shows strong prompt engineering early, Track A shifts to harder territory: context window management, multi-step workflows, edge case handling. If a product manager mentions safety considerations unprompted, Track A probes the depth of that awareness instead of asking introductory safety questions.

This adaptiveness means a single 25-minute assessment can differentiate across the full skill spectrum. A candidate who is clearly Expert-level spends most of the conversation on Expert-level topics, generating evidence at that band. A candidate in the Developing band gets questions appropriate to their level, which is both more informative and less discouraging than being grilled on topics they cannot yet address.

Anti-Gaming by Design

The conversational format is inherently harder to game than any test format. You cannot memorize answers to questions that have not been written yet. You cannot paste ChatGPT responses without introducing detectable style shifts. You cannot claim expertise and then fail to demonstrate it when the conversation probes deeper.

AISA also runs specific integrity checks: style shift detection (flagging abrupt changes in vocabulary or formality), timing analysis (flagging responses whose complexity is inconsistent with composition time), and consistency verification (revisiting topics from different angles to check for contradictions).

These are included in the assessment report as transparency signals, not automatic disqualifiers. Hiring managers see the evidence and make their own judgment.

What Comes Out

Every AISA assessment produces:

  • A composite score (0–100) weighted across all five dimensions
  • Dimensional scores showing specific strengths and gaps
  • A persona classification — one of 10 psychographic profiles describing the candidate's relationship with AI tools
  • Evidence quotes — specific statements from the conversation that support each score
  • Integrity indicators — any detected anomalies in style, timing, or consistency

The evidence trail is what makes AISA reports actionable. A hiring manager does not just see "Critical Thinking: 4/10" — they see the specific moment in the conversation where the candidate accepted a flawed AI output without questioning it. An L&D leader does not just see "Workflow: 3/10" — they see that the candidate uses AI ad hoc with no structured process.

For designers, data scientists, and any other role where AI touches daily work, the evidence maps directly to on-the-job behavior.

Why We Are Publishing This

Transparency is a deliberate choice. We publish the rubric, explain the anti-gaming architecture, and describe the scoring methodology because the system is designed so that understanding how it works does not help you game it. The only reliable way to score well is to be genuinely proficient. That is the point.

Ready to try the AI skills assessment yourself?