The 2026 AI Skills Report: What Assessment Data Reveals About Builder Proficiency

A framework for understanding AI proficiency patterns across roles, dimensions, and personas — what AISA's scoring system is designed to surface and why it matters for hiring and L&D.

By AISA Team·

Most teams cannot answer a basic question: how good is our team at working with AI? Not "do they use AI tools" — that is easy to observe. But "do they use them well, critically, and in a way that produces reliable outcomes?" That question requires measurement. And measurement requires a framework that captures AI proficiency as what it actually is: a multi-dimensional skill set, not a single number.

This report describes what AISA's assessment framework is designed to measure, the patterns it is built to surface, and the scoring architecture that makes those patterns visible. As our assessment data grows, we will publish longitudinal updates with aggregate statistics. For now, this is the reference document for understanding what AI proficiency looks like when you actually measure it — and what the early signals tell us.

The Scoring Framework

All AISA data comes from conversational assessments — 25-minute adaptive conversations scored by an independent AI evaluator across 11 criteria in 5 dimensions. Assessments are completed by professionals in software engineering, product management, design, data science, and other roles, sourced through AISA's hiring and team benchmarking customers.

Scores use a 1–10 scale per criterion. Proficiency bands are: Novice (1–2), Developing (3–4), Competent (5–6), Proficient (7–8), Expert (9–10). The weighted composite produces a 0–100 overall score.

The framework is role-adaptive: a developer assessment probes different territory than a product manager assessment, but both are scored against the same rubric. This means scores are directly comparable across roles while the evidence behind them reflects role-specific context.

What the Framework Is Designed to Detect

The Critical Thinking Gap

The most important pattern the framework is built to surface is the gap between AI usage and AI evaluation. In the broader AI adoption landscape, most professionals have adopted AI tools to some degree. Far fewer have developed systematic approaches to evaluating what those tools produce.

The AISA rubric separates these skills deliberately. Prompting & Communication measures whether someone can effectively direct AI tools. Critical Thinking measures whether they can evaluate the results. These are independent skills — being good at one does not guarantee being good at the other. A person who writes excellent prompts but accepts every output at face value is a different profile from someone who writes basic prompts but rigorously verifies every result.

Our early assessment data confirms what we expected: these two dimensions do not track each other as closely as most people assume. The framework captures this divergence precisely because it was designed to.

The Recognition-Application Gap

The second pattern the framework targets is the gap between knowing about AI and applying it effectively. Many professionals can discuss AI concepts fluently — they know what RAG stands for, they understand the difference between fine-tuning and prompting, they can name multiple AI tools. But conceptual knowledge does not predict practical proficiency.

AISA's conversational format forces candidates to demonstrate application, not just recognition. When someone claims expertise in prompt engineering, the conversation requires them to explain specific choices, evaluate real outputs, and reason about tradeoffs in real time. This surfaces the gap between recognition-level knowledge and application-level skill in a way that multiple-choice formats cannot.

The Workflow Integration Gap

The third key pattern is the gap between ad hoc AI use and structured AI workflows. Most professionals use AI tools reactively — they open a chat interface when they are stuck and close it when they get an answer. Far fewer have built AI into their daily work as a systematic practice with defined handoff points, quality gates, and repeatable patterns.

Workflow & Application is the highest-weighted dimension in the AISA rubric (25%) precisely because this gap has the largest impact on actual productivity. A developer who has built AI into their code review, documentation, and testing workflows gets fundamentally more value from AI than one who uses it for occasional question-answering, even if both score similarly on technical knowledge.

Role-Specific Patterns

The framework is designed to capture how AI proficiency varies systematically across professional roles. While every individual is unique, roles create structural patterns in which dimensions tend to be strongest and weakest.

Software Developers

Developers typically show their strongest scores on Technical Understanding — they work with AI tools daily and develop intuitions about how models behave. Their typical weakness is in Critical Thinking, specifically in output evaluation. Developers who spend their days writing code with AI assistance often develop a bias toward accepting AI-generated code that "looks right" without systematic verification.

The gap between a developer's Technical Understanding and their Critical Thinking is one of the most actionable findings for engineering managers. It suggests that the team does not need more AI tool training — they need output evaluation discipline. See our developer assessment page for more on what we probe.

Product Managers

Product managers tend to show strength in Safety & Responsibility — their professional orientation toward user impact and risk naturally translates to AI context. Their typical weakness is Technical Understanding, which is expected. The concern is not that PMs should become ML engineers, but that insufficient technical mental models lead to poor product decisions about where and how to deploy AI features.

The PM assessment specifically probes whether product managers can evaluate AI vendor claims, scope AI features realistically, and anticipate failure modes — all of which require a baseline of technical understanding that many PMs have not yet developed.

Designers

Designers are often the newest entrants to serious AI tool usage, and their profiles tend to reflect this. Design-specific AI tools (generative image, copy, and layout tools) have matured rapidly, but the practices around evaluating and integrating their outputs into professional design workflows are still developing.

The designer assessment focuses on whether designers are applying the same quality standards to AI-generated outputs that they apply to human-created work — brand consistency, accessibility, UX principles — or whether AI outputs are being treated as a special category that bypasses normal review.

Data Scientists

Data scientists typically show the highest Technical Understanding of any role — they have the deepest mental models of how AI systems work. Their pattern often inverts the developer profile: strong conceptual knowledge but weaker Workflow & Application scores, suggesting that their AI usage is exploratory rather than systematically integrated into production processes.

For data scientist assessments, we probe whether conceptual knowledge translates into practical workflow design — can they build reliable RAG pipelines, design evaluation metrics for AI outputs, and make production-ready architecture decisions?

The 10 Personas

Beyond dimensional scores, AISA maps each candidate to one of 10 AI Personas — psychographic profiles that describe their relationship with AI tools. Understanding personas is often more actionable than understanding raw scores, because personas suggest the type of intervention needed, not just the amount.

PersonaProfile
BystanderNot yet engaged with AI tools. Low scores across all dimensions.
DabblerHas tried AI casually. No structured practice or consistent usage.
Copy-PasterRegular AI user who accepts outputs uncritically. High Prompting, low Critical Thinking.
ScepticUnderstands AI conceptually, distrusts it practically. High Technical Understanding, low Workflow.
EnthusiastExcited about AI, uses it eagerly. High engagement, low evaluation rigor.
TacticianBalanced, intentional AI user. Moderate-to-high scores across dimensions.
ConductorDesigns AI workflows for teams. High Workflow, strong Critical Thinking.
BuilderCreates custom AI tools and pipelines. High Technical Understanding and Workflow.
ArchitectPrinciple-level systems thinker. Designs AI strategy, not just AI usage.
OracleExpert across all dimensions. AI has reshaped their professional thinking.

The personas that matter most for organizational risk are the Copy-Paster and the Enthusiast — people who use AI regularly but without the critical evaluation skills to use it reliably. These profiles represent the largest single category of AI risk for organizations. Not the non-users, but the uncritical users.

The Sceptic represents a different challenge: someone who has the knowledge to use AI well but is not doing so. Sceptics often cite legitimate reliability concerns. The path forward for Sceptics is not persuasion but evidence — showing them specific workflows where AI reliability is adequate for the task, with appropriate verification steps.

Experience and Proficiency

A common assumption is that more experienced professionals are better at AI. The relationship is more nuanced than that.

Technical Understanding does tend to increase with experience — senior professionals who have worked in technology longer have more context for understanding how AI systems work and where they fit into the broader technology landscape.

But Workflow & Application and Prompting & Communication do not follow the same pattern. These skills reward active, daily AI tool usage, and the professionals who spend the most hands-on time with AI tools are often mid-career practitioners, not the most senior people in the room. Senior leaders may understand AI conceptually but delegate the hands-on work, which means their practical proficiency does not always match their seniority.

This has direct implications for upskilling: senior engineers often need practice-based interventions (AI sprints, workflow design exercises), not more conceptual training. Junior engineers need foundational training on both concepts and techniques. Mid-career professionals who are already heavy AI users may be the best candidates for peer teaching roles.

Assessment Integrity

AISA's anti-gaming systems monitor for three categories of integrity concerns:

  • Style shift detection — responses that show abrupt changes in vocabulary, structure, or formality consistent with external text insertion
  • Timing anomalies — responses whose complexity is inconsistent with the time taken to produce them
  • Consistency failures — contradictions between claimed expertise and demonstrated behavior

The system is designed so that genuine proficiency is the most effective strategy. Attempts to game the assessment — by pasting external text or fabricating expertise — introduce behavioral inconsistencies that the evaluator detects. The assessment rewards authenticity because it is measuring how someone actually thinks and works, not what they can produce under artificial conditions.

For a detailed explanation of the integrity architecture, see Beyond Multiple Choice.

Implications for Hiring

For hiring managers, the key insight is that AI proficiency is poorly predicted by traditional proxies. Years of experience, number of AI tools listed on a resume, and self-reported skill levels are all weak signals. The strongest predictor of practical AI proficiency is the candidate's ability to articulate why their approaches work — a behavior that can only be observed in conversation, not extracted from a resume.

Organizations serious about hiring AI-proficient talent need to assess for it directly. Resume keywords and self-reported skill ratings are noise. For a framework on integrating AI proficiency assessment into your hiring pipeline, see Hiring the Next Generation.

Implications for L&D

For L&D leaders, the framework supports three strategic priorities:

  1. Measure before training. Without dimensional assessment data, AI training budgets are spent on the wrong things. A team with strong Prompting but weak Critical Thinking does not need another prompt engineering course.

  2. Target training by persona, not just by score. A Copy-Paster and a Sceptic at the same composite score need completely different interventions. Persona-based training addresses the underlying behavioral pattern, not just the numerical gap.

  3. Measure after training. AI training without pre/post assessment is a budget expenditure. AI training with evidence-based measurement is a capability investment. See The AI Skills Gap for a detailed implementation guide.

What Comes Next

This report will be updated regularly as our assessment dataset grows. Future editions will include aggregate score distributions, role-specific benchmarks, longitudinal improvement trends, and industry-specific breakdowns. Our goal is to build the most comprehensive dataset on AI proficiency across professional roles — measured through evidence, not self-report.

The organizations that approach AI upskilling with measurement discipline will be the ones that build teams capable of working with AI at the level the technology demands. The first step is knowing where you stand.

Ready to try the AI skills assessment yourself?