What 723 Assessments Reveal About How Developers Actually Use AI
Assessment data from 723 candidates shows where developers excel with AI tools and where consistent blind spots emerge.
The Pattern Nobody Expected
When we designed AISA's developer assessment track, we assumed the strongest performers would be the ones who could write the most sophisticated prompts. After 723 completed assessments — 523 in the last 30 days alone — the data tells a different story. The developers who score highest aren't the best prompters. They're the best editors.
The gap between developers who treat AI output as a first draft versus those who treat it as a final answer is the single clearest signal in our developer assessment data. And it maps to real-world consequences that engineering managers should care about.
Where Developers Consistently Score Well
Technical Understanding Is the Baseline, Not the Differentiator
Developers tend to score above the population mean on Technical Understanding — the dimension that covers how models work, their limitations, and when to apply them. This makes intuitive sense. If you've spent time with Claude Code, Cursor, or Codex, you've built a working mental model of what these tools can and can't do. You've hit the context window wall. You've watched a model confidently generate code that doesn't compile.
But here's the thing: Technical Understanding accounts for only 20% of the AISA rubric. Scoring well on it is necessary but nowhere near sufficient. We observe developers who score 7-8 on Technical Understanding but land at 4-5 overall because the other dimensions pull them down hard.
Prompting Mechanics Are Strong, Strategy Is Weaker
Most developers can construct a clear, well-scoped prompt. They know to specify language, framework, constraints. The mechanical skill is there. Where we see scores drop within the Prompting & Communication dimension is on the strategic side — knowing when to break a problem into multiple prompts versus handling it in one pass, when to provide examples versus instructions, and how to recover when the conversation goes sideways.
This tracks with how the tooling landscape has evolved. Claude Code, Cursor, and Antigravity 2.0 have converged on similar agentic patterns — but each rewards slightly different interaction styles. Developers who've only used one tool tend to develop one prompting strategy. That works until it doesn't.
The Three Blind Spots
1. Critical Evaluation of AI Output
This is the big one. Critical Thinking is the dimension where we see the widest variance among developers, and where the most consistent blind spot lives.
The pattern: a developer receives AI-generated code that looks syntactically correct and structurally reasonable. They accept it. During the assessment conversation, when the AI facilitator probes on edge cases, error handling, or architectural implications, the developer's reasoning thins out. They trusted the output because it looked right, not because they verified it was right.
With models like GLM-5.2 scoring 62.1% on SWE-bench Pro and Fable 5 hitting 80.3% before the export control pullback, AI-generated code is getting better fast. That makes this blind spot more dangerous, not less. The better the output looks, the harder it is to catch the 20% that's wrong.
Engineering managers: if your team is using AI coding tools heavily, this is the skill gap to probe. Not "can they use Copilot" but "can they catch what Copilot gets wrong."
2. Safety and Responsibility Gets Deprioritized
The Safety & Responsibility dimension is weighted at only 10% of the total score, but it's where developers most consistently underperform relative to other roles. We observe a pattern where developers treat safety considerations — data handling, bias awareness, appropriate use boundaries — as someone else's problem. A product manager's problem. A compliance team's problem.
With the Colorado AI Act enforcement starting June 30 and the EU AI Act bulk application hitting August 2, this isn't theoretical anymore. The developer who pushes AI-generated content to production without considering provenance, licensing, or data leakage is creating concrete legal exposure. Our assessment surfaces whether candidates think about these issues proactively or only when prompted.
3. Workflow Integration Beyond Code Generation
Developers tend to think of AI tools as code generators. Write a prompt, get code back. But Workflow & Application — the largest dimension at 25% — measures something broader: how you integrate AI into planning, debugging, documentation, code review, and architectural decisions.
The strongest developers we assess use AI across the full development lifecycle. They use it to rubber-duck architectural decisions, generate test cases, draft ADRs, review their own PRs before submitting. The weaker performers have a single use case: "I describe what I want, it writes code." That's the difference between a Tactician and a Conductor — between someone who uses AI effectively for specific tasks and someone who orchestrates it across an entire workflow.
The current multi-tool reality makes this even more relevant. Teams are increasingly using Claude Code for architecture and planning, Codex for implementation loops, and Antigravity for browser testing. The developer who can only operate in one of these modes is leaving capability on the table.
What This Means for Hiring
If you're an engineering manager hiring developers right now, the traditional AI screening question — "Do you use AI tools?" — tells you almost nothing. Nearly every developer uses AI tools. The question is how.
Our assessment data suggests three questions that actually differentiate:
- Can they critically evaluate AI output? Not just "does it compile" but "is this the right approach, does it handle edge cases, will it scale."
- Do they use AI beyond code generation? Planning, debugging, documentation, review — or just autocomplete?
- Do they think about safety without being prompted? Data handling, licensing, appropriate use — or do they assume someone else owns that?
A developer who scores well on all three consistently lands in the Proficient-to-Expert range on our free AI skills assessment. A developer who only scores well on the first one — which is already rare — typically lands at Competent.
The Concrete Takeaway
The developer AI skill gap isn't about prompt engineering. It's about critical evaluation, workflow breadth, and safety awareness. If you're building an AI-native hiring process, screen for these three things specifically. If you're a developer looking to level up, the highest-leverage move isn't learning more prompt tricks — it's building the habit of questioning every AI output and expanding where in your workflow you apply these tools.
Run your engineering team through the developer assessment. The scores won't surprise you on Technical Understanding. They'll surprise you everywhere else.
Learn more about how AISA assesses developers.

Ozan Dagdeviren
Founder of AISA — the AI skills assessment platform used by professionals worldwide to measure, certify, and develop their AI fluency. More about AISA
Ready to try the free AI skills assessment yourself?