What 400 Assessments Reveal About Developers and AI: Strong on Prompting, Weaker Than You'd Expect on Technical Understanding

Patterns from 400+ AISA assessments show developers excel at prompting but often stumble on model selection, context management, and safety reasoning.

By AISA Team··6 min read
rolesdatahiringdevelopersassessment-patternstechnical-understandingengineering-managementai-skills

Most engineering managers assume their developers are the best-positioned people in the org to use AI effectively. They write code. They understand APIs. They've been using Copilot for two years. Of course they'll score well on an AI skills assessment.

Across 408 completed AI skills assessments on AISA, the patterns we observe in developer assessments challenge that assumption in specific, actionable ways. Developers do tend to score well — but not where you'd expect, and the gaps they show are exactly the ones that matter as AI tooling shifts from autocomplete to agentic workflows.

The Strength: Prompting & Communication Comes Naturally

Developers consistently demonstrate strong Prompting & Communication skills, which accounts for 23% of the AISA rubric. This isn't surprising. Writing clear, structured instructions is essentially what developers do all day — in code, in tickets, in pull request descriptions. The transfer to prompt engineering is direct.

Where developer prompting stands out specifically:

  • Constraint specification. Developers naturally define boundaries, edge cases, and output formats. When asked to get an AI to produce something, they instinctively add constraints that less technical candidates leave out.
  • Iterative refinement. Instead of accepting a first response, developers tend to debug prompts the way they debug code — adjusting inputs, isolating variables, narrowing scope.
  • Structured output requests. Asking for JSON, markdown tables, or specific schemas comes naturally. This is a real skill that many non-developer candidates struggle with.

If your developers are landing in the Tactician or Conductor persona range, strong prompting is usually the dimension pulling them there.

The Blind Spot: Technical Understanding Isn't What You Think It Is

Here's the counterintuitive finding. The Technical Understanding dimension (20% of the overall score) measures whether someone understands how models work, their limitations, and how to select the right tool for a given task. You'd assume developers would dominate here.

They often don't.

The pattern we observe is that many developers have deep knowledge of one tool — typically whatever's integrated into their IDE — and shallow knowledge of everything else. When the assessment conversation moves to topics like model selection, context window trade-offs, or understanding why a model produces certain types of errors, developers frequently reveal gaps.

Specific patterns that show up:

  • Model selection defaults. Asked how they'd approach a task requiring long-context reasoning versus fast iteration, many developers default to "I'd use ChatGPT" or "I'd use Copilot" without reasoning about which model or configuration fits the task. With Claude Opus 4.8 now offering effort control across five levels and 1M context, and Gemini 3.5 Flash running 4x faster at a fraction of the cost, the ability to match model to task is a real engineering decision — not a preference.
  • Context window misunderstanding. Developers who work with Copilot daily sometimes can't articulate what a context window is, why it matters, or how it affects output quality. They know that longer files sometimes produce worse suggestions, but not why.
  • Confusion between fine-tuning, RAG, and prompting. When the assessment explores when you'd use each approach, developers often conflate them or describe one when they mean another.

This matters because as tools like Claude Code's Dynamic Workflows and Google Antigravity 2.0 push developers toward orchestrating multiple AI agents, understanding what's happening under the hood becomes a prerequisite, not a nice-to-have. A developer who can't reason about model capabilities will struggle to decompose tasks across parallel subagents effectively.

The Other Gap: Safety & Responsibility Gets Deprioritized

We've written about the safety dimension before, but it's worth calling out the developer-specific pattern. Safety & Responsibility carries 10% of the AISA score, and developers tend to treat it as an afterthought — something that's someone else's job.

The assessment probes for things like:

  • Recognizing when AI output might contain hallucinated code dependencies or security vulnerabilities
  • Understanding data privacy implications of pasting proprietary code into third-party AI tools
  • Knowing when human review is non-negotiable versus when automation is acceptable

Developers often score well on the first point (they're trained to be skeptical of code) but poorly on the second and third. With California advancing roughly 30 AI bills past crossover deadline this month and OpenAI publishing its Frontier Governance Framework mapping to both TFAIA and the EU AI Act, the regulatory environment is catching up to what developers are doing daily. Teams that treat AI safety as a compliance checkbox rather than an engineering discipline will get caught flat-footed.

Why This Pattern Emerges

The developer blind spots make sense when you consider how most developers adopted AI tools: bottom-up, through IDE integrations, optimizing for speed on their existing tasks. That path builds strong prompting habits and strong workflow patterns, but it doesn't build breadth of technical understanding or safety awareness. Those require deliberate learning that goes beyond "use the tool more."

What Engineering Managers Should Do With This

First, stop assuming your developers are AI-proficient because they use AI daily. Usage frequency and skill level are weakly correlated at best. Someone can use Copilot 40 hours a week and still be a Copy-Paster — accepting outputs without evaluation, using one tool for everything, never questioning whether the approach fits the problem.

Second, assess before you train. Running your developer team through AISA takes about 20 minutes per person and gives you dimension-level scores. If your team's Technical Understanding scores cluster below their Prompting scores — which is the pattern we're describing — you know exactly where to focus L&D budget. Generic "AI for developers" workshops won't close a specific gap in model selection or context management.

Third, make model selection an explicit part of technical design. When a developer proposes using AI for a task, ask: Which model? Why that one? What's the context window requirement? What's the cost trade-off? These questions should be as normal as asking about database choice or API design. The current model landscape — with Opus 4.8 at $5/$25, GPT-5.5 at $5/$30, and Gemini 3.5 Flash at $1.50/$9 — means these choices have real cost and performance implications.

Fourth, treat safety reasoning as an engineering skill, not a compliance exercise. Include it in code reviews. When someone pastes code into an AI tool, ask what data was in that context. When someone ships AI-generated code, ask what verification they ran. Build the habit now.

The Concrete Takeaway

Developers are not automatically your strongest AI users. They're your strongest prompters. The gap between prompting skill and technical understanding is the most actionable finding for engineering teams right now, and it's the gap that will widen as tooling moves from single-model autocomplete to multi-agent orchestration. Assess your developers at the dimension level, find the specific gaps, and close them before the tooling outpaces the team.

Learn more about how AISA assesses developers.

Ozan Dagdeviren

Ozan Dagdeviren

Founder of AISA — the AI skills assessment platform used by professionals worldwide to measure, certify, and develop their AI fluency. More about AISA

The Science Behind AISA

Metropolitan PoliceHarvard UniversityCrowdboticsEuropean School of Economics

In 2026, Anthropic published the AI Fluency Index — the largest empirical study of AI fluency to date, analysing 9,830 conversations. AISA covers 93% of the behaviours Anthropic identified as markers of AI fluency and goes even deeper with 4 additional dimensions.Read our white paper: Anthropic's AI Fluency Study & AISA

AISA's framework is developed by a team with deep roots in tech, behavioural science, and AI product leadership — the rubric is informed by backgrounds spanning the Metropolitan Police, Harvard, Crowdbotics (Silicon Valley), and the European School of Economics.