What a Score 9 Looks Like: A Prompt Sequence Tear-Down

We analyzed a high-scoring developer assessment to identify the specific prompt patterns that separate proficient AI users from experts.

By AISA Team··5 min read
workflowscoringdeveloperprompting

The 7-Prompt Sequence That Scored 9/10

Scoring above 8 on P1 (Prompt Design) is rare. We pulled apart a high-scoring session — prompt by prompt — to show exactly what separates a 9 from a 5.

The candidate was a mid-senior backend engineer. Nothing unusual about their background. What stood out was how they talked to the AI, not what they knew going in.

The Setup: A Debugging Scenario

During the developer assessment, Track B presented this candidate with a failing Python service — a FastAPI endpoint returning 500 errors intermittently under load. The candidate had to use an AI assistant to diagnose and fix the issue.

Here is how the scoring broke down across the key dimensions:

  • P1 (Prompt Design): 9/10
  • P2 (Output Evaluation): 8/10
  • T1 (Capability Awareness): 8/10
  • T2 (Limitation Awareness): 7/10

What a Typical 5/10 Looks Like

A mid-range candidate usually opens with something like:

"Fix this FastAPI code that's returning 500 errors."

They paste the entire file — sometimes 300+ lines — and wait for the AI to produce a corrected version. When the AI's fix doesn't work, they paste the error message and say "it still doesn't work." This loop repeats two or three times before time pressure forces a guess.

The pattern: dump context, wait for magic, repeat on failure.

This approach scores around 5 on P1 because it shows basic ability to interact with an AI tool but zero strategic thinking about how to get better outputs.

The 9/10 Sequence: Seven Prompts, One Fix

The high-scoring candidate took a fundamentally different approach. Here is the actual sequence, anonymized:

Prompt 1 — Scoping the problem: They didn't paste code first. They described the symptom, the environment, and the constraints: Python 3.11, FastAPI with async handlers, PostgreSQL via asyncpg, intermittent 500s only under concurrent requests above ~50 RPS.

Prompt 2 — Requesting a diagnostic framework: Instead of asking for a fix, they asked the AI to list the top five categories of intermittent failures in async Python services with database connections. They were building a mental model before touching code.

Prompt 3 — Targeted code submission: Only now did they paste code — but not the whole file. They included only the database connection pool setup and the specific handler that was failing. They explicitly noted the file was 280 lines and that they were showing the relevant 40.

This is where P1 (Prompt Design) becomes visible. The candidate demonstrated awareness that context window space is a finite resource. Flooding it with irrelevant code degrades output quality. This single decision — selective context inclusion — is one of the strongest predictors of a high P1 score.

Prompt 4 — Course correction: The AI suggested a connection pool exhaustion issue and recommended increasing max_size. The candidate pushed back: "The pool is set to 20 and we're only hitting 50 RPS with queries averaging 15ms. That math doesn't suggest exhaustion. What else could cause this pattern?"

This is the move that separates proficient from expert. They didn't accept the first plausible answer. They applied domain knowledge to evaluate the AI's output — which maps directly to P2 (Output Evaluation) on the AISA rubric.

Prompt 5 — Hypothesis refinement: After the AI suggested a race condition in the connection acquire/release cycle, the candidate asked it to generate a minimal reproduction script that would demonstrate the issue under concurrent load.

Prompt 6 — Validation: They ran the reproduction script, confirmed the race condition, and pasted the output back to the AI with a clear instruction: "Confirmed. The issue is in the acquire/release cycle. Propose a fix using asyncpg's pool context manager pattern, and explain why it prevents this race condition."

Prompt 7 — Review and edge cases: After receiving the fix, they asked: "What edge cases would this fix not cover? Specifically, what happens if a query times out mid-execution?"

Seven prompts. Clear diagnostic progression. No wasted context. Active evaluation at every step.

The Three Patterns That Drive High P1 Scores

In high-scoring sessions, three patterns emerge consistently:

1. Problem Framing Before Code

Top scorers spend their first interaction describing the problem space. They treat the AI as a collaborator who needs context, not a search engine that needs keywords. This maps to what we measure in P1 — the ability to structure prompts that guide the AI toward useful outputs.

2. Selective Context Inclusion

Candidates who score above 7 on P1 almost never paste entire files. They curate what the AI sees. They understand that more context is not always better context — sometimes it is noise that dilutes the signal.

3. Iterative Evaluation Loops

The highest-scoring candidates treat every AI response as a hypothesis, not an answer. They cross-reference against their own knowledge, ask for evidence, and request alternative explanations. This is where P2 (Output Evaluation) and T2 (Limitation Awareness) overlap — understanding that AI outputs require verification is a form of limitation awareness.

What This Means for Developers Preparing for AISA

If you are preparing for an AISA developer assessment, the takeaway is straightforward: the assessment is not measuring whether you can get the AI to produce correct code. It is measuring how you think about the interaction.

A developer who scores 9 on P1 treats every prompt as an engineering decision. They consider what information the AI needs, what information would be noise, and how to validate the output before trusting it.

The scoring rubric is public. You can read the full AISA rubric breakdown to understand exactly what each dimension measures. The candidates who score highest are the ones who have internalized these dimensions — not as test criteria, but as genuine workflow habits.

One More Thing the Data Shows

Candidates who score 8+ on P1 also tend to score 7+ on U1 (Effective Usage) and T1 (Capability Awareness). This isn't coincidence. Strong prompt design requires understanding what the tool can and cannot do. The skills are deeply linked — which is why AISA measures them as separate but correlated dimensions rather than collapsing them into a single score.

The gap between a 5 and a 9 is not talent. It is technique. And technique can be learned.

Learn more about how AISA assesses developers.

Ready to try the AI skills assessment yourself?