AI Failure Modes: A Complete Taxonomy

From AISApedia, the AI skills & terms encyclopedia

A failure mode taxonomy classifies AI output errors into distinct categories — correct, partially correct, hallucinated, refused, and misunderstood — each requiring a different remediation strategy. By diagnosing which failure type has occurred before attempting a fix, practitioners avoid applying the wrong corrective action, which is the most common reason that prompt debugging takes longer than necessary.

What are the five failure categories and how do you distinguish them?

A correct output matches the intended result — no intervention needed. A partially correct output has the right structure, approach, and general direction but contains errors in specific details: the right analytical method applied with wrong numbers, the correct framework with missing edge cases, or an accurate answer that omits important caveats. A hallucinated output presents fabricated information as fact — invented citations, made-up statistics, or plausible-sounding claims with no factual basis. A refused output occurs when the model declines to attempt the task, typically citing safety guidelines, capability limitations, or policy restrictions. A misunderstood output answers a different question than the one asked — it may be coherent, detailed, and well-structured, but it addresses the wrong topic or interprets the request in an unintended way.

Identification requires comparing the output against both the prompt intent and the ground truth. A misunderstood output and a hallucinated output can look similar at first glance — both are wrong — but the diagnosis is entirely different. The misunderstood output addressed the wrong topic; the hallucinated output addressed the right topic with fabricated details. This distinction matters because the remediation path differs completely between the two, and applying the wrong fix wastes time while potentially making the output worse.

Why does the remediation strategy differ by failure type?

Applying the wrong fix to a misdiagnosed failure type wastes effort and can make the output worse. If a model misunderstood the question and you respond by asking it to "be more accurate" or "check your facts" (the fix for hallucination), you will get a more carefully researched answer to the wrong question. If a model hallucinated a statistic and you respond by "rephrasing the question more clearly" (the fix for misunderstanding), you will get a different hallucinated statistic in response to a better-phrased prompt.

The correct remediation for each type: partial correctness requires pointing the model to the specific errors — the wrong numbers, the missing cases, the omitted caveats — and asking for targeted corrections. Hallucination requires asking the model to verify its claims against sources it can cite — a practice aligned with source triangulation, or providing the correct facts and asking it to regenerate using only verified information. Refusal requires understanding the specific concern and reframing the request to address it — or acknowledging a genuine capability limit and using a different tool. Misunderstanding requires restating the question with explicit constraints about what you are and are not asking about.

This diagnostic discipline connects to <a href="/aisapedia/prompt-debugging">prompt debugging</a>, which is the broader practice of systematically diagnosing and resolving prompt failures. The failure mode taxonomy provides the classification framework that prompt debugging operates on — it is the diagnostic step that must precede the fix.

How does tracking failure patterns improve your AI practice over time?

Logging your AI output failures by category over a period of weeks reveals your personal failure distribution — the specific pattern of errors you encounter most frequently given your tasks, your prompting style, and your choice of models. You might discover that most of your issues are misunderstandings (suggesting your prompts need clearer scoping and explicit constraints), or mostly hallucinations (suggesting you are asking about topics at the edge of the model's reliable knowledge), or mostly partial correctness (suggesting the model understands your tasks but needs more precision in execution details).

This distribution data drives targeted improvement through systematic feedback loops that is far more efficient than generic "better prompting" advice. If misunderstandings dominate, invest in clearer prompt structure, explicit scope constraints, and examples of what you are not asking for. If hallucinations dominate, add verification steps, switch to models with better factual grounding for those specific tasks, or provide the factual base in the prompt. If partial correctness dominates, provide more detailed examples and tighter output specifications. Each failure type has a specific remedy; knowing which remedy to apply requires knowing which failure type is most prevalent.

Teams can aggregate these logs to identify organisation-wide patterns. If multiple team members experience the same failure type on the same task category, the fix might be a shared <a href="/aisapedia/domain-prompt-templates">domain prompt template</a> rather than individual prompt improvements — a systemic solution to a systemic problem.

What happens when a failure falls between categories?

Real-world failures do not always fit neatly into a single category. A common hybrid is partial-hallucination: the output addresses the right topic with mostly correct information but includes one or two fabricated details — a real framework described with an invented feature, or a correct historical timeline with a wrong date. These hybrids require combining remediation strategies: verify the suspicious details specifically while accepting the overall structure.

Another hybrid is misunderstanding-plus-hallucination: the model interpreted the question differently than intended and then hallucinated details within that misinterpretation. The fix requires two steps — first redirect to the correct topic, then verify the factual claims in the redirected response. Attempting only one step (only correcting the misunderstanding, or only fact-checking without redirecting) will leave the other error in place.

When classification is ambiguous, err toward the more severe diagnosis. If you are unsure whether an output is partially correct or hallucinated, treat it as hallucinated and verify the facts. The cost of unnecessary verification on a partially correct output is much lower than the cost of trusting a hallucinated claim that turns out to be fabricated.

How can failure mode tracking help you evaluate and choose between AI models?

When you track failure modes consistently across different models for the same task types, you build an empirical comparison — a form of model benchmarking — that is more useful than public benchmark scores. A model that produces fewer hallucinations but more refusals on your specific tasks may be preferable to one that attempts everything but fabricates details more frequently — or the reverse, depending on whether your workflow tolerates false information or task avoidance more easily.

This data also reveals model-specific patterns that inform prompt design. If one model consistently misunderstands a particular question structure while another handles it correctly, the issue may be a mismatch between your prompt style and the model's training, not a fundamental quality difference. Adjusting the prompt for the model's interpretation tendencies — rather than switching models — may be the more efficient fix. Failure mode tracking gives you the diagnostic data to make this distinction.

Try this yourself

Track your next 10 AI outputs using these categories. When you get a bad response, classify it before fixing: hallucination needs fact-checking, refusal needs reframing, misunderstanding needs clarification.

Real-world example

Marketing manager asks for 'customer acquisition costs by channel.' Gets beautiful charts with fake numbers — that's hallucination, needs real data sources. Asks for 'GDPR-compliant email templates,' gets generic templates — that's partial correctness, needs specific compliance points. Same symptom (bad output), different diseases, different cures.