For the technically curious. (You know who you are.)
How It Works
One conversation. Two AI tracks. Zero black box.
Candidates talk to one AI. A separate AI evaluates every response in real time. They never mix. That’s how we keep the dialogue natural and the scoring rigorous.
Anti-gaming
We detect when responses don’t come from the candidate. No proctoring—just instrumentation.
Typing speed
Human typing is ~4–10 chars/sec. Text appearing faster (12+ chars/sec) or in large chunks = pasted.
Paste events
We track paste events directly. No guesswork—if they pasted, we know.
Style shifts
Sudden change in vocabulary or formality (e.g. ChatGPT-ish phrasing) vs their baseline.
Response timing
Long pauses before answering or impossibly fast replies. Flagged alongside the report.
Typing metrics weigh 70%, other signals 30%. Flags appear in the report with an appeal option—we’re confident but never punitive.
End-to-end flow
Invite
You send a link. They get a professional invite and open the assessment.
Conversation
20–40 min chat. Workflows, tools, verification—feels like a colleague, not an exam.
Evaluation
Every message is scored by Track B before Track A replies. Evidence and steering, invisible to the candidate.
Report
Scores across 5 dimensions, each tied to quotes. Plus follow-up questions where evidence was thin.
The two tracks
Two distinct AIs so conversation quality and evaluation rigor don’t compromise each other.
Track A
Conversationalist
The only AI the candidate sees. Warmth, natural flow, adaptive depth. Gets steering notes from Track B but never sees scores. Prioritizes a natural dialogue over checklist coverage.
Track B
Evaluator
Runs silently. For every message: evidence items, scores, steering notes—structured data only, no candidate-facing text. Behavioral anchors (1-10 scale) keep scores consistent and explainable.
Track B runs before Track A each turn → the next reply can already reflect the latest steering.
Orchestrator
Stateless pipeline per message. Instrumentation is explicit—we measure it, we don’t ask the model to guess.
Behavioral rubric
5 skill dimensions, 11 underlying criteria. Job-relevant, learnable AI skills (not personality). Selected via multi-frame analysis; we assess skills, not communication style.
Score scale (1–10)
5 dimensions · 11 criteria
- P1 — Prompt Design
- P2 — Iterative Dialogue
- P3 — Context & Memory Management
- T1 — Output Evaluation
- T2 — Limitation Awareness
- U1 — AI Fundamentals
- U2 — Tool Landscape
- W1 — Workflow Integration
- W2 — Task Decomposition
- W3 — Domain Application
- S1 — AI Safety & Responsibility
Want the full breakdown? Read our deep dive into all 5 dimensions or learn how conversational evidence prevents cheating.
Evidence & scoring
Every score is tied to specific quotes. We flag when the candidate agreed with something we explained (score cap). Chaptering keeps context manageable over long sessions.
What you get
- ✓Overall score & recommendation
- ✓Per-criterion scores with confidence
- ✓Exact quotes justifying each score
- ✓Strengths & gaps
- ✓Follow-up interview questions
- ✓Instrumentation notes when relevant
The 10 AI Personas
Beyond the score, every candidate gets an AI persona — a profile of how they interact with AI, not just how well they understand it. Two people can score identically and receive different personas.
Deep technical mastery of AI itself. Understands or builds AI models, works with ML and LLMs at a technical level. Elite critical analysis comes from understanding the technology at its foundation, not just from using it.
Designs and builds sophisticated multi-system AI integrations at scale. Goes beyond creating individual tools to engineering production-grade architectures where AI components interact with each other and non-AI systems.
Personally created complex, useful tools, workflows, or products using AI — whether for their own use, their company, or commercially. Developed deep practical understanding through hands-on building that goes beyond secondhand knowledge.
Uses AI heavily across complex workflows, automations, and multi-tool pipelines. Understands AI limitations well and knows which tool integrates with which. Orchestrates and configures sophisticated setups, but typically works with what's available rather than building novel tools from scratch.
Productive with mainstream AI tools and uses them well within established workflows. Communicates clearly with AI and consistently gets quality output, but typically hasn't pushed into the cutting edge of AI tooling or complex integrations.
Actively building AI skill across multiple dimensions. Tries new tools, refines prompts, and is beginning to develop repeatable patterns — the trajectory is strong.
Approaches AI with critical caution. May under-use AI in practice, but the verification instinct and risk awareness form a strong foundation that many frequent users lack.
Relies on AI for day-to-day output but with limited iteration or verification. Gets value, but leaves quality and safety gains on the table by accepting first-pass results.
Experiments with AI intermittently: a prompt here, a quick question there. Nothing sustained, but a willingness to explore that many skip entirely.
Has heard of AI tools but hasn't meaningfully engaged — the assessment itself may be the most direct interaction to date. Awareness exists; habit does not.
Personas reflect AI interaction style — usage patterns, habits, and mindset. They correlate with the score but don't directly map to it.