Enter password to view case study

SPEC-DRIVEN. AI-AUGMENTED . 2026

Specification is the design skill no one talks about. It's also what makes AI actually useful.

Most design tools are built for the world we already understand. This project was about the next one, where the spec is the deliverable, AI is the build partner, and the designer's job is to ask the questions that make both work.

I reframed a fraction practice tool as a shame problem. Wrote a behavioural specification instead of wireframes. Used Claude to build a working prototype directly from it and applied my judgment wherever a competent default wasn't a good enough answer.

The model: designer authors the contract. AI compresses the build. Human judgment determines what ships.

Designing for shame
a spec-driven approach to adaptive learning

Designing for shame
a spec-driven approach to adaptive learning

The brief was to design an adaptive fraction practice experience for high school students. But the real design problem had almost nothing to do with fractions. It was about teenagers who have spent years being told they’re bad at maths — sitting in intervention programs they didn’t choose, using tools that have already failed them. The complexity wasn’t the content. It was the audience.

I started by reframing the constraint. The stakeholders were split: the executive wanted a simple drill that maximised completion rate; the product leader wanted adaptive learning science; classroom guides wanted visual maturity. These weren’t contradictions to smooth over — they were a map of the problem. I synthesised them into a structured user understanding document and arrived at the central insight: motivation isn’t the design constraint. Shame is. Every interaction in the product would either reinforce a student’s belief that they’re bad at maths, or create a small opening to update it.

This shaped every decision: no red, no “Incorrect,” no grade-level labels, no confetti. Instead, a dark confident interface, strategy cards that name the problem type not the failure, and a tiered error handling system that responds to how a student is doing — not just what they got wrong. I delivered the full MVP design: a behavioural specification document that replaced traditional wireframe handoff, and a working HTML/CSS/JS prototype built from that spec using AI as a build partner, with my judgment determining what shipped.

Deliverables:

  • User Understanding Document — structured synthesis of audience research, design principles, and non-obvious considerations, written to be usable by engineers and AI agents as well as designers

  • Three UX Approaches — meaningfully different strategies mapped to design philosophies (Behaviorist, Constructivist, Human-Centred), with honest tradeoff analysis and a documented recommendation

  • Spec-Driven Design Document — full behavioural specification with research-grounded user stories, design rationale, tiered error handling system, adaptive algorithm logic, and accessibility requirements

  • Working Interactive Prototype — self-contained HTML/CSS/JS, fully functional in-browser, built directly from the spec

  • Design Process Documentation — AI workflow evidence showing where human judgment shaped the outcome

Deliverables:

  • User Understanding Document — structured synthesis of audience research, design principles, and non-obvious considerations, written to be usable by engineers and AI agents as well as designers

  • Three UX Approaches — meaningfully different strategies mapped to design philosophies (Behaviorist, Constructivist, Human-Centred), with honest tradeoff analysis and a documented recommendation

  • Spec-Driven Design Document — full behavioural specification with research-grounded user stories, design rationale, tiered error handling system, adaptive algorithm logic, and accessibility requirements

  • Working Interactive Prototype — self-contained HTML/CSS/JS, fully functional in-browser, built directly from the spec

  • Design Process Documentation — AI workflow evidence showing where human judgment shaped the outcome

Role:

Senior Product Designer (Independent)

Role:

Senior Product Designer (Independent)

Client

US learning technology company

Platform

Web-based learning application

Methods

Spec-Driven Development, AI-Augmented Design Workflow, User Research Synthesis, Interaction Design, Functional Prototyping

The Challenge: Designing for an Audience That Has Already Given Up

The Challenge: Designing for an Audience That Has Already Given Up

A learning technology company building software for US schools needed an adaptive practice experience for high school students struggling with fraction comparison — a foundational skill that, when missing, blocks progress in algebra and beyond.

The complexity wasn’t the maths. It was the audience: teenagers who carry years of shame about a skill they know is supposed to be elementary, sitting in intervention groups they didn’t choose, using tools that have failed them before. I was brought in to design the MVP of this experience — from user understanding through to a working interactive prototype.

Navigating Contradictory Stakeholder Input

Navigating Contradictory Stakeholder Input

Real product work means real contradictions. I received input from four different sources, each with a different mental model of what this product should be.

  • The executive wanted a simple drill optimised for completion rate — the metric he reports to partner schools. No gamification, no complexity. “Ship something simple.”

  • The product leader wanted adaptive practice grounded in learning science — embedded diagnostic assessment, strategy instruction on errors, and a progressive difficulty model. She saw the product as “a drill that teaches.”

  • Classroom guides described students who physically shrink when they see fraction work, who leave the room to avoid it, whose progress is destroyed when a peer sees them doing “kid stuff.” Their message: if it looks like it was made for a ten-year-old, they will refuse to use it.

  • Students ranged from defiant (“I already know fractions”) to defeated (“nothing’s helped before”) to guarded (“I couldn’t tell if the app was just giving me easy ones”).

Rather than resolving the contradictions by defaulting to the highest-authority voice, I treated them as complementary constraints. The synthesis pointed toward something that could satisfy all four: an experience that felt simple and mature on the surface, that taught through errors without judging them, and that gave students specific proof of progress rather than empty encouragement.

The design had to hold all of these truths simultaneously.

Simple enough for the executive. Rigorous enough for the product leader. Mature enough for the classroom. Safe enough for the student.

The design had to hold all of these truths simultaneously.

Simple enough for the executive. Rigorous enough for the product leader. Mature enough for the classroom. Safe enough for the student.

User Understanding: Shame as the Primary Design Constraint

User Understanding: Shame as the Primary Design Constraint

I synthesised the stakeholder inputs into a structured user understanding document — not a persona template, but a design-ready analysis that an engineer or AI agent could use to make better decisions. The central insight that emerged:

The target audience isn’t “high schoolers who need fraction practice.”

It’s teenagers who’ve built identity-level defences around a skill they associate with failure.

The target audience isn’t “high schoolers who need fraction practice.”

It’s teenagers who’ve built identity-level defences around a skill they associate with failure.

This reframed every design decision. Motivation isn’t the central problem — shame is. These students don’t lack capability; they have specific gaps from earlier grades that nobody caught, compounded by years of being told they’re “bad at maths.” Every interaction in the product either reinforces that belief or creates a small opening to update it.

From this understanding I derived five non-negotiable design principles:

  • Never expose the grade level of the content. No “Grade 4” labels anywhere in the experience.

  • Wrong answers teach, they don’t judge. No red, no X — show the strategy, not the verdict.

  • Progress equals proof, not celebration. Specific improvement by problem type, not confetti.

  • Visual maturity is non-negotiable. Dark, confident UI that a teenager wouldn’t minimise if someone walked by.

  • Low social risk by default. Nothing on screen identifiable as remedial to a passerby.

Exploring the Design Space: Three Approaches

Exploring the Design Space: Three Approaches

I developed three meaningfully different UX approaches, each rooted in a distinct design philosophy — not as visual variations, but as genuinely different bets on what this audience needed.

A: The Quiet Drill

Behaviorist / Functional. Pure function — problem after problem, wrong answers show a brief correction, progress bar fills. Optimises for build simplicity and completion rate. Aligns with the executive’s vision.

The tradeoff: students don’t quit because the drill is too complicated. They quit because they disengage emotionally. A simpler drill doesn’t fix that.

B: Strategy Reveal

Constructivist / Evidence-Based. Feels like a drill on the surface, but wrong answers trigger strategy instruction specific to the problem type — visual models, reasoning breakdowns. Pedagogically grounded.

The tradeoff: strong on learning science but doesn’t address the identity and shame layer. Without reframing the experience, it risks feeling like school.

C: Prove What You Know ✓

Human-Centred / Ethical. Recommended. Reframes the entire experience around what students can do, not what they can’t. Strategy instruction from B is fully incorporated. Progress tracked by type with honest, specific labels.

It satisfies the executive’s completion metric better than A: students stay longer because they see evidence of capability. It’s the only approach that treats user research as a first-class design input.

The recommendation of Approach C wasn’t a gut call. I mapped each approach to established design philosophies to ground the recommendation in theory rather than preference — making the case to stakeholders on structural grounds, not personal conviction.

A: The Quiet Drill

Behaviorist / Functional. Pure function — problem after problem, wrong answers show a brief correction, progress bar fills. Optimises for build simplicity and completion rate. Aligns with the executive’s vision.

The tradeoff: students don’t quit because the drill is too complicated. They quit because they disengage emotionally. A simpler drill doesn’t fix that.

B: Strategy Reveal

Constructivist / Evidence-Based. Feels like a drill on the surface, but wrong answers trigger strategy instruction specific to the problem type — visual models, reasoning breakdowns. Pedagogically grounded.

The tradeoff: strong on learning science but doesn’t address the identity and shame layer. Without reframing the experience, it risks feeling like school.

C: Prove What You Know ✓

Human-Centred / Ethical. Recommended. Reframes the entire experience around what students can do, not what they can’t. Strategy instruction from B is fully incorporated. Progress tracked by type with honest, specific labels.

It satisfies the executive’s completion metric better than A: students stay longer because they see evidence of capability. It’s the only approach that treats user research as a first-class design input.

Spec-Driven Development: From Design Thinking to Buildable Contract

Spec-Driven Development: From Design Thinking to Buildable Contract

This is where my approach diverged from a traditional design process. Instead of going from wireframes to mockups to developer handoff, I wrote a Spec-Driven Design Document; a formal behavioural specification that served as both design documentation and build instructions.

Why spec-driven?

The traditional handoff model — Figma files, annotated screens, a Jira ticket — leaves too much to interpretation. Developers make hundreds of micro-decisions the mockup doesn’t address: What happens on the third wrong answer? How does the difficulty adjust? What’s the emotional tone of the error state?

A behavioural spec answers those questions before anyone writes code. It defines what the system does, why it does it, and what it must not do — with enough precision that both a human developer and an AI tool can build from it without ambiguity.

The spec structure

Every screen and feature in the document followed the same structure:

  • Research-grounded user stories — not generic “As a user, I want...” statements, but stories that reflect actual emotional states from the research. For example: “As a student who just got a fraction comparison wrong, the next 3 seconds determine whether I learn something or shut down. If I see a red X and ‘Incorrect,’ I’ll tell myself I just wasn’t paying attention. But if I see why my thinking was off — not that I’m wrong, but that there’s a different way to approach this type — I might actually update my understanding.”

  • Design rationale — why each decision works this way for this audience, tied to specific research evidence.

  • Explicit constraints — what the feature must NOT do. No red anywhere. No “Incorrect.” No celebration. No grade-level labels. The constraints are often more important than the requirements.

  • Acceptance feel — one sentence capturing the emotional target. Not acceptance criteria in the engineering sense, but what the student should feel at this moment.

The Tiered Error Handling System

The Tiered Error Handling System

The most detailed part of the spec was the error handling — because the wrong-answer moment is where the entire product succeeds or fails with this audience. I designed a system that classifies errors across three dimensions simultaneously.

Misconception detection

The system identifies why the student likely got it wrong — did they compare numerators directly? Think bigger denominators mean bigger fractions? This runs silently and informs the system’s response, without exposing the classification to the student.

Error frequency

The system tracks how many times a student has gotten each problem type wrong in the current session. First wrong: full strategy explanation with visual model. Second wrong: simpler wording, same visual. Third wrong: just a quick reminder, and difficulty drops automatically. The system doesn’t repeat itself louder — it makes the problem easier so the strategy can land at the right level.

Behavioural signal

Response time tracking distinguishes between a student who’s trying but confused (slow response) and one who’s checked out and guessing (fast response). If the system detects guessing — three fast wrong answers in the last five problems — it gently slows the pacing without announcing it. The next strategy card includes a single quiet italic line: “Take a moment with this one.” No timer, no warning, no surveillance.

None of this is visible to the student. They just feel that the app is patient, honest, and responsive to how they’re doing.

The system doesn’t repeat itself louder.

It makes the problem easier so the strategy can land.

The system doesn’t repeat itself louder.

It makes the problem easier so the strategy can land.

AI-Augmented Prototyping: Spec to Working Product

With the spec complete, I used AI (Claude) as a build partner to generate a fully interactive prototype directly from the specification. The spec was precise enough that AI produced structurally correct output but the design value was entirely in the overrides.

What AI generated

A self-contained HTML/CSS/JS prototype implementing the full experience: calibrated opening sequence, adaptive problem selection, strategy cards with SVG visual models for each problem type, tiered error handling, type-based progress tracking, session summary, keyboard accessibility, and responsive design.

Where I overrode AI output

AI tools generate competent defaults. The design value is in knowing which defaults to challenge:

  • Scroll behaviour: the prototype didn’t scroll to the problem area after “Try one more”, a usability issue that only surfaces through actually using the product, not generating it.

  • The “Streak” label: I questioned whether a counter that resets to zero on wrong answers creates a visible failure moment for an audience already primed for shame.

  • Tiered error handling: the initial prototype treated all wrong answers the same. I asked “what happens when a student gets the same type wrong three times?” and designed the escalating response system.

  • The progress bar pattern: I challenged the design logic behind it, then decided type-based progress tracking served this audience better than a traditional completion bar.

The AI compressed weeks of build time into hours. My judgment ensured that every decision served the audience the research revealed.

AI-Augmented Prototyping: Spec to Working Product

The Result

A working interactive prototype, built from a single behavioural specification document, demonstrating:

  • Zero-onboarding entry — straight to the first problem, no remedial framing, no instructions chrome

  • Emotionally safe error handling — no red, no judgment language, strategy instruction that teaches without shaming

  • Adaptive intelligence — silent misconception detection, tiered responses, guessing detection, difficulty adjustment

  • Honest progress tracking — type-based, specific status labels, no inflated scores or percentage bars

  • Visual maturity — dark, confident interface that respects the student’s age and intelligence

  • Accessible by default — keyboard navigation, screen-reader-friendly fraction display, WCAG 2.1 AA colour contrast

What each screen delivers

Problem display: Two fractions side by side. No instructions, no chrome, just a question. The dark monospace interface feels like a tool, not a classroom app. The top bar shows a minimal streak counter and four type-status pills — compact enough that a glancing classmate sees nothing identifiable as remedial.

Correct answer: A subtle highlight on the chosen fraction. Streak increments. Next problem slides in. No praise, no fanfare — just momentum. If a type status advances, the corresponding pill pulses quietly.

Strategy reveal (first wrong): No red. No “Incorrect.” The correct fraction highlights softly and a strategy card slides up. The header names the problem type — not the error. The card explains the reasoning strategy with a visual model.

Session summary: “Session Complete.” What moved this session. Your types at a glance. No score. No grade. No “Great job.” Just: here’s what you proved today.

The Result

What This Work Taught Me

Spec-driven design changes the conversation. Writing a behavioural spec forced me to resolve ambiguities that a mockup can hide. “Show a strategy card on wrong answers” became a multi-dimensional system with error classification, tiered responses, and guessing detection — because the spec demanded I answer “what exactly happens, and why, in every scenario?” A good spec doesn’t constrain the build. It protects the user research from getting lost in implementation.

AI amplifies design judgment — it doesn’t replace it. The most impactful decisions in this project were human ones: recognising shame as the primary constraint, reframing remediation as progression, designing error states that teach instead of judge, and pushing back on stakeholder assumptions with user evidence. AI built what I specified. I specified what the research demanded.

The designer’s role is evolving, not shrinking. This project convinced me that the designers who thrive in the AI era will be the ones who can define system behaviour precisely enough that both humans and AI can build from it. Not just drawing interfaces, but authoring the contracts that determine what gets built and why. The spec is the new deliverable. Judgment is the durable skill.

Deliverables

  • User Understanding Document — structured synthesis of audience research, design principles, and non-obvious considerations, written to be usable by engineers and AI agents as well as designers

  • Three UX Approaches — meaningfully different strategies mapped to design philosophies (Behaviorist, Constructivist, Human-Centred), with honest tradeoff analysis and a documented recommendation

  • Spec-Driven Design Document — full behavioural specification with research-grounded user stories, design rationale, tiered error handling system, adaptive algorithm logic, and accessibility requirements

  • Working Interactive Prototype — self-contained HTML/CSS/JS, fully functional in-browser, built directly from the spec

  • Design Process Documentation — AI workflow evidence showing where human judgment shaped the outcome

What This Work Taught Me