How We Built a Self-Enforcing Design System with Claude Code
How We Built a Self-Enforcing Design System with Claude Code
We build products really fast at Kasava thanks to a flat hierarchy and constantly looking for how we can leverage AI in our workflows. This isn't unique to us; it's not hard these days to find a blog post about that using AI for productivity. But what I don't see so much of is the impact this has on product design. There's a real risk when you build products with AI that you introduce inconsistencies in the user experience. Here's how we try to address that.
Design is the area I struggle with most. I'm not being falsely modest. I've spent my career building systems, writing code, architecting backends, working with customers. But put me in front of a Figma file and it might as well be Microsoft Paint. It's incredibly frustrating.
This wouldn't be such a problem except that design profoundly matters. Users don't experience your elegant database schema or your caching strategy. They experience your interface. And if that interface feels off -- even in ways they can't articulate -- they'll leave.
I've lived both extremes of the design system spectrum, and neither works.
On previous projects, we had no design system at all. We'd approximate colors. We'd hardcode padding. The result was death by a thousand paper cuts -- an interface that felt inconsistent without anyone being able to pinpoint exactly why.
At another company, we had a design system so rigid and sprawling that it became actively counterproductive. Every decision required consulting The Document. The Document spawned sub-documents. Clarifications required meetings with the design committee. Interpretations varied between team members. We'd spend weeks in limbo, unable to start projects because we couldn't get clear answers on which guidelines applied to our use case -- but we also couldn't proceed without guidelines, because that was against policy.
The paralysis was real. "Wait, is this a 'dialog' or a 'sheet' per section 4.3.2?"; "I asked the design team and got three different answers..." Projects would stall not because we didn't know what to build, but because we didn't know if we were allowed to build it the way we wanted. That's truly insane.
At Kasava, I wanted to see if we could take the mindset of applying AI to product management and engineering and apply it to design as well. What if we could build a system where Claude Code actively enforces those principles during development -- giving me (the design non-expert) guardrails to work within?
The Design Principles That Actually Get Used
Here's what makes this work for someone like me: our design principles document is only about 300 lines long. That's it.
Why so short? Because every principle we include has to pass one test: can Claude Code actually evaluate whether we're following it?
If a principle isn't specific enough for an AI to evaluate, it's too vague to be useful anyway. The principles that remain are clear, concrete, and actionable -- exactly what a design non-expert needs. The document lives in our code-based and gets as any other configuration file, down to the versioning and all.
The Core Philosophy and Principles We Codified
Here's our core philosophy in one line:
Focus over features. Every screen should answer one question clearly. Remove anything that doesn't directly serve the user's immediate goal.
That's not just a nice sentiment -- it's a constraint that Claude can operationalize. When reviewing UI code, it checks: Is there one primary action per view? Are there 2-3 supporting elements or 15? Is the information hierarchy clear?
We take this philosophy and organize a design system around ten key areas. Here's a glimpse:
1. Information Hierarchy (The 80/20 Rule)
Show 20% of data that drives 80% of decisions. Everything else should be accessible but hidden by default.
Primary: 1 main action or piece of information per view
Secondary: 2-3 supporting elements
Tertiary: Everything else collapsed or on separate pages
2. Progressive Disclosure
Default to less. Show 3-5 items by default, use "View all (X)" links for complete lists.
Good: Show 3 insights with "+4 more" link
Bad: Show all 7 insights at once
3. Whitespace & Density
Target 40-60% content, 40-60% whitespace. Our dashboards were hitting 70%+ content before we codified this.
| Element | Spacing |
|---|---|
| Between major sections | 32px (gap-8) |
| Between cards/items | 16px (gap-4) |
| Within cards | 12-16px padding |
4. Component Limits
Dashboard cards: 6-8 maximum, ideally 3-4 focused sections. List items: 5 max by default. AI suggestions: Top 1-2 prominent, rest behind "show more."
5. Layout Patterns
Single-column focus for task-oriented dashboards reduces cognitive load. Multi-column layouts are reserved for reference/comparison views and dense data tables.
When choosing between dialogs and sheets:
- Use dialogs for informationally dense content (wide reports, multi-section previews)
- Use sheets for quick previews that maintain context, settings, and forms
6. Color & Visual Indicators
Use color sparingly and consistently:
- Red: Critical, error, destructive
- Yellow/Amber: Warning, needs attention
- Green: Success, healthy, complete
- Blue: Info, links, interactive
- Gray: Neutral, muted, secondary
Never use color alone to convey meaning -- always provide text labels or icons alongside color for accessibility.
7. Empty States
Be helpful, not decorative. Explain what would appear here, provide clear action to populate, or hide empty sections entirely.
Good: "No items need attention" [Create your first item →]
Bad: 🎉 Big illustration "Nothing here yet!"
8. Loading States
Use skeleton screens that match content shape instead of generic spinners. Show available content immediately and load heavy/secondary content after.
9. Navigation Principles
Don't duplicate -- if something is in the sidebar navigation, it doesn't need a dashboard card. Reduce clicks with direct links to specific items and deep linking to exact state.
10. Design Checklist
Before shipping any UI, verify:
- Can I remove any element without losing core value?
- Is the primary action obvious within 2 seconds?
- Are lists limited to 5 items or fewer by default?
- Is there enough whitespace between sections?
- Does it work without color (accessibility)?
These aren't arbitrary numbers -- they're constraints derived from studying Linear, Vercel, Stripe, and Shopify. And they're specific enough that an AI can evaluate them.
The Anti-Patterns We Avoid
Our principles document doesn't just say what to do -- it explicitly calls out what not to do:
| Anti-Pattern | Better Approach |
|---|---|
| Show everything at once | Progressive disclosure |
| Heavy card borders/shadows | Subtle backgrounds |
| Multiple CTAs competing | Single primary action |
| Decorative empty states | Actionable guidance |
| Color-only status | Color + icon/text |
| Spinner for all loading | Skeleton screens |
| Duplicate nav items | Single source of truth |
The agent knows these too. When it sees a page with three competing call-to-action buttons, it flags it. When it detects a status indicator using only color (accessibility violation), it reports it.
How Claude Code Interprets Design Principles
This is where it gets interesting. We have a dedicated design-review agent that uses Playwright to actually interact with our UI and evaluate it against our principles.
Here's how the workflow operates:
Phase 0: Preparation
The agent analyzes the PR description, reviews the code diff, and spins up a live preview environment. It sets the viewport to 1440x900 (our desktop baseline) and prepares for systematic evaluation.
Phase 1: Interaction and User Flow
This isn't static analysis -- the agent actually clicks buttons, fills forms, and tests hover states. It executes the primary user flow and assesses:
- Are destructive actions properly confirmed?
- How does perceived performance feel?
- Do interactive states (hover, active, disabled) work correctly?
Phase 2: Responsiveness Testing
The agent resizes the browser to three viewports:
- Desktop (1440px)
- Tablet (768px)
- Mobile (375px)
At each breakpoint, it captures screenshots and verifies: no horizontal scrolling, no element overlap, layouts adapt appropriately.
Phase 3: Visual Polish
Here's where our principles get evaluated directly:
- Layout alignment and spacing consistency (checking against our spacing scale)
- Typography hierarchy (is there clear visual priority?)
- Color palette consistency (are we using design tokens?)
- Visual hierarchy guides user attention
Phase 4: Accessibility (WCAG 2.1 AA)
The agent tests:
- Complete keyboard navigation (Tab order)
- Visible focus states on all interactive elements
- Semantic HTML usage
- Form labels and associations
- Color contrast ratios (4.5:1 minimum)
This catches things humans miss. I can't tell you how many times it's flagged a focus state that looks fine visually but doesn't meet contrast requirements.
Phase 5: Robustness Testing
Forms get filled with invalid inputs. Content gets stressed with overflow scenarios. Loading, empty, and error states are verified.
The Report
Everything gets categorized into:
- [Blocker]: Critical failures requiring immediate fix
- [High-Priority]: Significant issues to fix before merge
- [Medium-Priority]: Improvements for follow-up
- [Nitpick]: Minor aesthetic details
The agent provides screenshots as evidence. It starts with what works well. And crucially, it describes problems and their impact, not prescriptive solutions.
Instead of: "Change margin to 16px"
It says: "The spacing feels inconsistent with adjacent elements, creating visual clutter that violates the 32px section spacing guideline."
The difference matters. We retain agency over how to fix something while being clear about what the problem is.
The Research Feedback Loop
Here's the part that really changed our workflow: we can iterate on our design principles based on real-world research, and those changes propagate immediately.
We built a /mobbin-research command that uses AI-powered browser automation to scrape design patterns from Mobbin. When someone says "hey, I saw this interesting pattern in Notion's empty states," we can:
- Research it systematically across 10-20 examples
- Extract common patterns and platform distributions
- Generate insights and recommendations
- Propose updates to our design principles
The Human Element
I want to be clear: this system augments human judgment, it doesn't replace it.
Our design review agent produces a report. Sometimes we override its recommendations. Sometimes a "violation" is actually intentional -- a deliberate design choice that breaks convention for good reason.
The agent categorizes issues as blockers, high-priority, medium-priority, and nitpicks. It provides evidence. It explains impact. But the final call remains human.
It's not that the system makes me a better designer. It's that it makes design decisions tractable for someone who isn't a designer. That's a meaningful difference.
Getting Started
If you want to implement something similar, here's my advice:
1. Start with principles you can evaluate
"Design should be beautiful" is not enforceable. "Dashboard cards should be limited to 6-8 maximum" is. Write principles specific enough that you could explain the evaluation criteria to a junior developer.
2. Codify your principles in a single file
We use DESIGN_PRINCIPLES.md at the root of our repo. It's in version control. Changes are reviewed like code changes. This is your source of truth.
3. Build the review workflow
Create a design review agent or command that:
- Ingests your principles
- Spins up a live preview
- Tests against specific criteria
- Reports violations with evidence
4. Close the feedback loop
Research trends. Test new approaches. Update your principles. Revert what doesn't work. The system should evolve with your learning. Treat it like any other engineering system. If it doesn't improve with changes, revert back.
5. Keep humans in the loop
The agent reports. Humans decide. Override when appropriate. Trust the process but trust judgment more.
What's Next
The industry is moving toward AI-powered, self-enforcing design systems. Design token coverage is skyrocketing. Living design systems are becoming the norm. The tools exist today to build systems that maintain themselves.
But the real innovation isn't the technology. It's the workflow. By treating design principles as code -- versioned, testable, deployable, revertable -- we've created a system that learns and evolves with us.
Here's what I've learned: you don't need to be a design expert to build a good-looking product. You need clear, specific principles that can be evaluated. You need a system that enforces them without bureaucracy. And you need the humility to accept feedback from an automated reviewer that catches what you miss.
I'll never have the intuition of a trained designer. I'll probably always spend too long picking colors. But with this system, I can ship interfaces that are consistent, accessible, and aligned with modern design patterns -- without waiting for approval from a design committee or guessing what section 4.3.2 of our design bible really means.
The 47-page design document nobody reads? That's the past. A living, breathing set of principles that enforce themselves while staying out of your way? That's the present. And it's more achievable than you might think.
Building something similar? Want to compare notes? Find me on LinkedIn at benjamin-gregory. Always happy to talk shop about design systems, AI-powered tooling, or why despite all this automation, picking the right shade of blue is still somehow the hardest part of the job.