Kasava
Back to Blog
Engineering

The Complete Edge Architecture Guide (Part 2): How We Fit an AI Platform in 128MB

Ben Gregory-Sep 5, 2025

This is Part 2 of our four-part series on building AI-powered applications on the edge. [Part 1](./part-1-architecture.md*) covered the foundational Cloudflare architecture. In this post, we'll dive into our framework choice and memory optimization strategies. Part 3 explores [our sub-10ms AI pipeline implementation](./part-3-ai-pipeline.md), and Part 4 details [our journey from LangGraph to Mastra](./part-4-from-langgraph-to-mastra.md) for AI orchestration.*

——

So you've decided to go all-in on Cloudflare Workers. You've got your R2 buckets, your queues, your Durable Objects. But now you need to actually build an API. And if you're like us, your first instinct is to reach for Express.

Here's the thing about running on the edge -- everything you know about Node.js frameworks goes out the window. That comfortable Express setup with its 572KB bundle size? It's going to cause a large memory overhead once you add your actual application code. Fastify with its plugin ecosystem? Those plugins assume Node.js APIs that don't exist in Workers.

Why Framework Choice Matters on the Edge

When you're running in a traditional Node.js environment, framework overhead is a rounding error. Your server has gigabytes of RAM, persistent connections, and all the time in the world to initialize. Bundle size? Who cares when you have a 100Mbps connection to your CDN.

But on the edge, every kilobyte counts. Every millisecond of initialization time is multiplied across thousands of cold starts. Every dependency is code that has to be parsed, compiled, and held in memory -- memory you desperately need for your actual application.

Consider this: Cloudflare Workers gives you 128MB of memory. Total. For everything. Your framework, your application code, your in-flight requests, your temporary data structures. Everything.
Now consider that Express alone -- just the framework, no middleware, no actual application -- uses 120MB of memory when running. See the problem?

The Express Problem: When 572KB Is Too Much

Let's be specific about why Express (and Fastify, and Koa, and Hapi) don't work on the edge:

Bundle Size Reality Check

The Polyfill Tax

Since Workers aren't Node.js, you need polyfills for Node.js-specific APIs. But here's the thing -- those polyfills aren't free:

By the time you've polyfilled enough to make Express happy, you've added 200KB+ to your bundle. For functionality you don't even need.

The Memory Problem

Express was designed for long-running servers. It caches routes, maintains middleware state, and builds up internal data structures over time. In a serverless environment where every request might be a cold start, this is pure waste:

Enter Hono: Built for the Edge

Hono takes a completely different approach. Instead of trying to be Node.js-compatible, it embraces web standards. Instead of bundling everything, it's modular. Instead of assuming persistent memory, it's stateless.

ExpressFastifyKoaHono
Bundle Size572KB189KB90KB<14KB
Memory Usage120MB85MB60MB18MB
Cold Start 450ms380ms210ms 120ms
Dependencies50+30+20+0
Edge Workers❌ Needs adapter❌ Needs adapter❌ Needs adapter✅ Native

But it's not just about being smaller. It's about being designed for this environment.

Web Standards First

Hono is built entirely on Web Standards APIs -- the same APIs that Cloudflare Workers implements natively:

Zero Dependencies

Everything Hono needs is either part of the web standards (which Workers provides) or bundled in that tiny 14KB package. No dependency hell. No security vulnerabilities from nested dependencies. No surprises.

TypeScript Native

While Express requires @types/express (and prayers that they match the actual version), Hono is written in TypeScript:

Cloudflare Bindings Integration

As if the efficiency benefits weren't enough: direct, type-safe access to all Cloudflare services.

Middleware Composition

Hono's middleware system is both powerful and efficient:

Dynamic Loading: Managing Memory in a 128MB World

Here's a reality check -- Cloudflare Workers gives you 128MB of memory per execution. That's it. When you're running an AI orchestration framework (Mastra), handling 50+ route modules, and processing real-time data, that 128MB disappears fast.

So how do we make it work? Dynamic loading.

The Problem: Everything Everywhere All at Once

In a traditional Node.js app with Express, you'd import everything at startup:

Do this in Workers and you'll blow through your memory limit before handling a single request.

Our Solution: Load Only What You Need, When You Need It

We implemented a dynamic loading system that treats memory as the precious resource it is:

Per-Request AI Initialization

Here's the clever bit -- Mastra (our AI framework) isn't initialized globally. It's created on-demand, only for routes that actually need it:

The Route Module System

Our route module system is configuration-driven and optimized for edge constraints:

Route Configuration

The Loading Strategy

Routes That Skip AI Initialization

Most routes don't need Mastra at all:

  • Authentication (`/auth/*`): JWT validation, session management

  • Organizations (`/organizations/*`): CRUD operations

  • Repositories (`/repositories/*`): GitHub configuration

  • **Billing **(`/billing/*`): Stripe integration

  • API Keys (`/api-keys/*`): Key management

  • Notifications (`/notifications/*`): Preference management

  • Health Checks (`/health/*`): System status

Only these routes initialize Mastra:

  • Chat (`/chat/*`): AI-powered conversations

  • Bug Analysis (`/bug-analysis/*`): Chrome extension analysis

  • Analytics Enrichment (`/v1/analytics-enrichment/*`): AI insights

  • **GitHub Workflow **(queue consumers): Event processing

Production Patterns That Emerged

After six months in production, here are the patterns that actually work:

1. Lazy Everything

2. Request-Scoped Initialization

3. Granular Module Boundaries

4. Memory-Aware Caching

Lessons Learned

After building a production AI platform with Hono and dynamic loading, here's what we learned:

1. Constraints Drive Innovation

The 128MB limit seemed impossible at first. But it forced us to build a better architecture than we would have with unlimited memory. Every constraint became an opportunity to optimize.

2. Not All Frameworks Are Created Equal 

Framework choice matters exponentially more on the edge than on traditional servers. The difference between Hono and Express isn't incremental -- it's the difference between working and not working.

3. Dynamic Loading Is a Superpower

Loading code on-demand isn't just about saving memory. It's about:- Faster cold starts (less to parse)- Better isolation (routes don't interfere)- Easier debugging (smaller surface area)- Natural code splitting (enforced boundaries)

4. Type Safety Enables Velocity

Hono's TypeScript-first design with CloudflareEnv bindings caught so many bugs at compile time. When your bindings are typed, your routes are typed, and your responses are typed, you ship with confidence.

5. Web Standards Are the Future

By building on Request/Response instead of Node.js APIs, our code is portable. We could move to Deno, Bun, or any future runtime that implements web standards. No lock-in.

The Bottom Line

Hono + dynamic loading let us fit an entire AI platform -- 50+ endpoints, real-time chat, workflow orchestration, semantic search -- into 128MB of memory with sub-50ms response times globally.

Could we have built this with Express? No. The memory constraints alone would have killed the project.

Could we have built it with another edge framework? Maybe, but none match Hono's combination of performance, size, and developer experience.

The future of web development isn't just about moving to the edge. It's about embracing the constraints of the edge and using frameworks designed for this new world. For us, that framework is Hono.

——

Next in the Series


**[Part 3: How We Built a Sub-10ms AI Pipeline on the Edge](./part-3-ai-pipeline.md**)**** - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
**[Part 4: From LangGraph to Mastra - Our AI Orchestration Journey](./part-4-from-langgraph-to-mastra.md**)**** - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.


——

_Want to see Hono in action? Check out our [open-source code](_https://github.com/kasava/kasava) or read more about [our parallel indexing architecture](/blog/parallel-indexing-architecture) that processes 10,000+ files in under 5 minutes.__