The Complete Edge Architecture Guide (Part 2): How We Fit an AI Platform in 128MB
The Complete Edge Architecture Guide (Part 2): How We Fit an AI Platform in 128MB
Jane Cooper
•
Sep 5, 2025
This is Part 2 of our four-part series on building AI-powered applications on the edge. [Part 1](./part-1-architecture.md) covered the foundational Cloudflare architecture. In this post, we'll dive into our framework choice and memory optimization strategies. Part 3 explores [our sub-10ms AI pipeline implementation](./part-3-ai-pipeline.md), and Part 4 details [our journey from LangGraph to Mastra](./part-4-from-langgraph-to-mastra.md) for AI orchestration.
——
So you've decided to go all-in on Cloudflare Workers. You've got your R2 buckets, your queues, your Durable Objects. But now you need to actually build an API. And if you're like us, your first instinct is to reach for Express.
Here's the thing about running on the edge -- everything you know about Node.js frameworks goes out the window. That comfortable Express setup with its 572KB bundle size? It's going to cause a large memory overhead once you add your actual application code. Fastify with its plugin ecosystem? Those plugins assume Node.js APIs that don't exist in Workers.
Why Framework Choice Matters on the Edge
When you're running in a traditional Node.js environment, framework overhead is a rounding error. Your server has gigabytes of RAM, persistent connections, and all the time in the world to initialize. Bundle size? Who cares when you have a 100Mbps connection to your CDN.
But on the edge, every kilobyte counts. Every millisecond of initialization time is multiplied across thousands of cold starts. Every dependency is code that has to be parsed, compiled, and held in memory -- memory you desperately need for your actual application.
Consider this: Cloudflare Workers gives you 128MB of memory. Total. For everything. Your framework, your application code, your in-flight requests, your temporary data structures. Everything.
Now consider that Express alone -- just the framework, no middleware, no actual application -- uses 120MB of memory when running. See the problem?
The Express Problem: When 572KB Is Too Much
Let's be specific about why Express (and Fastify, and Koa, and Hapi) don't work on the edge:
Bundle Size Reality Check
// What you think you're importing
import express from 'express';
// What you're actually getting
// - 572KB of minified JavaScript
// - 50+ dependencies
// - Node.js-specific code that needs polyfills
// - Middleware system that assumes persistent memory
// - Router that wasn't built for cold starts
The Polyfill Tax
Since Workers aren't Node.js, you need polyfills for Node.js-specific APIs. But here's the thing -- those polyfills aren't free:
// Express needs these Node.js APIs
import { Buffer } from 'buffer'; // +45KB
import { Stream } from 'stream'; // +30KB
import { EventEmitter } from 'events'; // +12KB
import process from 'process'; // +25KB// ... and 20 more
By the time you've polyfilled enough to make Express happy, you've added 200KB+ to your bundle. For functionality you don't even need.
The Memory Problem
Express was designed for long-running servers. It caches routes, maintains middleware state, and builds up internal data structures over time. In a serverless environment where every request might be a cold start, this is pure waste:
// Express internals (simplified)
class Express { constructor() {
this._router = new Router(); // Complex routing table this.cache = {}; // Route cache this.engines = {}; // Template engines this.settings = {}; // App settings this.locals = {}; // App-level variables // ... lots more state }}
// Memory usage after initialization: ~120MB// Memory actually needed for a single request: ~5MB```
Enter Hono: Built for the Edge
Hono takes a completely different approach. Instead of trying to be Node.js-compatible, it embraces web standards. Instead of bundling everything, it's modular. Instead of assuming persistent memory, it's stateless.
Express | Fastify | Koa | Hono | |
---|---|---|---|---|
Bundle Size | 572KB | 189KB | 90KB | <14KB |
Memory Usage | 120MB | 85MB | 60MB | 18MB |
Cold Start | 450ms | 380ms | 210ms | 120ms |
Dependencies | 50+ | 30+ | 20+ | 0 |
Edge Workers | ❌ Needs adapter | ❌ Needs adapter | ❌ Needs adapter | ✅ Native |
But it's not just about being smaller. It's about being designed for this environment.
Web Standards First
Hono is built entirely on Web Standards APIs -- the same APIs that Cloudflare Workers implements natively:
// Express way (needs polyfills)
app.get('/users/:id', (req, res) => {
const id = req.params.id;
res.json({ id });
});
// Hono way (pure web standards)
app.get('/users/:id', (c) => {
const id = c.req.param('id');
return c.json({ id });
});
Zero Dependencies
Everything Hono needs is either part of the web standards (which Workers provides) or bundled in that tiny 14KB package. No dependency hell. No security vulnerabilities from nested dependencies. No surprises.
TypeScript Native
While Express requires @types/express
(and prayers that they match the actual version), Hono is written in TypeScript:
// Full type safety out of the boximport { Hono } from 'hono';import type { CloudflareEnv } from './types';
const app = new Hono<{ Bindings: CloudflareEnv; Variables: { user: AuthUser; correlationId: string; };}>();
// TypeScript knows about your bindings!app.get('/data', async (c) => { const data = await c.env.MY_KV.get('key'); // ✅ Fully typed const user = c.get('user'); // ✅ Type: AuthUser return c.json({ data, user }); // ✅ Response type inferred});
Cloudflare Bindings Integration
As if the efficiency benefits weren't enough: direct, type-safe access to all Cloudflare services.
// Type definitions for our bindings
interface CloudflareEnv {
// Databases
DB: D1Database;
// Storage
KASAVA_DOCUMENTS: R2Bucket;
KASAVA_RECORDINGS: R2Bucket;
// Queues
GITHUB_EVENT_QUEUE: Queue;
INDEXING_QUEUE: Queue;
// KV Namespaces
SESSION_CACHE: KVNamespace;
EMBEDDING_CACHE: KVNamespace;
// Durable Objects
CHAT_SESSIONS: DurableObjectNamespace;
// Secrets
ANTHROPIC_API_KEY: string;
VOYAGE_API_KEY: string;
}
// Using bindings in routes
app.post('/documents/upload', async (c) => {
const file = await c.req.blob();
// Direct R2 access - no configuration needed!
await c.env.KASAVA_DOCUMENTS.put(
`docs/${crypto.randomUUID()}`,
file,
{
httpMetadata: {
contentType: file.type,
}
}
);
return c.json({ success: true });
});
Middleware Composition
Hono's middleware system is both powerful and efficient:
// Custom middleware for API key tracking
const apiKeyTracking = (): MiddlewareHandler => {
return async (c, next) => {
const apiKey = c.req.header('X-API-Key');
if (apiKey) { // Track usage in KV
const key = `usage:${apiKey}:${new Date().toISOString().split('T')[0]}`;
const count = await c.env.API_USAGE.get(key);
await c.env.API_USAGE.put(key, String((parseInt(count || '0') + 1)));
}
await next();
};
};
// Compose middleware for specific routes
api.use('/v1/*', apiKeyTracking());
api.use('/v1/*', validateApiKey());
api.use('/v1/*', checkRateLimit());
Dynamic Loading: Managing Memory in a 128MB World
Here's a reality check -- Cloudflare Workers gives you 128MB of memory per execution. That's it. When you're running an AI orchestration framework (Mastra), handling 50+ route modules, and processing real-time data, that 128MB disappears fast.
So how do we make it work? Dynamic loading.
The Problem: Everything Everywhere All at Once
In a traditional Node.js app with Express, you'd import everything at startup:
// Traditional approach - loads EVERYTHING
import { auth } from './routes/auth';
import { billing } from './routes/billing';
import { chat } from './routes/chat';
import { analytics } from './routes/analytics';
import { repositories } from './routes/repositories';
import { organizations } from './routes/organizations';
import { Mastra } from '@mastra/core'; // ... 50 more imports
const app = express();const mastra = new Mastra({ /* config */ });
// Mount all routes
app.use('/auth', auth);app.use('/billing', billing); // ... etc
// Memory usage: 80MB+ just from imports!
Do this in Workers and you'll blow through your memory limit before handling a single request.
Our Solution: Load Only What You Need, When You Need It
We implemented a dynamic loading system that treats memory as the precious resource it is:
// Route configuration with metadata
export const API_ROUTES: RouteConfig[] = [
{
path: '/auth',
module: '@/routes/api/auth/auth.route',
export: 'auth', requiresAI: false, // No Mastra needed }, { path: '/chat', module: '@/routes/api/chat/chat.route', export: 'chat', requiresAI: true, // Initialize Mastra }, { path: '/repositories', module: '@/routes/api/repositories/repositories.route', export: 'repositories', requiresAI: false, }, // ... 50+ routes, only 4 need AI];
// Dynamic route loaderexport async function loadRouteModule(modulePath: string): Promise<any> { // Check cache first if (routeCache.has(modulePath)) { return routeCache.get(modulePath); } // Static imports for Workers compatibility // (Dynamic string imports don't work in Workers) let module; switch (modulePath) { case '@/routes/api/auth/auth.route': module = await import('@/routes/api/auth/auth.route'); break; case '@/routes/api/chat/chat.route': module = await import('@/routes/api/chat/chat.route'); break; // ... other routes } // Cache for this request lifecycle if (module) { routeCache.set(modulePath, module); } return module;}
Per-Request AI Initialization
Here's the clever bit -- Mastra (our AI framework) isn't initialized globally. It's created on-demand, only for routes that actually need it:
// Mastra instance management
let mastraInstance: Mastra | undefined;
export async function ensureMastraInitialized(env: Env): Promise<Mastra> {
if (!mastraInstance) {
// Lightweight instance creation
mastraInstance = new Mastra({
executionEngine: 'event-based',
// 40% performance boost
logger: false,
// Disable heavy logging
systemHostURL: env.MASTRA_SYSTEM_HOST_URL,
// Only initialize what we need
workflows: {
chat: () => import('./workflows/chat'),
github: () => import('./workflows/github'),
},
})
}
// In routes that DON'T need AI (most of them!)
// auth.route.ts
export const auth = new Hono()
.get('/session', async (c) => {
// No Mastra initialization here!
const user = await getUser(c.env);
return c.json({ user }); });
// In routes that DO need AI
// chat.route.ts
export const chat = new Hono()
.post('/stream', async (c) => {
// Initialize Mastra only when needed
const mastra = await ensureMastraInitialized(c.env);
const workflow = mastra.getWorkflow('chat');
// ... use AI features
});
The Route Module System
Our route module system is configuration-driven and optimized for edge constraints:
Route Configuration
// routes.config.ts
interface RouteConfig { path: string; // URL path prefix
module: string; // Module to import
export: string; // Named export to use
requiresAI?: boolean; // Needs Mastra?
requiresAuth?: boolean; // Needs authentication?
rateLimit?: number; // Requests per minute
}
export const API_ROUTES: RouteConfig[] = [
// Authentication & User Management
{ path: '/auth', module: 'auth/auth.route', export: 'auth' },
{ path: '/users', module: 'users/users.route', export: 'users' },
// Core Platform Features (no AI needed)
{ path: '/organizations', module: 'organizations/organizations.route', export: 'organizations' },
{ path: '/repositories', module: 'repositories/repositories.route', export: 'repositories' },
{ path: '/billing', module: 'billing/billing.route', export: 'billing' },
{ path: '/api-keys', module: 'api-keys/api-keys.route', export: 'apiKeys' },
// AI-Powered Features (initialize Mastra)
{ path: '/chat', module: 'chat/chat.route', export: 'chat', requiresAI: true },
{ path: '/bug-analysis', module: 'bug-analysis/bug-analysis.route', export: 'bugAnalysis', requiresAI: true },
{ path: '/analytics-enrichment', module: 'analytics/enrichment.route', export: 'enrichment', requiresAI: true },
// Webhook Handlers
{ path: '/webhooks/github', module: 'webhooks/github.route', export: 'githubWebhooks' },
{ path: '/webhooks/stripe', module: 'webhooks/stripe.route', export: 'stripeWebhooks' },
];
The Loading Strategy
// Route handler with intelligent loading
app.all('/*', async (c) => {
const startTime = Date.now();
const path = c.req.path;
// Find matching route configuration
const routeConfig = API_ROUTES.find(r => path.startsWith(r.path));
if (!routeConfig) { return c.notFound(); }
try {
// Load only the specific route module needed
const module = await loadRouteModule(routeConfig.module);
const router = module[routeConfig.export];
// Initialize AI only if needed
if (routeConfig.requiresAI) {
await ensureMastraInitialized(c.env);
}
// Apply route-specific middleware
if (routeConfig.requiresAuth) {
const authResult = await authenticate(c);
if (!authResult.success) {
return c.json({ error: 'Unauthorized' }, 401);
}
}
// Execute the route handler
const response = await router.fetch(c.req.raw, c.env, c.executionCtx);
// Log performance metrics
const duration = Date.now() - startTime;
console.log(`[route:${path}] completed in ${duration}ms`);
return response;
} catch (error) {
console.error(`[route:${path}] error:`, error);
return c.json({ error: 'Internal Server Error' }, 500);
}
});
Routes That Skip AI Initialization
Most routes don't need Mastra at all:
Authentication (`/auth/*`): JWT validation, session management
Organizations (`/organizations/*`): CRUD operations
Repositories (`/repositories/*`): GitHub configuration
Billing (`/billing/*`): Stripe integration
API Keys (`/api-keys/*`): Key management
Notifications (`/notifications/*`): Preference management
Health Checks (`/health/*`): System status
Only these routes initialize Mastra:
Chat (`/chat/*`): AI-powered conversations
Bug Analysis (`/bug-analysis/*`): Chrome extension analysis
Analytics Enrichment (`/v1/analytics-enrichment/*`): AI insights
GitHub Workflow (queue consumers): Event processing
Production Patterns That Emerged
After six months in production, here are the patterns that actually work:
1. Lazy Everything
// Don't do this
import { heavyLibrary } from 'heavy-library';
// Do this
const getHeavyLibrary = async () => {
const { heavyLibrary } = await import('heavy-library');
return heavyLibrary;
};
// Use only when needed
app.post('/process', async (c) => {
const lib = await getHeavyLibrary();
// Now use it
});
2. Request-Scoped Initialization
// Not global state
let globalMastra: Mastra; // ❌ Persists across requests
// Request-scoped state
app.use('*', async (c, next) => {
// Fresh for each request
c.set('requestId', crypto.randomUUID());
c.set('startTime', Date.now());
await next();
});
3. Granular Module Boundaries
// Instead of one giant module
// ❌ repositories.route.ts (500 lines)
// Break it down
// ✅ repositories
// list.route.ts (50 lines)
// get.route.ts (30 lines)
// create.route.ts (60 lines)
// update.route.ts (40 lines)
// delete.route.ts (30 lines)
4. Memory-Aware Caching
// Simple in-memory cache with size limits
class BoundedCache<T> {
private cache = new Map<string, T>(); private maxSize: number;
constructor(maxSize = 100) {
this.maxSize = maxSize; }
set(key: string, value: T) {
// Evict oldest if at capacity
if (this.cache.size >= this.maxSize) {
const firstKey = this.cache.keys().next().value;
this.cache.delete(firstKey);
}
this.cache.set(key, value);
}
get(key: string): T | undefined {
return this.cache.get(key);
}
}
// Use bounded caches for request-scoped data
const embedingCache = new BoundedCache<Float32Array>(50);
Lessons Learned
After building a production AI platform with Hono and dynamic loading, here's what we learned:
1. Constraints Drive Innovation
The 128MB limit seemed impossible at first. But it forced us to build a better architecture than we would have with unlimited memory. Every constraint became an opportunity to optimize.
2. Not All Frameworks Are Created Equal
Framework choice matters exponentially more on the edge than on traditional servers. The difference between Hono and Express isn't incremental -- it's the difference between working and not working.
3. Dynamic Loading Is a Superpower
Loading code on-demand isn't just about saving memory. It's about:- Faster cold starts (less to parse)- Better isolation (routes don't interfere)- Easier debugging (smaller surface area)- Natural code splitting (enforced boundaries)
4. Type Safety Enables Velocity
Hono's TypeScript-first design with CloudflareEnv bindings caught so many bugs at compile time. When your bindings are typed, your routes are typed, and your responses are typed, you ship with confidence.
5. Web Standards Are the Future
By building on Request/Response instead of Node.js APIs, our code is portable. We could move to Deno, Bun, or any future runtime that implements web standards. No lock-in.
The Bottom Line
Hono + dynamic loading let us fit an entire AI platform -- 50+ endpoints, real-time chat, workflow orchestration, semantic search -- into 128MB of memory with sub-50ms response times globally.
Could we have built this with Express? No. The memory constraints alone would have killed the project.
Could we have built it with another edge framework? Maybe, but none match Hono's combination of performance, size, and developer experience.
The future of web development isn't just about moving to the edge. It's about embracing the constraints of the edge and using frameworks designed for this new world. For us, that framework is Hono.
——
Next in the Series
**[Part 3: How We Built a Sub-10ms AI Pipeline on the Edge](./part-3-ai-pipeline.md)** - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
**[Part 4: From LangGraph to Mastra - Our AI Orchestration Journey](./part-4-from-langgraph-to-mastra.md)** - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.
——
_Want to see Hono in action? Check out our [open-source code](https://github.com/kasava/kasava) or read more about [our parallel indexing architecture](/blog/parallel-indexing-architecture) that processes 10,000+ files in under 5 minutes._
This is Part 2 of our four-part series on building AI-powered applications on the edge. [Part 1](./part-1-architecture.md) covered the foundational Cloudflare architecture. In this post, we'll dive into our framework choice and memory optimization strategies. Part 3 explores [our sub-10ms AI pipeline implementation](./part-3-ai-pipeline.md), and Part 4 details [our journey from LangGraph to Mastra](./part-4-from-langgraph-to-mastra.md) for AI orchestration.
——
So you've decided to go all-in on Cloudflare Workers. You've got your R2 buckets, your queues, your Durable Objects. But now you need to actually build an API. And if you're like us, your first instinct is to reach for Express.
Here's the thing about running on the edge -- everything you know about Node.js frameworks goes out the window. That comfortable Express setup with its 572KB bundle size? It's going to cause a large memory overhead once you add your actual application code. Fastify with its plugin ecosystem? Those plugins assume Node.js APIs that don't exist in Workers.
Why Framework Choice Matters on the Edge
When you're running in a traditional Node.js environment, framework overhead is a rounding error. Your server has gigabytes of RAM, persistent connections, and all the time in the world to initialize. Bundle size? Who cares when you have a 100Mbps connection to your CDN.
But on the edge, every kilobyte counts. Every millisecond of initialization time is multiplied across thousands of cold starts. Every dependency is code that has to be parsed, compiled, and held in memory -- memory you desperately need for your actual application.
Consider this: Cloudflare Workers gives you 128MB of memory. Total. For everything. Your framework, your application code, your in-flight requests, your temporary data structures. Everything.
Now consider that Express alone -- just the framework, no middleware, no actual application -- uses 120MB of memory when running. See the problem?
The Express Problem: When 572KB Is Too Much
Let's be specific about why Express (and Fastify, and Koa, and Hapi) don't work on the edge:
Bundle Size Reality Check
// What you think you're importing
import express from 'express';
// What you're actually getting
// - 572KB of minified JavaScript
// - 50+ dependencies
// - Node.js-specific code that needs polyfills
// - Middleware system that assumes persistent memory
// - Router that wasn't built for cold starts
The Polyfill Tax
Since Workers aren't Node.js, you need polyfills for Node.js-specific APIs. But here's the thing -- those polyfills aren't free:
// Express needs these Node.js APIs
import { Buffer } from 'buffer'; // +45KB
import { Stream } from 'stream'; // +30KB
import { EventEmitter } from 'events'; // +12KB
import process from 'process'; // +25KB// ... and 20 more
By the time you've polyfilled enough to make Express happy, you've added 200KB+ to your bundle. For functionality you don't even need.
The Memory Problem
Express was designed for long-running servers. It caches routes, maintains middleware state, and builds up internal data structures over time. In a serverless environment where every request might be a cold start, this is pure waste:
// Express internals (simplified)
class Express { constructor() {
this._router = new Router(); // Complex routing table this.cache = {}; // Route cache this.engines = {}; // Template engines this.settings = {}; // App settings this.locals = {}; // App-level variables // ... lots more state }}
// Memory usage after initialization: ~120MB// Memory actually needed for a single request: ~5MB```
Enter Hono: Built for the Edge
Hono takes a completely different approach. Instead of trying to be Node.js-compatible, it embraces web standards. Instead of bundling everything, it's modular. Instead of assuming persistent memory, it's stateless.
Express | Fastify | Koa | Hono | |
---|---|---|---|---|
Bundle Size | 572KB | 189KB | 90KB | <14KB |
Memory Usage | 120MB | 85MB | 60MB | 18MB |
Cold Start | 450ms | 380ms | 210ms | 120ms |
Dependencies | 50+ | 30+ | 20+ | 0 |
Edge Workers | ❌ Needs adapter | ❌ Needs adapter | ❌ Needs adapter | ✅ Native |
But it's not just about being smaller. It's about being designed for this environment.
Web Standards First
Hono is built entirely on Web Standards APIs -- the same APIs that Cloudflare Workers implements natively:
// Express way (needs polyfills)
app.get('/users/:id', (req, res) => {
const id = req.params.id;
res.json({ id });
});
// Hono way (pure web standards)
app.get('/users/:id', (c) => {
const id = c.req.param('id');
return c.json({ id });
});
Zero Dependencies
Everything Hono needs is either part of the web standards (which Workers provides) or bundled in that tiny 14KB package. No dependency hell. No security vulnerabilities from nested dependencies. No surprises.
TypeScript Native
While Express requires @types/express
(and prayers that they match the actual version), Hono is written in TypeScript:
// Full type safety out of the boximport { Hono } from 'hono';import type { CloudflareEnv } from './types';
const app = new Hono<{ Bindings: CloudflareEnv; Variables: { user: AuthUser; correlationId: string; };}>();
// TypeScript knows about your bindings!app.get('/data', async (c) => { const data = await c.env.MY_KV.get('key'); // ✅ Fully typed const user = c.get('user'); // ✅ Type: AuthUser return c.json({ data, user }); // ✅ Response type inferred});
Cloudflare Bindings Integration
As if the efficiency benefits weren't enough: direct, type-safe access to all Cloudflare services.
// Type definitions for our bindings
interface CloudflareEnv {
// Databases
DB: D1Database;
// Storage
KASAVA_DOCUMENTS: R2Bucket;
KASAVA_RECORDINGS: R2Bucket;
// Queues
GITHUB_EVENT_QUEUE: Queue;
INDEXING_QUEUE: Queue;
// KV Namespaces
SESSION_CACHE: KVNamespace;
EMBEDDING_CACHE: KVNamespace;
// Durable Objects
CHAT_SESSIONS: DurableObjectNamespace;
// Secrets
ANTHROPIC_API_KEY: string;
VOYAGE_API_KEY: string;
}
// Using bindings in routes
app.post('/documents/upload', async (c) => {
const file = await c.req.blob();
// Direct R2 access - no configuration needed!
await c.env.KASAVA_DOCUMENTS.put(
`docs/${crypto.randomUUID()}`,
file,
{
httpMetadata: {
contentType: file.type,
}
}
);
return c.json({ success: true });
});
Middleware Composition
Hono's middleware system is both powerful and efficient:
// Custom middleware for API key tracking
const apiKeyTracking = (): MiddlewareHandler => {
return async (c, next) => {
const apiKey = c.req.header('X-API-Key');
if (apiKey) { // Track usage in KV
const key = `usage:${apiKey}:${new Date().toISOString().split('T')[0]}`;
const count = await c.env.API_USAGE.get(key);
await c.env.API_USAGE.put(key, String((parseInt(count || '0') + 1)));
}
await next();
};
};
// Compose middleware for specific routes
api.use('/v1/*', apiKeyTracking());
api.use('/v1/*', validateApiKey());
api.use('/v1/*', checkRateLimit());
Dynamic Loading: Managing Memory in a 128MB World
Here's a reality check -- Cloudflare Workers gives you 128MB of memory per execution. That's it. When you're running an AI orchestration framework (Mastra), handling 50+ route modules, and processing real-time data, that 128MB disappears fast.
So how do we make it work? Dynamic loading.
The Problem: Everything Everywhere All at Once
In a traditional Node.js app with Express, you'd import everything at startup:
// Traditional approach - loads EVERYTHING
import { auth } from './routes/auth';
import { billing } from './routes/billing';
import { chat } from './routes/chat';
import { analytics } from './routes/analytics';
import { repositories } from './routes/repositories';
import { organizations } from './routes/organizations';
import { Mastra } from '@mastra/core'; // ... 50 more imports
const app = express();const mastra = new Mastra({ /* config */ });
// Mount all routes
app.use('/auth', auth);app.use('/billing', billing); // ... etc
// Memory usage: 80MB+ just from imports!
Do this in Workers and you'll blow through your memory limit before handling a single request.
Our Solution: Load Only What You Need, When You Need It
We implemented a dynamic loading system that treats memory as the precious resource it is:
// Route configuration with metadata
export const API_ROUTES: RouteConfig[] = [
{
path: '/auth',
module: '@/routes/api/auth/auth.route',
export: 'auth', requiresAI: false, // No Mastra needed }, { path: '/chat', module: '@/routes/api/chat/chat.route', export: 'chat', requiresAI: true, // Initialize Mastra }, { path: '/repositories', module: '@/routes/api/repositories/repositories.route', export: 'repositories', requiresAI: false, }, // ... 50+ routes, only 4 need AI];
// Dynamic route loaderexport async function loadRouteModule(modulePath: string): Promise<any> { // Check cache first if (routeCache.has(modulePath)) { return routeCache.get(modulePath); } // Static imports for Workers compatibility // (Dynamic string imports don't work in Workers) let module; switch (modulePath) { case '@/routes/api/auth/auth.route': module = await import('@/routes/api/auth/auth.route'); break; case '@/routes/api/chat/chat.route': module = await import('@/routes/api/chat/chat.route'); break; // ... other routes } // Cache for this request lifecycle if (module) { routeCache.set(modulePath, module); } return module;}
Per-Request AI Initialization
Here's the clever bit -- Mastra (our AI framework) isn't initialized globally. It's created on-demand, only for routes that actually need it:
// Mastra instance management
let mastraInstance: Mastra | undefined;
export async function ensureMastraInitialized(env: Env): Promise<Mastra> {
if (!mastraInstance) {
// Lightweight instance creation
mastraInstance = new Mastra({
executionEngine: 'event-based',
// 40% performance boost
logger: false,
// Disable heavy logging
systemHostURL: env.MASTRA_SYSTEM_HOST_URL,
// Only initialize what we need
workflows: {
chat: () => import('./workflows/chat'),
github: () => import('./workflows/github'),
},
})
}
// In routes that DON'T need AI (most of them!)
// auth.route.ts
export const auth = new Hono()
.get('/session', async (c) => {
// No Mastra initialization here!
const user = await getUser(c.env);
return c.json({ user }); });
// In routes that DO need AI
// chat.route.ts
export const chat = new Hono()
.post('/stream', async (c) => {
// Initialize Mastra only when needed
const mastra = await ensureMastraInitialized(c.env);
const workflow = mastra.getWorkflow('chat');
// ... use AI features
});
The Route Module System
Our route module system is configuration-driven and optimized for edge constraints:
Route Configuration
// routes.config.ts
interface RouteConfig { path: string; // URL path prefix
module: string; // Module to import
export: string; // Named export to use
requiresAI?: boolean; // Needs Mastra?
requiresAuth?: boolean; // Needs authentication?
rateLimit?: number; // Requests per minute
}
export const API_ROUTES: RouteConfig[] = [
// Authentication & User Management
{ path: '/auth', module: 'auth/auth.route', export: 'auth' },
{ path: '/users', module: 'users/users.route', export: 'users' },
// Core Platform Features (no AI needed)
{ path: '/organizations', module: 'organizations/organizations.route', export: 'organizations' },
{ path: '/repositories', module: 'repositories/repositories.route', export: 'repositories' },
{ path: '/billing', module: 'billing/billing.route', export: 'billing' },
{ path: '/api-keys', module: 'api-keys/api-keys.route', export: 'apiKeys' },
// AI-Powered Features (initialize Mastra)
{ path: '/chat', module: 'chat/chat.route', export: 'chat', requiresAI: true },
{ path: '/bug-analysis', module: 'bug-analysis/bug-analysis.route', export: 'bugAnalysis', requiresAI: true },
{ path: '/analytics-enrichment', module: 'analytics/enrichment.route', export: 'enrichment', requiresAI: true },
// Webhook Handlers
{ path: '/webhooks/github', module: 'webhooks/github.route', export: 'githubWebhooks' },
{ path: '/webhooks/stripe', module: 'webhooks/stripe.route', export: 'stripeWebhooks' },
];
The Loading Strategy
// Route handler with intelligent loading
app.all('/*', async (c) => {
const startTime = Date.now();
const path = c.req.path;
// Find matching route configuration
const routeConfig = API_ROUTES.find(r => path.startsWith(r.path));
if (!routeConfig) { return c.notFound(); }
try {
// Load only the specific route module needed
const module = await loadRouteModule(routeConfig.module);
const router = module[routeConfig.export];
// Initialize AI only if needed
if (routeConfig.requiresAI) {
await ensureMastraInitialized(c.env);
}
// Apply route-specific middleware
if (routeConfig.requiresAuth) {
const authResult = await authenticate(c);
if (!authResult.success) {
return c.json({ error: 'Unauthorized' }, 401);
}
}
// Execute the route handler
const response = await router.fetch(c.req.raw, c.env, c.executionCtx);
// Log performance metrics
const duration = Date.now() - startTime;
console.log(`[route:${path}] completed in ${duration}ms`);
return response;
} catch (error) {
console.error(`[route:${path}] error:`, error);
return c.json({ error: 'Internal Server Error' }, 500);
}
});
Routes That Skip AI Initialization
Most routes don't need Mastra at all:
Authentication (`/auth/*`): JWT validation, session management
Organizations (`/organizations/*`): CRUD operations
Repositories (`/repositories/*`): GitHub configuration
Billing (`/billing/*`): Stripe integration
API Keys (`/api-keys/*`): Key management
Notifications (`/notifications/*`): Preference management
Health Checks (`/health/*`): System status
Only these routes initialize Mastra:
Chat (`/chat/*`): AI-powered conversations
Bug Analysis (`/bug-analysis/*`): Chrome extension analysis
Analytics Enrichment (`/v1/analytics-enrichment/*`): AI insights
GitHub Workflow (queue consumers): Event processing
Production Patterns That Emerged
After six months in production, here are the patterns that actually work:
1. Lazy Everything
// Don't do this
import { heavyLibrary } from 'heavy-library';
// Do this
const getHeavyLibrary = async () => {
const { heavyLibrary } = await import('heavy-library');
return heavyLibrary;
};
// Use only when needed
app.post('/process', async (c) => {
const lib = await getHeavyLibrary();
// Now use it
});
2. Request-Scoped Initialization
// Not global state
let globalMastra: Mastra; // ❌ Persists across requests
// Request-scoped state
app.use('*', async (c, next) => {
// Fresh for each request
c.set('requestId', crypto.randomUUID());
c.set('startTime', Date.now());
await next();
});
3. Granular Module Boundaries
// Instead of one giant module
// ❌ repositories.route.ts (500 lines)
// Break it down
// ✅ repositories
// list.route.ts (50 lines)
// get.route.ts (30 lines)
// create.route.ts (60 lines)
// update.route.ts (40 lines)
// delete.route.ts (30 lines)
4. Memory-Aware Caching
// Simple in-memory cache with size limits
class BoundedCache<T> {
private cache = new Map<string, T>(); private maxSize: number;
constructor(maxSize = 100) {
this.maxSize = maxSize; }
set(key: string, value: T) {
// Evict oldest if at capacity
if (this.cache.size >= this.maxSize) {
const firstKey = this.cache.keys().next().value;
this.cache.delete(firstKey);
}
this.cache.set(key, value);
}
get(key: string): T | undefined {
return this.cache.get(key);
}
}
// Use bounded caches for request-scoped data
const embedingCache = new BoundedCache<Float32Array>(50);
Lessons Learned
After building a production AI platform with Hono and dynamic loading, here's what we learned:
1. Constraints Drive Innovation
The 128MB limit seemed impossible at first. But it forced us to build a better architecture than we would have with unlimited memory. Every constraint became an opportunity to optimize.
2. Not All Frameworks Are Created Equal
Framework choice matters exponentially more on the edge than on traditional servers. The difference between Hono and Express isn't incremental -- it's the difference between working and not working.
3. Dynamic Loading Is a Superpower
Loading code on-demand isn't just about saving memory. It's about:- Faster cold starts (less to parse)- Better isolation (routes don't interfere)- Easier debugging (smaller surface area)- Natural code splitting (enforced boundaries)
4. Type Safety Enables Velocity
Hono's TypeScript-first design with CloudflareEnv bindings caught so many bugs at compile time. When your bindings are typed, your routes are typed, and your responses are typed, you ship with confidence.
5. Web Standards Are the Future
By building on Request/Response instead of Node.js APIs, our code is portable. We could move to Deno, Bun, or any future runtime that implements web standards. No lock-in.
The Bottom Line
Hono + dynamic loading let us fit an entire AI platform -- 50+ endpoints, real-time chat, workflow orchestration, semantic search -- into 128MB of memory with sub-50ms response times globally.
Could we have built this with Express? No. The memory constraints alone would have killed the project.
Could we have built it with another edge framework? Maybe, but none match Hono's combination of performance, size, and developer experience.
The future of web development isn't just about moving to the edge. It's about embracing the constraints of the edge and using frameworks designed for this new world. For us, that framework is Hono.
——
Next in the Series
**[Part 3: How We Built a Sub-10ms AI Pipeline on the Edge](./part-3-ai-pipeline.md)** - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
**[Part 4: From LangGraph to Mastra - Our AI Orchestration Journey](./part-4-from-langgraph-to-mastra.md)** - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.
——
_Want to see Hono in action? Check out our [open-source code](https://github.com/kasava/kasava) or read more about [our parallel indexing architecture](/blog/parallel-indexing-architecture) that processes 10,000+ files in under 5 minutes._
Start Building with Momentum
Momentum empowers you to unleash your creativity and build anything you can imagine.
Start Building with Momentum
Momentum empowers you to unleash your creativity and build anything you can imagine.
Start Building with Momentum
Momentum empowers you to unleash your creativity and build anything you can imagine.
Kasava
No Spam. Just Product updates.
Company
Kasava
No Spam. Just Product updates.
Company
Kasava
No Spam. Just Product updates.
Company
Kasava
No Spam. Just Product updates.
Company