The Complete Edge Architecture Guide (Part 2): How We Fit an AI Platform in 128MB

The Complete Edge Architecture Guide (Part 2): How We Fit an AI Platform in 128MB

Sep 5, 2025

This is Part 2 of our four-part series on building AI-powered applications on the edge. [Part 1](./part-1-architecture.md) covered the foundational Cloudflare architecture. In this post, we'll dive into our framework choice and memory optimization strategies. Part 3 explores [our sub-10ms AI pipeline implementation](./part-3-ai-pipeline.md), and Part 4 details [our journey from LangGraph to Mastra](./part-4-from-langgraph-to-mastra.md) for AI orchestration.

——

So you've decided to go all-in on Cloudflare Workers. You've got your R2 buckets, your queues, your Durable Objects. But now you need to actually build an API. And if you're like us, your first instinct is to reach for Express.

Here's the thing about running on the edge -- everything you know about Node.js frameworks goes out the window. That comfortable Express setup with its 572KB bundle size? It's going to cause a large memory overhead once you add your actual application code. Fastify with its plugin ecosystem? Those plugins assume Node.js APIs that don't exist in Workers.

Why Framework Choice Matters on the Edge

When you're running in a traditional Node.js environment, framework overhead is a rounding error. Your server has gigabytes of RAM, persistent connections, and all the time in the world to initialize. Bundle size? Who cares when you have a 100Mbps connection to your CDN.

But on the edge, every kilobyte counts. Every millisecond of initialization time is multiplied across thousands of cold starts. Every dependency is code that has to be parsed, compiled, and held in memory -- memory you desperately need for your actual application.

Consider this: Cloudflare Workers gives you 128MB of memory. Total. For everything. Your framework, your application code, your in-flight requests, your temporary data structures. Everything.
Now consider that Express alone -- just the framework, no middleware, no actual application -- uses 120MB of memory when running. See the problem?

The Express Problem: When 572KB Is Too Much

Let's be specific about why Express (and Fastify, and Koa, and Hapi) don't work on the edge:

Bundle Size Reality Check

// What you think you're importing
import express from 'express';

// What you're actually getting
// - 572KB of minified JavaScript
// - 50+ dependencies
// - Node.js-specific code that needs polyfills
// - Middleware system that assumes persistent memory
// - Router that wasn't built for cold starts


The Polyfill Tax

Since Workers aren't Node.js, you need polyfills for Node.js-specific APIs. But here's the thing -- those polyfills aren't free:

// Express needs these Node.js APIs
import { Buffer } from 'buffer';  // +45KB
import { Stream } from 'stream';   // +30KB  
import { EventEmitter } from 'events'; // +12KB
import process from 'process';     // +25KB// ... and 20 more

By the time you've polyfilled enough to make Express happy, you've added 200KB+ to your bundle. For functionality you don't even need.

The Memory Problem

Express was designed for long-running servers. It caches routes, maintains middleware state, and builds up internal data structures over time. In a serverless environment where every request might be a cold start, this is pure waste:

// Express internals (simplified)
class Express {  constructor() {    
  this._router = new Router();  // Complex routing table    this.cache = {};              // Route cache    this.engines = {};            // Template engines    this.settings = {};           // App settings    this.locals = {};             // App-level variables    // ... lots more state  }}

// Memory usage after initialization: ~120MB// Memory actually needed for a single request: ~5MB```

Enter Hono: Built for the Edge

Hono takes a completely different approach. Instead of trying to be Node.js-compatible, it embraces web standards. Instead of bundling everything, it's modular. Instead of assuming persistent memory, it's stateless.


Express

Fastify

Koa

Hono

Bundle Size

572KB

189KB

90KB

<14KB

Memory Usage

120MB

85MB

60MB

18MB

Cold Start

450ms

380ms

210ms

120ms

Dependencies

50+

30+

20+

0

Edge Workers

❌ Needs adapter

❌ Needs adapter

❌ Needs adapter

Native

But it's not just about being smaller. It's about being designed for this environment.

Web Standards First

Hono is built entirely on Web Standards APIs -- the same APIs that Cloudflare Workers implements natively:

// Express way (needs polyfills)
app.get('/users/:id', (req, res) => {  
  const id = req.params.id;  
  res.json({ id });
});

// Hono way (pure web standards)
app.get('/users/:id', (c) => { 
  const id = c.req.param('id');  
  return c.json({ id });
});


Zero Dependencies

Everything Hono needs is either part of the web standards (which Workers provides) or bundled in that tiny 14KB package. No dependency hell. No security vulnerabilities from nested dependencies. No surprises.

TypeScript Native

While Express requires @types/express (and prayers that they match the actual version), Hono is written in TypeScript:

// Full type safety out of the boximport { Hono } from 'hono';import type { CloudflareEnv } from './types';
const app = new Hono<{  Bindings: CloudflareEnv;  Variables: {    user: AuthUser;    correlationId: string;  };}>();
// TypeScript knows about your bindings!app.get('/data', async (c) => {  const data = await c.env.MY_KV.get('key'); // ✅ Fully typed  const user = c.get('user');                // ✅ Type: AuthUser  return c.json({ data, user });             // ✅ Response type inferred});


Cloudflare Bindings Integration

As if the efficiency benefits weren't enough: direct, type-safe access to all Cloudflare services.

// Type definitions for our bindings
interface CloudflareEnv {  

// Databases  
DB: D1Database;   

// Storage  
KASAVA_DOCUMENTS: R2Bucket; 
KASAVA_RECORDINGS: R2Bucket;   

// Queues  
GITHUB_EVENT_QUEUE: Queue; 
INDEXING_QUEUE: Queue; 

// KV Namespaces  
SESSION_CACHE: KVNamespace; 
EMBEDDING_CACHE: KVNamespace; 

// Durable Objects  
CHAT_SESSIONS: DurableObjectNamespace;  
  
// Secrets  
ANTHROPIC_API_KEY: string; 
VOYAGE_API_KEY: string;
}

// Using bindings in routes
app.post('/documents/upload', async (c) => {
  const file = await c.req.blob(); 
  // Direct R2 access - no configuration needed!
  await c.env.KASAVA_DOCUMENTS.put( 
    `docs/${crypto.randomUUID()}`,  
    file,   
    {  
      httpMetadata: {  
        contentType: file.type,
      }  
    } 
  );  
  return c.json({ success: true });
});

Middleware Composition

Hono's middleware system is both powerful and efficient:

// Custom middleware for API key tracking
const apiKeyTracking = (): MiddlewareHandler => {
  return async (c, next) => {    
    const apiKey = c.req.header('X-API-Key');    
    if (apiKey) { // Track usage in KV     
      const key = `usage:${apiKey}:${new Date().toISOString().split('T')[0]}`; 
      const count = await c.env.API_USAGE.get(key);     
      await c.env.API_USAGE.put(key, String((parseInt(count || '0') + 1)));   
    }      
    await next(); 
  };
};

// Compose middleware for specific routes
api.use('/v1/*', apiKeyTracking());
api.use('/v1/*', validateApiKey());
api.use('/v1/*', checkRateLimit());

Dynamic Loading: Managing Memory in a 128MB World

Here's a reality check -- Cloudflare Workers gives you 128MB of memory per execution. That's it. When you're running an AI orchestration framework (Mastra), handling 50+ route modules, and processing real-time data, that 128MB disappears fast.

So how do we make it work? Dynamic loading.

The Problem: Everything Everywhere All at Once

In a traditional Node.js app with Express, you'd import everything at startup:

// Traditional approach - loads EVERYTHING
import { auth } from './routes/auth';
import { billing } from './routes/billing';
import { chat } from './routes/chat';
import { analytics } from './routes/analytics';
import { repositories } from './routes/repositories';
import { organizations } from './routes/organizations';
import { Mastra } from '@mastra/core'; // ... 50 more imports
const app = express();const mastra = new Mastra({ /* config */ });

// Mount all routes
app.use('/auth', auth);app.use('/billing', billing); // ... etc

// Memory usage: 80MB+ just from imports!

Do this in Workers and you'll blow through your memory limit before handling a single request.

Our Solution: Load Only What You Need, When You Need It

We implemented a dynamic loading system that treats memory as the precious resource it is:

// Route configuration with metadata
export const API_ROUTES: RouteConfig[] = [ 
  {  
    path: '/auth', 
    module: '@/routes/api/auth/auth.route',   
    export: 'auth',    requiresAI: false,  // No Mastra needed  },  {    path: '/chat',    module: '@/routes/api/chat/chat.route',     export: 'chat',    requiresAI: true,   // Initialize Mastra  },  {    path: '/repositories',    module: '@/routes/api/repositories/repositories.route',    export: 'repositories',    requiresAI: false,  },  // ... 50+ routes, only 4 need AI];
// Dynamic route loaderexport async function loadRouteModule(modulePath: string): Promise<any> {  // Check cache first  if (routeCache.has(modulePath)) {    return routeCache.get(modulePath);  }    // Static imports for Workers compatibility  // (Dynamic string imports don't work in Workers)  let module;  switch (modulePath) {    case '@/routes/api/auth/auth.route':      module = await import('@/routes/api/auth/auth.route');      break;    case '@/routes/api/chat/chat.route':      module = await import('@/routes/api/chat/chat.route');      break;    // ... other routes  }    // Cache for this request lifecycle  if (module) {    routeCache.set(modulePath, module);  }    return module;}

Per-Request AI Initialization

Here's the clever bit -- Mastra (our AI framework) isn't initialized globally. It's created on-demand, only for routes that actually need it:

// Mastra instance management

let mastraInstance: Mastra | undefined;

export async function ensureMastraInitialized(env: Env): Promise<Mastra> { 
  if (!mastraInstance) {  
    // Lightweight instance creation   
    mastraInstance = new Mastra({     
      executionEngine: 'event-based', 
      // 40% performance boost  
      logger: false,           
      // Disable heavy logging   
      systemHostURL: env.MASTRA_SYSTEM_HOST_URL,       
      // Only initialize what we need    
      workflows: {  
        chat: () => import('./workflows/chat'),   
        github: () => import('./workflows/github'),    
      },  
    })
  }

// In routes that DON'T need AI (most of them!)

// auth.route.ts  
export const auth = new Hono()  
  .get('/session', async (c) => {   
  // No Mastra initialization here!  
    const user = await getUser(c.env);  
    return c.json({ user });  });

// In routes that DO need AI

// chat.route.ts  
export const chat = new Hono()  
  .post('/stream', async (c) => {   
    // Initialize Mastra only when needed    
    const mastra = await ensureMastraInitialized(c.env);  
    const workflow = mastra.getWorkflow('chat');  
    // ... use AI features  
  });

The Route Module System

Our route module system is configuration-driven and optimized for edge constraints:

Route Configuration

// routes.config.ts
interface RouteConfig { path: string;  // URL path prefix 
                       module: string;         // Module to import
                       export: string;         // Named export to use 
                       requiresAI?: boolean;   // Needs Mastra? 
                       requiresAuth?: boolean; // Needs authentication?  
                       rateLimit?: number;     // Requests per minute
                       }
export const API_ROUTES: RouteConfig[] = [ 
  // Authentication & User Management 
  { path: '/auth', module: 'auth/auth.route', export: 'auth' }, 
  { path: '/users', module: 'users/users.route', export: 'users' }, 
  // Core Platform Features (no AI needed)  
  { path: '/organizations', module: 'organizations/organizations.route', export: 'organizations' }, 
  { path: '/repositories', module: 'repositories/repositories.route', export: 'repositories' }, 
  { path: '/billing', module: 'billing/billing.route', export: 'billing' }, 
  { path: '/api-keys', module: 'api-keys/api-keys.route', export: 'apiKeys' },   
  // AI-Powered Features (initialize Mastra) 
  { path: '/chat', module: 'chat/chat.route', export: 'chat', requiresAI: true }, 
  { path: '/bug-analysis', module: 'bug-analysis/bug-analysis.route', export: 'bugAnalysis', requiresAI: true },
  { path: '/analytics-enrichment', module: 'analytics/enrichment.route', export: 'enrichment', requiresAI: true },    
  // Webhook Handlers 
  { path: '/webhooks/github', module: 'webhooks/github.route', export: 'githubWebhooks' },  
  { path: '/webhooks/stripe', module: 'webhooks/stripe.route', export: 'stripeWebhooks' },
];

The Loading Strategy

// Route handler with intelligent loading
app.all('/*', async (c) => {  
  const startTime = Date.now();  
  const path = c.req.path;   
  // Find matching route configuration 
  const routeConfig = API_ROUTES.find(r => path.startsWith(r.path));   
  if (!routeConfig) {    return c.notFound();  }    
  try {   
    // Load only the specific route module needed   
    const module = await loadRouteModule(routeConfig.module);    
    const router = module[routeConfig.export];       
    // Initialize AI only if needed   
    if (routeConfig.requiresAI) {   
      await ensureMastraInitialized(c.env); 
    }       
    // Apply route-specific middleware   
    if (routeConfig.requiresAuth) {   
      const authResult = await authenticate(c);   
      if (!authResult.success) {    
        return c.json({ error: 'Unauthorized' }, 401);   
      }   
    }   
    // Execute the route handler 
    const response = await router.fetch(c.req.raw, c.env, c.executionCtx);  
    // Log performance metrics 
    const duration = Date.now() - startTime;  
    console.log(`[route:${path}] completed in ${duration}ms`);  
    return response;  
  } catch (error) {  
    console.error(`[route:${path}] error:`, error);  
    return c.json({ error: 'Internal Server Error' }, 500);  
  }
});

Routes That Skip AI Initialization

Most routes don't need Mastra at all:

  • Authentication (`/auth/*`): JWT validation, session management

  • Organizations (`/organizations/*`): CRUD operations

  • Repositories (`/repositories/*`): GitHub configuration

  • Billing (`/billing/*`): Stripe integration

  • API Keys (`/api-keys/*`): Key management

  • Notifications (`/notifications/*`): Preference management

  • Health Checks (`/health/*`): System status

Only these routes initialize Mastra:

  • Chat (`/chat/*`): AI-powered conversations

  • Bug Analysis (`/bug-analysis/*`): Chrome extension analysis

  • Analytics Enrichment (`/v1/analytics-enrichment/*`): AI insights

  • GitHub Workflow (queue consumers): Event processing


Production Patterns That Emerged

After six months in production, here are the patterns that actually work:

1. Lazy Everything

// Don't do this
import { heavyLibrary } from 'heavy-library';

// Do this
const getHeavyLibrary = async () => {  
  const { heavyLibrary } = await import('heavy-library'); 
  return heavyLibrary;
};

// Use only when needed
app.post('/process', async (c) => {  
  const lib = await getHeavyLibrary(); 
  // Now use it
});

2. Request-Scoped Initialization

// Not global state
let globalMastra: Mastra;  // ❌ Persists across requests

// Request-scoped state
app.use('*', async (c, next) => { 
  // Fresh for each request 
  c.set('requestId', crypto.randomUUID());
  c.set('startTime', Date.now()); 
  await next();
});

3. Granular Module Boundaries

// Instead of one giant module
// ❌ repositories.route.ts (500 lines)

// Break it down
// ✅ repositories
//     list.route.ts    (50 lines)
//     get.route.ts     (30 lines)
//     create.route.ts  (60 lines)
//     update.route.ts  (40 lines)
//     delete.route.ts  (30 lines)

4. Memory-Aware Caching

// Simple in-memory cache with size limits
class BoundedCache<T> {  
  private cache = new Map<string, T>();  private maxSize: number;  
  constructor(maxSize = 100) {  
    this.maxSize = maxSize;  }  
  set(key: string, value: T) {  
    // Evict oldest if at capacity   
    if (this.cache.size >= this.maxSize) {  
      const firstKey = this.cache.keys().next().value;   
      this.cache.delete(firstKey);    
    }   
    this.cache.set(key, value); 
  }   
  get(key: string): T | undefined {   
    return this.cache.get(key); 
  }
}

// Use bounded caches for request-scoped data
const embedingCache = new BoundedCache<Float32Array>(50);


Lessons Learned

After building a production AI platform with Hono and dynamic loading, here's what we learned:

1. Constraints Drive Innovation

The 128MB limit seemed impossible at first. But it forced us to build a better architecture than we would have with unlimited memory. Every constraint became an opportunity to optimize.

2. Not All Frameworks Are Created Equal

Framework choice matters exponentially more on the edge than on traditional servers. The difference between Hono and Express isn't incremental -- it's the difference between working and not working.

3. Dynamic Loading Is a Superpower

Loading code on-demand isn't just about saving memory. It's about:- Faster cold starts (less to parse)- Better isolation (routes don't interfere)- Easier debugging (smaller surface area)- Natural code splitting (enforced boundaries)

4. Type Safety Enables Velocity

Hono's TypeScript-first design with CloudflareEnv bindings caught so many bugs at compile time. When your bindings are typed, your routes are typed, and your responses are typed, you ship with confidence.

5. Web Standards Are the Future

By building on Request/Response instead of Node.js APIs, our code is portable. We could move to Deno, Bun, or any future runtime that implements web standards. No lock-in.


The Bottom Line

Hono + dynamic loading let us fit an entire AI platform -- 50+ endpoints, real-time chat, workflow orchestration, semantic search -- into 128MB of memory with sub-50ms response times globally.

Could we have built this with Express? No. The memory constraints alone would have killed the project.

Could we have built it with another edge framework? Maybe, but none match Hono's combination of performance, size, and developer experience.

The future of web development isn't just about moving to the edge. It's about embracing the constraints of the edge and using frameworks designed for this new world. For us, that framework is Hono.

——

Next in the Series


**[Part 3: How We Built a Sub-10ms AI Pipeline on the Edge](./part-3-ai-pipeline.md)** - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
**[Part 4: From LangGraph to Mastra - Our AI Orchestration Journey](./part-4-from-langgraph-to-mastra.md)** - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.


——

_Want to see Hono in action? Check out our [open-source code](https://github.com/kasava/kasava) or read more about [our parallel indexing architecture](/blog/parallel-indexing-architecture) that processes 10,000+ files in under 5 minutes._

This is Part 2 of our four-part series on building AI-powered applications on the edge. [Part 1](./part-1-architecture.md) covered the foundational Cloudflare architecture. In this post, we'll dive into our framework choice and memory optimization strategies. Part 3 explores [our sub-10ms AI pipeline implementation](./part-3-ai-pipeline.md), and Part 4 details [our journey from LangGraph to Mastra](./part-4-from-langgraph-to-mastra.md) for AI orchestration.

——

So you've decided to go all-in on Cloudflare Workers. You've got your R2 buckets, your queues, your Durable Objects. But now you need to actually build an API. And if you're like us, your first instinct is to reach for Express.

Here's the thing about running on the edge -- everything you know about Node.js frameworks goes out the window. That comfortable Express setup with its 572KB bundle size? It's going to cause a large memory overhead once you add your actual application code. Fastify with its plugin ecosystem? Those plugins assume Node.js APIs that don't exist in Workers.

Why Framework Choice Matters on the Edge

When you're running in a traditional Node.js environment, framework overhead is a rounding error. Your server has gigabytes of RAM, persistent connections, and all the time in the world to initialize. Bundle size? Who cares when you have a 100Mbps connection to your CDN.

But on the edge, every kilobyte counts. Every millisecond of initialization time is multiplied across thousands of cold starts. Every dependency is code that has to be parsed, compiled, and held in memory -- memory you desperately need for your actual application.

Consider this: Cloudflare Workers gives you 128MB of memory. Total. For everything. Your framework, your application code, your in-flight requests, your temporary data structures. Everything.
Now consider that Express alone -- just the framework, no middleware, no actual application -- uses 120MB of memory when running. See the problem?

The Express Problem: When 572KB Is Too Much

Let's be specific about why Express (and Fastify, and Koa, and Hapi) don't work on the edge:

Bundle Size Reality Check

// What you think you're importing
import express from 'express';

// What you're actually getting
// - 572KB of minified JavaScript
// - 50+ dependencies
// - Node.js-specific code that needs polyfills
// - Middleware system that assumes persistent memory
// - Router that wasn't built for cold starts


The Polyfill Tax

Since Workers aren't Node.js, you need polyfills for Node.js-specific APIs. But here's the thing -- those polyfills aren't free:

// Express needs these Node.js APIs
import { Buffer } from 'buffer';  // +45KB
import { Stream } from 'stream';   // +30KB  
import { EventEmitter } from 'events'; // +12KB
import process from 'process';     // +25KB// ... and 20 more

By the time you've polyfilled enough to make Express happy, you've added 200KB+ to your bundle. For functionality you don't even need.

The Memory Problem

Express was designed for long-running servers. It caches routes, maintains middleware state, and builds up internal data structures over time. In a serverless environment where every request might be a cold start, this is pure waste:

// Express internals (simplified)
class Express {  constructor() {    
  this._router = new Router();  // Complex routing table    this.cache = {};              // Route cache    this.engines = {};            // Template engines    this.settings = {};           // App settings    this.locals = {};             // App-level variables    // ... lots more state  }}

// Memory usage after initialization: ~120MB// Memory actually needed for a single request: ~5MB```

Enter Hono: Built for the Edge

Hono takes a completely different approach. Instead of trying to be Node.js-compatible, it embraces web standards. Instead of bundling everything, it's modular. Instead of assuming persistent memory, it's stateless.


Express

Fastify

Koa

Hono

Bundle Size

572KB

189KB

90KB

<14KB

Memory Usage

120MB

85MB

60MB

18MB

Cold Start

450ms

380ms

210ms

120ms

Dependencies

50+

30+

20+

0

Edge Workers

❌ Needs adapter

❌ Needs adapter

❌ Needs adapter

Native

But it's not just about being smaller. It's about being designed for this environment.

Web Standards First

Hono is built entirely on Web Standards APIs -- the same APIs that Cloudflare Workers implements natively:

// Express way (needs polyfills)
app.get('/users/:id', (req, res) => {  
  const id = req.params.id;  
  res.json({ id });
});

// Hono way (pure web standards)
app.get('/users/:id', (c) => { 
  const id = c.req.param('id');  
  return c.json({ id });
});


Zero Dependencies

Everything Hono needs is either part of the web standards (which Workers provides) or bundled in that tiny 14KB package. No dependency hell. No security vulnerabilities from nested dependencies. No surprises.

TypeScript Native

While Express requires @types/express (and prayers that they match the actual version), Hono is written in TypeScript:

// Full type safety out of the boximport { Hono } from 'hono';import type { CloudflareEnv } from './types';
const app = new Hono<{  Bindings: CloudflareEnv;  Variables: {    user: AuthUser;    correlationId: string;  };}>();
// TypeScript knows about your bindings!app.get('/data', async (c) => {  const data = await c.env.MY_KV.get('key'); // ✅ Fully typed  const user = c.get('user');                // ✅ Type: AuthUser  return c.json({ data, user });             // ✅ Response type inferred});


Cloudflare Bindings Integration

As if the efficiency benefits weren't enough: direct, type-safe access to all Cloudflare services.

// Type definitions for our bindings
interface CloudflareEnv {  

// Databases  
DB: D1Database;   

// Storage  
KASAVA_DOCUMENTS: R2Bucket; 
KASAVA_RECORDINGS: R2Bucket;   

// Queues  
GITHUB_EVENT_QUEUE: Queue; 
INDEXING_QUEUE: Queue; 

// KV Namespaces  
SESSION_CACHE: KVNamespace; 
EMBEDDING_CACHE: KVNamespace; 

// Durable Objects  
CHAT_SESSIONS: DurableObjectNamespace;  
  
// Secrets  
ANTHROPIC_API_KEY: string; 
VOYAGE_API_KEY: string;
}

// Using bindings in routes
app.post('/documents/upload', async (c) => {
  const file = await c.req.blob(); 
  // Direct R2 access - no configuration needed!
  await c.env.KASAVA_DOCUMENTS.put( 
    `docs/${crypto.randomUUID()}`,  
    file,   
    {  
      httpMetadata: {  
        contentType: file.type,
      }  
    } 
  );  
  return c.json({ success: true });
});

Middleware Composition

Hono's middleware system is both powerful and efficient:

// Custom middleware for API key tracking
const apiKeyTracking = (): MiddlewareHandler => {
  return async (c, next) => {    
    const apiKey = c.req.header('X-API-Key');    
    if (apiKey) { // Track usage in KV     
      const key = `usage:${apiKey}:${new Date().toISOString().split('T')[0]}`; 
      const count = await c.env.API_USAGE.get(key);     
      await c.env.API_USAGE.put(key, String((parseInt(count || '0') + 1)));   
    }      
    await next(); 
  };
};

// Compose middleware for specific routes
api.use('/v1/*', apiKeyTracking());
api.use('/v1/*', validateApiKey());
api.use('/v1/*', checkRateLimit());

Dynamic Loading: Managing Memory in a 128MB World

Here's a reality check -- Cloudflare Workers gives you 128MB of memory per execution. That's it. When you're running an AI orchestration framework (Mastra), handling 50+ route modules, and processing real-time data, that 128MB disappears fast.

So how do we make it work? Dynamic loading.

The Problem: Everything Everywhere All at Once

In a traditional Node.js app with Express, you'd import everything at startup:

// Traditional approach - loads EVERYTHING
import { auth } from './routes/auth';
import { billing } from './routes/billing';
import { chat } from './routes/chat';
import { analytics } from './routes/analytics';
import { repositories } from './routes/repositories';
import { organizations } from './routes/organizations';
import { Mastra } from '@mastra/core'; // ... 50 more imports
const app = express();const mastra = new Mastra({ /* config */ });

// Mount all routes
app.use('/auth', auth);app.use('/billing', billing); // ... etc

// Memory usage: 80MB+ just from imports!

Do this in Workers and you'll blow through your memory limit before handling a single request.

Our Solution: Load Only What You Need, When You Need It

We implemented a dynamic loading system that treats memory as the precious resource it is:

// Route configuration with metadata
export const API_ROUTES: RouteConfig[] = [ 
  {  
    path: '/auth', 
    module: '@/routes/api/auth/auth.route',   
    export: 'auth',    requiresAI: false,  // No Mastra needed  },  {    path: '/chat',    module: '@/routes/api/chat/chat.route',     export: 'chat',    requiresAI: true,   // Initialize Mastra  },  {    path: '/repositories',    module: '@/routes/api/repositories/repositories.route',    export: 'repositories',    requiresAI: false,  },  // ... 50+ routes, only 4 need AI];
// Dynamic route loaderexport async function loadRouteModule(modulePath: string): Promise<any> {  // Check cache first  if (routeCache.has(modulePath)) {    return routeCache.get(modulePath);  }    // Static imports for Workers compatibility  // (Dynamic string imports don't work in Workers)  let module;  switch (modulePath) {    case '@/routes/api/auth/auth.route':      module = await import('@/routes/api/auth/auth.route');      break;    case '@/routes/api/chat/chat.route':      module = await import('@/routes/api/chat/chat.route');      break;    // ... other routes  }    // Cache for this request lifecycle  if (module) {    routeCache.set(modulePath, module);  }    return module;}

Per-Request AI Initialization

Here's the clever bit -- Mastra (our AI framework) isn't initialized globally. It's created on-demand, only for routes that actually need it:

// Mastra instance management

let mastraInstance: Mastra | undefined;

export async function ensureMastraInitialized(env: Env): Promise<Mastra> { 
  if (!mastraInstance) {  
    // Lightweight instance creation   
    mastraInstance = new Mastra({     
      executionEngine: 'event-based', 
      // 40% performance boost  
      logger: false,           
      // Disable heavy logging   
      systemHostURL: env.MASTRA_SYSTEM_HOST_URL,       
      // Only initialize what we need    
      workflows: {  
        chat: () => import('./workflows/chat'),   
        github: () => import('./workflows/github'),    
      },  
    })
  }

// In routes that DON'T need AI (most of them!)

// auth.route.ts  
export const auth = new Hono()  
  .get('/session', async (c) => {   
  // No Mastra initialization here!  
    const user = await getUser(c.env);  
    return c.json({ user });  });

// In routes that DO need AI

// chat.route.ts  
export const chat = new Hono()  
  .post('/stream', async (c) => {   
    // Initialize Mastra only when needed    
    const mastra = await ensureMastraInitialized(c.env);  
    const workflow = mastra.getWorkflow('chat');  
    // ... use AI features  
  });

The Route Module System

Our route module system is configuration-driven and optimized for edge constraints:

Route Configuration

// routes.config.ts
interface RouteConfig { path: string;  // URL path prefix 
                       module: string;         // Module to import
                       export: string;         // Named export to use 
                       requiresAI?: boolean;   // Needs Mastra? 
                       requiresAuth?: boolean; // Needs authentication?  
                       rateLimit?: number;     // Requests per minute
                       }
export const API_ROUTES: RouteConfig[] = [ 
  // Authentication & User Management 
  { path: '/auth', module: 'auth/auth.route', export: 'auth' }, 
  { path: '/users', module: 'users/users.route', export: 'users' }, 
  // Core Platform Features (no AI needed)  
  { path: '/organizations', module: 'organizations/organizations.route', export: 'organizations' }, 
  { path: '/repositories', module: 'repositories/repositories.route', export: 'repositories' }, 
  { path: '/billing', module: 'billing/billing.route', export: 'billing' }, 
  { path: '/api-keys', module: 'api-keys/api-keys.route', export: 'apiKeys' },   
  // AI-Powered Features (initialize Mastra) 
  { path: '/chat', module: 'chat/chat.route', export: 'chat', requiresAI: true }, 
  { path: '/bug-analysis', module: 'bug-analysis/bug-analysis.route', export: 'bugAnalysis', requiresAI: true },
  { path: '/analytics-enrichment', module: 'analytics/enrichment.route', export: 'enrichment', requiresAI: true },    
  // Webhook Handlers 
  { path: '/webhooks/github', module: 'webhooks/github.route', export: 'githubWebhooks' },  
  { path: '/webhooks/stripe', module: 'webhooks/stripe.route', export: 'stripeWebhooks' },
];

The Loading Strategy

// Route handler with intelligent loading
app.all('/*', async (c) => {  
  const startTime = Date.now();  
  const path = c.req.path;   
  // Find matching route configuration 
  const routeConfig = API_ROUTES.find(r => path.startsWith(r.path));   
  if (!routeConfig) {    return c.notFound();  }    
  try {   
    // Load only the specific route module needed   
    const module = await loadRouteModule(routeConfig.module);    
    const router = module[routeConfig.export];       
    // Initialize AI only if needed   
    if (routeConfig.requiresAI) {   
      await ensureMastraInitialized(c.env); 
    }       
    // Apply route-specific middleware   
    if (routeConfig.requiresAuth) {   
      const authResult = await authenticate(c);   
      if (!authResult.success) {    
        return c.json({ error: 'Unauthorized' }, 401);   
      }   
    }   
    // Execute the route handler 
    const response = await router.fetch(c.req.raw, c.env, c.executionCtx);  
    // Log performance metrics 
    const duration = Date.now() - startTime;  
    console.log(`[route:${path}] completed in ${duration}ms`);  
    return response;  
  } catch (error) {  
    console.error(`[route:${path}] error:`, error);  
    return c.json({ error: 'Internal Server Error' }, 500);  
  }
});

Routes That Skip AI Initialization

Most routes don't need Mastra at all:

  • Authentication (`/auth/*`): JWT validation, session management

  • Organizations (`/organizations/*`): CRUD operations

  • Repositories (`/repositories/*`): GitHub configuration

  • Billing (`/billing/*`): Stripe integration

  • API Keys (`/api-keys/*`): Key management

  • Notifications (`/notifications/*`): Preference management

  • Health Checks (`/health/*`): System status

Only these routes initialize Mastra:

  • Chat (`/chat/*`): AI-powered conversations

  • Bug Analysis (`/bug-analysis/*`): Chrome extension analysis

  • Analytics Enrichment (`/v1/analytics-enrichment/*`): AI insights

  • GitHub Workflow (queue consumers): Event processing


Production Patterns That Emerged

After six months in production, here are the patterns that actually work:

1. Lazy Everything

// Don't do this
import { heavyLibrary } from 'heavy-library';

// Do this
const getHeavyLibrary = async () => {  
  const { heavyLibrary } = await import('heavy-library'); 
  return heavyLibrary;
};

// Use only when needed
app.post('/process', async (c) => {  
  const lib = await getHeavyLibrary(); 
  // Now use it
});

2. Request-Scoped Initialization

// Not global state
let globalMastra: Mastra;  // ❌ Persists across requests

// Request-scoped state
app.use('*', async (c, next) => { 
  // Fresh for each request 
  c.set('requestId', crypto.randomUUID());
  c.set('startTime', Date.now()); 
  await next();
});

3. Granular Module Boundaries

// Instead of one giant module
// ❌ repositories.route.ts (500 lines)

// Break it down
// ✅ repositories
//     list.route.ts    (50 lines)
//     get.route.ts     (30 lines)
//     create.route.ts  (60 lines)
//     update.route.ts  (40 lines)
//     delete.route.ts  (30 lines)

4. Memory-Aware Caching

// Simple in-memory cache with size limits
class BoundedCache<T> {  
  private cache = new Map<string, T>();  private maxSize: number;  
  constructor(maxSize = 100) {  
    this.maxSize = maxSize;  }  
  set(key: string, value: T) {  
    // Evict oldest if at capacity   
    if (this.cache.size >= this.maxSize) {  
      const firstKey = this.cache.keys().next().value;   
      this.cache.delete(firstKey);    
    }   
    this.cache.set(key, value); 
  }   
  get(key: string): T | undefined {   
    return this.cache.get(key); 
  }
}

// Use bounded caches for request-scoped data
const embedingCache = new BoundedCache<Float32Array>(50);


Lessons Learned

After building a production AI platform with Hono and dynamic loading, here's what we learned:

1. Constraints Drive Innovation

The 128MB limit seemed impossible at first. But it forced us to build a better architecture than we would have with unlimited memory. Every constraint became an opportunity to optimize.

2. Not All Frameworks Are Created Equal

Framework choice matters exponentially more on the edge than on traditional servers. The difference between Hono and Express isn't incremental -- it's the difference between working and not working.

3. Dynamic Loading Is a Superpower

Loading code on-demand isn't just about saving memory. It's about:- Faster cold starts (less to parse)- Better isolation (routes don't interfere)- Easier debugging (smaller surface area)- Natural code splitting (enforced boundaries)

4. Type Safety Enables Velocity

Hono's TypeScript-first design with CloudflareEnv bindings caught so many bugs at compile time. When your bindings are typed, your routes are typed, and your responses are typed, you ship with confidence.

5. Web Standards Are the Future

By building on Request/Response instead of Node.js APIs, our code is portable. We could move to Deno, Bun, or any future runtime that implements web standards. No lock-in.


The Bottom Line

Hono + dynamic loading let us fit an entire AI platform -- 50+ endpoints, real-time chat, workflow orchestration, semantic search -- into 128MB of memory with sub-50ms response times globally.

Could we have built this with Express? No. The memory constraints alone would have killed the project.

Could we have built it with another edge framework? Maybe, but none match Hono's combination of performance, size, and developer experience.

The future of web development isn't just about moving to the edge. It's about embracing the constraints of the edge and using frameworks designed for this new world. For us, that framework is Hono.

——

Next in the Series


**[Part 3: How We Built a Sub-10ms AI Pipeline on the Edge](./part-3-ai-pipeline.md)** - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
**[Part 4: From LangGraph to Mastra - Our AI Orchestration Journey](./part-4-from-langgraph-to-mastra.md)** - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.


——

_Want to see Hono in action? Check out our [open-source code](https://github.com/kasava/kasava) or read more about [our parallel indexing architecture](/blog/parallel-indexing-architecture) that processes 10,000+ files in under 5 minutes._

Kasava

Kasava. All right reserved. © 2025

Kasava

Kasava. All right reserved. © 2025

Kasava

Kasava. All right reserved. © 2025

Kasava

Kasava. All right reserved. © 2025