The Complete Edge Architecture Guide (Part 1): Why We Went All-In on Cloudflare

The Complete Edge Architecture Guide (Part 1): Why We Went All-In on Cloudflare

Jane Cooper

Sep 3, 2025

This is Part 1 of our four-part series on building AI-powered applications on the edge. In this post, we'll cover the foundational Cloudflare architecture. Part 2 explores [how we use Hono framework and dynamic loading to fit everything in 128MB](./part-2-hono-framework.md). Part 3 dives into [our sub-10ms AI pipeline implementation](./part-3-ai-pipeline.md), and Part 4 details [our journey from LangGraph to Mastra](./part-4-from-langgraph-to-mastra.md) for AI orchestration.

Table of Contents

  • [The Workers Runtime](#the-workers-runtime-where-it-all-started)

  • [Bindings: Zero-Latency Service Connections](#bindings-the-secret-sauce-nobody-talks-about)

  • [R2: Object Storage with Zero Egress](#r2-the-s3-killer-with-zero-egress-fees)

  • [Queues: Async Processing That Works](#queues-async-processing-that-actually-works)

  • [Durable Objects: Distributed Coordination](#durable-objects-the-distributed-systems-cheat-code)

  • [KV: Simple Global Caching](#kv-the-cache-thats-actually-simple)

  • [Edge Positioning Advantages](#the-edge-positioning-advantage)- [Economics & Cost Analysis](#the-economics-of-it-all)

  • [Developer Experience](#the-developer-experience)- [Lessons Learned](#what-we-learned)

——

When we started building Kasava, we had a choice to make. We could go the traditional route with AWS Lambda, maybe some EC2 instances, deal with VPCs, cold starts, and the inevitable "why is our AWS bill $10k this month?" conversation. Or we could try something different.

We chose different. We chose Cloudflare Workers. And honestly? It's been one of the best architectural decisions we've made.

The Workers Runtime

When you're building an AI-powered platform that needs to process GitHub webhooks, run semantic search across millions of lines of code, and respond to natural language queries in real-time, you need speed. Cloudflare Workers gave us that. With V8 isolates instead of containers, we're talking sub-5ms cold starts globally. Not 5 seconds like Lambda -- 5 _milliseconds_. When a GitHub webhook hits our API, we're already processing it before Lambda would even wake up.

V8 Isolates vs Node.js

Here's where it gets interesting. Most serverless platforms, including AWS Lambda, run your code in containers or virtual machines. Each function gets its own isolated environment, which sounds great until you realize the overhead involved.

Traditional Node.js serverless (like Lambda):

  • Each function runs in its own container/VM

  • Full Node.js runtime per function instance

  • Cold starts measured in seconds

  • Memory overhead of ~50-100MB per instance

  • Process-level isolation

Cloudflare Workers with V8 Isolates:

  • Hundreds of isolates run within a single V8 runtime

  • Shared JavaScript engine across all functions- Cold starts measured in milliseconds

  • Memory overhead of ~5-10MB per isolate

  • V8-level isolation (same security model as Chrome tabs)

Think of it like the difference between giving each person their own house (containers) versus having hundreds of secure apartments in the same building (isolates). You get effectively the same security and isolation but with dramatically less overhead.

A single Workers runtime instance can handle thousands of isolates simultaneously, seamlessly switching between them in microseconds. When your code needs to run, it's not spinning up a new container or booting a Node.js process -- it's just creating a lightweight context within an already-running V8 engine.

This is why we can process GitHub webhooks in under 10ms globally while traditional serverless is still waking up.

But here's where it gets interesting...

Bindings: The Secret Sauce Nobody Talks About

You know what kills performance in serverless? Network calls. Every time you need to hit a database, call another service, or fetch from storage, you're adding latency. AWS makes you jump through VPC hoops, configure security groups, and pray to the networking gods. If you know you know.
Enter bindings. These are zero-latency connections between Workers and other Cloudflare services. No network overhead. No authentication dance. Just the purest form of syntactical sugar and automagical, instant access.
```typescriptconst s3 = new AWS.S3({ region: 'us-west-2', credentials: {...} });await s3.getObject({ Bucket: 'my-bucket', Key: 'file.txt' }).promise();
// We just do this:const file = await env.KASAVA_PROFILE_DOCUMENTS_BUCKET.get('file.txt');```
This is a simplified example but the real value becomes apparent when you have bindings for everything in your stack and can treat them as just services within your existing codebase:
- **R2 Buckets**: 5 different buckets for documents, recordings, archives- **KV Namespaces**: Session storage and embedding cache- **Queues**: 18 queues (9 main + 9 DLQ) for async processing- **Durable Objects**: Real-time chat sessions and distributed coordination- **Vectorize**: Experimental vector indexes for semantic search
Each binding is just there, available instantly, no configuration needed in code.

R2: The S3 Killer with Zero Egress Fees

Let's talk about the elephant in the room -- egress fees. AWS charges $90 per TB for data transfer. If you're serving videos, documents, or any kind of media, this adds up fast. Really fast.
Cloudflare R2? Zero. Zilch. Nada. Free egress.
We store everything in R2:
- Chrome extension recordings (up to 100MB per bug report)- Organization documents- GitHub comment archives- Profile documents- Indexing artifacts from our code analysis

// Storing a 50MB screen recording
await env.RECORDING_STORAGE.put(`recordings/${bugReportId}/screen.webm`,
                                recording,  
                                { httpMetadata: { contentType: "video/webm" },   
                                  customMetadata: { userId, timestamp: Date.now().toString() },
                                }
                               );
// Cost on AWS S3: Storage + egress fees every time someone views it
// Cost on R2: Just storage ($15/TB/month), zero egress


One of our users downloaded 3.2TB of historical data last month. On AWS? That would've been ~$300 just in egress. On Cloudflare? Free.

Queues: Async Processing That Actually Works

Here's something nobody tells you about serverless -- handling async work is a pain. With Lambda, you're either chaining functions together (and paying for wait time), using SQS (more complexity), or giving up and running ECS tasks.
Cloudflare Queues integrate directly with Workers. No external services, no additional authentication, just push to a queue and consume from it.

// Producer side - instant response to webhook
export default {  
  async fetch(request: Request, env: Env) {    
    const event = await request.json();
    // Queue it and return immediately                 
    await env.GITHUB_EVENT_QUEUE.send(event);
    return new Response("OK", { status: 200 });  
  },
};

// Consumer side - process in background
export default { 
  async queue(batch: MessageBatch, env: Env) {
    for (const message of batch.messages) { 
      await processGitHubEvent(message.body);    
      message.ack(); // Mark as processed  
    } 
  },
};

We run 9 different queues for different workloads:

  • repository-indexing: Orchestrates parallel code analysis (processes 10,000+ files in under 5 minutes!)

  • file-indexing: 50-file batches with 100 concurrent workers- **embedding-generation**: 128-text batches for Voyage AI

  • github-events: Primary webhook processing- Plus 9 matching DLQs for failed message handling


Each queue has its own configuration -- batch sizes, timeouts, retry policies. All managed through `wrangler.jsonc`, deployed with a single command.

Durable Objects: The Distributed Systems Cheat Code

This is where things get really interesting. We needed to coordinate indexing jobs across 100+ parallel workers without database contention. Traditional approach? Distributed locks, Redis, maybe Zookeeper if you hate yourself.
Cloudflare's answer? Durable Objects.


export class IndexingCoordinator { 
  private state: DurableObjectState; 
  private jobs: Map<string, JobState>
  
  async claimJob(workerId: string): Promise<Job | null> {    
    // This runs in exactly one place globally   
    // No race conditions, no distributed locks  
    const availableJob = this.findAvailableJob();  
    if (availableJob) {     
      availableJob.workerId = workerId;   
    availableJob.claimedAt = Date.now(); 
    await this.state.storage.put(`job:${availableJob.id}`, availableJob);  
    }    
    return availableJob; 
    }
}


Each Durable Object is a single-threaded JavaScript environment that's globally unique. Perfect for:

  • Coordinating our parallel indexing workers

  • Managing WebSocket connections for real-time chat

  • Maintaining session state without a database


No locks. No race conditions. Just JavaScript running in exactly one place.

KV: The Cache That's Actually Simple

Redis is great. Until you have to manage it. Provision it. Scale it. Deal with connection pools. Handle failover.
Cloudflare KV is just... there. A globally distributed key-value store with no configuration.

// Caching embedding results
const cacheKey = `embedding:${model}:${hashText(content)}`;
const cached = await env.EMBEDDING_CACHE.get(cacheKey);

if (cached) {  
  return JSON.parse(cached); // Sub-10ms for hot keys
}

const embedding = await generateEmbedding(content);

await env.EMBEDDING_CACHE.put(  
  cacheKey,  
  JSON.stringify(embedding),  
  { expirationTtl: 3600 } // 1 hour TTL
);

We cache everything:

  • Session data (15-minute TTL)

  • Embedding results (1-hour TTL)

  • API responses

  • Search results

The Edge Positioning Advantage

Here's the thing about running on the edge -- your code runs where your users are. Not in us-east-1. Not in three regions you carefully selected. Everywhere.

When someone in Tokyo hits our API, they're hitting a worker in Tokyo. Someone in São Paulo? Worker in São Paulo. No CDN configuration, no geo-routing rules. It just works.

This matters more than you think. We've seen:

  • 200ms faster response times compared to centralized deployments

  • Consistent performance regardless of user location

  • Natural resilience (if one location has issues, traffic routes elsewhere)

The Economics of It All

Let's talk money. Because at the end of the day, this stuff has to make business sense.

Traditional AWS Setup:
  • Lambda: Pay for execution time + cold starts

  • S3: $23/TB storage + $90/TB egress

  • SQS: $0.40 per million messages

  • ElastiCache: Starting at $15/month

  • CloudFront: Complex pricing, egress fees

  • NAT Gateway: $45/month + data processing

Our Cloudflare Setup:
  • Workers: 100k requests/day free, then $0.15 per million

  • R2: $15/TB storage, **zero egress**

  • Queues: Included with Workers

  • KV: 100k reads/day free

  • Durable Objects: $0.15 per million requests- Everything runs on the edge

We're saving 60-80% compared to an equivalent AWS setup. But more importantly, we're not managing infrastructure. No VPCs, no security groups, no capacity planning.

The Developer Experience

You know what's underrated? Being able to test everything locally. Wrangler (Cloudflare's CLI) lets us run the entire stack:

npm run dev  
# Thats it. Workers, KV, R2, Queues, all running locally

Deployment? One command:

npm run deploy  # Code is live globally in under 30 seconds

Compare that to setting up LocalStack, configuring AWS SAM, dealing with Docker containers... yeah, no thanks.

What We Learned

After six months of running everything on Cloudflare, here's what we've learned:

The Good:
  • Performance is incredible (sub-5ms cold starts still blow my mind)

  • Zero egress fees change how you think about architecture

  • Bindings eliminate entire categories of problems

  • Global by default is powerful

  • The simplicity is addictive

The Tradeoffs:
  • 128MB memory limit per worker (but you'd be surprised what fits)

  • 30-second CPU time limit (queues handle long-running tasks)

  • Different mental model from traditional serverless

  • Some services still experimental (Vectorize)

The Unexpected:
  • Durable Objects solved problems we didn't know we had

  • Queue coordination is smoother than any message broker we've used

  • The platform keeps getting better (recent 10x Queue performance improvement!)

Why This Matters

We're processing millions of GitHub events, running semantic search across gigabytes of code, generating embeddings with Voyage AI, coordinating 100+ parallel workers, and serving it all globally with sub-100ms latency. On a platform that costs us less than a decent coffee machine per month. But here's the real thing -- we're a small team. We don't have a DevOps person. We don't need one. Cloudflare handles the infrastructure so we can focus on building the product.

Is it perfect? No. Would I build Kasava on AWS if I had to start over? Also no. Sometimes the best architectural decision isn't about choosing the most popular option or the one with the most features. Sometimes it's about choosing the one that lets you ship fast, iterate quickly, and sleep at night knowing your infrastructure just works.

For us, that's Cloudflare. All in on the edge, and not looking back.

The Node.js Compatibility Tradeoff

Let's be honest about the elephant in the room -- Cloudflare Workers aren't Node.js. They run JavaScript in V8 isolates, which means you're giving up some Node.js compatibility for those incredible performance gains.

What you lose:
  • Full Node.js standard library (no `fs`, limited `os`, no native modules)

  • Some npm packages that rely heavily on Node.js internals (this became an issue when trying to implement tree-sitter for code parsing -- more on that later...)

  • Direct file system access (everything goes through bindings)

  • Long-running processes (30-second CPU time limit)

What Cloudflare provides:
  • Native support for most common Node.js APIs (crypto, buffer, streams, HTTP)

  • Polyfills for unsupported APIs via the `nodejs_compat` flag

  • Web standard APIs that often work better than Node.js equivalents

  • Automatic bundling that handles most compatibility issues

In practice? It's rarely a problem. The nodejs_compat flag handles most edge cases, and when it doesn't, there's usually a better web-standard alternative.

// This works fine in Workers
import { createHash } from "crypto";import { Buffer } from "buffer";
// This doesn't (but you don't need it on the edge)
import fs from "fs"; 
// ❌ No file system access
import os from "os"; 
// ❌ Limited OS APIs
// Use bindings instead
const file = await env.BUCKET.get("data.json");
// ✅ Better than fs

The mental shift is worth it. Instead of thinking "how do I make this Node.js code work?", you start thinking "how do I build this for the edge?" The result is cleaner, more performant code that scales globally.

We've found that 95% of what we wanted to do "just works." The remaining 5% usually led us to better architectural decisions anyway.

——

Next in the Series:

**[Part 2: Hono + Dynamic Loading - How We Fit an AI Platform in 128MB](./part-2-hono-framework.md)** - Discover why we chose Hono over Express/Fastify and how dynamic loading lets us run 50+ endpoints in Workers' memory constraints.
**[Part 3: How We Built a Sub-10ms AI Pipeline on the Edge](./part-3-ai-pipeline.md)** - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
**[Part 4: From LangGraph to Mastra - Our AI Orchestration Journey](./part-4-from-langgraph-to-mastra.md)** - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.

This is Part 1 of our four-part series on building AI-powered applications on the edge. In this post, we'll cover the foundational Cloudflare architecture. Part 2 explores [how we use Hono framework and dynamic loading to fit everything in 128MB](./part-2-hono-framework.md). Part 3 dives into [our sub-10ms AI pipeline implementation](./part-3-ai-pipeline.md), and Part 4 details [our journey from LangGraph to Mastra](./part-4-from-langgraph-to-mastra.md) for AI orchestration.

Table of Contents

  • [The Workers Runtime](#the-workers-runtime-where-it-all-started)

  • [Bindings: Zero-Latency Service Connections](#bindings-the-secret-sauce-nobody-talks-about)

  • [R2: Object Storage with Zero Egress](#r2-the-s3-killer-with-zero-egress-fees)

  • [Queues: Async Processing That Works](#queues-async-processing-that-actually-works)

  • [Durable Objects: Distributed Coordination](#durable-objects-the-distributed-systems-cheat-code)

  • [KV: Simple Global Caching](#kv-the-cache-thats-actually-simple)

  • [Edge Positioning Advantages](#the-edge-positioning-advantage)- [Economics & Cost Analysis](#the-economics-of-it-all)

  • [Developer Experience](#the-developer-experience)- [Lessons Learned](#what-we-learned)

——

When we started building Kasava, we had a choice to make. We could go the traditional route with AWS Lambda, maybe some EC2 instances, deal with VPCs, cold starts, and the inevitable "why is our AWS bill $10k this month?" conversation. Or we could try something different.

We chose different. We chose Cloudflare Workers. And honestly? It's been one of the best architectural decisions we've made.

The Workers Runtime

When you're building an AI-powered platform that needs to process GitHub webhooks, run semantic search across millions of lines of code, and respond to natural language queries in real-time, you need speed. Cloudflare Workers gave us that. With V8 isolates instead of containers, we're talking sub-5ms cold starts globally. Not 5 seconds like Lambda -- 5 _milliseconds_. When a GitHub webhook hits our API, we're already processing it before Lambda would even wake up.

V8 Isolates vs Node.js

Here's where it gets interesting. Most serverless platforms, including AWS Lambda, run your code in containers or virtual machines. Each function gets its own isolated environment, which sounds great until you realize the overhead involved.

Traditional Node.js serverless (like Lambda):

  • Each function runs in its own container/VM

  • Full Node.js runtime per function instance

  • Cold starts measured in seconds

  • Memory overhead of ~50-100MB per instance

  • Process-level isolation

Cloudflare Workers with V8 Isolates:

  • Hundreds of isolates run within a single V8 runtime

  • Shared JavaScript engine across all functions- Cold starts measured in milliseconds

  • Memory overhead of ~5-10MB per isolate

  • V8-level isolation (same security model as Chrome tabs)

Think of it like the difference between giving each person their own house (containers) versus having hundreds of secure apartments in the same building (isolates). You get effectively the same security and isolation but with dramatically less overhead.

A single Workers runtime instance can handle thousands of isolates simultaneously, seamlessly switching between them in microseconds. When your code needs to run, it's not spinning up a new container or booting a Node.js process -- it's just creating a lightweight context within an already-running V8 engine.

This is why we can process GitHub webhooks in under 10ms globally while traditional serverless is still waking up.

But here's where it gets interesting...

Bindings: The Secret Sauce Nobody Talks About

You know what kills performance in serverless? Network calls. Every time you need to hit a database, call another service, or fetch from storage, you're adding latency. AWS makes you jump through VPC hoops, configure security groups, and pray to the networking gods. If you know you know.
Enter bindings. These are zero-latency connections between Workers and other Cloudflare services. No network overhead. No authentication dance. Just the purest form of syntactical sugar and automagical, instant access.
```typescriptconst s3 = new AWS.S3({ region: 'us-west-2', credentials: {...} });await s3.getObject({ Bucket: 'my-bucket', Key: 'file.txt' }).promise();
// We just do this:const file = await env.KASAVA_PROFILE_DOCUMENTS_BUCKET.get('file.txt');```
This is a simplified example but the real value becomes apparent when you have bindings for everything in your stack and can treat them as just services within your existing codebase:
- **R2 Buckets**: 5 different buckets for documents, recordings, archives- **KV Namespaces**: Session storage and embedding cache- **Queues**: 18 queues (9 main + 9 DLQ) for async processing- **Durable Objects**: Real-time chat sessions and distributed coordination- **Vectorize**: Experimental vector indexes for semantic search
Each binding is just there, available instantly, no configuration needed in code.

R2: The S3 Killer with Zero Egress Fees

Let's talk about the elephant in the room -- egress fees. AWS charges $90 per TB for data transfer. If you're serving videos, documents, or any kind of media, this adds up fast. Really fast.
Cloudflare R2? Zero. Zilch. Nada. Free egress.
We store everything in R2:
- Chrome extension recordings (up to 100MB per bug report)- Organization documents- GitHub comment archives- Profile documents- Indexing artifacts from our code analysis

// Storing a 50MB screen recording
await env.RECORDING_STORAGE.put(`recordings/${bugReportId}/screen.webm`,
                                recording,  
                                { httpMetadata: { contentType: "video/webm" },   
                                  customMetadata: { userId, timestamp: Date.now().toString() },
                                }
                               );
// Cost on AWS S3: Storage + egress fees every time someone views it
// Cost on R2: Just storage ($15/TB/month), zero egress


One of our users downloaded 3.2TB of historical data last month. On AWS? That would've been ~$300 just in egress. On Cloudflare? Free.

Queues: Async Processing That Actually Works

Here's something nobody tells you about serverless -- handling async work is a pain. With Lambda, you're either chaining functions together (and paying for wait time), using SQS (more complexity), or giving up and running ECS tasks.
Cloudflare Queues integrate directly with Workers. No external services, no additional authentication, just push to a queue and consume from it.

// Producer side - instant response to webhook
export default {  
  async fetch(request: Request, env: Env) {    
    const event = await request.json();
    // Queue it and return immediately                 
    await env.GITHUB_EVENT_QUEUE.send(event);
    return new Response("OK", { status: 200 });  
  },
};

// Consumer side - process in background
export default { 
  async queue(batch: MessageBatch, env: Env) {
    for (const message of batch.messages) { 
      await processGitHubEvent(message.body);    
      message.ack(); // Mark as processed  
    } 
  },
};

We run 9 different queues for different workloads:

  • repository-indexing: Orchestrates parallel code analysis (processes 10,000+ files in under 5 minutes!)

  • file-indexing: 50-file batches with 100 concurrent workers- **embedding-generation**: 128-text batches for Voyage AI

  • github-events: Primary webhook processing- Plus 9 matching DLQs for failed message handling


Each queue has its own configuration -- batch sizes, timeouts, retry policies. All managed through `wrangler.jsonc`, deployed with a single command.

Durable Objects: The Distributed Systems Cheat Code

This is where things get really interesting. We needed to coordinate indexing jobs across 100+ parallel workers without database contention. Traditional approach? Distributed locks, Redis, maybe Zookeeper if you hate yourself.
Cloudflare's answer? Durable Objects.


export class IndexingCoordinator { 
  private state: DurableObjectState; 
  private jobs: Map<string, JobState>
  
  async claimJob(workerId: string): Promise<Job | null> {    
    // This runs in exactly one place globally   
    // No race conditions, no distributed locks  
    const availableJob = this.findAvailableJob();  
    if (availableJob) {     
      availableJob.workerId = workerId;   
    availableJob.claimedAt = Date.now(); 
    await this.state.storage.put(`job:${availableJob.id}`, availableJob);  
    }    
    return availableJob; 
    }
}


Each Durable Object is a single-threaded JavaScript environment that's globally unique. Perfect for:

  • Coordinating our parallel indexing workers

  • Managing WebSocket connections for real-time chat

  • Maintaining session state without a database


No locks. No race conditions. Just JavaScript running in exactly one place.

KV: The Cache That's Actually Simple

Redis is great. Until you have to manage it. Provision it. Scale it. Deal with connection pools. Handle failover.
Cloudflare KV is just... there. A globally distributed key-value store with no configuration.

// Caching embedding results
const cacheKey = `embedding:${model}:${hashText(content)}`;
const cached = await env.EMBEDDING_CACHE.get(cacheKey);

if (cached) {  
  return JSON.parse(cached); // Sub-10ms for hot keys
}

const embedding = await generateEmbedding(content);

await env.EMBEDDING_CACHE.put(  
  cacheKey,  
  JSON.stringify(embedding),  
  { expirationTtl: 3600 } // 1 hour TTL
);

We cache everything:

  • Session data (15-minute TTL)

  • Embedding results (1-hour TTL)

  • API responses

  • Search results

The Edge Positioning Advantage

Here's the thing about running on the edge -- your code runs where your users are. Not in us-east-1. Not in three regions you carefully selected. Everywhere.

When someone in Tokyo hits our API, they're hitting a worker in Tokyo. Someone in São Paulo? Worker in São Paulo. No CDN configuration, no geo-routing rules. It just works.

This matters more than you think. We've seen:

  • 200ms faster response times compared to centralized deployments

  • Consistent performance regardless of user location

  • Natural resilience (if one location has issues, traffic routes elsewhere)

The Economics of It All

Let's talk money. Because at the end of the day, this stuff has to make business sense.

Traditional AWS Setup:
  • Lambda: Pay for execution time + cold starts

  • S3: $23/TB storage + $90/TB egress

  • SQS: $0.40 per million messages

  • ElastiCache: Starting at $15/month

  • CloudFront: Complex pricing, egress fees

  • NAT Gateway: $45/month + data processing

Our Cloudflare Setup:
  • Workers: 100k requests/day free, then $0.15 per million

  • R2: $15/TB storage, **zero egress**

  • Queues: Included with Workers

  • KV: 100k reads/day free

  • Durable Objects: $0.15 per million requests- Everything runs on the edge

We're saving 60-80% compared to an equivalent AWS setup. But more importantly, we're not managing infrastructure. No VPCs, no security groups, no capacity planning.

The Developer Experience

You know what's underrated? Being able to test everything locally. Wrangler (Cloudflare's CLI) lets us run the entire stack:

npm run dev  
# Thats it. Workers, KV, R2, Queues, all running locally

Deployment? One command:

npm run deploy  # Code is live globally in under 30 seconds

Compare that to setting up LocalStack, configuring AWS SAM, dealing with Docker containers... yeah, no thanks.

What We Learned

After six months of running everything on Cloudflare, here's what we've learned:

The Good:
  • Performance is incredible (sub-5ms cold starts still blow my mind)

  • Zero egress fees change how you think about architecture

  • Bindings eliminate entire categories of problems

  • Global by default is powerful

  • The simplicity is addictive

The Tradeoffs:
  • 128MB memory limit per worker (but you'd be surprised what fits)

  • 30-second CPU time limit (queues handle long-running tasks)

  • Different mental model from traditional serverless

  • Some services still experimental (Vectorize)

The Unexpected:
  • Durable Objects solved problems we didn't know we had

  • Queue coordination is smoother than any message broker we've used

  • The platform keeps getting better (recent 10x Queue performance improvement!)

Why This Matters

We're processing millions of GitHub events, running semantic search across gigabytes of code, generating embeddings with Voyage AI, coordinating 100+ parallel workers, and serving it all globally with sub-100ms latency. On a platform that costs us less than a decent coffee machine per month. But here's the real thing -- we're a small team. We don't have a DevOps person. We don't need one. Cloudflare handles the infrastructure so we can focus on building the product.

Is it perfect? No. Would I build Kasava on AWS if I had to start over? Also no. Sometimes the best architectural decision isn't about choosing the most popular option or the one with the most features. Sometimes it's about choosing the one that lets you ship fast, iterate quickly, and sleep at night knowing your infrastructure just works.

For us, that's Cloudflare. All in on the edge, and not looking back.

The Node.js Compatibility Tradeoff

Let's be honest about the elephant in the room -- Cloudflare Workers aren't Node.js. They run JavaScript in V8 isolates, which means you're giving up some Node.js compatibility for those incredible performance gains.

What you lose:
  • Full Node.js standard library (no `fs`, limited `os`, no native modules)

  • Some npm packages that rely heavily on Node.js internals (this became an issue when trying to implement tree-sitter for code parsing -- more on that later...)

  • Direct file system access (everything goes through bindings)

  • Long-running processes (30-second CPU time limit)

What Cloudflare provides:
  • Native support for most common Node.js APIs (crypto, buffer, streams, HTTP)

  • Polyfills for unsupported APIs via the `nodejs_compat` flag

  • Web standard APIs that often work better than Node.js equivalents

  • Automatic bundling that handles most compatibility issues

In practice? It's rarely a problem. The nodejs_compat flag handles most edge cases, and when it doesn't, there's usually a better web-standard alternative.

// This works fine in Workers
import { createHash } from "crypto";import { Buffer } from "buffer";
// This doesn't (but you don't need it on the edge)
import fs from "fs"; 
// ❌ No file system access
import os from "os"; 
// ❌ Limited OS APIs
// Use bindings instead
const file = await env.BUCKET.get("data.json");
// ✅ Better than fs

The mental shift is worth it. Instead of thinking "how do I make this Node.js code work?", you start thinking "how do I build this for the edge?" The result is cleaner, more performant code that scales globally.

We've found that 95% of what we wanted to do "just works." The remaining 5% usually led us to better architectural decisions anyway.

——

Next in the Series:

**[Part 2: Hono + Dynamic Loading - How We Fit an AI Platform in 128MB](./part-2-hono-framework.md)** - Discover why we chose Hono over Express/Fastify and how dynamic loading lets us run 50+ endpoints in Workers' memory constraints.
**[Part 3: How We Built a Sub-10ms AI Pipeline on the Edge](./part-3-ai-pipeline.md)** - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
**[Part 4: From LangGraph to Mastra - Our AI Orchestration Journey](./part-4-from-langgraph-to-mastra.md)** - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.

Start Building with Momentum

Momentum empowers you to unleash your creativity and build anything you can imagine.

Start Building with Momentum

Momentum empowers you to unleash your creativity and build anything you can imagine.

Start Building with Momentum

Momentum empowers you to unleash your creativity and build anything you can imagine.

Kasava

No Spam. Just Product updates.

Kasava. All right reserved. © 2025

Kasava

No Spam. Just Product updates.

Kasava. All right reserved. © 2025

Kasava

No Spam. Just Product updates.

Kasava. All right reserved. © 2025

Kasava

No Spam. Just Product updates.

Kasava. All right reserved. © 2025