The Complete Edge Architecture Guide (Part 1): Why We Went All-In on Cloudflare

The Complete Edge Architecture Guide (Part 1): Why We Went All-In on Cloudflare

Sep 3, 2025

When we started building Kasava, we had a choice to make. We could go the traditional route with AWS Lambda, maybe some EC2 instances, deal with VPCs, cold starts, and the inevitable "why is our AWS bill $10k this month?" conversation. Or we could try something different.

We chose different. We chose Cloudflare Workers. And honestly? It's been one of the best architectural decisions we've made.

The Workers Runtime

When you're building an AI-powered platform that needs to process GitHub webhooks, run semantic search across millions of lines of code, and respond to natural language queries in real-time, you need speed. Cloudflare Workers gave us that. With V8 isolates instead of containers, we're talking sub-5ms cold starts globally. Not 5 seconds like Lambda -- 5 _milliseconds_. When a GitHub webhook hits our API, we're already processing it before Lambda would even wake up.

V8 Isolates vs Node.js

Here's where it gets interesting. Most serverless platforms, including AWS Lambda, run your code in containers or virtual machines. Each function gets its own isolated environment, which sounds great until you realize the overhead involved.

Traditional Node.js serverless (like Lambda):

  • Each function runs in its own container/VM

  • Full Node.js runtime per function instance

  • Cold starts measured in seconds

  • Memory overhead of ~50-100MB per instance

  • Process-level isolation

Cloudflare Workers with V8 Isolates:

  • Hundreds of isolates run within a single V8 runtime

  • Shared JavaScript engine across all functions- Cold starts measured in milliseconds

  • Memory overhead of ~5-10MB per isolate

  • V8-level isolation (same security model as Chrome tabs)

Think of it like the difference between giving each person their own house (containers) versus having hundreds of secure apartments in the same building (isolates). You get effectively the same security and isolation but with dramatically less overhead.

A single Workers runtime instance can handle thousands of isolates simultaneously, seamlessly switching between them in microseconds. When your code needs to run, it's not spinning up a new container or booting a Node.js process -- it's just creating a lightweight context within an already-running V8 engine.

This is why we can process GitHub webhooks in under 10ms globally while traditional serverless is still waking up.

What you lose:
  • Full Node.js standard library (no `fs`, limited `os`, no native modules)

  • Some npm packages that rely heavily on Node.js internals (this became an issue when trying to implement tree-sitter for code parsing -- more on that later...)

  • Direct file system access (everything goes through bindings)

  • Long-running processes (30-second CPU time limit)

What Cloudflare provides:
  • Native support for most common Node.js APIs (crypto, buffer, streams, HTTP)

  • Polyfills for unsupported APIs via the `nodejs_compat` flag

  • Web standard APIs that often work better than Node.js equivalents

  • Automatic bundling that handles most compatibility issues

In practice? It's rarely a problem. The nodejs_compat flag handles most edge cases, and when it doesn't, there's usually a better web-standard alternative.

// This works fine in Workers
import { createHash } from "crypto";import { Buffer } from "buffer";
// This doesn't (but you don't need it on the edge)
import fs from "fs"; 
// ❌ No file system access
import os from "os"; 
// ❌ Limited OS APIs
// Use bindings instead
const file = await env.BUCKET.get("data.json");
// ✅ Better than fs

We've found that 95% of what we wanted to do "just works." The remaining 5% usually led us to better architectural decisions anyway.


Bindings: The Secret Sauce Nobody Talks About

You know what kills performance in serverless? Network calls. Every time you need to hit a database, call another service, or fetch from storage, you're adding latency. AWS makes you jump through VPC hoops, configure security groups, and pray to the networking gods. If you know you know.
Enter bindings. These are zero-latency connections between Workers and other Cloudflare services. No network overhead. No authentication dance. Just the purest form of syntactical sugar and automagical, instant access.

const s3 = new AWS.S3({ region: 'us-west-2', credentials: {...} });
await s3.getObject({ Bucket: 'my-bucket', Key: 'file.txt' }).promise();

// We just do this:

const file = await env.KASAVA_PROFILE_DOCUMENTS_BUCKET.get('file.txt');

This is a simplified example but the real value becomes apparent when you have bindings for everything in your stack and can treat them as just services within your existing codebase:

  • R2 Buckets: 5 different buckets for documents, recordings, archives

  • KV Namespaces: Session storage and embedding cache

  • Queues: 18 queues (9 main + 9 DLQ) for async processing

  • Durable Objects: Real-time chat sessions and distributed coordination

  • Vectorize: Experimental vector indexes for semantic search

Each binding is just there, available instantly, no configuration needed in code.

R2: The S3 Killer with Zero Egress Fees

Let's talk about the elephant in the room -- egress fees. AWS charges $90 per TB for data transfer. If you're serving videos, documents, or any kind of media, this adds up fast. Really fast.
Cloudflare R2? Zero. Zilch. Nada. Free egress.


We store everything in R2:

  • Chrome extension recordings (up to 100MB per bug report)

  • Organization documents

  • GitHub comment archives

  • Profile documents

  • Indexing artifacts from our code analysis

// Storing a 50MB screen recording
await env.RECORDING_STORAGE.put(`recordings/${bugReportId}/screen.webm`,
                                recording,  
                                { httpMetadata: { contentType: "video/webm" },   
                                  customMetadata: { userId, 
                                                   timestamp: Date.now().toString()
                                                  },
                                }
                               );

// Cost on AWS S3: Storage + egress fees every time someone views it
// Cost on R2: Just storage ($15/TB/month), zero egress

Queues: Async Processing That Actually Works

Here's something nobody tells you about serverless -- handling async work is a pain. With Lambda, you're either chaining functions together (and paying for wait time), using SQS (more complexity), or giving up and running ECS tasks.

Cloudflare Queues integrate directly with Workers. No external services, no additional authentication, just push to a queue and consume from it.

// Producer side - instant response to webhook
export default {  
  async fetch(request: Request, env: Env) {    
    const event = await request.json();
    // Queue it and return immediately                 
    await env.GITHUB_EVENT_QUEUE.send(event);
    return new Response("OK", { status: 200 });  
  },
};

// Consumer side - process in background
export default { 
  async queue(batch: MessageBatch, env: Env) {
    for (const message of batch.messages) { 
      await processGitHubEvent(message.body);    
      message.ack(); // Mark as processed  
    } 
  },
};

We run 4 different queues for different workloads:

  • repository-indexing: Orchestrates parallel code analysis (processes 10,000+ files in under 5 minutes!)

  • file-indexing: 50-file batches with 100 concurrent workers

  • embedding-generation: 128-text batches for Voyage AI

  • github-events: Primary webhook processing

  • Plus 4 matching DLQs for failed message handling

Each queue has its own configuration -- batch sizes, timeouts, retry policies. All managed through wrangler.jsonc, deployed with a single command.

Durable Objects: The Distributed Systems Cheat Code

This is where things get really interesting. We needed to coordinate indexing jobs across 100+ parallel workers without database contention. Traditional approach? Distributed locks, Redis, maybe Zookeeper if you hate yourself.
Cloudflare's answer? Durable Objects.

export class IndexingCoordinator { 
  private state: DurableObjectState; 
  private jobs: Map<string, JobState>
  
  async claimJob(workerId: string): Promise<Job | null> {    
    // This runs in exactly one place globally   
    // No race conditions, no distributed locks  
    const availableJob = this.findAvailableJob();  
    if (availableJob) {     
      availableJob.workerId = workerId;   
    availableJob.claimedAt = Date.now(); 
    await this.state.storage.put(`job:${availableJob.id}`, availableJob);  
    }    
    return availableJob; 
    }
}

Each Durable Object is a single-threaded JavaScript environment that's globally unique. Perfect for:

  • Coordinating our parallel indexing workers

  • Managing WebSocket connections for real-time chat

  • Maintaining session state without a database

  • No locks. No race conditions. Just JavaScript running in exactly one place.

KV: The Cache That's Actually Simple

Redis is great. Until you have to manage it. Provision it. Scale it. Deal with connection pools. Handle failover.
Cloudflare KV is just... there. A globally distributed key-value store with no configuration.

// Caching embedding results
const cacheKey = `embedding:${model}:${hashText(content)}`;
const cached = await env.EMBEDDING_CACHE.get(cacheKey);

if (cached) {  
  return JSON.parse(cached); // Sub-10ms for hot keys
}

const embedding = await generateEmbedding(content);

await env.EMBEDDING_CACHE.put(  
  cacheKey,  
  JSON.stringify(embedding),  
  { expirationTtl: 3600 } // 1 hour TTL
);

We cache everything:

  • Session data (15-minute TTL)

  • Embedding results (1-hour TTL)

  • API responses

  • Search results

The Edge Positioning Advantage

Here's the thing about running on the edge -- your code runs where your users are. Not in us-east-1. Not in three regions you carefully selected. Everywhere.

When someone in Tokyo hits our API, they're hitting a worker in Tokyo. Someone in São Paulo? Worker in São Paulo. No CDN configuration, no geo-routing rules. It just works.

This matters more than you think. We've seen:

  • 200ms faster response times compared to centralized deployments

  • Consistent performance regardless of user location

  • Natural resilience (if one location has issues, traffic routes elsewhere)

The Economics of It All

Let's talk money. Because at the end of the day, this stuff has to make business sense.

Traditional AWS Setup:
  • Lambda: Pay for execution time + cold starts

  • S3: $23/TB storage + $90/TB egress

  • SQS: $0.40 per million messages

  • ElastiCache: Starting at $15/month

  • CloudFront: Complex pricing, egress fees

  • NAT Gateway: $45/month + data processing

Our Cloudflare Setup:
  • Workers: 100k requests/day free, then $0.15 per million

  • R2: $15/TB storage, **zero egress**

  • Queues: Included with Workers

  • KV: 100k reads/day free

  • Durable Objects: $0.15 per million requests- Everything runs on the edge

We're saving 60-80% compared to an equivalent AWS setup. But more importantly, we're not managing infrastructure. No VPCs, no security groups, no capacity planning.


What We Learned

After six months of running everything on Cloudflare, here's what we've learned:

The Good:
  • Performance is incredible (sub-5ms cold starts still blow my mind)

  • Zero egress fees change how you think about architecture

  • Bindings eliminate entire categories of problems

  • Global by default is powerful

  • The simplicity is addictive

The Tradeoffs:
  • 128MB memory limit per worker (but you'd be surprised what fits)

  • 30-second CPU time limit (queues handle long-running tasks)

  • Different mental model from traditional serverless

  • Some services still experimental (Vectorize)

The Unexpected:
  • Durable Objects solved problems we didn't know we had

  • Queue coordination is smoother than any message broker we've used

  • The platform keeps getting better (recent 10x Queue performance improvement!)

Why This Matters

Is it perfect? No. Would I build Kasava on AWS if I had to start over? Also no. Sometimes the best architectural decision isn't about choosing the most popular option or the one with the most features. Sometimes it's about choosing the one that lets you ship fast, iterate quickly, and sleep at night knowing your infrastructure just works.

For us, that's Cloudflare. All in on the edge, and not looking back.

——

Next in the Series:

Part 2: Hono + Dynamic Loading - How We Fit an AI Platform in 128MB] - Discover why we chose Hono over Express/Fastify and how dynamic loading lets us run 50+ endpoints in Workers' memory constraints.
Part 3: How We Built a AI Pipeline on the Edge - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
Part 4: From LangGraph to Mastra - Our AI Orchestration Journey - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.

When we started building Kasava, we had a choice to make. We could go the traditional route with AWS Lambda, maybe some EC2 instances, deal with VPCs, cold starts, and the inevitable "why is our AWS bill $10k this month?" conversation. Or we could try something different.

We chose different. We chose Cloudflare Workers. And honestly? It's been one of the best architectural decisions we've made.

The Workers Runtime

When you're building an AI-powered platform that needs to process GitHub webhooks, run semantic search across millions of lines of code, and respond to natural language queries in real-time, you need speed. Cloudflare Workers gave us that. With V8 isolates instead of containers, we're talking sub-5ms cold starts globally. Not 5 seconds like Lambda -- 5 _milliseconds_. When a GitHub webhook hits our API, we're already processing it before Lambda would even wake up.

V8 Isolates vs Node.js

Here's where it gets interesting. Most serverless platforms, including AWS Lambda, run your code in containers or virtual machines. Each function gets its own isolated environment, which sounds great until you realize the overhead involved.

Traditional Node.js serverless (like Lambda):

  • Each function runs in its own container/VM

  • Full Node.js runtime per function instance

  • Cold starts measured in seconds

  • Memory overhead of ~50-100MB per instance

  • Process-level isolation

Cloudflare Workers with V8 Isolates:

  • Hundreds of isolates run within a single V8 runtime

  • Shared JavaScript engine across all functions- Cold starts measured in milliseconds

  • Memory overhead of ~5-10MB per isolate

  • V8-level isolation (same security model as Chrome tabs)

Think of it like the difference between giving each person their own house (containers) versus having hundreds of secure apartments in the same building (isolates). You get effectively the same security and isolation but with dramatically less overhead.

A single Workers runtime instance can handle thousands of isolates simultaneously, seamlessly switching between them in microseconds. When your code needs to run, it's not spinning up a new container or booting a Node.js process -- it's just creating a lightweight context within an already-running V8 engine.

This is why we can process GitHub webhooks in under 10ms globally while traditional serverless is still waking up.

What you lose:
  • Full Node.js standard library (no `fs`, limited `os`, no native modules)

  • Some npm packages that rely heavily on Node.js internals (this became an issue when trying to implement tree-sitter for code parsing -- more on that later...)

  • Direct file system access (everything goes through bindings)

  • Long-running processes (30-second CPU time limit)

What Cloudflare provides:
  • Native support for most common Node.js APIs (crypto, buffer, streams, HTTP)

  • Polyfills for unsupported APIs via the `nodejs_compat` flag

  • Web standard APIs that often work better than Node.js equivalents

  • Automatic bundling that handles most compatibility issues

In practice? It's rarely a problem. The nodejs_compat flag handles most edge cases, and when it doesn't, there's usually a better web-standard alternative.

// This works fine in Workers
import { createHash } from "crypto";import { Buffer } from "buffer";
// This doesn't (but you don't need it on the edge)
import fs from "fs"; 
// ❌ No file system access
import os from "os"; 
// ❌ Limited OS APIs
// Use bindings instead
const file = await env.BUCKET.get("data.json");
// ✅ Better than fs

We've found that 95% of what we wanted to do "just works." The remaining 5% usually led us to better architectural decisions anyway.


Bindings: The Secret Sauce Nobody Talks About

You know what kills performance in serverless? Network calls. Every time you need to hit a database, call another service, or fetch from storage, you're adding latency. AWS makes you jump through VPC hoops, configure security groups, and pray to the networking gods. If you know you know.
Enter bindings. These are zero-latency connections between Workers and other Cloudflare services. No network overhead. No authentication dance. Just the purest form of syntactical sugar and automagical, instant access.

const s3 = new AWS.S3({ region: 'us-west-2', credentials: {...} });
await s3.getObject({ Bucket: 'my-bucket', Key: 'file.txt' }).promise();

// We just do this:

const file = await env.KASAVA_PROFILE_DOCUMENTS_BUCKET.get('file.txt');

This is a simplified example but the real value becomes apparent when you have bindings for everything in your stack and can treat them as just services within your existing codebase:

  • R2 Buckets: 5 different buckets for documents, recordings, archives

  • KV Namespaces: Session storage and embedding cache

  • Queues: 18 queues (9 main + 9 DLQ) for async processing

  • Durable Objects: Real-time chat sessions and distributed coordination

  • Vectorize: Experimental vector indexes for semantic search

Each binding is just there, available instantly, no configuration needed in code.

R2: The S3 Killer with Zero Egress Fees

Let's talk about the elephant in the room -- egress fees. AWS charges $90 per TB for data transfer. If you're serving videos, documents, or any kind of media, this adds up fast. Really fast.
Cloudflare R2? Zero. Zilch. Nada. Free egress.


We store everything in R2:

  • Chrome extension recordings (up to 100MB per bug report)

  • Organization documents

  • GitHub comment archives

  • Profile documents

  • Indexing artifacts from our code analysis

// Storing a 50MB screen recording
await env.RECORDING_STORAGE.put(`recordings/${bugReportId}/screen.webm`,
                                recording,  
                                { httpMetadata: { contentType: "video/webm" },   
                                  customMetadata: { userId, 
                                                   timestamp: Date.now().toString()
                                                  },
                                }
                               );

// Cost on AWS S3: Storage + egress fees every time someone views it
// Cost on R2: Just storage ($15/TB/month), zero egress

Queues: Async Processing That Actually Works

Here's something nobody tells you about serverless -- handling async work is a pain. With Lambda, you're either chaining functions together (and paying for wait time), using SQS (more complexity), or giving up and running ECS tasks.

Cloudflare Queues integrate directly with Workers. No external services, no additional authentication, just push to a queue and consume from it.

// Producer side - instant response to webhook
export default {  
  async fetch(request: Request, env: Env) {    
    const event = await request.json();
    // Queue it and return immediately                 
    await env.GITHUB_EVENT_QUEUE.send(event);
    return new Response("OK", { status: 200 });  
  },
};

// Consumer side - process in background
export default { 
  async queue(batch: MessageBatch, env: Env) {
    for (const message of batch.messages) { 
      await processGitHubEvent(message.body);    
      message.ack(); // Mark as processed  
    } 
  },
};

We run 4 different queues for different workloads:

  • repository-indexing: Orchestrates parallel code analysis (processes 10,000+ files in under 5 minutes!)

  • file-indexing: 50-file batches with 100 concurrent workers

  • embedding-generation: 128-text batches for Voyage AI

  • github-events: Primary webhook processing

  • Plus 4 matching DLQs for failed message handling

Each queue has its own configuration -- batch sizes, timeouts, retry policies. All managed through wrangler.jsonc, deployed with a single command.

Durable Objects: The Distributed Systems Cheat Code

This is where things get really interesting. We needed to coordinate indexing jobs across 100+ parallel workers without database contention. Traditional approach? Distributed locks, Redis, maybe Zookeeper if you hate yourself.
Cloudflare's answer? Durable Objects.

export class IndexingCoordinator { 
  private state: DurableObjectState; 
  private jobs: Map<string, JobState>
  
  async claimJob(workerId: string): Promise<Job | null> {    
    // This runs in exactly one place globally   
    // No race conditions, no distributed locks  
    const availableJob = this.findAvailableJob();  
    if (availableJob) {     
      availableJob.workerId = workerId;   
    availableJob.claimedAt = Date.now(); 
    await this.state.storage.put(`job:${availableJob.id}`, availableJob);  
    }    
    return availableJob; 
    }
}

Each Durable Object is a single-threaded JavaScript environment that's globally unique. Perfect for:

  • Coordinating our parallel indexing workers

  • Managing WebSocket connections for real-time chat

  • Maintaining session state without a database

  • No locks. No race conditions. Just JavaScript running in exactly one place.

KV: The Cache That's Actually Simple

Redis is great. Until you have to manage it. Provision it. Scale it. Deal with connection pools. Handle failover.
Cloudflare KV is just... there. A globally distributed key-value store with no configuration.

// Caching embedding results
const cacheKey = `embedding:${model}:${hashText(content)}`;
const cached = await env.EMBEDDING_CACHE.get(cacheKey);

if (cached) {  
  return JSON.parse(cached); // Sub-10ms for hot keys
}

const embedding = await generateEmbedding(content);

await env.EMBEDDING_CACHE.put(  
  cacheKey,  
  JSON.stringify(embedding),  
  { expirationTtl: 3600 } // 1 hour TTL
);

We cache everything:

  • Session data (15-minute TTL)

  • Embedding results (1-hour TTL)

  • API responses

  • Search results

The Edge Positioning Advantage

Here's the thing about running on the edge -- your code runs where your users are. Not in us-east-1. Not in three regions you carefully selected. Everywhere.

When someone in Tokyo hits our API, they're hitting a worker in Tokyo. Someone in São Paulo? Worker in São Paulo. No CDN configuration, no geo-routing rules. It just works.

This matters more than you think. We've seen:

  • 200ms faster response times compared to centralized deployments

  • Consistent performance regardless of user location

  • Natural resilience (if one location has issues, traffic routes elsewhere)

The Economics of It All

Let's talk money. Because at the end of the day, this stuff has to make business sense.

Traditional AWS Setup:
  • Lambda: Pay for execution time + cold starts

  • S3: $23/TB storage + $90/TB egress

  • SQS: $0.40 per million messages

  • ElastiCache: Starting at $15/month

  • CloudFront: Complex pricing, egress fees

  • NAT Gateway: $45/month + data processing

Our Cloudflare Setup:
  • Workers: 100k requests/day free, then $0.15 per million

  • R2: $15/TB storage, **zero egress**

  • Queues: Included with Workers

  • KV: 100k reads/day free

  • Durable Objects: $0.15 per million requests- Everything runs on the edge

We're saving 60-80% compared to an equivalent AWS setup. But more importantly, we're not managing infrastructure. No VPCs, no security groups, no capacity planning.


What We Learned

After six months of running everything on Cloudflare, here's what we've learned:

The Good:
  • Performance is incredible (sub-5ms cold starts still blow my mind)

  • Zero egress fees change how you think about architecture

  • Bindings eliminate entire categories of problems

  • Global by default is powerful

  • The simplicity is addictive

The Tradeoffs:
  • 128MB memory limit per worker (but you'd be surprised what fits)

  • 30-second CPU time limit (queues handle long-running tasks)

  • Different mental model from traditional serverless

  • Some services still experimental (Vectorize)

The Unexpected:
  • Durable Objects solved problems we didn't know we had

  • Queue coordination is smoother than any message broker we've used

  • The platform keeps getting better (recent 10x Queue performance improvement!)

Why This Matters

Is it perfect? No. Would I build Kasava on AWS if I had to start over? Also no. Sometimes the best architectural decision isn't about choosing the most popular option or the one with the most features. Sometimes it's about choosing the one that lets you ship fast, iterate quickly, and sleep at night knowing your infrastructure just works.

For us, that's Cloudflare. All in on the edge, and not looking back.

——

Next in the Series:

Part 2: Hono + Dynamic Loading - How We Fit an AI Platform in 128MB] - Discover why we chose Hono over Express/Fastify and how dynamic loading lets us run 50+ endpoints in Workers' memory constraints.
Part 3: How We Built a AI Pipeline on the Edge - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
Part 4: From LangGraph to Mastra - Our AI Orchestration Journey - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.

Kasava

Kasava. All right reserved. © 2025

Kasava

Kasava. All right reserved. © 2025

Kasava

Kasava. All right reserved. © 2025

Kasava

Kasava. All right reserved. © 2025