The Complete Edge Architecture Guide (Part 1): Why We Went All-In on Cloudflare
The Complete Edge Architecture Guide (Part 1): Why We Went All-In on Cloudflare
Sep 3, 2025
When we started building Kasava, we had a choice to make. We could go the traditional route with AWS Lambda, maybe some EC2 instances, deal with VPCs, cold starts, and the inevitable "why is our AWS bill $10k this month?" conversation. Or we could try something different.
We chose different. We chose Cloudflare Workers. And honestly? It's been one of the best architectural decisions we've made.
The Workers Runtime
When you're building an AI-powered platform that needs to process GitHub webhooks, run semantic search across millions of lines of code, and respond to natural language queries in real-time, you need speed. Cloudflare Workers gave us that. With V8 isolates instead of containers, we're talking sub-5ms cold starts globally. Not 5 seconds like Lambda -- 5 _milliseconds_. When a GitHub webhook hits our API, we're already processing it before Lambda would even wake up.
V8 Isolates vs Node.js
Here's where it gets interesting. Most serverless platforms, including AWS Lambda, run your code in containers or virtual machines. Each function gets its own isolated environment, which sounds great until you realize the overhead involved.
Traditional Node.js serverless (like Lambda):
Each function runs in its own container/VM
Full Node.js runtime per function instance
Cold starts measured in seconds
Memory overhead of ~50-100MB per instance
Process-level isolation
Cloudflare Workers with V8 Isolates:
Hundreds of isolates run within a single V8 runtime
Shared JavaScript engine across all functions- Cold starts measured in milliseconds
Memory overhead of ~5-10MB per isolate
V8-level isolation (same security model as Chrome tabs)
Think of it like the difference between giving each person their own house (containers) versus having hundreds of secure apartments in the same building (isolates). You get effectively the same security and isolation but with dramatically less overhead.
A single Workers runtime instance can handle thousands of isolates simultaneously, seamlessly switching between them in microseconds. When your code needs to run, it's not spinning up a new container or booting a Node.js process -- it's just creating a lightweight context within an already-running V8 engine.
This is why we can process GitHub webhooks in under 10ms globally while traditional serverless is still waking up.
What you lose:
Full Node.js standard library (no `fs`, limited `os`, no native modules)
Some npm packages that rely heavily on Node.js internals (this became an issue when trying to implement tree-sitter for code parsing -- more on that later...)
Direct file system access (everything goes through bindings)
Long-running processes (30-second CPU time limit)
What Cloudflare provides:
Native support for most common Node.js APIs (crypto, buffer, streams, HTTP)
Polyfills for unsupported APIs via the `nodejs_compat` flag
Web standard APIs that often work better than Node.js equivalents
Automatic bundling that handles most compatibility issues
In practice? It's rarely a problem. The nodejs_compat
flag handles most edge cases, and when it doesn't, there's usually a better web-standard alternative.
// This works fine in Workers
import { createHash } from "crypto";import { Buffer } from "buffer";
// This doesn't (but you don't need it on the edge)
import fs from "fs";
// ❌ No file system access
import os from "os";
// ❌ Limited OS APIs
// Use bindings instead
const file = await env.BUCKET.get("data.json");
// ✅ Better than fs
We've found that 95% of what we wanted to do "just works." The remaining 5% usually led us to better architectural decisions anyway.
Bindings: The Secret Sauce Nobody Talks About
You know what kills performance in serverless? Network calls. Every time you need to hit a database, call another service, or fetch from storage, you're adding latency. AWS makes you jump through VPC hoops, configure security groups, and pray to the networking gods. If you know you know.
Enter bindings. These are zero-latency connections between Workers and other Cloudflare services. No network overhead. No authentication dance. Just the purest form of syntactical sugar and automagical, instant access.
const s3 = new AWS.S3({ region: 'us-west-2', credentials: {...} });
await s3.getObject({ Bucket: 'my-bucket', Key: 'file.txt' }).promise();
// We just do this:
const file = await env.KASAVA_PROFILE_DOCUMENTS_BUCKET.get('file.txt');
This is a simplified example but the real value becomes apparent when you have bindings for everything in your stack and can treat them as just services within your existing codebase:
R2 Buckets: 5 different buckets for documents, recordings, archives
KV Namespaces: Session storage and embedding cache
Queues: 18 queues (9 main + 9 DLQ) for async processing
Durable Objects: Real-time chat sessions and distributed coordination
Vectorize: Experimental vector indexes for semantic search
Each binding is just there, available instantly, no configuration needed in code.
R2: The S3 Killer with Zero Egress Fees
Let's talk about the elephant in the room -- egress fees. AWS charges $90 per TB for data transfer. If you're serving videos, documents, or any kind of media, this adds up fast. Really fast.
Cloudflare R2? Zero. Zilch. Nada. Free egress.
We store everything in R2:
Chrome extension recordings (up to 100MB per bug report)
Organization documents
GitHub comment archives
Profile documents
Indexing artifacts from our code analysis
// Storing a 50MB screen recording
await env.RECORDING_STORAGE.put(`recordings/${bugReportId}/screen.webm`,
recording,
{ httpMetadata: { contentType: "video/webm" },
customMetadata: { userId,
timestamp: Date.now().toString()
},
}
);
// Cost on AWS S3: Storage + egress fees every time someone views it
// Cost on R2: Just storage ($15/TB/month), zero egress
Queues: Async Processing That Actually Works
Here's something nobody tells you about serverless -- handling async work is a pain. With Lambda, you're either chaining functions together (and paying for wait time), using SQS (more complexity), or giving up and running ECS tasks.
Cloudflare Queues integrate directly with Workers. No external services, no additional authentication, just push to a queue and consume from it.
// Producer side - instant response to webhook
export default {
async fetch(request: Request, env: Env) {
const event = await request.json();
// Queue it and return immediately
await env.GITHUB_EVENT_QUEUE.send(event);
return new Response("OK", { status: 200 });
},
};
// Consumer side - process in background
export default {
async queue(batch: MessageBatch, env: Env) {
for (const message of batch.messages) {
await processGitHubEvent(message.body);
message.ack(); // Mark as processed
}
},
};
We run 4 different queues for different workloads:
repository-indexing: Orchestrates parallel code analysis (processes 10,000+ files in under 5 minutes!)
file-indexing: 50-file batches with 100 concurrent workers
embedding-generation: 128-text batches for Voyage AI
github-events: Primary webhook processing
Plus 4 matching DLQs for failed message handling
Each queue has its own configuration -- batch sizes, timeouts, retry policies. All managed through wrangler.jsonc
, deployed with a single command.
Durable Objects: The Distributed Systems Cheat Code
This is where things get really interesting. We needed to coordinate indexing jobs across 100+ parallel workers without database contention. Traditional approach? Distributed locks, Redis, maybe Zookeeper if you hate yourself.
Cloudflare's answer? Durable Objects.
export class IndexingCoordinator {
private state: DurableObjectState;
private jobs: Map<string, JobState>
async claimJob(workerId: string): Promise<Job | null> {
// This runs in exactly one place globally
// No race conditions, no distributed locks
const availableJob = this.findAvailableJob();
if (availableJob) {
availableJob.workerId = workerId;
availableJob.claimedAt = Date.now();
await this.state.storage.put(`job:${availableJob.id}`, availableJob);
}
return availableJob;
}
}
Each Durable Object is a single-threaded JavaScript environment that's globally unique. Perfect for:
Coordinating our parallel indexing workers
Managing WebSocket connections for real-time chat
Maintaining session state without a database
No locks. No race conditions. Just JavaScript running in exactly one place.
KV: The Cache That's Actually Simple
Redis is great. Until you have to manage it. Provision it. Scale it. Deal with connection pools. Handle failover.
Cloudflare KV is just... there. A globally distributed key-value store with no configuration.
// Caching embedding results
const cacheKey = `embedding:${model}:${hashText(content)}`;
const cached = await env.EMBEDDING_CACHE.get(cacheKey);
if (cached) {
return JSON.parse(cached); // Sub-10ms for hot keys
}
const embedding = await generateEmbedding(content);
await env.EMBEDDING_CACHE.put(
cacheKey,
JSON.stringify(embedding),
{ expirationTtl: 3600 } // 1 hour TTL
);
We cache everything:
Session data (15-minute TTL)
Embedding results (1-hour TTL)
API responses
Search results
The Edge Positioning Advantage
Here's the thing about running on the edge -- your code runs where your users are. Not in us-east-1. Not in three regions you carefully selected. Everywhere.
When someone in Tokyo hits our API, they're hitting a worker in Tokyo. Someone in São Paulo? Worker in São Paulo. No CDN configuration, no geo-routing rules. It just works.
This matters more than you think. We've seen:
200ms faster response times compared to centralized deployments
Consistent performance regardless of user location
Natural resilience (if one location has issues, traffic routes elsewhere)
The Economics of It All
Let's talk money. Because at the end of the day, this stuff has to make business sense.
Traditional AWS Setup:
Lambda: Pay for execution time + cold starts
S3: $23/TB storage + $90/TB egress
SQS: $0.40 per million messages
ElastiCache: Starting at $15/month
CloudFront: Complex pricing, egress fees
NAT Gateway: $45/month + data processing
Our Cloudflare Setup:
Workers: 100k requests/day free, then $0.15 per million
R2: $15/TB storage, **zero egress**
Queues: Included with Workers
KV: 100k reads/day free
Durable Objects: $0.15 per million requests- Everything runs on the edge
We're saving 60-80% compared to an equivalent AWS setup. But more importantly, we're not managing infrastructure. No VPCs, no security groups, no capacity planning.
What We Learned
After six months of running everything on Cloudflare, here's what we've learned:
The Good:
Performance is incredible (sub-5ms cold starts still blow my mind)
Zero egress fees change how you think about architecture
Bindings eliminate entire categories of problems
Global by default is powerful
The simplicity is addictive
The Tradeoffs:
128MB memory limit per worker (but you'd be surprised what fits)
30-second CPU time limit (queues handle long-running tasks)
Different mental model from traditional serverless
Some services still experimental (Vectorize)
The Unexpected:
Durable Objects solved problems we didn't know we had
Queue coordination is smoother than any message broker we've used
The platform keeps getting better (recent 10x Queue performance improvement!)
Why This Matters
Is it perfect? No. Would I build Kasava on AWS if I had to start over? Also no. Sometimes the best architectural decision isn't about choosing the most popular option or the one with the most features. Sometimes it's about choosing the one that lets you ship fast, iterate quickly, and sleep at night knowing your infrastructure just works.
For us, that's Cloudflare. All in on the edge, and not looking back.
——
Next in the Series:
Part 2: Hono + Dynamic Loading - How We Fit an AI Platform in 128MB] - Discover why we chose Hono over Express/Fastify and how dynamic loading lets us run 50+ endpoints in Workers' memory constraints.
Part 3: How We Built a AI Pipeline on the Edge - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
Part 4: From LangGraph to Mastra - Our AI Orchestration Journey - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.
When we started building Kasava, we had a choice to make. We could go the traditional route with AWS Lambda, maybe some EC2 instances, deal with VPCs, cold starts, and the inevitable "why is our AWS bill $10k this month?" conversation. Or we could try something different.
We chose different. We chose Cloudflare Workers. And honestly? It's been one of the best architectural decisions we've made.
The Workers Runtime
When you're building an AI-powered platform that needs to process GitHub webhooks, run semantic search across millions of lines of code, and respond to natural language queries in real-time, you need speed. Cloudflare Workers gave us that. With V8 isolates instead of containers, we're talking sub-5ms cold starts globally. Not 5 seconds like Lambda -- 5 _milliseconds_. When a GitHub webhook hits our API, we're already processing it before Lambda would even wake up.
V8 Isolates vs Node.js
Here's where it gets interesting. Most serverless platforms, including AWS Lambda, run your code in containers or virtual machines. Each function gets its own isolated environment, which sounds great until you realize the overhead involved.
Traditional Node.js serverless (like Lambda):
Each function runs in its own container/VM
Full Node.js runtime per function instance
Cold starts measured in seconds
Memory overhead of ~50-100MB per instance
Process-level isolation
Cloudflare Workers with V8 Isolates:
Hundreds of isolates run within a single V8 runtime
Shared JavaScript engine across all functions- Cold starts measured in milliseconds
Memory overhead of ~5-10MB per isolate
V8-level isolation (same security model as Chrome tabs)
Think of it like the difference between giving each person their own house (containers) versus having hundreds of secure apartments in the same building (isolates). You get effectively the same security and isolation but with dramatically less overhead.
A single Workers runtime instance can handle thousands of isolates simultaneously, seamlessly switching between them in microseconds. When your code needs to run, it's not spinning up a new container or booting a Node.js process -- it's just creating a lightweight context within an already-running V8 engine.
This is why we can process GitHub webhooks in under 10ms globally while traditional serverless is still waking up.
What you lose:
Full Node.js standard library (no `fs`, limited `os`, no native modules)
Some npm packages that rely heavily on Node.js internals (this became an issue when trying to implement tree-sitter for code parsing -- more on that later...)
Direct file system access (everything goes through bindings)
Long-running processes (30-second CPU time limit)
What Cloudflare provides:
Native support for most common Node.js APIs (crypto, buffer, streams, HTTP)
Polyfills for unsupported APIs via the `nodejs_compat` flag
Web standard APIs that often work better than Node.js equivalents
Automatic bundling that handles most compatibility issues
In practice? It's rarely a problem. The nodejs_compat
flag handles most edge cases, and when it doesn't, there's usually a better web-standard alternative.
// This works fine in Workers
import { createHash } from "crypto";import { Buffer } from "buffer";
// This doesn't (but you don't need it on the edge)
import fs from "fs";
// ❌ No file system access
import os from "os";
// ❌ Limited OS APIs
// Use bindings instead
const file = await env.BUCKET.get("data.json");
// ✅ Better than fs
We've found that 95% of what we wanted to do "just works." The remaining 5% usually led us to better architectural decisions anyway.
Bindings: The Secret Sauce Nobody Talks About
You know what kills performance in serverless? Network calls. Every time you need to hit a database, call another service, or fetch from storage, you're adding latency. AWS makes you jump through VPC hoops, configure security groups, and pray to the networking gods. If you know you know.
Enter bindings. These are zero-latency connections between Workers and other Cloudflare services. No network overhead. No authentication dance. Just the purest form of syntactical sugar and automagical, instant access.
const s3 = new AWS.S3({ region: 'us-west-2', credentials: {...} });
await s3.getObject({ Bucket: 'my-bucket', Key: 'file.txt' }).promise();
// We just do this:
const file = await env.KASAVA_PROFILE_DOCUMENTS_BUCKET.get('file.txt');
This is a simplified example but the real value becomes apparent when you have bindings for everything in your stack and can treat them as just services within your existing codebase:
R2 Buckets: 5 different buckets for documents, recordings, archives
KV Namespaces: Session storage and embedding cache
Queues: 18 queues (9 main + 9 DLQ) for async processing
Durable Objects: Real-time chat sessions and distributed coordination
Vectorize: Experimental vector indexes for semantic search
Each binding is just there, available instantly, no configuration needed in code.
R2: The S3 Killer with Zero Egress Fees
Let's talk about the elephant in the room -- egress fees. AWS charges $90 per TB for data transfer. If you're serving videos, documents, or any kind of media, this adds up fast. Really fast.
Cloudflare R2? Zero. Zilch. Nada. Free egress.
We store everything in R2:
Chrome extension recordings (up to 100MB per bug report)
Organization documents
GitHub comment archives
Profile documents
Indexing artifacts from our code analysis
// Storing a 50MB screen recording
await env.RECORDING_STORAGE.put(`recordings/${bugReportId}/screen.webm`,
recording,
{ httpMetadata: { contentType: "video/webm" },
customMetadata: { userId,
timestamp: Date.now().toString()
},
}
);
// Cost on AWS S3: Storage + egress fees every time someone views it
// Cost on R2: Just storage ($15/TB/month), zero egress
Queues: Async Processing That Actually Works
Here's something nobody tells you about serverless -- handling async work is a pain. With Lambda, you're either chaining functions together (and paying for wait time), using SQS (more complexity), or giving up and running ECS tasks.
Cloudflare Queues integrate directly with Workers. No external services, no additional authentication, just push to a queue and consume from it.
// Producer side - instant response to webhook
export default {
async fetch(request: Request, env: Env) {
const event = await request.json();
// Queue it and return immediately
await env.GITHUB_EVENT_QUEUE.send(event);
return new Response("OK", { status: 200 });
},
};
// Consumer side - process in background
export default {
async queue(batch: MessageBatch, env: Env) {
for (const message of batch.messages) {
await processGitHubEvent(message.body);
message.ack(); // Mark as processed
}
},
};
We run 4 different queues for different workloads:
repository-indexing: Orchestrates parallel code analysis (processes 10,000+ files in under 5 minutes!)
file-indexing: 50-file batches with 100 concurrent workers
embedding-generation: 128-text batches for Voyage AI
github-events: Primary webhook processing
Plus 4 matching DLQs for failed message handling
Each queue has its own configuration -- batch sizes, timeouts, retry policies. All managed through wrangler.jsonc
, deployed with a single command.
Durable Objects: The Distributed Systems Cheat Code
This is where things get really interesting. We needed to coordinate indexing jobs across 100+ parallel workers without database contention. Traditional approach? Distributed locks, Redis, maybe Zookeeper if you hate yourself.
Cloudflare's answer? Durable Objects.
export class IndexingCoordinator {
private state: DurableObjectState;
private jobs: Map<string, JobState>
async claimJob(workerId: string): Promise<Job | null> {
// This runs in exactly one place globally
// No race conditions, no distributed locks
const availableJob = this.findAvailableJob();
if (availableJob) {
availableJob.workerId = workerId;
availableJob.claimedAt = Date.now();
await this.state.storage.put(`job:${availableJob.id}`, availableJob);
}
return availableJob;
}
}
Each Durable Object is a single-threaded JavaScript environment that's globally unique. Perfect for:
Coordinating our parallel indexing workers
Managing WebSocket connections for real-time chat
Maintaining session state without a database
No locks. No race conditions. Just JavaScript running in exactly one place.
KV: The Cache That's Actually Simple
Redis is great. Until you have to manage it. Provision it. Scale it. Deal with connection pools. Handle failover.
Cloudflare KV is just... there. A globally distributed key-value store with no configuration.
// Caching embedding results
const cacheKey = `embedding:${model}:${hashText(content)}`;
const cached = await env.EMBEDDING_CACHE.get(cacheKey);
if (cached) {
return JSON.parse(cached); // Sub-10ms for hot keys
}
const embedding = await generateEmbedding(content);
await env.EMBEDDING_CACHE.put(
cacheKey,
JSON.stringify(embedding),
{ expirationTtl: 3600 } // 1 hour TTL
);
We cache everything:
Session data (15-minute TTL)
Embedding results (1-hour TTL)
API responses
Search results
The Edge Positioning Advantage
Here's the thing about running on the edge -- your code runs where your users are. Not in us-east-1. Not in three regions you carefully selected. Everywhere.
When someone in Tokyo hits our API, they're hitting a worker in Tokyo. Someone in São Paulo? Worker in São Paulo. No CDN configuration, no geo-routing rules. It just works.
This matters more than you think. We've seen:
200ms faster response times compared to centralized deployments
Consistent performance regardless of user location
Natural resilience (if one location has issues, traffic routes elsewhere)
The Economics of It All
Let's talk money. Because at the end of the day, this stuff has to make business sense.
Traditional AWS Setup:
Lambda: Pay for execution time + cold starts
S3: $23/TB storage + $90/TB egress
SQS: $0.40 per million messages
ElastiCache: Starting at $15/month
CloudFront: Complex pricing, egress fees
NAT Gateway: $45/month + data processing
Our Cloudflare Setup:
Workers: 100k requests/day free, then $0.15 per million
R2: $15/TB storage, **zero egress**
Queues: Included with Workers
KV: 100k reads/day free
Durable Objects: $0.15 per million requests- Everything runs on the edge
We're saving 60-80% compared to an equivalent AWS setup. But more importantly, we're not managing infrastructure. No VPCs, no security groups, no capacity planning.
What We Learned
After six months of running everything on Cloudflare, here's what we've learned:
The Good:
Performance is incredible (sub-5ms cold starts still blow my mind)
Zero egress fees change how you think about architecture
Bindings eliminate entire categories of problems
Global by default is powerful
The simplicity is addictive
The Tradeoffs:
128MB memory limit per worker (but you'd be surprised what fits)
30-second CPU time limit (queues handle long-running tasks)
Different mental model from traditional serverless
Some services still experimental (Vectorize)
The Unexpected:
Durable Objects solved problems we didn't know we had
Queue coordination is smoother than any message broker we've used
The platform keeps getting better (recent 10x Queue performance improvement!)
Why This Matters
Is it perfect? No. Would I build Kasava on AWS if I had to start over? Also no. Sometimes the best architectural decision isn't about choosing the most popular option or the one with the most features. Sometimes it's about choosing the one that lets you ship fast, iterate quickly, and sleep at night knowing your infrastructure just works.
For us, that's Cloudflare. All in on the edge, and not looking back.
——
Next in the Series:
Part 2: Hono + Dynamic Loading - How We Fit an AI Platform in 128MB] - Discover why we chose Hono over Express/Fastify and how dynamic loading lets us run 50+ endpoints in Workers' memory constraints.
Part 3: How We Built a AI Pipeline on the Edge - Dive deep into our AI infrastructure implementation, featuring Voyage AI embeddings, pgvector for semantic search, and the economic implications of edge computing for AI startups.
Part 4: From LangGraph to Mastra - Our AI Orchestration Journey - Learn why we migrated from LangGraph to Mastra for AI workflow orchestration, and how this TypeScript-first framework transformed our development velocity.
Kasava
Product
Company
Kasava. All right reserved. © 2025
Kasava
Product
Company
Kasava. All right reserved. © 2025
Kasava
Product
Company
Kasava. All right reserved. © 2025
Kasava
Product
Company
Kasava. All right reserved. © 2025