Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.strait.dev/llms.txt

Use this file to discover all available pages before exploring further.

Reference Benchmarks

These benchmarks are from our internal load tests. Your results will vary based on hardware, network, and workload mix. Run the load testing suite on your own infrastructure for accurate numbers.

Local Development Benchmarks

Tested on a MacBook Pro (M-series, shared CPU), Strait running with DB_MAX_CONNS=50, WORKER_CONCURRENCY=25, PostgreSQL and Redis co-located.
TestResultBottleneck
Throughput Ceiling70 jobs/sec sustained, breaks at 80Queue depth > 10K
Concurrency Ceiling350 concurrent, breaks at 400+Test context timeout
Quick Validation80 jobs/sec sustainedP99 latency > 5s at 90/sec
Total Operations (Throughput)21,594 with 0 errors-
Total Operations (Concurrency)350,863 with 0 errors-
These numbers represent a floor, not a ceiling. Production deployments on dedicated hardware with DB_MAX_CONNS=100 will achieve significantly higher throughput. Run the load testing suite on your own infrastructure for accurate numbers.

Hardware Sizing Guide

CustomersRuns/DayRecommended HardwarePostgreSQLRedis
1-100< 50K1 vCPU, 1GB RAMShared instanceShared instance
100-50050-500K2 vCPU, 4GB RAMDedicated, 2 vCPUDedicated, 1GB
500-2,000500K-5M4 vCPU, 8GB RAMDedicated + read replicaDedicated, 2GB
2,000+5M+Horizontal workersDedicated + read replicasCluster mode

PostgreSQL Tuning

Connection Pool

The most common bottleneck. Strait uses pgx/v5 connection pooling.
SettingDefaultRecommended (< 500 tenants)Recommended (> 500 tenants)
DB_MAX_CONNS505050-100
DB_MIN_CONNS101010-25
DB_MAX_CONN_LIFETIME30m30m15m
DB_MAX_CONN_IDLE_TIME5m5m2m
DB_STATEMENT_TIMEOUT30s30s15s

Connection Budget

When running multiple Fly machines, the total database connections equals DB_MAX_CONNS * number_of_machines. Ensure your PostgreSQL max_connections exceeds this sum. For example, with DB_MAX_CONNS=100 and 4 machines, you need at least 400 connections on the primary. PlanetScale PostgreSQL supports 1000 connections per primary and per replica by default.

PostgreSQL Server

# postgresql.conf recommendations for Strait workloads
shared_buffers = 256MB          # 25% of available RAM
effective_cache_size = 768MB    # 75% of available RAM
work_mem = 16MB                 # Per-operation memory
maintenance_work_mem = 128MB    # For VACUUM, CREATE INDEX

# WAL settings
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 2GB

# Connection limits
max_connections = 200           # Must exceed DB_MAX_CONNS * worker_count

Redis Tuning

SettingDefaultRecommended
maxmemoryNo limit512MB-2GB
maxmemory-policynoevictionnoeviction (Strait manages TTLs)
timeout0300
tcp-keepalive30060

Worker Configuration

SettingDefaultEffect
WORKER_CONCURRENCY25Starting parallel job executions per worker (auto-scales to ADAPTIVE_CONCURRENCY_MAX)
MAX_DEQUEUE_BATCH_SIZE10Jobs claimed per dequeue cycle
DEQUEUE_STRATEGYprioritypriority or fifo
DEFAULT_JOB_TIMEOUT_SECS300Default job timeout
DEFAULT_JOB_MAX_ATTEMPTS3Default retry count

Scaling Workers

For horizontal scaling, run multiple worker processes:
# Worker 1
strait --mode worker

# Worker 2
strait --mode worker

# API (separate)
strait --mode api
Each worker independently dequeues from PostgreSQL using SELECT ... FOR UPDATE SKIP LOCKED, so they naturally load-balance.

Cost Estimation

Compute Cost per 1M Runs

ComponentHTTP-Mode JobsManaged (Docker) Jobs
Strait worker CPU~0.5 vCPU-hours~2 vCPU-hours
PostgreSQL IOPS~50K reads, ~20K writes~80K reads, ~30K writes
Redis operations~100K commands~200K commands
Network egress~1GB~5GB

Monitoring Key Metrics

Track these metrics to predict when you need to scale:
  1. Queue depth - If consistently > 0, add workers
  2. DB connection wait count - If increasing, raise DB_MAX_CONNS
  3. Worker CPU utilization - If > 70%, add worker instances
  4. P99 latency trend - If increasing over days, investigate query performance
  5. Memory RSS - If growing linearly, check for leaks

Fly.io Deployment Sizing

For Fly.io deployments:
ScaleMachine SizeCountRegion
Startershared-cpu-2x (1024MB)1 combinedSingle
Growthshared-cpu-2x (2048MB)1 API + 1 workerSingle
Scaleperformance-2x (4GB)1 API + 2 workersMulti-region
Enterpriseperformance-4x (8GB)2 API + 4 workersMulti-region

Running Your Own Benchmarks

Use the included load testing framework with Grafana dashboards for real-time visualization:
# Start the load test environment (Postgres, Redis, Prometheus, Grafana)
cd apps/strait
docker compose -f docker-compose.loadtest.yml up -d

# Start Strait
DATABASE_URL="postgres://strait:strait@localhost:5432/strait?sslmode=disable" \
REDIS_URL="redis://localhost:6379" go run ./cmd/strait

# Run the quick validation
LOADTEST_QUICK=true go test -tags=loadtest -run TestQuickValidation \
  -timeout 15m ./internal/loadtest/...

# View results in Grafana
open http://localhost:3001