Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.strait.dev/llms.txt

Use this file to discover all available pages before exploring further.

Prerequisites

  • Docker running (for test job containers)
  • Go 1.26+
  • 8GB+ RAM free
  • The Strait source code

Quick Start (15 minutes)

Build the test job images and run a quick validation:
# Build test job images
cd packages/load-tests && make build && cd ..

# Run quick validation (finds approximate throughput ceiling)
cd apps/strait
LOADTEST_QUICK=true go test -tags=loadtest -run TestQuickValidation \
  -timeout 15m ./internal/loadtest/...

Test Job Images

The framework includes real workloads in Python, TypeScript, and Go:
ImageLanguageWhat It Does
strait-loadtest-pythonPython 3.12Fast processing, CPU-intensive work, AI agent simulation
strait-loadtest-tsTypeScript (Node 22)Data pipeline with 10K record transform
strait-loadtest-goGo 1.26Memory allocation for OOM testing
strait-loadtest-errorsPython 3.1212 failure scenarios (OOM, segfault, infinite loop, etc.)
Build all images:
cd packages/load-tests && make build

Full Test Suite

Tier 1: Throughput Ceiling

Finds the maximum sustained jobs/sec. Starts at 10 jobs/sec, increases by 10 every 60 seconds until the system breaks.
go test -tags=loadtest -run TestThroughputCeiling -timeout 2h ./internal/loadtest/...
Stop conditions: queue depth > 10K, P99 latency > 5s, or error rate > 1%.

Tier 2: Concurrency Ceiling

Finds the maximum concurrent connections. Starts at 50 concurrent, increases by 50 every 2 minutes.
go test -tags=loadtest -run TestConcurrencyCeiling -timeout 1h ./internal/loadtest/...

Tier 3: Multi-Tenant Simulation

Simulates real production with hundreds of tenants, mixed plans, and varied traffic patterns.
# 500 tenants, 4 hours
go test -tags=loadtest -run TestProductionSimulation -timeout 6h ./internal/loadtest/...

# 2,000 tenants, 8 hours
LOADTEST_TENANTS=2000 LOADTEST_DURATION=8h \
  go test -tags=loadtest -run TestProductionSimulation -timeout 10h ./internal/loadtest/...

Tier 3: Breaking Point

Adds 100 tenants every 30 minutes until performance degrades.
go test -tags=loadtest -run TestBreakingPoint -timeout 12h ./internal/loadtest/...

Tier 4: Endurance (24 hours)

Runs at 70% of throughput ceiling for 24 hours. Detects memory leaks, goroutine leaks, and performance drift.
LOADTEST_DURATION=24h go test -tags=loadtest -run TestEndurance -timeout 26h ./internal/loadtest/...

Tier 5: Chaos Engineering

Breaks things on purpose during production load. 8 scenarios: worker kill, database failover, Redis failure, Docker restart, connection pool exhaustion, disk pressure, clock skew, cascading failure.
go test -tags=loadtest -run TestChaosAll -timeout 4h ./internal/loadtest/...

Error Scenarios

Tests all 12 failure modes: clean exit, exit codes, OOM, segfault, infinite loop, slow death, checkpoint recovery, SDK timeout, fork bomb, disk fill, network abuse.
go test -tags=loadtest -run TestErrorScenarios -timeout 1h ./internal/loadtest/...

Generating Reports

After running tests, generate HTML and JSON reports:
go run -tags=loadtest ./internal/loadtest/cmd/report \
  -input loadtest-results/latest/ \
  -html report.html -json report.json
The HTML report includes:
  • Executive summary with key metrics
  • Throughput and concurrency ramp tables
  • Multi-tenant simulation results
  • Chaos engineering verdicts
  • Error scenario pass/fail matrix

Grafana Dashboard

The load test environment includes a pre-configured Grafana dashboard for real-time monitoring.

Setup

# Start the full load test stack
cd apps/strait
docker compose -f docker-compose.loadtest.yml up -d

# Open Grafana (default: admin/admin)
open http://localhost:3001
The dashboard shows:
  • Queue depth and active workers
  • Throughput and dispatch latency (P50/P95/P99)
  • Error rates and worker pool utilization
  • Database connection pool breakdown
  • Webhook delivery metrics
  • Go runtime (goroutines, heap memory, GC pauses)
Prometheus scrapes Strait’s /metrics endpoint every 5 seconds, so panels update in near real-time during load tests.

Understanding Your Results

What each metric means

MetricGoodWarningAction
Max throughput> 1,000/sec< 500/secCheck DB connection pool, query optimization
P99 latency< 500ms> 2sProfile hot paths, check indexes
Error rate< 0.01%> 0.1%Check logs for root cause
Memory trendFlat over 24hLinear growthCheck for goroutine or connection leaks
Queue depthReturns to 0GrowingWorker count too low or dequeue too slow

Environment variables

VariableDefaultDescription
LOADTEST_STRAIT_URLhttp://localhost:8080Strait API URL
LOADTEST_INTERNAL_SECRET$INTERNAL_SECRETAPI authentication secret
LOADTEST_DATABASE_URL$DATABASE_URLPostgreSQL connection for metrics
LOADTEST_REDIS_URL$REDIS_URLRedis connection for metrics
LOADTEST_QUICK-Set to true for 15-min quick validation
LOADTEST_TENANTS500Tenant count for production simulation
LOADTEST_DURATION4hDuration for simulation/endurance tests
LOADTEST_TARGET_RATEautoOverride target rate for endurance tests

Tuning Based on Results

BottleneckSymptomFix
Queue throughputQueue depth growing, dequeue rate flatIncrease DB_MAX_CONNS (default: 50), optimize dequeue query
Concurrent limitErrors spike at N concurrentIncrease WORKER_CONCURRENCY
Memory growthRSS increases linearly over 24hCheck for leaked goroutines, unclosed connections
Webhook deliveryWebhook latency spikingIncrease WEBHOOK_CONCURRENCY, check endpoint health
Database connectionswait_count increasingIncrease DB_MAX_CONNS (default: 50), add connection pooler
The load test harness uses HTTP keep-alives with connection pooling for realistic measurements that match production client behavior. | Redis memory | Used memory > maxmemory | Increase maxmemory, review eviction policy |