Mission & Concepts
Why Sluice exists, design philosophy, and key concepts behind SDK-first cooperative rate limiting.
The problem
Multiple Ontopix products share the same vendor API quotas. When audit-service and stats-service both call OpenAI, they draw from the same rate limit pool. Without coordination, one product can exhaust the quota for all others, causing unpredictable 429 errors across the platform.
Why not a proxy, queue, or gateway?
| Approach | Why we rejected it |
|---|---|
| Proxy | Adds a network hop to every vendor call. Increases latency, becomes a single point of failure, and requires its own scaling and monitoring. |
| Queue | Vendor calls are latency-sensitive. Queuing adds delay and makes request-response patterns awkward. Products already have their own SQS queues for retries. |
| Gateway | Correct long-term answer (see ADR-005), but too heavy for v0.1.0. A gateway needs routing, auth, and observability. Sluice solves the immediate coordination problem without that overhead. |
Sluice is SDK-first. Products import a library, call acquire() or slot(), and make vendor calls directly. The only shared infrastructure is a DynamoDB table.
Design philosophy
Cooperative rate limiting. Sluice does not block or reject calls. It tells the caller how many tokens are available and, if none remain, exactly how long to wait. The caller decides whether to sleep inline or requeue.
Deterministic retry delays. When quota is exhausted, acquire() returns a retry_in value computed from the refill rate. No guessing, no exponential backoff against the vendor. The caller knows precisely when capacity will be available.
SDK-first. No sidecar, no proxy, no agent. Products add a dependency and configure environment variables. The SDK handles DynamoDB transactions, optimistic locking, and lease management internally.
Key concepts
Dimension
A vendor + metric pair that identifies a specific rate limit. Format: {vendor}#{metric}.
Examples:
openai#rpm-- OpenAI requests per minuteopenai#tpm-- OpenAI tokens per minuteelevenlabs#characters-- ElevenLabs character quota
Each dimension is a separate row in DynamoDB with its own capacity and refill rate.
Bucket
The DynamoDB row that tracks a dimension's quota state. Contains capacity, current token count, refill rate, and a version counter for optimistic locking. Buckets are seeded by Terraform -- adding a new vendor dimension is a terraform apply, not a code change.
Tokens
Available capacity units in a bucket. Tokens are consumed on acquire() and lazily refilled over time using elapsed-time arithmetic (no background refill job). The number of tokens consumed per call is configured as cost_per_call on the bucket.
Lease
A temporary DynamoDB record written when tokens are acquired. Leases have a TTL and serve two purposes:
- Crash recovery. If a caller acquires tokens but crashes before releasing, the reconciler finds the expired lease and restores the tokens.
- Concurrent limit tracking. For
concurrent-type dimensions, active leases represent held slots.
Lease records are deleted on release. Expired leases are cleaned up by the reconciler Lambda every 5 minutes.
Slot
The high-level API that most product code should use. A slot wraps acquire + hold + release into a safe pattern:
async with slot("openai#rpm", timeout=30) as s:
response = await openai.chat.completions.create(...)
# release() called automatically
await slot("openai#rpm", 30, async () => {
const response = await openai.chat.completions.create(...);
});
// release() called automatically
err := sluice.WithSlot(ctx, "openai#rpm", 30*time.Second, func(ctx context.Context) error {
// vendor call -- release is guaranteed on return
return nil
})
Using slot() / WithSlot() guarantees that tokens are released even if the vendor call fails. The lower-level acquire() API is available when you need explicit control, but requires manual release() in a finally block.