Architecture
Token bucket algorithm, DynamoDB schema, lease lifecycle, and reconciler design.
System context
┌─────────────────────────────────────────────────────────────┐
│ Ontopix Products (audit-service, stats-service, ...) │
│ │
│ ┌──────────────┐ acquire("openai#rpm") ┌──────────────┐ │
│ │ Lambda │ ──────────────────────► │ Sluice SDK │ │
│ │ (handler) │ ◄────────────────────── │ (py/ts/go) │ │
│ └──────────────┘ GRANTED / RETRY_IN └──────┬───────┘ │
│ │ │ │
│ │ vendor call (if GRANTED) │ DynamoDB │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ OpenAI API │ │ ontopix-vendor- │ │
│ │ ElevenLabs │ │ buckets-{env} │ │
│ │ Anthropic │ │ (per-env table) │ │
│ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Products import the Sluice SDK as a library dependency. The SDK communicates with a shared DynamoDB table to coordinate rate limits. Vendor API calls go directly from the product's Lambda to the vendor -- Sluice is not in the request path.
Token bucket algorithm
Sluice uses a lazy-refill token bucket. There is no background process that tops up tokens. Instead, the current token count is computed on every acquisition using elapsed-time arithmetic.
Bucket state
Each dimension (e.g. openai#rpm) is a single DynamoDB row with these fields:
| Field | Example | Description |
|---|---|---|
vendor_dimension | openai#rpm | Partition key. Format: {vendor}#{metric} |
capacity | 100 | Maximum tokens (matches vendor rate limit) |
tokens | 47.0 | Token count at time of last write |
refill_rate | 1.667 | Tokens restored per second (capacity / window) |
last_refill_at | 1709550002 | Unix timestamp of last acquisition |
cost_per_call | 1 | Tokens consumed per acquisition |
limit_type | requests | One of: requests, tokens, concurrent |
version | 42 | Monotonic counter for optimistic locking |
Acquisition sequence
When a caller runs acquire("openai#rpm"):
Step 1: Pre-read. GetItem fetches the current bucket state. This is a consistent read outside the transaction, used to compute whether tokens are available before paying the cost of a transaction.
Step 2: Compute current tokens. Tokens accumulate continuously based on elapsed time:
elapsed = now - last_refill_at
tokens_now = min(capacity, stored_tokens + elapsed * refill_rate)
The min(capacity, ...) cap prevents tokens from accumulating beyond the bucket's maximum.
Step 3: Check availability. If tokens_now < cost_per_call, there are not enough tokens. The SDK returns RETRY_IN with a computed delay:
retry_in = (cost_per_call - tokens_now) / refill_rate
This tells the caller exactly how many seconds to wait before capacity is available. No guessing.
Step 4: Atomic transaction. If tokens are available, the SDK sends a TransactWriteItems with two operations:
- Update bucket -- decrement tokens, update
last_refill_at, incrementversion. The transaction condition isversion = :expected_version, ensuring no other caller modified the bucket between the pre-read and the write. - Write lease -- create a lease record with
TTL = now + SLUICE_LEASE_TTL.
Both operations succeed or both fail. There is no partial state.
Step 5: Handle contention. If the transaction fails with TransactionCanceledException (another caller wrote first), the SDK retries with exponential backoff and jitter (base 25ms, max 200ms, up to SLUICE_MAX_RETRIES attempts). If all retries fail, it returns RETRY_IN.
Why this works
- No background refill job means no race conditions from a separate writer.
- Optimistic locking via an integer
versioncounter avoids floating-point comparison issues. - The pre-read avoids unnecessary transactions when quota is clearly exhausted.
- Atomic multi-item transactions support acquiring multiple dimensions at once (e.g.
openai#rpm+openai#tpm), all-or-nothing.
DynamoDB schema
All data lives in a single table per environment. Bucket rows and lease rows share the table, distinguished by key format.
Bucket item (example)
{
"vendor_dimension": {"S": "openai#rpm"},
"capacity": {"N": "100"},
"tokens": {"N": "47"},
"refill_rate": {"N": "1.667"},
"last_refill_at": {"N": "1709550002"},
"cost_per_call": {"N": "1"},
"limit_type": {"S": "requests"},
"version": {"N": "42"}
}
Lease item (example)
{
"vendor_dimension": {"S": "lease#openai#rpm#a3f8c910"},
"dimension": {"S": "openai#rpm"},
"cost": {"N": "1"},
"created_at": {"N": "1709550100"},
"ttl": {"N": "1709550160"},
"caller": {"S": "audit-service"}
}
The lease PK includes a unique suffix to allow multiple concurrent leases for the same dimension. The ttl field is a DynamoDB TTL attribute -- DynamoDB will eventually delete expired items, but the reconciler proactively processes them to restore tokens.
Lease lifecycle
acquire() hold release()
│ │ │
▼ ▼ ▼
┌──────┐ ┌──────────┐ ┌──────────┐
│ Write │───────│ Active │──────│ Delete │
│ lease │ │ (caller │ │ lease │
│ + dec │ │ working) │ │ + done │
│ tokens│ └──────────┘ └──────────┘
└──────┘ │
│ caller crashes
│ or timeout
▼
┌──────────┐ ┌──────────┐
│ Expired │──────│Reconciler │
│ lease │ │ restores │
│ (orphan) │ │ tokens + │
└──────────┘ │ deletes │
│ lease │
└──────────┘
- Acquire. Tokens are decremented and a lease record is written atomically.
- Hold. The caller makes its vendor API call while the lease is active.
- Release. On success or failure, the caller (or
slot()/WithSlot()) deletes the lease. Tokens are not restored on release forrequestsandtokenslimit types -- they were consumed. Forconcurrentlimit types, tokens are restored on release. - Expire. If the caller crashes or times out without releasing, the lease stays in DynamoDB until its TTL.
- Reconcile. The reconciler Lambda finds expired leases, restores tokens to the bucket (capped at capacity), and deletes the lease records.
Reconciler
The reconciler is a Lambda function (sluice-reconciler-{env}) triggered by an EventBridge rule every 5 minutes.
What it does
- Scans the DynamoDB table for lease items where
ttl < now. - For each expired lease:
- Reads the parent bucket.
- Restores tokens:
new_tokens = min(capacity, current_tokens + lease_cost). Themincap ensures tokens never exceed capacity even if multiple leases expire simultaneously. - Deletes the lease record.
- All operations are idempotent. Running the reconciler twice on the same expired lease has no effect (the lease is already deleted after the first run).
Why it exists
The lazy refill algorithm handles normal token replenishment. The reconciler handles the abnormal case: a caller acquired tokens, crashed, and never released. Without the reconciler, those tokens would be permanently lost until the bucket is manually reset.
Environment strategy
| Environment | DynamoDB | How to run | Purpose |
|---|---|---|---|
sandbox | LocalStack (Docker) | task sandbox:start | Local development and testing |
pre | Real AWS | task infra:apply ENV=pre | Integration tests, load tests |
prod | Real AWS | task infra:apply ENV=prod | Live traffic |
Each environment has its own table (ontopix-vendor-buckets-sandbox, ontopix-vendor-buckets-pre, ontopix-vendor-buckets-prod). There is no environment prefix in partition keys -- isolation is at the table level. The SLUICE_TABLE_NAME environment variable points to the correct table.
For local development, task sandbox:start launches LocalStack and seeds vendor bucket configurations. SDKs connect to LocalStack via SLUICE_ENDPOINT_URL=http://localhost:4566.