Sluice

Architecture

Token bucket algorithm, DynamoDB schema, lease lifecycle, and reconciler design.

System context

┌─────────────────────────────────────────────────────────────┐
│  Ontopix Products (audit-service, stats-service, ...)        │
│                                                              │
│  ┌──────────────┐  acquire("openai#rpm")  ┌──────────────┐  │
│  │    Lambda    │ ──────────────────────► │  Sluice SDK  │  │
│  │  (handler)   │ ◄────────────────────── │  (py/ts/go)  │  │
│  └──────────────┘   GRANTED / RETRY_IN    └──────┬───────┘  │
│         │                                        │          │
│         │ vendor call (if GRANTED)               │ DynamoDB │
│         ▼                                        ▼          │
│  ┌──────────────┐                    ┌──────────────────┐   │
│  │ OpenAI API   │                    │  ontopix-vendor- │   │
│  │ ElevenLabs   │                    │  buckets-{env}   │   │
│  │ Anthropic    │                    │  (per-env table) │   │
│  └──────────────┘                    └──────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Products import the Sluice SDK as a library dependency. The SDK communicates with a shared DynamoDB table to coordinate rate limits. Vendor API calls go directly from the product's Lambda to the vendor -- Sluice is not in the request path.

Token bucket algorithm

Sluice uses a lazy-refill token bucket. There is no background process that tops up tokens. Instead, the current token count is computed on every acquisition using elapsed-time arithmetic.

Bucket state

Each dimension (e.g. openai#rpm) is a single DynamoDB row with these fields:

FieldExampleDescription
vendor_dimensionopenai#rpmPartition key. Format: {vendor}#{metric}
capacity100Maximum tokens (matches vendor rate limit)
tokens47.0Token count at time of last write
refill_rate1.667Tokens restored per second (capacity / window)
last_refill_at1709550002Unix timestamp of last acquisition
cost_per_call1Tokens consumed per acquisition
limit_typerequestsOne of: requests, tokens, concurrent
version42Monotonic counter for optimistic locking

Acquisition sequence

When a caller runs acquire("openai#rpm"):

Step 1: Pre-read. GetItem fetches the current bucket state. This is a consistent read outside the transaction, used to compute whether tokens are available before paying the cost of a transaction.

Step 2: Compute current tokens. Tokens accumulate continuously based on elapsed time:

elapsed = now - last_refill_at
tokens_now = min(capacity, stored_tokens + elapsed * refill_rate)

The min(capacity, ...) cap prevents tokens from accumulating beyond the bucket's maximum.

Step 3: Check availability. If tokens_now < cost_per_call, there are not enough tokens. The SDK returns RETRY_IN with a computed delay:

retry_in = (cost_per_call - tokens_now) / refill_rate

This tells the caller exactly how many seconds to wait before capacity is available. No guessing.

Step 4: Atomic transaction. If tokens are available, the SDK sends a TransactWriteItems with two operations:

  1. Update bucket -- decrement tokens, update last_refill_at, increment version. The transaction condition is version = :expected_version, ensuring no other caller modified the bucket between the pre-read and the write.
  2. Write lease -- create a lease record with TTL = now + SLUICE_LEASE_TTL.

Both operations succeed or both fail. There is no partial state.

Step 5: Handle contention. If the transaction fails with TransactionCanceledException (another caller wrote first), the SDK retries with exponential backoff and jitter (base 25ms, max 200ms, up to SLUICE_MAX_RETRIES attempts). If all retries fail, it returns RETRY_IN.

Why this works

  • No background refill job means no race conditions from a separate writer.
  • Optimistic locking via an integer version counter avoids floating-point comparison issues.
  • The pre-read avoids unnecessary transactions when quota is clearly exhausted.
  • Atomic multi-item transactions support acquiring multiple dimensions at once (e.g. openai#rpm + openai#tpm), all-or-nothing.

DynamoDB schema

All data lives in a single table per environment. Bucket rows and lease rows share the table, distinguished by key format.

Bucket item (example)

{
  "vendor_dimension": {"S": "openai#rpm"},
  "capacity": {"N": "100"},
  "tokens": {"N": "47"},
  "refill_rate": {"N": "1.667"},
  "last_refill_at": {"N": "1709550002"},
  "cost_per_call": {"N": "1"},
  "limit_type": {"S": "requests"},
  "version": {"N": "42"}
}

Lease item (example)

{
  "vendor_dimension": {"S": "lease#openai#rpm#a3f8c910"},
  "dimension": {"S": "openai#rpm"},
  "cost": {"N": "1"},
  "created_at": {"N": "1709550100"},
  "ttl": {"N": "1709550160"},
  "caller": {"S": "audit-service"}
}

The lease PK includes a unique suffix to allow multiple concurrent leases for the same dimension. The ttl field is a DynamoDB TTL attribute -- DynamoDB will eventually delete expired items, but the reconciler proactively processes them to restore tokens.

Lease lifecycle

acquire()          hold              release()
   │                 │                   │
   ▼                 ▼                   ▼
┌──────┐        ┌──────────┐       ┌──────────┐
│ Write │───────│  Active   │──────│  Delete   │
│ lease │       │  (caller  │      │  lease    │
│ + dec │       │  working) │      │  + done   │
│ tokens│       └──────────┘      └──────────┘
└──────┘             │
                     │ caller crashes
                     │ or timeout
                     ▼
                ┌──────────┐       ┌──────────┐
                │ Expired  │──────│Reconciler │
                │  lease   │      │ restores  │
                │ (orphan) │      │ tokens +  │
                └──────────┘      │ deletes   │
                                  │ lease     │
                                  └──────────┘
  1. Acquire. Tokens are decremented and a lease record is written atomically.
  2. Hold. The caller makes its vendor API call while the lease is active.
  3. Release. On success or failure, the caller (or slot() / WithSlot()) deletes the lease. Tokens are not restored on release for requests and tokens limit types -- they were consumed. For concurrent limit types, tokens are restored on release.
  4. Expire. If the caller crashes or times out without releasing, the lease stays in DynamoDB until its TTL.
  5. Reconcile. The reconciler Lambda finds expired leases, restores tokens to the bucket (capped at capacity), and deletes the lease records.

Reconciler

The reconciler is a Lambda function (sluice-reconciler-{env}) triggered by an EventBridge rule every 5 minutes.

What it does

  1. Scans the DynamoDB table for lease items where ttl < now.
  2. For each expired lease:
    • Reads the parent bucket.
    • Restores tokens: new_tokens = min(capacity, current_tokens + lease_cost). The min cap ensures tokens never exceed capacity even if multiple leases expire simultaneously.
    • Deletes the lease record.
  3. All operations are idempotent. Running the reconciler twice on the same expired lease has no effect (the lease is already deleted after the first run).

Why it exists

The lazy refill algorithm handles normal token replenishment. The reconciler handles the abnormal case: a caller acquired tokens, crashed, and never released. Without the reconciler, those tokens would be permanently lost until the bucket is manually reset.

Environment strategy

EnvironmentDynamoDBHow to runPurpose
sandboxLocalStack (Docker)task sandbox:startLocal development and testing
preReal AWStask infra:apply ENV=preIntegration tests, load tests
prodReal AWStask infra:apply ENV=prodLive traffic

Each environment has its own table (ontopix-vendor-buckets-sandbox, ontopix-vendor-buckets-pre, ontopix-vendor-buckets-prod). There is no environment prefix in partition keys -- isolation is at the table level. The SLUICE_TABLE_NAME environment variable points to the correct table.

For local development, task sandbox:start launches LocalStack and seeds vendor bucket configurations. SDKs connect to LocalStack via SLUICE_ENDPOINT_URL=http://localhost:4566.