ADR-0011: Lambda Deploy Strategy
Decision to use direct zip upload as the default Lambda code deployment method from CI/CD.
Status: Accepted
Date: 2026-03-14
Deciders: Engineering Leadership
Context
Ontopix runs multiple services on AWS Lambda (audit-service with 7 Lambdas, maxcolchon with 3). As we add CI/CD pipelines, we need a standard method for deploying Lambda code from GitHub Actions.
Two approaches exist:
- Direct zip upload — GitHub Actions builds the zip, then calls
aws lambda update-function-code --zip-file fileb://dist/lambda.zipto push it directly to the Lambda function. - S3 artifact upload — GitHub Actions builds the zip, uploads it to an S3 bucket with a versioned key (
lambdas/{component}/{version}-{sha}.zip), then callsaws lambda update-function-code --s3-bucket ... --s3-key ....
The maxcolchon project implemented the S3 approach, introducing a dedicated {project}-{env}-lambda-artifacts bucket with versioning, lifecycle policies (90 days, 30 noncurrent versions), and a dual-mode Terraform configuration (S3 for CI/CD, local file for bootstrap). This works but adds infrastructure complexity.
Key observations:
- Current Lambda bundles are small: 1-10 MB (TypeScript/esbuild) and 10-30 MB (Python/uv). All well under the 50 MB direct upload limit.
- Rollback via S3 (pointing to a previous key) saves ~2 minutes vs rollback via git revert + CI rebuild. In practice, both require human intervention.
- Git commit SHAs already provide full traceability of what code is deployed.
- The
GitHubActions-Lambda-DeployRole(ADR-003) already haslambda:UpdateFunctionCodepermission, which supports both methods.
We need a standard default that balances simplicity with operational needs.
Decision
We adopt direct zip upload as the default Lambda code deployment method from CI/CD.
Core Rules
- Default method:
aws lambda update-function-code --zip-file fileb://<path>from the GitHub Actions runner - Terraform separation: Lambda functions MUST use
lifecycle { ignore_changes = [filename, source_code_hash] }to decouple code deployments from infrastructure management - Workflow structure: Two workflows per service —
ci.yaml(PRs: validate) anddeploy.yaml(master: build + deploy) - Rollback: Revert the commit on master; CI/CD rebuilds and redeploys automatically
- S3 escalation: Services MAY adopt S3 artifact upload when escalation criteria are met (see below)
Escalation to S3 Artifacts
A service SHOULD switch to S3 artifact upload when any of these criteria apply:
- Bundle size exceeds 40 MB zipped (approaching the 50 MB direct upload limit)
- Regulatory or compliance requirements mandate persistent artifact retention with audit trail
- Operational requirements demand instant rollback without rebuild (sub-30-second recovery SLA)
When escalating, follow the {project}-{env}-lambda-artifacts bucket naming convention established by the Lambda-DeployRole.
Terraform Configuration
Lambda functions use local file for initial bootstrap and ignore_changes to prevent Terraform from reverting CI/CD deployments:
resource "aws_lambda_function" "my_function" {
function_name = "${var.project}-${var.environment}-my-function"
role = aws_iam_role.my_function.arn
handler = "index.handler"
runtime = "nodejs20.x"
filename = "${path.module}/../dist/my-function.zip"
source_code_hash = filebase64sha256("${path.module}/../dist/my-function.zip")
lifecycle {
ignore_changes = [filename, source_code_hash]
}
}
Rationale
Why Direct Upload by Default?
Simplicity:
- No additional infrastructure (no S3 bucket, lifecycle policies, versioning configuration)
- One command per Lambda:
aws lambda update-function-code --zip-file fileb://... - No intermediate state to manage or clean up
Sufficient for current scale:
- All Ontopix Lambda bundles are 1-30 MB, well under the 50 MB limit
- 10 total Lambdas across all services — not at a scale where artifact management adds value
- Deploy cycles are fast (~2 minutes end-to-end)
Git as the source of truth:
- Commit SHA identifies exactly what code is deployed
git logprovides the full deployment history- Rollback =
git revert+ automatic CI/CD redeploy
Alignment with existing infrastructure:
Lambda-DeployRole(ADR-003) already supportslambda:UpdateFunctionCode- No IAM changes needed
- No new S3 bucket permissions required
Why Keep S3 as an Escalation Path?
Direct upload has real limits:
- 50 MB zip limit — large Python dependencies or bundled assets can exceed this
- No persistent artifacts — if compliance requires knowing exactly which binary was running at a given time, git alone may not suffice
- Rebuild-dependent rollback — a broken build system prevents rollback
The S3 approach is a valid escalation, not a wrong choice. The {project}-{env}-lambda-artifacts convention in Lambda-DeployRole ensures any service can adopt it without IAM changes.
Why Decouple Terraform from Code Deploys?
Without lifecycle { ignore_changes }, running terraform apply would revert Lambda code to whatever local zip was built during the apply. This creates two problems:
- Developer running
terraform applyto change an env var accidentally downgrades Lambda code - Terraform plan always shows code drift (noisy, masks real infrastructure changes)
The separation matches the Code vs Infrastructure boundary from ADR-003: GitHub Actions deploys code, developers manage infrastructure via Terraform.
Limitations
Direct zip upload is deliberately simple. That simplicity comes with hard constraints:
| Limitation | Impact | Threshold |
|---|---|---|
| 50 MB zip size limit | AWS API rejects the upload | Bundle exceeds 50 MB compressed |
| No persistent artifacts | Cannot inspect deployed binary without rebuilding from source | Always (inherent to the model) |
| Rebuild-dependent rollback | Rollback requires CI to build from a previous commit; broken CI = no rollback | CI pipeline failure during incident |
| No atomic multi-Lambda deploy | Each Lambda updates independently; brief window where Lambdas run different versions | Services where Lambdas share a contract that changes simultaneously |
| No deployment history in AWS | CloudWatch logs show function updates but not which artifact was deployed | Post-incident forensics requiring binary-level traceability |
When to Escalate to S3 Artifacts
A service MUST evaluate switching to S3 artifact upload when any of these triggers fire:
| Trigger | Why it matters | Action |
|---|---|---|
| Bundle exceeds 40 MB zipped | Approaching the 50 MB hard limit with no margin for growth | Switch to S3 upload (--s3-bucket/--s3-key) |
| Compliance or audit requires persistent artifact retention | Regulators or customers require proof of exactly which binary ran at a given time | Add S3 bucket with versioning; retain artifacts per retention policy |
| Instant rollback SLA (sub-30 seconds) | Rebuild takes ~2 minutes; S3 rollback is ~5 seconds (update-function-code pointing to previous key) | Switch to S3 with versioned keys for instant repoint |
| CI pipeline unreliable for rollback | If CI has frequent failures, rebuild-based rollback becomes risky during incidents | S3 artifacts decouple rollback from CI health |
| Multi-Lambda atomic deploy needed | Services where Lambdas share a versioned contract and must update together | S3 + CodeDeploy or custom orchestration |
Proactive Monitoring
To prevent hitting limitations reactively:
- CI bundle size check: Add a step that fails the build (or warns) when any zip exceeds 40 MB
- Quarterly review: Check if any service's bundles are growing toward the threshold
- Incident retro: After any rollback, evaluate if rebuild-based rollback was fast enough
Consequences
- Zero additional infrastructure for Lambda CI/CD — no S3 buckets, lifecycle policies, or extra IAM
- Fast adoption — any new service can add CI/CD without infrastructure changes
- Clear escalation path — S3 artifacts are documented and IAM-ready when needed
- See Limitations above for constraints and escalation triggers
Alternatives Considered
Alternative 1: S3 Artifact Upload as Default
Rejected as default because:
- Adds infrastructure per service (S3 bucket, lifecycle, versioning)
- Adds workflow complexity (upload to S3, then update Lambda)
- Marginal benefit when bundles are small and deploys are fast
- The
maxcolchonimplementation works but is more complex than needed for current scale
Not rejected entirely — remains the documented escalation path.
Alternative 2: AWS SAM / Serverless Framework
Rejected because:
- Introduces a new abstraction layer over Terraform
- Inconsistent with the
.infra/Terraform convention (ADR-0004) - Vendor-specific tooling that doesn't align with our IaC approach
Alternative 3: Container Image Lambdas
Rejected as default because:
- Adds ECR dependency and Docker build complexity
- Overkill for small Node.js/Python functions
- May be appropriate for specific use cases (large ML models, custom runtimes)
References
- Lambda Deploy Pattern
- GitHub Actions Workflows Pattern
- Infrastructure Layout Pattern
- ADR-003: GitHub Actions OIDC Trust Tier Model (in ontopix/infra)
- ADR-0004: Infrastructure Layout
Success Criteria
This decision is successful if:
- New Lambda services adopt direct upload CI/CD without needing infrastructure changes
- Existing services (maxcolchon) can migrate to direct upload, simplifying their setup
- Bundle sizes remain under 40 MB (the proactive warning threshold)
- Rollback time via git revert + CI rebuild stays under 5 minutes
- Services that genuinely need S3 artifacts can escalate using the documented criteria
ADR-0010: Repository Context Directory Convention
Decision to adopt .context/ as a structured directory for sharing repository knowledge between humans, AI coding agents, and sessions.
ADR-0012: Workflow Architecture for Multi-Environment Deploys
Extend the GitHub Actions Workflows pattern to cover authentication, multi-environment architecture, and topic ownership across CI/CD patterns.