Architecture Decisions

ADR-003: GitHub Actions OIDC Trust Tier Model

Three-tier OIDC trust model for GitHub Actions roles — CI, Deploy, and Release.

Approved

Status: Accepted
Date: 2026-02-11
Issues: #18, #19

Context

Ontopix uses GitHub Actions OIDC to authenticate CI/CD workflows to AWS without long-lived credentials. As CI needs grow beyond pulling images and reading packages, we need a consistent model for scoping trust across different workflow types.

The initial OIDC roles were designed for two use cases:

CI builds — any branch, read-only operations (pull images, read packages)
Releases — tag-based, publish operations (push images, publish packages)

This left a gap: deploy workflows that run on branch merges (not tags) and need write access — such as deploying Lambda code or pushing container images for dev/pre environments. This gap blocked maxcolchon deploy-dev workflows (#19) and per-project Terraform plan CI (#18).

Decision

We adopt a three-tier trust model for all GitHub Actions OIDC roles:

Tier	Trust Scope	Purpose	Workflow Type
CI (read)	`repo:ontopix/:`	Read-only operations from any branch or PR	`ci.yaml`
Deploy (write)	`repo:ontopix/*:ref:refs/heads/{master,pre,dev}`	Write operations from deploy branches	`deploy-*.yaml`
Release (publish)	`repo:ontopix/:ref:refs/tags/`	Publish operations from version tags	`release.yaml`

Deploy branches

Three branches may trigger deploy workflows: master, pre, and dev. Trust policies for Deploy-tier roles include all three:

"StringLike": {
  "token.actions.githubusercontent.com:sub": [
    "repo:ontopix/*:ref:refs/heads/master",
    "repo:ontopix/*:ref:refs/heads/pre",
    "repo:ontopix/*:ref:refs/heads/dev"
  ]
}

Role inventory after this decision

Role	Tier	Module	Permissions
`GitHubActions-ECR-PullRole`	CI	`ecr/`	ECR read
`GitHubActions-ECR-PushRole`	Deploy + Release	`ecr/`	ECR read + write
`GitHubActions-CodeArtifact-ReadRole`	CI	`codeartifact/`	CodeArtifact read
`GitHubActions-CodeArtifact-PublishRole`	Deploy + Release	`codeartifact/`	CodeArtifact read + write
`GitHubActions-Terraform-PlanRole`	CI	`iam/`	TF state read + AWS ReadOnlyAccess
`GitHubActions-Lambda-DeployRole`	Deploy	`iam/`	Lambda deploy + invoke, S3 artifacts, SSM read

Naming convention

GitHubActions-{Service}-{Access}Role

All OIDC roles are org-scoped (repo:ontopix/*), not per-repository. Per-repo scoping can be added later if needed for sensitive repositories.

Terraform Apply role — deferred

A GitHubActions-Terraform-ApplyRole is intentionally not provisioned yet. Terraform apply requires write permissions across multiple service types, and the permission scope depends on what each project manages. Apply is currently done locally with task infra:apply. When CI-driven apply is needed, a separate ADR should define the permission model (per-project vs org-wide, which services, approval gates).

Rationale

Aligns with workflow types: The engineering handbook defines three workflow types (CI, deploy, release). One trust tier per workflow type is a natural mapping.
Least privilege: Read operations are open (any ref), write operations are restricted to known deploy branches, publish operations are restricted to tags.
Org-wide by default: Follows the established pattern from CodeArtifact and ECR. Avoids per-repo maintenance overhead.
Deploy branches are explicit: Rather than a wildcard like refs/heads/* (which would allow any branch to deploy), we list the specific branches that represent deployment environments.

Consequences

Positive:

Clear, documented model for all future OIDC roles
Unblocks per-project Terraform plan CI and Lambda deploy workflows
ECR push now works from deploy branches, not just tags
Adding new roles follows an established pattern

Negative:

Adding a new deploy branch requires updating all Deploy-tier trust policies
Org-wide scoping means any ontopix/* repo can assume deploy roles from the named branches

Mitigations:

Deploy branches are expected to be stable (master/pre/dev); changes are rare
Branch protection rules on deploy branches prevent unauthorized pushes
Per-repo scoping can be layered on for sensitive operations

Future Considerations

Terraform Apply role: When CI-driven apply is needed, define permission model in a follow-up ADR
Per-repo scoping: For high-sensitivity operations, restrict trust to specific repositories
Environment-based trust: GitHub Environments with deployment protection rules as an additional approval gate (evaluated and deferred — ref-based subjects remain the sole trust model until org-level environment policies are in place)
Engineering handbook pattern: A "Terraform + GitHub Actions" pattern page should document how projects consume these roles

Philosophy: Code vs Infrastructure Boundary

The trust tier model enforces a fundamental CI/CD permission boundary:

Layer	Who executes	What it covers
Infrastructure (shape)	Developer via `terraform apply`	Create, delete, configure resources — Lambda config (memory, timeout, env vars, IAM role), IAM policies, DynamoDB tables, S3 buckets, AgentCore runtimes, networking
Code (content)	GHA via OIDC roles	Build, test, publish artifacts, update code in existing resources — deploy a new ZIP to an existing Lambda, push an image to an existing ECR repo

Why this boundary matters:

Safety: GHA cannot accidentally create, delete, or misconfigure infrastructure
Auditability: All infrastructure changes flow through Terraform state with developer review
Reversibility: Code deployments are trivially reversible (redeploy previous artifact); infrastructure changes may not be

The Lambda Code vs Config Distinction

This boundary is most visible in Lambda operations:

Operation	Who	Rationale
`UpdateFunctionCode`	GHA (Deploy tier)	Updates what runs inside the Lambda. Analogous to deploying a new Docker image. Rollback = redeploy previous ZIP.
`UpdateFunctionConfiguration`	Developer (`terraform apply`)	Changes IAM role, env vars, memory, timeout, VPC. Can break security boundaries. Must go through Terraform.
`CreateFunction` / `DeleteFunction`	Developer (`terraform apply`)	Infrastructure lifecycle — never automated in CI/CD.

This matches AWS CodeDeploy, SAM, and Serverless Framework defaults: CI updates code, IaC controls configuration.

Why 3 Tiers, Not 4

A stricter model (proposed in #33) separates Publish (put artifacts in registries) from Deploy (make artifacts live). Under that model:

Tier	Purpose	Example
1. CI	Read-only checks	`terraform plan`, lint, test
2. Publish	Store artifacts	S3 upload, ECR push, CodeArtifact publish
3. Deploy	Activate artifacts	`lambda:UpdateFunctionCode`, `lambda:InvokeFunction`
4. Infrastructure	Resource lifecycle	`terraform apply` (developer only)

Our pragmatic choice: We merge Publish and Deploy into a single Deploy tier. The tradeoffs:

Factor	Split (4-tier)	Merged (3-tier, current)
Blast radius	A compromised Publish job can't activate code	A compromised Deploy job can both publish and activate
Complexity	Two role assumptions per deploy workflow	One role assumption
Maintenance	Two policies to maintain per service	One policy
Real-world risk	Meaningful for large orgs with untrusted contributors	Low for a small org where all deploy branches are protected

When to revisit: If the org grows to include external contributors, or if compliance requirements demand separation of duties between artifact publishing and deployment activation, split the Deploy tier into Publish + Deploy with separate roles.

Convention Over Configuration

Rather than per-project IAM policies, the Deploy tier uses naming conventions to grant access:

S3 artifacts: Any bucket matching *-*-lambda-artifacts* is accessible under the lambdas/ key prefix
Lambda functions: All functions in the account (scoped by OIDC trust to deploy branches only)
SSM parameters: All parameters in the account (read-only)

This means new projects get deploy permissions by following the convention — no infra PR needed for IAM changes. The trust boundary is the OIDC subject (only master/pre/dev branches from ontopix/* repos), not per-resource ARN scoping.

Amendments

2026-02-16: CodeArtifact PublishRole promoted to Deploy + Release

The GitHubActions-CodeArtifact-PublishRole was originally Release-only (tag refs). This was inconsistent with GitHubActions-ECR-PushRole, which already supported both deploy branches and tags. Pre-release package publishing from deploy branches (e.g. @ontopix/stylebook@0.5.0-dev.1 from dev) is a valid workflow, and the two write-tier roles should trust the same subjects. The publish role now trusts deploy branches (master, pre, dev) in addition to tags. Resolves #22.

2026-02-24: Lambda DeployRole expanded, AgentCore roles removed

The GitHubActions-Lambda-DeployRole was expanded to support the full Lambda deploy lifecycle: S3 artifact upload/read (convention-based bucket pattern *-*-lambda-artifacts*), post-deploy smoke tests (lambda:InvokeFunction), and SSM drift detection (ssm:GetParameter*). This enables any project following the {project}-{env}-lambda-artifacts bucket naming convention to publish Lambda code via S3 without additional infra changes.

The GitHubActions-AgentCore-DeployRole and GitHubActions-AgentCore-ReadRole were removed. Per the maxcolchon v9 spec P1 principle ("Infrastructure stays manual"), AgentCore runtime updates are performed by the developer via terraform apply, not by GitHub Actions workflows. The roles were provisioned in #25 and iteratively fixed in #28, #30, but no consuming workflow was ever deployed. Resolves #32.

ADR-002: Terraform AWS Provider v6 Migration

Options for migrating from AWS Terraform provider v5 to v6 to support Bedrock AgentCore resources.