ADR-0007: AWS Bedrock Inference via Application Inference Profiles
Decision to require application inference profiles for all AWS Bedrock model invocations to enable cost attribution and tagging.
| Field | Value |
|---|---|
| Status | Approved |
| Date | 2026-03-12 |
| Authors | Engineering |
| Depends | ADR-0006 — AWS Resource Tagging Taxonomy |
| Replaces | — |
Context
Ontopix services invoke foundation models on Amazon Bedrock across multiple products and client engagements. When a model is invoked directly — by passing a foundation model ID or system-defined cross-region inference profile ID as modelId — no Ontopix-level attribution is possible: Bedrock records the invocation against the account, but there is no mechanism to associate that cost or usage with a product, a team, or a client.
Amazon Bedrock provides application inference profiles as the designated mechanism for cost allocation and usage tracking at the workload level. An application inference profile is a named resource that wraps a foundation model (or a system-defined cross-region profile), accepts standard AWS resource tags, and appears as a distinct dimension in AWS Cost Explorer and CloudWatch metrics. Bedrock charges flow through the profile, and because the profile carries tags, those charges are attributed exactly like any other tagged AWS resource.
This ADR establishes that all Bedrock model invocations from Ontopix services MUST go through an explicitly defined application inference profile, and that those profiles MUST follow the tagging taxonomy defined in ADR-0006.
Decision
All AWS Bedrock model invocations MUST use an application inference profile as the modelId.
Direct invocation of foundation model IDs or system-defined cross-region inference profile IDs is prohibited in production code. The only permitted modelId value in a Bedrock API call is the ARN of an application inference profile provisioned and tagged in Terraform.
Application inference profiles MUST:
- Be provisioned in Terraform using the
aws_bedrock_inference_profileresource. - Carry the full Tier 1 tag set from ADR-0006 (
product,billing-mode,env,team,owner,source,managed-by). - Carry Tier 2 tags when
billing-mode = client(client,project,cost-center,component). - Have a name that identifies the product and model unambiguously (see naming convention in the pattern).
- Be defined in the
.infra/module of the repository that owns the workload, or in the centralinfrarepository for shared profiles.
The eu. cross-region inference profiles are the preferred copy_from source for all Ontopix workloads. They route within EU regions, satisfy data residency requirements, and increase throughput by distributing across eu-central-1, eu-west-1, and eu-west-3 without leaving the EU.
Rationale
Why application inference profiles specifically
Bedrock has two types of inference profiles: system-defined (cross-region) and application. System-defined profiles are predefined by AWS and cannot carry custom tags. Application profiles are user-created, accept all standard AWS resource tags, and produce tagged cost allocation entries in Cost Explorer. They are the only mechanism that connects Bedrock token consumption to the ADR-0006 tagging taxonomy.
Why this must be a hard rule, not a recommendation
Without enforcement, services will inevitably invoke models directly — it is the path of least resistance during development. Direct invocations produce unattributed Bedrock costs that are invisible in per-client or per-product cost reports. Given that AI inference is typically the dominant cost driver in Ontopix workloads, unattributed invocations undermine the entire tagging strategy.
Why profiles are defined in Terraform, not at runtime
Application inference profiles can be created via API at runtime, but doing so bypasses the standard Terraform resource lifecycle, tag enforcement, and code review. Defining profiles in Terraform ensures every profile is reviewed, tagged consistently, and tracked in version control.
Scope and Exceptions
In scope:
- All services invoking
InvokeModel,InvokeModelWithResponseStream,Converse, orConverseStream - Bedrock Agents and Knowledge Bases that specify a model ID
- Any SDK or framework that abstracts Bedrock invocations (LangChain, Strands, AgentCore, etc.)
Out of scope:
- AWS-managed services that call Bedrock internally on your behalf (e.g. Kendra, some Bedrock-native managed features). These cannot be controlled at the invocation level.
- Sandbox and local development environments where
BEDROCK_INFERENCE_PROFILE_ARNis not yet set — these may fall back to direct model IDs with explicit documentation inAGENTS.md.
No exceptions in production. If a service requires a model for which no application inference profile exists yet, the profile must be created before the service is deployed to prod or pre.
Alternatives Considered
Use system-defined cross-region inference profiles directly
System-defined profiles (e.g. eu.anthropic.claude-3-5-sonnet-20240620-v1:0) provide cross-region routing but accept no custom tags. This provides throughput benefits but zero attribution at the Ontopix level. Rejected.
Tag invocations using CloudWatch dimensions or custom logging
Usage data could theoretically be reconstructed from CloudWatch Logs and CloudTrail by correlating request metadata. This is complex, brittle, and produces attribution data that does not flow into Cost Explorer. Rejected.
Create inference profiles via application code at startup
Avoids Terraform dependency and allows profiles to be created dynamically per deployment. However, it bypasses IaC review, produces inconsistently tagged resources, and cannot be enforced via AWS Config. Rejected.
One global shared profile per model
Simpler to manage. Rejected because a single profile cannot carry different client, product, or billing-mode tags simultaneously. Attribution requires one profile per cost allocation unit.
Consequences
Positive
- Bedrock token costs flow into Cost Explorer under the same tag dimensions as all other AWS resources, making AI inference a first-class line item in per-client and per-product reports.
- Profiles appear as named, reviewable Terraform resources — any change to which model a workload uses goes through a PR.
- The
eu.cross-region routing improves throughput resilience without leaving the EU, satisfying data residency requirements implicitly. - CloudWatch metrics are emitted per inference profile, enabling per-workload latency and token usage dashboards without additional instrumentation.
Negative / Trade-offs
- Every workload that uses Bedrock must have at least one profile provisioned before deployment. This adds a Terraform step to the initial setup of any AI-capable service.
- Profile ARNs must be threaded through application configuration (environment variables, SSM parameters). Services cannot hardcode model IDs.
- Profile proliferation is possible if every combination of tags generates a new profile. The naming convention and governance rules in the pattern mitigate this.
- Application inference profiles are not currently visible in the Bedrock console — only via CLI and API (
aws bedrock list-inference-profiles --type-equals APPLICATION).
Open Questions (RFC)
- SSM vs environment variable for profile ARN delivery — this is an implementation decision each service/component should make based on its deployment model. Environment variables are simpler but require redeployment to change the profile ARN (e.g. when upgrading a model). SSM parameters can be hot-updated and read at cold start, decoupling model changes from application deployments. The pattern documents both options; neither is mandated at the ADR level.
Resolved Questions
- Shared profiles for internal / R&D workloads — the central
infrarepository SHOULD maintain a set of shared profiles forbilling-mode=internalandbilling-mode=rdworkloads, covering all available models (not just a subset). This reduces the likelihood of developers reaching for untagged system profiles during R&D — using an Ontopix R&D profile requires the same effort as using a system profile, removing the path-of-least-resistance problem. Trade-off acknowledged: R&D profiles provisioned globally are technically available in production, but this is acceptable given that thebilling-mode=rdtag makes the cost attribution explicit. - Enforcement mechanism — alerting-only first, consistent with ADR-0006. Monitor for direct model invocations (calls where
modelIdis not anapplication-inference-profileARN) and alert via the same channels as tag compliance (owner +infra@ontopix.ai). SCP-level blocking deferred until the team has built the habit of using profiles and the shared R&D profiles are in place. - Model upgrade lifecycle — changing
copy_fromin theaws_bedrock_inference_profileresource is a force-replacement: Terraform destroys the old profile and creates a new one with a new ARN. When the ARN is delivered via SSM, the parameter updates automatically and the application picks up the new ARN on its next cold start — no code changes or redeployment needed. The lifecycle for both model deprecation and new model adoption is: (1) updatebedrock_model_sourcein{env}.tfvars, (2)terraform apply→ old profile destroyed, new one created, SSM updated, (3) next cold start reads new ARN. To avoid a brief window where the old profile is gone but the new one isn't yet created, usecreate_before_destroylifecycle on the profile resource in production.
Related
- ADR-0006 — AWS Resource Tagging Taxonomy — the tag taxonomy all profiles must follow
- Pattern: AWS Bedrock Inference Profiles — implementation guide for this decision (published alongside this ADR)
- Infrastructure Layout pattern — where
.infra/modules live - Core Engineering Principles — Evidence Over Assumptions, Ownership & Responsibility
ADR-0006: AWS Resource Tagging Taxonomy
Decision to establish a mandatory three-tier tagging taxonomy for all AWS resources to enable cost allocation and operational traceability.
ADR-0008: Python Dependency Management with uv
Decision to adopt uv as the standard Python dependency and project management tool, replacing Poetry.