Decisions

ADR-0007: AWS Bedrock Inference via Application Inference Profiles

Decision to require application inference profiles for all AWS Bedrock model invocations to enable cost attribution and tagging.

ApprovedNo MCP

Field	Value
Status	Approved
Date	2026-03-12
Authors	Engineering
Depends	ADR-0006 — AWS Resource Tagging Taxonomy
Replaces	—

Context

Ontopix services invoke foundation models on Amazon Bedrock across multiple products and client engagements. When a model is invoked directly — by passing a foundation model ID or system-defined cross-region inference profile ID as modelId — no Ontopix-level attribution is possible: Bedrock records the invocation against the account, but there is no mechanism to associate that cost or usage with a product, a team, or a client.

Amazon Bedrock provides application inference profiles as the designated mechanism for cost allocation and usage tracking at the workload level. An application inference profile is a named resource that wraps a foundation model (or a system-defined cross-region profile), accepts standard AWS resource tags, and appears as a distinct dimension in AWS Cost Explorer and CloudWatch metrics. Bedrock charges flow through the profile, and because the profile carries tags, those charges are attributed exactly like any other tagged AWS resource.

This ADR establishes that all Bedrock model invocations from Ontopix services MUST go through an explicitly defined application inference profile, and that those profiles MUST follow the tagging taxonomy defined in ADR-0006.

Decision

All AWS Bedrock model invocations MUST use an application inference profile as the modelId.

Direct invocation of foundation model IDs or system-defined cross-region inference profile IDs is prohibited in production code. The only permitted modelId value in a Bedrock API call is the ARN of an application inference profile provisioned and tagged in Terraform.

Application inference profiles MUST:

Be provisioned in Terraform using the aws_bedrock_inference_profile resource.
Carry the full Tier 1 tag set from ADR-0006 (product, billing-mode, env, team, owner, source, managed-by).
Carry Tier 2 tags when billing-mode = client (client, project, cost-center, component).
Have a name that identifies the product and model unambiguously (see naming convention in the pattern).
Be defined in the .infra/ module of the repository that owns the workload, or in the central infra repository for shared profiles.

The eu. cross-region inference profiles are the preferred copy_from source for all Ontopix workloads. They route within EU regions, satisfy data residency requirements, and increase throughput by distributing across eu-central-1, eu-west-1, and eu-west-3 without leaving the EU.

Rationale

Why application inference profiles specifically

Bedrock has two types of inference profiles: system-defined (cross-region) and application. System-defined profiles are predefined by AWS and cannot carry custom tags. Application profiles are user-created, accept all standard AWS resource tags, and produce tagged cost allocation entries in Cost Explorer. They are the only mechanism that connects Bedrock token consumption to the ADR-0006 tagging taxonomy.

Why this must be a hard rule, not a recommendation

Without enforcement, services will inevitably invoke models directly — it is the path of least resistance during development. Direct invocations produce unattributed Bedrock costs that are invisible in per-client or per-product cost reports. Given that AI inference is typically the dominant cost driver in Ontopix workloads, unattributed invocations undermine the entire tagging strategy.

Why profiles are defined in Terraform, not at runtime

Application inference profiles can be created via API at runtime, but doing so bypasses the standard Terraform resource lifecycle, tag enforcement, and code review. Defining profiles in Terraform ensures every profile is reviewed, tagged consistently, and tracked in version control.

Scope and Exceptions

In scope:

All services invoking InvokeModel, InvokeModelWithResponseStream, Converse, or ConverseStream
Bedrock Agents and Knowledge Bases that specify a model ID
Any SDK or framework that abstracts Bedrock invocations (LangChain, Strands, AgentCore, etc.)

Out of scope:

AWS-managed services that call Bedrock internally on your behalf (e.g. Kendra, some Bedrock-native managed features). These cannot be controlled at the invocation level.
Sandbox and local development environments where BEDROCK_INFERENCE_PROFILE_ARN is not yet set — these may fall back to direct model IDs with explicit documentation in AGENTS.md.

No exceptions in production. If a service requires a model for which no application inference profile exists yet, the profile must be created before the service is deployed to prod or pre.

Alternatives Considered

Use system-defined cross-region inference profiles directly

System-defined profiles (e.g. eu.anthropic.claude-3-5-sonnet-20240620-v1:0) provide cross-region routing but accept no custom tags. This provides throughput benefits but zero attribution at the Ontopix level. Rejected.

Tag invocations using CloudWatch dimensions or custom logging

Usage data could theoretically be reconstructed from CloudWatch Logs and CloudTrail by correlating request metadata. This is complex, brittle, and produces attribution data that does not flow into Cost Explorer. Rejected.

Create inference profiles via application code at startup

Avoids Terraform dependency and allows profiles to be created dynamically per deployment. However, it bypasses IaC review, produces inconsistently tagged resources, and cannot be enforced via AWS Config. Rejected.

One global shared profile per model

Simpler to manage. Rejected because a single profile cannot carry different client, product, or billing-mode tags simultaneously. Attribution requires one profile per cost allocation unit.

Consequences

Positive

Bedrock token costs flow into Cost Explorer under the same tag dimensions as all other AWS resources, making AI inference a first-class line item in per-client and per-product reports.
Profiles appear as named, reviewable Terraform resources — any change to which model a workload uses goes through a PR.
The eu. cross-region routing improves throughput resilience without leaving the EU, satisfying data residency requirements implicitly.
CloudWatch metrics are emitted per inference profile, enabling per-workload latency and token usage dashboards without additional instrumentation.

Negative / Trade-offs

Every workload that uses Bedrock must have at least one profile provisioned before deployment. This adds a Terraform step to the initial setup of any AI-capable service.
Profile ARNs must be threaded through application configuration (environment variables, SSM parameters). Services cannot hardcode model IDs.
Profile proliferation is possible if every combination of tags generates a new profile. The naming convention and governance rules in the pattern mitigate this.
Application inference profiles are not currently visible in the Bedrock console — only via CLI and API (aws bedrock list-inference-profiles --type-equals APPLICATION).

Open Questions (RFC)

SSM vs environment variable for profile ARN delivery — this is an implementation decision each service/component should make based on its deployment model. Environment variables are simpler but require redeployment to change the profile ARN (e.g. when upgrading a model). SSM parameters can be hot-updated and read at cold start, decoupling model changes from application deployments. The pattern documents both options; neither is mandated at the ADR level.

Resolved Questions

Shared profiles for internal / R&D workloads — the central infra repository SHOULD maintain a set of shared profiles for billing-mode=internal and billing-mode=rd workloads, covering all available models (not just a subset). This reduces the likelihood of developers reaching for untagged system profiles during R&D — using an Ontopix R&D profile requires the same effort as using a system profile, removing the path-of-least-resistance problem. Trade-off acknowledged: R&D profiles provisioned globally are technically available in production, but this is acceptable given that the billing-mode=rd tag makes the cost attribution explicit.
Enforcement mechanism — alerting-only first, consistent with ADR-0006. Monitor for direct model invocations (calls where modelId is not an application-inference-profile ARN) and alert via the same channels as tag compliance (owner + infra@ontopix.ai). SCP-level blocking deferred until the team has built the habit of using profiles and the shared R&D profiles are in place.
Model upgrade lifecycle — changing copy_from in the aws_bedrock_inference_profile resource is a force-replacement: Terraform destroys the old profile and creates a new one with a new ARN. When the ARN is delivered via SSM, the parameter updates automatically and the application picks up the new ARN on its next cold start — no code changes or redeployment needed. The lifecycle for both model deprecation and new model adoption is: (1) update bedrock_model_source in {env}.tfvars, (2) terraform apply → old profile destroyed, new one created, SSM updated, (3) next cold start reads new ARN. To avoid a brief window where the old profile is gone but the new one isn't yet created, use create_before_destroy lifecycle on the profile resource in production.

ADR-0006 — AWS Resource Tagging Taxonomy — the tag taxonomy all profiles must follow
Pattern: AWS Bedrock Inference Profiles — implementation guide for this decision (published alongside this ADR)
Infrastructure Layout pattern — where .infra/ modules live
Core Engineering Principles — Evidence Over Assumptions, Ownership & Responsibility

ADR-0006: AWS Resource Tagging Taxonomy

Decision to establish a mandatory three-tier tagging taxonomy for all AWS resources to enable cost allocation and operational traceability.

ADR-0008: Python Dependency Management with uv

Decision to adopt uv as the standard Python dependency and project management tool, replacing Poetry.