Organizational

AWS Bedrock Inference Profiles

Pattern for provisioning and consuming AWS Bedrock application inference profiles for cost-attributed, tagged model invocation.

Production

Status: Approved Type: Organizational (Required*) ADR: ADR-0007 — AWS Bedrock Inference via Application Inference Profiles
*Required for all services that invoke AWS Bedrock models.

Problem

Bedrock model invocations made directly against a foundation model ID produce no workload-level attribution. Token costs are recorded at the account level with no way to associate them with a product, client, or team in Cost Explorer. This makes AI inference — typically the dominant cost driver in Ontopix workloads — invisible in per-client and per-product reports.

Context

When to Use This Pattern

Any service that calls InvokeModel, Converse, ConverseStream, or InvokeModelWithResponseStream
Any SDK or framework that abstracts Bedrock calls (Strands, AgentCore, LangChain, Boto3 wrappers)
Bedrock Agents or Knowledge Bases where a modelId is configurable

When NOT to Use This Pattern

AWS-managed services that invoke Bedrock internally without an exposed modelId parameter
Local development with BEDROCK_INFERENCE_PROFILE_ARN not yet set — document the fallback in AGENTS.md

Solution

Provision an aws_bedrock_inference_profile Terraform resource for each cost allocation unit. Pass the profile ARN as modelId in all Bedrock API calls. The profile inherits the ADR-0006 tag taxonomy via default_tags and produces tagged cost entries in Cost Explorer automatically.

Concepts

Two types of Bedrock inference profiles

Type	Created by	Tags	Purpose
System-defined (cross-region)	AWS	Cannot carry custom tags	Cross-region throughput routing
Application	You, in Terraform	Full custom tag support	Cost attribution + optional cross-region routing

Ontopix uses application inference profiles exclusively. They can wrap either a single-region foundation model or a system-defined cross-region profile as their source.

Source model selection

Ontopix operates from eu-central-1. The recommended copy_from sources are the EU system-defined cross-region profiles — they distribute load across eu-central-1, eu-west-1, and eu-west-3 without leaving the EU, satisfying data residency requirements while improving throughput.

Model	`copy_from` (EU cross-region)
Claude Sonnet 4.5	`arn:aws:bedrock:eu-central-1::inference-profile/eu.anthropic.claude-sonnet-4-5-20250929-v1:0`
Claude Sonnet 3.7	`arn:aws:bedrock:eu-central-1::inference-profile/eu.anthropic.claude-3-7-sonnet-20250219-v1:0`
Claude Haiku 3.5	`arn:aws:bedrock:eu-central-1::inference-profile/eu.anthropic.claude-haiku-3-5-20241022-v1:0`
Nova Pro	`arn:aws:bedrock:eu-central-1::inference-profile/eu.amazon.nova-pro-v1:0`
Nova Lite	`arn:aws:bedrock:eu-central-1::inference-profile/eu.amazon.nova-lite-v1:0`

To use a single-region model (strict eu-central-1 only), pass the foundation model ARN directly:

arn:aws:bedrock:eu-central-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0

Prefer EU cross-region profiles for production workloads. Use single-region only when a specific compliance requirement mandates it.

Naming convention

{product}-{model-short}-{env}

Examples:

agents-claude-sonnet-45-prod
audits-claude-haiku-35-dev
platform-nova-pro-pre

The name must be unique within the account and should identify the workload and model at a glance.

Structure

my-service/
└── .infra/
    ├── locals.tf          # ADR-0006 tags
    ├── providers.tf       # default_tags injection
    ├── variables.tf       # environment validation
    ├── bedrock.tf         # inference profile resource(s)
    ├── outputs.tf         # profile ARN output
    └── ssm.tf             # optional: ARN stored in SSM

Implementation

Step 1 — Define the inference profile in Terraform

# .infra/bedrock.tf

data "aws_caller_identity" "current" {}

resource "aws_bedrock_inference_profile" "main" {
  name        = "${local.tags["product"]}-claude-sonnet-45-${var.environment}"
  description = "Inference profile for ${local.tags["product"]} — ${var.environment}"

  model_source {
    # EU cross-region profile: routes across eu-central-1, eu-west-1, eu-west-3
    copy_from = "arn:aws:bedrock:eu-central-1::inference-profile/eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
  }

  # Tags inherited from default_tags — no explicit tags block needed
  # unless this profile needs to override a tag from the module default

  # Avoids downtime during model upgrades: new profile is created before
  # the old one is destroyed, so SSM/env var consumers never see a gap.
  lifecycle {
    create_before_destroy = true
  }
}

The profile inherits all tags from default_tags on the provider, including the full ADR-0006 Tier 1 and Tier 2 sets. No explicit tags block is needed unless this specific profile requires a tag value that differs from the module's locals.tags.

Step 2 — Export the profile ARN

# .infra/outputs.tf

output "bedrock_inference_profile_arn" {
  description = "ARN of the Bedrock application inference profile"
  value       = aws_bedrock_inference_profile.main.inference_profile_arn
}

Step 3 — Deliver the ARN to the application

Option A — Environment variable in Lambda/ECS task definition (preferred for simplicity)

# .infra/lambda.tf (example)

resource "aws_lambda_function" "worker" {
  # ...
  environment {
    variables = {
      BEDROCK_INFERENCE_PROFILE_ARN = aws_bedrock_inference_profile.main.inference_profile_arn
    }
  }
}

Option B — SSM Parameter Store (preferred when ARN must be shared across multiple resources or repos)

# .infra/ssm.tf

resource "aws_ssm_parameter" "bedrock_profile_arn" {
  name  = "/${var.environment}/${local.tags["product"]}/bedrock/inference-profile-arn"
  type  = "String"
  value = aws_bedrock_inference_profile.main.inference_profile_arn
}

Application reads at startup:

import boto3, os

ssm = boto3.client("ssm", region_name="eu-central-1")
PROFILE_ARN = ssm.get_parameter(
    Name=f"/{os.environ['ENV']}/{os.environ['PRODUCT']}/bedrock/inference-profile-arn"
)["Parameter"]["Value"]

Step 4 — Use the profile ARN as `modelId` in all Bedrock calls

The profile ARN is a drop-in replacement for any model ID in the Bedrock API.

Boto3 (Converse API):

import boto3, os

bedrock = boto3.client("bedrock-runtime", region_name="eu-central-1")
PROFILE_ARN = os.environ["BEDROCK_INFERENCE_PROFILE_ARN"]

response = bedrock.converse(
    modelId=PROFILE_ARN,
    messages=[{"role": "user", "content": [{"text": "Hello"}]}],
)

Boto3 (InvokeModel — ARN or profile ID both work):

import boto3, json, os

bedrock = boto3.client("bedrock-runtime", region_name="eu-central-1")
PROFILE_ARN = os.environ["BEDROCK_INFERENCE_PROFILE_ARN"]

response = bedrock.invoke_model(
    modelId=PROFILE_ARN,
    body=json.dumps({"prompt": "Hello", "max_tokens": 256}),
)

Strands agent:

from strands import Agent
from strands.models.bedrock import BedrockModel
import os

model = BedrockModel(
    model_id=os.environ["BEDROCK_INFERENCE_PROFILE_ARN"],
    region_name="eu-central-1",
)
agent = Agent(model=model)

Never hardcode a model ID or system-defined profile ID in application code. Always read the profile ARN from configuration.

Multiple profiles per repository

A workload may need multiple profiles when:

Different tasks use different models (e.g. a fast haiku for classification, a capable sonnet for generation)
Different billing-mode or client values apply to different invocations within the same service

In this case, define a profile per model+context combination and expose each ARN separately:

# .infra/bedrock.tf

resource "aws_bedrock_inference_profile" "classify" {
  name        = "${local.tags["product"]}-claude-haiku-35-${var.environment}"
  description = "Fast classification profile"
  model_source {
    copy_from = "arn:aws:bedrock:eu-central-1::inference-profile/eu.anthropic.claude-haiku-3-5-20241022-v1:0"
  }
}

resource "aws_bedrock_inference_profile" "generate" {
  name        = "${local.tags["product"]}-claude-sonnet-45-${var.environment}"
  description = "Generation profile"
  model_source {
    copy_from = "arn:aws:bedrock:eu-central-1::inference-profile/eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
  }
}

# .infra/outputs.tf

output "bedrock_classify_profile_arn" {
  value = aws_bedrock_inference_profile.classify.inference_profile_arn
}

output "bedrock_generate_profile_arn" {
  value = aws_bedrock_inference_profile.generate.inference_profile_arn
}

Model Upgrade Lifecycle

When a new model version is released or an existing model is deprecated, the upgrade process is:

Update copy_from in the profile's model_source block (e.g. in {env}.tfvars or directly in bedrock.tf).
terraform apply — changing copy_from is a force-replacement: Terraform destroys the old profile and creates a new one with a new ARN. With create_before_destroy (see Step 1), the new profile is created first, eliminating the gap.
ARN propagation — if the ARN is delivered via SSM, the parameter updates automatically as part of the apply. If via environment variable, redeploy the application.
Next cold start picks up the new ARN — no application code changes needed.

The profile name also changes to reflect the new model (e.g. agents-claude-sonnet-45-prod → agents-claude-sonnet-46-prod), keeping it unambiguous in the console and CLI output.

Who approves model upgrades? Model version changes in prod or pre go through a standard PR review. No separate approval process beyond the existing Terraform PR workflow.

Applies Principles

Evidence Over Assumptions — inference profiles make token costs a tagged, queryable line item; without them, AI spend is a black box.
Ownership & Responsibility — the owner and team tags on the profile route cost anomalies to the right contact.
Automation Over Manual Work — profiles are provisioned once in Terraform; applications receive the ARN via configuration and never manage the resource lifecycle.
Security by Design — profiles are IAM-controllable resources; access to invoke a profile can be granted or denied independently of foundation model access.

Consequences


✅	Bedrock token costs appear in Cost Explorer under the same tag dimensions as all other AWS resources
✅	CloudWatch emits per-profile metrics for invocation count, latency, and token usage — no extra instrumentation
✅	EU cross-region routing improves throughput resilience without leaving the EU
✅	Model IDs are never hardcoded in application code — upgrading a model requires only a Terraform change
⚠️	Every AI-capable service needs at least one profile provisioned before first deployment
⚠️	Application inference profiles are not visible in the Bedrock console — use CLI: `aws bedrock list-inference-profiles --type-equals APPLICATION --region eu-central-1`
⚠️	Profile ARN must be threaded through application configuration — services cannot rely on a static model string

Examples

Client project (agents product, `billing-mode=client`)

# .infra/locals.tf

locals {
  tags = {
    product      = "agents"
    billing-mode = "client"
    env          = var.environment
    team         = "engineering"
    owner        = "engineering@ontopix.ai"
    source       = "ontopix/my-agent-service/.infra"
    managed-by   = "terraform"
    client       = "acme"
    project      = "support-agent-v1"
    cost-center  = "tbd"
    component    = "ml"
  }
}

# .infra/bedrock.tf

resource "aws_bedrock_inference_profile" "main" {
  name        = "agents-claude-sonnet-45-${var.environment}"
  description = "Agent inference for support-agent-v1 (${var.environment})"
  model_source {
    copy_from = "arn:aws:bedrock:eu-central-1::inference-profile/eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
  }
}

Shared SaaS platform profile (`billing-mode=saas`)

# .infra/locals.tf

locals {
  tags = {
    product      = "platform"
    billing-mode = "saas"
    env          = var.environment
    team         = "engineering"
    owner        = "engineering@ontopix.ai"
    source       = "ontopix/platform-core/.infra"
    managed-by   = "terraform"
    component    = "ml"
  }
}

# .infra/bedrock.tf

resource "aws_bedrock_inference_profile" "main" {
  name        = "platform-claude-sonnet-45-${var.environment}"
  description = "Shared platform inference profile (${var.environment})"
  model_source {
    copy_from = "arn:aws:bedrock:eu-central-1::inference-profile/eu.anthropic.claude-sonnet-4-5-20250929-v1:0"
  }
}

AI Agent Rules

When working in a repository that invokes AWS Bedrock, agents MUST:

Verify that an aws_bedrock_inference_profile resource exists in .infra/ before writing any code that calls Bedrock.
If no profile exists, create one following this pattern — including the full ADR-0006 tag set — before writing application code that references it.
Never write application code that passes a raw model ID (e.g. anthropic.claude-3-5-sonnet-20241022-v2:0) or a system-defined cross-region profile ID as modelId.
Always read the profile ARN from an environment variable (BEDROCK_INFERENCE_PROFILE_ARN) or SSM parameter — never hardcode it.
If multiple models are needed, create one profile per model and expose each ARN separately.
Request human approval before applying Terraform changes that create or modify inference profiles in prod or pre.

Inspecting profiles via CLI

Application inference profiles are not visible in the Bedrock console. Use the CLI to inspect them:

# List all application inference profiles
aws bedrock list-inference-profiles \
  --type-equals APPLICATION \
  --region eu-central-1

# Get details for a specific profile
aws bedrock get-inference-profile \
  --inference-profile-identifier <arn-or-id> \
  --region eu-central-1

# View tags on a profile
aws bedrock list-tags-for-resource \
  --resource-arn <profile-arn> \
  --region eu-central-1

AWS Resource Tagging — the tagging taxonomy all inference profiles must follow
Infrastructure Layout — where bedrock.tf lives within .infra/
Sandbox Environments — in sandboxes, document the direct model ID fallback in AGENTS.md

References

AWS Resource Tagging

Pattern for applying a consistent tagging taxonomy to all AWS resources for cost allocation, operational traceability, and client billing.

CodeArtifact + GitHub Actions

Pattern for secure private package management using AWS CodeArtifact with OIDC authentication.