Data Models
Complete reference for audit-utils data models and their structure.
Overview
audit-utils uses Pydantic v2 models with strict type validation. All models are generated from JSON schemas and extended with business methods.
Model Architecture:
- Base classes in
models/_generated/(auto-generated from schemas) - Wrapper classes in
models/(manual, with business methods) - Safe regeneration:
task models:generate
CustomerInteraction
Represents a customer service interaction (chat, call, email, etc.).
Basic Structure
from audit_utils.models import CustomerInteraction
interaction = CustomerInteraction.model_validate({
"schema_type": "customer-interaction",
"schema_version": "v1.0-alpha3",
"interaction_id": "12345",
"source": {...},
"participants": [...],
"events": [...],
"metadata": {...}
})
Fields
Core Fields
| Field | Type | Required | Description |
|---|---|---|---|
schema_type | str | ✅ | Always "customer-interaction" |
schema_version | str | ✅ | Schema version (e.g., "v1.0-alpha3") |
interaction_id | str | ✅ | Unique interaction identifier |
source | Source | ✅ | Interaction source/channel info |
participants | listParticipant | ✅ | List of participants (min 1) |
events | listEvent | ✅ | List of interaction events (min 1) |
summary | Summary | ❌ | Optional interaction summary |
metadata | Metadata | ❌ | Optional enrichment metadata |
Source
Information about interaction origin.
{
"type": "messaging", # messaging | voice | video | email
"channel": "whatsapp", # whatsapp | teams | phone | zoom | email
"platform": "zendesk", # zendesk | salesforce | twilio | custom
"url": "https://...", # Optional: Source URL
"s3_uri": "s3://bucket/path" # Optional: S3 location
}
Participant
Person or agent in the interaction.
{
"id": "agent-1",
"name": "John Doe", # Optional
"role": "agent", # agent | customer
"email": "john@company.com", # Optional
"phone": "+1234567890", # Optional
"metadata": {}, # Optional: Custom data
"analysis": {...}, # Optional: LLM analysis (from enrichment)
"metrics": {...} # Optional: Computed metrics (from enrichment)
}
ParticipantAnalysis (from enrichment):
{
"traits": ["polite", "helpful", "patient"],
"communication_style": "Professional and empathetic...",
"formality_level": "formal" # formal | informal | mixed
}
Metrics (from enrichment):
{
"word_count": 150,
"message_count": 5,
"avg_message_length": 30.0,
"avg_response_time_seconds": 45.5,
"question_ratio": 0.4
}
Event
Single message or action in the interaction.
{
"event_id": "evt-1",
"timestamp": "2025-12-10T12:00:00Z", # ISO 8601 format
"participant_id": "agent-1",
"content": {...}, # Content object
"sequence_number": 1, # Optional: Event order
"metadata": {}, # Optional: Custom data
"analysis": {...}, # Optional: LLM analysis (from enrichment)
"metrics": [...] # Optional: Computed metrics (from enrichment)
}
Content:
{
"type": "text", # text | audio | video | image | file
"text": "Hello, how can I help?", # For text
"transcript": "...", # For audio/video
"url": "https://...", # For media/files
"duration_seconds": 120.5, # For audio/video
"mime_type": "audio/wav" # For files
}
EventAnalysis (from enrichment):
{
"sentiment": "positive", # positive | neutral | negative
"emotion": "happy", # happy | sad | angry | neutral | frustrated
"tone": "polite", # polite | assertive | aggressive | neutral
"topics": ["billing", "refund"], # Detected topics
"intent": "request_information", # Intent classification
"message_type": "question", # question | response | statement
"key_points": ["Customer wants refund", "Order #12345"]
}
Metric (event-level metrics from enrichment):
{
"name": "word_count",
"value": 25.0,
"unit": "count"
}
Metadata
Interaction-level metadata and statistics (from enrichment).
{
"start_time": "2025-12-10T12:00:00Z",
"end_time": "2025-12-10T12:15:00Z",
"duration_seconds": 900.0,
"statistics": {...}
}
Statistics (from enrichment):
{
"total_events": 10,
"total_participants": 2,
"total_words": 250,
"sentiment_distribution": {
"positive": 0.7,
"neutral": 0.2,
"negative": 0.1
},
"topic_distribution": {
"billing": 5,
"refund": 3,
"support": 2
}
}
Validation Rules
- At least 1 participant required
- At least 1 event required
- All
participant_idin events must exist inparticipants - All timestamps must be valid ISO 8601 format
- Enum values must match defined options
Pydantic Operations
# Load from dict
interaction = CustomerInteraction.model_validate(data)
# Load from JSON
import json
with open("interaction.json") as f:
interaction = CustomerInteraction.model_validate_json(f.read())
# Export to dict
data = interaction.model_dump()
# Export to JSON
json_str = interaction.model_dump_json(indent=2)
# Copy (deep)
copy = interaction.model_copy(deep=True)
# Validation
try:
interaction = CustomerInteraction.model_validate(bad_data)
except ValidationError as e:
print(e)
AuditCriteria
Evaluation criteria definition with groups and indicators.
Basic Structure
from audit_utils.models import AuditCriteria
criteria = AuditCriteria.model_validate({
"schema_type": "audit-criteria",
"schema_version": "v1.0-alpha3",
"id": "quality_2025",
"name": "Service Quality Evaluation",
"description": "Criteria for evaluating service quality",
"source_types": ["customer-interaction"],
"criteria": [...],
"groups": [...],
"metadata": {}
})
Fields
Core Fields
| Field | Type | Required | Description |
|---|---|---|---|
schema_type | str | ✅ | Always "audit-criteria" |
schema_version | str | ✅ | Schema version (e.g., "v1.0-alpha3") |
id | str | ✅ | Unique criteria set identifier |
name | str | ✅ | Human-readable name |
description | str | ❌ | Optional description |
source_types | liststr | ✅ | Supported source types (e.g., ["customer-interaction"]) |
criteria | listCriterion | ✅ | List of criteria (min 1) |
groups | listCriterionGroup | ❌ | Optional logical grouping |
metadata | dict | ❌ | Optional custom metadata |
Criterion
Individual evaluation criterion.
{
"id": "C1",
"name": "Response Time",
"description": "Agent responds within acceptable timeframe",
"weight": 0.3, # 0.0-1.0, how important this criterion is
"enabled": True, # Can disable without removing
"indicators": [...] # List of indicators (min 1)
}
Indicator Types
Indicators are the measurable components of a criterion. Three types exist:
1. Boolean Indicator - Yes/No evaluation:
{
"id": "C1-I1",
"type": "boolean",
"description": "Response within 2 minutes",
"polarity": 1, # 1 (positive) or -1 (negative)
"enabled": True
}
2. Metrics Indicator - Numeric value with ranges:
{
"id": "C2-I1",
"type": "metrics",
"description": "Average response time",
"unit": "seconds",
"polarity": -1, # -1 means lower is better
"ranges": [
{"min": 0, "max": 60, "score": 1.0, "label": "excellent"},
{"min": 60, "max": 120, "score": 0.7, "label": "good"},
{"min": 120, "max": 180, "score": 0.4, "label": "fair"},
{"min": 180, "score": 0.0, "label": "poor"}
],
"enabled": True
}
3. Enum Indicator - Categorical selection:
{
"id": "C3-I1",
"type": "enum",
"description": "Tone of interaction",
"options": [
{"value": "professional", "score": 1.0},
{"value": "neutral", "score": 0.5},
{"value": "unprofessional", "score": 0.0}
],
"enabled": True
}
CriterionGroup
Logical grouping for criteria (used with strategy="grouped").
{
"id": "G1",
"name": "Timeliness",
"description": "Time-related criteria",
"criteria_ids": ["C1", "C2"]
}
Validation Rules
- At least 1 criterion required
- Each criterion must have at least 1 indicator
- All
criteria_idsin groups must exist incriteria - Weights should sum to 1.0 (warning if not)
- Score ranges must not overlap (for metrics indicators)
- All indicator IDs must be unique
AuditResult
Evaluation results from LLM processing.
Basic Structure
from audit_utils.models import AuditResult
result = AuditResult.model_validate({
"schema_type": "audit-result",
"schema_version": "v1.0-alpha3",
"id": "audit_12345_2025-12-10",
"timestamp": "2025-12-10T12:00:00Z",
"customer_interaction": {...},
"audit_criteria": {...},
"criteria_results": [...],
"metadata": {}
})
Fields
Core Fields
| Field | Type | Required | Description |
|---|---|---|---|
schema_type | str | ✅ | Always "audit-result" |
schema_version | str | ✅ | Schema version (e.g., "v1.0-alpha3") |
id | str | ✅ | Unique result identifier |
timestamp | datetime | ✅ | Evaluation timestamp |
customer_interaction | InteractionRef | ✅ | Reference to interaction |
audit_criteria | CriteriaRef | ✅ | Reference to criteria |
criteria_results | listCriterionResult | ✅ | Evaluation results (min 1) |
metadata | dict | ❌ | Execution metadata |
InteractionRef / CriteriaRef
Reference to source data.
{
"customer_interaction_id": "12345", # or "audit_criteria_id"
"s3_uri": "s3://bucket/path", # Optional
"version": "v1.0" # Optional
}
CriterionResult
Result for a single criterion.
{
"id": "C1",
"name": "Response Time",
"score": 0.85, # 0.0-1.0
"indicator_results": [...] # Results for each indicator
}
IndicatorResult
Result for a single indicator.
{
"id": "C1-I1",
"value": True, # bool | float | str (depending on type)
"score": 1.0, # 0.0-1.0
"evidence": [ # Optional: Supporting evidence
"Agent responded in 1:30 minutes",
"First message timestamp: 12:00:00"
]
}
Metadata
Execution information.
{
"timestamp": "2025-12-10T16:04:08.066049+00:00",
"llm_model": "gpt-4",
"llm_input_tokens": 13852,
"llm_output_tokens": 8990,
"llm_processing_time": 156.35,
"processing_strategy": "full", # full | grouped | individual
"reduced": False, # True if combined from multiple results
"original_result_count": 1 # Number of original results (if reduced)
}
Task
Internal coordination model (not typically used directly).
Structure
from audit_utils.models import Task
task = Task(
interaction=interaction.model_dump(), # Full interaction dict
criteria=criteria.model_dump(), # Full criteria dict
metadata={
"processing_strategy": "full",
"group_id": "G1", # Optional
"group_name": "Timeliness" # Optional
}
)
Fields
| Field | Type | Required | Description |
|---|---|---|---|
interaction | dict | ✅ | CustomerInteraction as dict |
criteria | dict | ✅ | AuditCriteria as dict |
metadata | dict | ❌ | Processing metadata |
Usage: Tasks are created internally by map_tasks() and consumed by evaluate_task(). Used to coordinate between map and evaluation phases.
Schema Versioning
All models include schema versioning for compatibility tracking:
v1.0-alpha3- All schemas (CustomerInteraction, AuditCriteria, AuditResult)
Migration: When schemas change, version numbers increment. The library validates schema versions at load time.
Type Safety
All models use Pydantic v2 for runtime validation:
from pydantic import ValidationError
try:
interaction = CustomerInteraction.model_validate(data)
except ValidationError as e:
# Handle validation errors
for error in e.errors():
print(f"Field: {error['loc']}")
print(f"Error: {error['msg']}")
JSON Schema
Source JSON schemas are in schemas/ directory:
customer-interaction-v1.0-alpha3.jsonaudit-criteria-v1.0-alpha3.jsonaudit-result-v1.0-alpha3.json
Regenerate models: task models:generate