Architecture

Schema Design Principles

Comprehensive design principles for creating high-quality, maintainable JSON schemas in the Ontopix ecosystem.

Core Philosophy

Ontopix schemas follow three fundamental principles:

  1. Strictness: Explicit over implicit, constrained over open
  2. Self-Description: Schemas document themselves completely
  3. Immutability: Published schemas never change; version instead

These principles ensure data integrity, clear contracts, and smooth evolution across all consuming services.

1. Strictness and Explicitness

No Implicit Properties

Rule: Set additionalProperties: false at ALL object levels.

Why: Prevent unexpected fields, catch typos, enforce data contracts.

Example:

{
  "type": "object",
  "properties": {
    "user": {
      "type": "object",
      "properties": {
        "id": {"type": "string"},
        "name": {"type": "string"}
      },
      "required": ["id", "name"],
      "additionalProperties": false  // ← Required at nested level too
    }
  },
  "required": ["user"],
  "additionalProperties": false  // ← Required at root level
}

Violation Detection:

task test:structure  # Checks all objects have additionalProperties: false

Explicit Types

Rule: Every field must have an explicit type declaration.

Why: Prevent ambiguity, enable type-safe code generation.

Good:

{
  "age": {
    "type": "integer",
    "minimum": 0,
    "maximum": 150
  }
}

Bad:

{
  "age": {
    "minimum": 0  // ❌ Missing type declaration
  }
}

Required vs Optional

Rule: Clearly define which fields are required.

Why: Explicit contracts prevent confusion and errors.

Good:

{
  "type": "object",
  "properties": {
    "id": {"type": "string"},
    "name": {"type": "string"},
    "description": {"type": "string"}
  },
  "required": ["id", "name"]  // description is optional
}

Bad:

{
  "type": "object",
  "properties": {
    "id": {"type": "string"},
    "name": {"type": "string"}
  }
  // ❌ Missing required field declaration
}

Enum Constraints

Rule: Use enum for fields with limited valid values.

Why: Prevent invalid data, document allowed values.

Example:

{
  "status": {
    "type": "string",
    "enum": ["pending", "active", "completed", "failed"],
    "description": "Current status of the process"
  }
}

Avoid Ambiguity

Rule: Prefer explicit structures over oneOf/anyOf when possible.

Why: Simpler validation, clearer contracts, better error messages.

Good (explicit):

{
  "source_type": {
    "type": "string",
    "enum": ["file", "api", "database"]
  },
  "file_path": {
    "type": "string",
    "description": "Required when source_type is 'file'"
  }
}

Acceptable (when truly needed):

{
  "contact": {
    "oneOf": [
      {
        "type": "object",
        "properties": {
          "email": {"type": "string", "format": "email"}
        },
        "required": ["email"]
      },
      {
        "type": "object",
        "properties": {
          "phone": {"type": "string", "pattern": "^\\+?[0-9]+$"}
        },
        "required": ["phone"]
      }
    ]
  }
}

2. Self-Description

Schema Identification

Rule: Every schema must include schema_type and schema_version.

Why: Self-identifying data, version tracking, validation routing.

Required fields:

{
  "schema_type": {
    "type": "string",
    "const": "customer-interaction",
    "description": "Schema type identifier"
  },
  "schema_version": {
    "type": "string",
    "pattern": "^v\\d+\\.\\d+(-(alpha|beta|rc)\\d+)?$",
    "description": "Schema version in vMAJOR.MINOR[-stage] format"
  }
}

Enforcement:

task test:structure  # Validates schema_type and schema_version presence

Metadata Fields

Rule: Include $schema, $id, title, and description at root level.

Why: JSON Schema compliance, documentation, tooling support.

Example:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "$id": "https://schemas.platform.ontopix.ai/customer-interaction/v1.0-beta1.json",
  "title": "Customer Interaction Schema",
  "description": "Unified schema for multi-channel customer interactions with detailed metrics, semantic analysis, and event-based structure"
}

Field Documentation

Rule: Every property must have a clear description.

Why: Self-documenting schemas, reduced need for external docs.

Good:

{
  "response_time": {
    "type": "number",
    "minimum": 0,
    "description": "Time in seconds from customer message to agent response"
  }
}

Bad:

{
  "response_time": {
    "type": "number"  // ❌ Missing description
  }
}

3. Evolution and Compatibility

Immutability

Rule: Published schemas are immutable; create new versions for changes.

Why: Stability for consumers, clear version history, safe deployments.

Workflow:

# ❌ NEVER do this
edit audit-criteria/v1.0-beta1.json

# ✅ ALWAYS do this
cp audit-criteria/v1.0-beta1.json audit-criteria/v1.0.json
# Edit v1.0.json with your changes
# Update schema_version and $id

Backward Compatibility by Default

Rule: Prefer MINOR version bumps (backward-compatible) over MAJOR (breaking).

Why: Minimize disruption, allow gradual migration.

MINOR changes (backward-compatible):

  • Adding optional fields
  • Adding enum values
  • Relaxing constraints (e.g., increasing maxLength)
  • Adding definitions to $defs

MAJOR changes (breaking):

  • Removing required fields
  • Changing field types
  • Removing enum values
  • Tightening constraints
  • Restructuring object hierarchy

Deprecation Path

Rule: When introducing breaking changes, document migration paths.

Why: Help consumers migrate, maintain service continuity.

Example (in schema .md file):

## Migration from v1.0 to v2.0

### Breaking Changes
1. **Field Rename**: `customer_interaction_id``id`
   - Update all references to use new field name

2. **Type Change**: `metadata` from object → flexible bag
   - Wrap existing metadata in appropriate structure

### Migration Script
See `scripts/migrate_v1_to_v2.py` for automated conversion.

Version Coexistence

Rule: Multiple versions can coexist; don't remove old versions.

Why: Support gradual migration, avoid forced upgrades.

Directory structure:

audit-criteria/
├── v1.0-alpha1.json  # Still available
├── v1.0-beta1.json   # Still available
└── v1.0.json         # Latest stable

4. Validation Rigor

Comprehensive Constraints

Rule: Use validation keywords where applicable.

Available constraints:

  • Numbers: minimum, maximum, multipleOf
  • Strings: minLength, maxLength, pattern
  • Arrays: minItems, maxItems, uniqueItems
  • Objects: minProperties, maxProperties

Example:

{
  "email": {
    "type": "string",
    "format": "email",
    "minLength": 5,
    "maxLength": 254,
    "description": "Valid email address per RFC 5322"
  },
  "age": {
    "type": "integer",
    "minimum": 0,
    "maximum": 150,
    "description": "Age in years"
  },
  "tags": {
    "type": "array",
    "items": {"type": "string"},
    "minItems": 1,
    "maxItems": 10,
    "uniqueItems": true,
    "description": "Between 1 and 10 unique tags"
  }
}

Format Validation

Rule: Use pattern for structured strings.

Why: Prevent malformed data (IDs, dates, codes).

Examples:

{
  "id": {
    "type": "string",
    "pattern": "^int-[a-zA-Z0-9]+$",
    "description": "Interaction ID in format 'int-{alphanumeric}'"
  },
  "date": {
    "type": "string",
    "pattern": "^\\d{4}-\\d{2}-\\d{2}$",
    "description": "Date in YYYY-MM-DD format"
  },
  "uuid": {
    "type": "string",
    "pattern": "^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
    "description": "UUID v4 format"
  }
}

Range Constraints

Rule: Specify valid ranges for numeric fields.

Why: Prevent nonsensical values, document expectations.

Example:

{
  "score": {
    "type": "number",
    "minimum": 0.0,
    "maximum": 1.0,
    "description": "Normalized score between 0.0 and 1.0"
  },
  "percentage": {
    "type": "integer",
    "minimum": 0,
    "maximum": 100,
    "description": "Percentage value from 0 to 100"
  }
}

Array Constraints

Rule: Define size and uniqueness constraints for arrays.

Why: Prevent empty arrays, huge lists, duplicate entries.

Example:

{
  "participants": {
    "type": "array",
    "items": {"$ref": "#/$defs/Participant"},
    "minItems": 1,
    "description": "At least one participant required"
  },
  "tags": {
    "type": "array",
    "items": {"type": "string"},
    "maxItems": 20,
    "uniqueItems": true,
    "description": "Up to 20 unique tags"
  }
}

5. Reusability and Maintainability

DRY Principle

Rule: Use $ref and $defs for repeated structures.

Why: Reduce duplication, maintain consistency, simplify updates.

Example:

{
  "$defs": {
    "Participant": {
      "type": "object",
      "properties": {
        "id": {"type": "string"},
        "role": {
          "type": "string",
          "enum": ["customer", "agent", "supervisor"]
        }
      },
      "required": ["id", "role"],
      "additionalProperties": false
    }
  },
  "properties": {
    "participants": {
      "type": "array",
      "items": {"$ref": "#/$defs/Participant"}
    }
  }
}

Semantic Naming

Rule: Use descriptive, domain-specific names.

Why: Clear intent, easier understanding, better documentation.

Good:

{
  "customer_interaction_id": {"type": "string"},
  "evaluation_timestamp": {"type": "string"},
  "compliance_score": {"type": "number"}
}

Bad:

{
  "id": {"type": "string"},  // Which ID?
  "timestamp": {"type": "string"},  // Timestamp of what?
  "score": {"type": "number"}  // Score for what?
}

Consistent Naming

Rule: Follow a consistent naming convention.

Why: Predictability, reduced cognitive load.

Standard: snake_case for JSON field names (Ontopix convention)

Example:

{
  "customer_interaction_id": "...",
  "evaluation_timestamp": "...",
  "participant_count": 5
}

Modular Definitions

Rule: Define reusable types in $defs.

Why: Centralized definitions, easier updates, consistency.

Example:

{
  "$defs": {
    "TimeRange": {
      "type": "object",
      "properties": {
        "start": {"type": "string", "format": "date-time"},
        "end": {"type": "string", "format": "date-time"}
      },
      "required": ["start"],
      "additionalProperties": false
    }
  },
  "properties": {
    "interaction_time": {"$ref": "#/$defs/TimeRange"},
    "evaluation_time": {"$ref": "#/$defs/TimeRange"}
  }
}

6. Documentation Completeness

Paired Documentation

Rule: Every .json schema has a corresponding .md file.

Why: Detailed explanations, examples, migration guides.

Structure:

customer-interaction/
├── v1.0-beta1.json   # Schema definition
└── v1.0-beta1.md     # Documentation

Documentation contents:

  • Overview and purpose
  • Field descriptions
  • Complete examples
  • Validation rules
  • Edge cases
  • Migration guides (for new versions)

Examples Included

Rule: Provide complete, realistic examples.

Why: Understanding, testing, reference implementations.

Example (in .md file):

## Complete Example

```json
{
  "schema_type": "customer-interaction",
  "schema_version": "v1.0-beta1",
  "id": "int-12345",
  "source": {
    "media": "text",
    "channel": "email",
    "synchronicity": "asynchronous"
  },
  "time": {
    "start": "2026-01-28T10:30:00Z"
  },
  "participants": [
    {"id": "customer-001", "role": "customer"},
    {"id": "agent-001", "role": "agent"}
  ],
  "events": [...]
}

### Edge Cases Documented

**Rule**: Explain special cases and constraints.

**Why**: Prevent misuse, clarify intentions.

**Example** (in `.md` file):
```markdown
## Edge Cases

### Empty Participant Lists
- **Not Allowed**: `participants` must have at least one item
- **Validation**: `minItems: 1` enforces this constraint

### Overlapping Time Ranges
- **Allowed**: Events can have overlapping time ranges
- **Use Case**: Concurrent speakers in multi-party calls

### Missing End Time
- **Allowed**: `time.end` is optional for ongoing interactions
- **Interpretation**: Interaction is still in progress

Usage Guidance

Rule: Include common use cases and patterns.

Why: Help developers integrate schemas correctly.

Example (in .md file):

## Common Use Cases

### Use Case 1: Email Interaction
```json
{
  "source": {
    "media": "text",
    "channel": "email",
    "synchronicity": "asynchronous"
  }
}

Use Case 2: Live Phone Call

{
  "source": {
    "media": "audio",
    "channel": "phone",
    "synchronicity": "synchronous"
  }
}

## 7. Standards Compliance

### JSON Schema Draft-07

**Rule**: All schemas comply with Draft-07 specification.

**Why**: Tool compatibility, validation consistency.

**Required declaration**:
```json
{
  "$schema": "http://json-schema.org/draft-07/schema#"
}

Validation:

task test:syntax  # Validates JSON Schema compliance

Industry Standards

Rule: Follow domain-specific standards.

Why: Interoperability, familiarity, correctness.

Examples:

  • ISO 8601: Date-time formats
  • RFC 5322: Email addresses
  • ISO 639: Language codes
  • ISO 3166: Country codes

Consistent Types

Rule: Use standard types consistently.

Why: Predictability, tooling support.

Standard types:

{
  "timestamp": {"type": "string", "format": "date-time"},
  "email": {"type": "string", "format": "email"},
  "uri": {"type": "string", "format": "uri"},
  "uuid": {"type": "string", "format": "uuid"}
}

8. Cross-Schema Consistency

Shared Vocabularies

Rule: Reuse enum values and field names across related schemas.

Why: Consistency, predictability, reduced learning curve.

Example:

// Used in multiple schemas
"severity": {
  "type": "string",
  "enum": ["critical", "major", "minor", "warning", "info"]
}

Consistent Patterns

Rule: Use similar structures for similar concepts.

Why: Familiarity, code reuse, reduced cognitive load.

Example (participant structure used consistently):

"participants": {
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "id": {"type": "string"},
      "role": {"type": "string"}
    }
  }
}

Dependency Documentation

Rule: Clearly document relationships between schemas.

Why: Understanding data flows, validation chains.

Example (in .md file):

## Schema Dependencies

This schema references:
- `customer-interaction@v1.0-beta1` - Source data for evaluation
- `audit-criteria@v1.0-beta1` - Criteria definitions

Referenced by:
- Reporting dashboards
- Compliance systems

Aligned Constraints

Rule: When schemas reference each other, ensure constraint alignment.

Why: Prevent validation mismatches, ensure referential integrity.

Example:

// In AuditCriteria
"target": {
  "schema": {
    "type": "string",
    "pattern": "^[a-z-]+@v\\d+\\.\\d+(-(alpha|beta|rc)\\d+)?$"
  }
}

// In AuditResult (must match)
"evaluated_objects": {
  "items": {
    "schema": {
      "type": "string",
      "pattern": "^[a-z-]+@v\\d+\\.\\d+(-(alpha|beta|rc)\\d+)?$"
    }
  }
}

Compliance Checklist

Before publishing a schema, verify:

  • additionalProperties: false on ALL object levels
  • schema_type field with const value
  • schema_version field with pattern validation
  • $schema, $id, title, description at root
  • All fields have description
  • All fields have explicit type
  • required fields clearly defined
  • Validation constraints where applicable
  • Reusable definitions in $defs
  • Consistent naming convention
  • Paired .md documentation file
  • Complete, realistic examples
  • Migration guide (for new versions)
  • Edge cases documented

Automated checks:

task test:syntax      # JSON syntax validation
task test:structure   # Schema structure validation
task lint:check       # Formatting consistency

See Also


Last Updated: 2026-01-28 Principles Count: 8 Status: Stable