Schema Transformers¶
Schema transformers are simple functions that modify schemas after generation. This guide covers transformer patterns, composition strategies, and best practices for post-processing schemas.
What are Schema Transformers?¶
Transformers are functions that take a JSON schema and return a modified version. They run after Pydantic generates the base schema, making them perfect for simple, model-specific customizations:
from pydantic import BaseModel
from pyrmute import ModelManager, JsonSchema
manager = ModelManager()
@manager.model("User", "1.0.0")
class UserV1(BaseModel):
name: str
email: str
@manager.schema_transformer("User", "1.0.0")
def add_examples(schema: JsonSchema) -> JsonSchema:
"""Add example data to User schema."""
schema["examples"] = [
{"name": "Alice Smith", "email": "alice@example.com"},
{"name": "Bob Jones", "email": "bob@example.com"}
]
return schema
# Transformer runs automatically
schema = manager.get_schema("User", "1.0.0")
print(schema["examples"])
# [{"name": "Alice Smith", "email": "alice@example.com"}, ...]
Key characteristics:
- Simple functions:
(JsonSchema) -> JsonSchema - Run after schema generation
- Model-specific customization
- Easy to compose multiple transformers
- No need to subclass anything
Basic Transformers¶
Adding Metadata¶
Add custom fields to schemas:
@manager.schema_transformer("User", "1.0.0")
def add_metadata(schema: JsonSchema) -> JsonSchema:
"""Add custom metadata fields."""
schema["x-version"] = "1.0.0"
schema["x-deprecated"] = False
schema["x-owner"] = "user-team"
schema["x-last-updated"] = "2024-01-15"
return schema
Adding Examples¶
Provide example data for documentation:
@manager.schema_transformer("Product", "1.0.0")
def add_product_examples(schema: JsonSchema) -> JsonSchema:
"""Add realistic product examples."""
schema["examples"] = [
{
"id": 1,
"name": "Widget",
"price": 9.99,
"in_stock": True
},
{
"id": 2,
"name": "Gadget",
"price": 19.99,
"in_stock": False
}
]
return schema
Adding Descriptions¶
Enhance field descriptions:
@manager.schema_transformer("Order", "1.0.0")
def enhance_descriptions(schema: JsonSchema) -> JsonSchema:
"""Add detailed field descriptions."""
properties = schema.get("properties", {})
if "order_id" in properties:
properties["order_id"]["description"] = (
"Unique identifier for the order. "
"Format: ORD-{timestamp}-{random}"
)
if "status" in properties:
properties["status"]["description"] = (
"Current order status. "
"Possible values: pending, processing, shipped, delivered, cancelled"
)
return schema
Adding Validation Rules¶
Add additional constraints:
@manager.schema_transformer("Email", "1.0.0")
def add_email_validation(schema: JsonSchema) -> JsonSchema:
"""Add email validation pattern."""
properties = schema.get("properties", {})
for field_name, field_schema in properties.items():
if "email" in field_name.lower():
field_schema["format"] = "email"
field_schema["pattern"] = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
return schema
Multiple Transformers¶
Register multiple transformers for the same model:
@manager.model("User", "1.0.0")
class UserV1(BaseModel):
name: str
email: str
age: int
@manager.schema_transformer("User", "1.0.0")
def add_examples(schema: JsonSchema) -> JsonSchema:
"""First transformer: add examples."""
schema["examples"] = [
{"name": "Alice", "email": "alice@example.com", "age": 30}
]
return schema
@manager.schema_transformer("User", "1.0.0")
def add_metadata(schema: JsonSchema) -> JsonSchema:
"""Second transformer: add metadata."""
schema["x-requires-auth"] = True
schema["x-rate-limit"] = 100
return schema
@manager.schema_transformer("User", "1.0.0")
def add_constraints(schema: JsonSchema) -> JsonSchema:
"""Third transformer: add constraints."""
schema["additionalProperties"] = False
return schema
# All three transformers run in registration order
schema = manager.get_schema("User", "1.0.0")
# Has examples, metadata, and constraints
Execution order: Transformers run in the order they were registered.
Advanced Patterns¶
Conditional Transformations¶
Apply transformations based on schema content:
@manager.schema_transformer("Payment", "1.0.0")
def add_security_requirements(schema: JsonSchema) -> JsonSchema:
"""Add security metadata for sensitive fields."""
properties = schema.get("properties", {})
sensitive_fields = []
# Identify sensitive fields
for field_name in properties.keys():
if any(keyword in field_name.lower()
for keyword in ["password", "token", "secret", "card", "ssn"]):
sensitive_fields.append(field_name)
# Add security metadata if sensitive fields found
if sensitive_fields:
schema["x-sensitive-fields"] = sensitive_fields
schema["x-requires-encryption"] = True
schema["x-pii"] = True
return schema
Recursive Field Processing¶
Process all fields recursively:
@manager.schema_transformer("Document", "1.0.0")
def add_field_metadata(schema: JsonSchema) -> JsonSchema:
"""Recursively add metadata to all fields."""
def process_schema(s: dict, path: str = "") -> None:
"""Recursively process schema at all levels."""
if "properties" in s:
for field_name, field_schema in s["properties"].items():
field_path = f"{path}.{field_name}" if path else field_name
# Add field path
field_schema["x-path"] = field_path
# Add field type category
field_type = field_schema.get("type")
if field_type in ["string", "integer", "number", "boolean"]:
field_schema["x-category"] = "primitive"
elif field_type == "array":
field_schema["x-category"] = "collection"
elif field_type == "object":
field_schema["x-category"] = "nested"
# Recursively process nested objects
if field_type == "object":
process_schema(field_schema, field_path)
# Process definitions
if "$defs" in s:
for def_name, def_schema in s["$defs"].items():
process_schema(def_schema, f"$defs.{def_name}")
process_schema(schema)
return schema
Schema Validation¶
Validate and fix schema issues:
@manager.schema_transformer("Config", "1.0.0")
def validate_and_fix_schema(schema: JsonSchema) -> JsonSchema:
"""Ensure schema has required fields and fix common issues."""
# Ensure title exists
if "title" not in schema:
schema["title"] = "Config"
# Ensure description exists
if "description" not in schema:
schema["description"] = f"Schema for {schema['title']}"
# Fix missing descriptions in properties
if "properties" in schema:
for field_name, field_schema in schema["properties"].items():
if "description" not in field_schema:
field_schema["description"] = (
field_name.replace("_", " ").title()
)
# Ensure required fields is a list
if "required" in schema and not isinstance(schema["required"], list):
schema["required"] = list(schema["required"])
return schema
Merging with External Metadata¶
Combine schema with external metadata:
# External metadata (could come from config file)
FIELD_METADATA = {
"User": {
"1.0.0": {
"name": {
"ui_widget": "text_input",
"placeholder": "Enter your full name",
"help_text": "First and last name"
},
"email": {
"ui_widget": "email_input",
"placeholder": "user@example.com",
"help_text": "We'll never share your email"
}
}
}
}
@manager.schema_transformer("User", "1.0.0")
def add_ui_metadata(schema: JsonSchema) -> JsonSchema:
"""Merge external UI metadata into schema."""
metadata = FIELD_METADATA.get("User", {}).get("1.0.0", {})
properties = schema.get("properties", {})
for field_name, field_schema in properties.items():
if field_name in metadata:
field_meta = metadata[field_name]
field_schema["x-ui"] = {
"widget": field_meta.get("ui_widget", "text_input"),
"placeholder": field_meta.get("placeholder", ""),
"helpText": field_meta.get("help_text", "")
}
return schema
OpenAPI Enhancement¶
Add OpenAPI-specific fields:
@manager.schema_transformer("User", "1.0.0")
def add_openapi_metadata(schema: JsonSchema) -> JsonSchema:
"""Add OpenAPI-specific metadata."""
# Add tags
schema["x-tags"] = ["users", "authentication"]
# Add response codes
schema["x-responses"] = {
"200": "Successful operation",
"400": "Invalid input",
"404": "User not found"
}
# Add security requirements
schema["x-security"] = [{"bearerAuth": []}]
# Add operation IDs
schema["x-operation-id"] = "getUser"
return schema
Reusable Transformers¶
Create generic transformers that work with any model:
def add_timestamp_metadata(
timestamp_field: str = "x-generated-at"
) -> Callable[[JsonSchema], JsonSchema]:
"""Factory function for timestamp transformers."""
from datetime import datetime
def transformer(schema: JsonSchema) -> JsonSchema:
schema[timestamp_field] = datetime.utcnow().isoformat()
return schema
return transformer
def add_version_metadata(version: str) -> Callable[[JsonSchema], JsonSchema]:
"""Factory function for version transformers."""
def transformer(schema: JsonSchema) -> JsonSchema:
schema["x-version"] = version
schema["x-api-version"] = f"v{version.split('.')[0]}"
return schema
return transformer
def add_deprecation_notice(
deprecated_in: str, removed_in: str, replacement: str
) -> Callable[[JsonSchema], JsonSchema]:
"""Factory function for deprecation transformers."""
def transformer(schema: JsonSchema) -> JsonSchema:
schema["deprecated"] = True
schema["x-deprecation"] = {
"deprecated_in": deprecated_in,
"removed_in": removed_in,
"replacement": replacement,
"message": f"Deprecated in {deprecated_in}. Use {replacement} instead."
}
return schema
return transformer
# Use reusable transformers
@manager.schema_transformer("User", "1.0.0")
def _(schema: JsonSchema) -> JsonSchema:
return add_timestamp_metadata()(schema)
@manager.schema_transformer("User", "1.0.0")
def _(schema: JsonSchema) -> JsonSchema:
return add_version_metadata("1.0.0")(schema)
# Or apply directly
transformer = add_deprecation_notice("2.0.0", "3.0.0", "UserV2")
manager.schema_transformer("OldUser", "1.0.0")(transformer)
Transformer Composition¶
Compose multiple transformers:
from typing import Callable
def compose_transformers(
*transformers: Callable[[JsonSchema], JsonSchema]
) -> Callable[[JsonSchema], JsonSchema]:
"""Compose multiple transformers into one."""
def composed(schema: JsonSchema) -> JsonSchema:
result = schema
for transformer in transformers:
result = transformer(result)
return result
return composed
# Individual transformers
def add_examples(schema: JsonSchema) -> JsonSchema:
schema["examples"] = [{"name": "Alice"}]
return schema
def add_metadata(schema: JsonSchema) -> JsonSchema:
schema["x-version"] = "1.0.0"
return schema
def add_security(schema: JsonSchema) -> JsonSchema:
schema["x-requires-auth"] = True
return schema
# Compose them
combined = compose_transformers(
add_examples,
add_metadata,
add_security
)
# Register composed transformer
@manager.schema_transformer("User", "1.0.0")
def _(schema: JsonSchema) -> JsonSchema:
return combined(schema)
Conditional Application¶
Apply transformers only in certain conditions:
import os
def when_environment(env: str) -> Callable[[JsonSchema], JsonSchema]:
"""Decorator to apply transformer only in specific environment."""
def decorator(func) -> Callable[[JsonSchema], JsonSchema]:
def wrapper(schema: JsonSchema) -> JsonSchema:
current_env = os.getenv("ENVIRONMENT", "production")
if current_env == env:
return func(schema)
return schema
return wrapper
return decorator
def when_field_exists(field_name: str) -> Callable[[JsonSchema], JsonSchema]:
"""Decorator to apply transformer only if field exists."""
def decorator(func) -> Callable[[JsonSchema], JsonSchema]:
def wrapper(schema: JsonSchema) -> JsonSchema:
if field_name in schema.get("properties", {}):
return func(schema)
return schema
return wrapper
return decorator
# Use conditional decorators
@manager.schema_transformer("User", "1.0.0")
@when_environment("development")
def add_debug_info(schema: JsonSchema) -> JsonSchema:
"""Only adds debug info in development."""
schema["x-debug"] = True
schema["x-dev-notes"] = "This is a development build"
return schema
@manager.schema_transformer("User", "1.0.0")
@when_field_exists("email")
def add_email_validation(schema: JsonSchema) -> JsonSchema:
"""Only adds validation if email field exists."""
schema["properties"]["email"]["format"] = "email"
return schema
Testing Transformers¶
Test transformers in isolation:
def test_add_examples_transformer() -> None:
"""Test that examples are added correctly."""
# Create a simple schema
schema = {
"type": "object",
"properties": {
"name": {"type": "string"}
}
}
# Apply transformer
result = add_examples(schema)
# Verify examples added
assert "examples" in result
assert len(result["examples"]) > 0
assert "name" in result["examples"][0]
def test_transformer_doesnt_mutate_input() -> None:
"""Test that transformer doesn't mutate input schema."""
original = {
"type": "object",
"properties": {"name": {"type": "string"}}
}
# Make a copy to compare
import copy
original_copy = copy.deepcopy(original)
# Apply transformer
result = add_examples(original)
# Original should be unchanged
assert original == original_copy
# Result should be different
assert result != original
assert "examples" in result
def test_transformer_with_manager() -> None:
"""Test transformer integration with ModelManager."""
manager = ModelManager()
@manager.model("Test", "1.0.0")
class TestV1(BaseModel):
value: str
@manager.schema_transformer("Test", "1.0.0")
def add_test_metadata(schema: JsonSchema) -> JsonSchema:
schema["x-test"] = True
return schema
schema = manager.get_schema("Test", "1.0.0")
assert "x-test" in schema
assert schema["x-test"] is True
Transformers vs Custom Generators¶
When to Use Transformers¶
✅ Use transformers when:
- Adding metadata or examples
- Model-specific customizations
- Simple field modifications
- You don't need to change core generation logic
- You want composable, reusable functions
- Changes are independent of Pydantic's generation
Example scenarios:
- Adding x-* extension fields
- Adding examples
- Enhancing descriptions
- Adding UI hints
- Environment-specific modifications
When to Use Custom Generators¶
✅ Use custom generators when:
- Need to change how Pydantic generates schemas
- Global changes affecting all schemas
- Deep integration with schema generation
- Need access to Pydantic's internal schema structure
- Changing core schema structure or dialect
Example scenarios:
- Using different JSON Schema dialect
- Changing how $defs are structured
- Global metadata for all schemas
- Deep format changes
- Integration with schema generation process
Combining Both¶
Use generators for global changes, transformers for model-specific:
from pydantic.json_schema import GenerateJsonSchema
class GlobalGenerator(GenerateJsonSchema):
"""Global changes for all schemas."""
def generate(
self, schema: Mapping[str, Any], mode: JsonSchemaMode = "validation"
) -> JsonSchema:
json_schema = super().generate(schema, mode=mode)
json_schema["$schema"] = self.schema_dialect
json_schema["x-generated-by"] = "pyrmute"
return json_schema
manager = ModelManager(
default_schema_config=SchemaConfig(schema_generator=GlobalGenerator)
)
# Model-specific transformers
@manager.schema_transformer("User", "1.0.0")
def add_user_metadata(schema: JsonSchema) -> JsonSchema:
"""User-specific metadata."""
schema["x-requires-auth"] = True
return schema
@manager.schema_transformer("Product", "1.0.0")
def add_product_metadata(schema: JsonSchema) -> JsonSchema:
"""Product-specific metadata."""
schema["x-cacheable"] = True
return schema
Common Patterns Library¶
Add Field Constraints¶
def add_string_constraints(
min_length: int = 1,
max_length: int = 1000,
pattern: str | None = None
) -> Callable[[JsonSchema], JsonSchema]:
"""Add constraints to all string fields."""
def transformer(schema: JsonSchema) -> JsonSchema:
if "properties" in schema:
for field_schema in schema["properties"].values():
if field_schema.get("type") == "string":
if "minLength" not in field_schema:
field_schema["minLength"] = min_length
if "maxLength" not in field_schema:
field_schema["maxLength"] = max_length
if pattern and "pattern" not in field_schema:
field_schema["pattern"] = pattern
return schema
return transformer
Add Documentation Links¶
def add_documentation_links(base_url: str) -> Callable[[JsonSchema], JsonSchema]:
"""Add documentation links to schemas."""
def transformer(schema: JsonSchema) -> JsonSchema:
title = schema.get("title", "Unknown")
schema["x-docs"] = f"{base_url}/models/{title.lower()}"
if "properties" in schema:
for field_name in schema["properties"]:
schema["properties"][field_name]["x-docs"] = (
f"{base_url}/models/{title.lower()}#{field_name}"
)
return schema
return transformer
Add Localization Support¶
def add_i18n_support(
translations: dict[str, Any]
) -> Callable[[JsonSchema], JsonSchema]:
"""Add internationalization metadata."""
def transformer(schema: JsonSchema) -> JsonSchema:
model_name = schema.get("title", "")
if model_name in translations:
schema["x-i18n"] = translations[model_name]
if "properties" in schema:
for field_name, field_schema in schema["properties"].items():
if field_name in translations.get(model_name, {}):
field_schema["x-i18n"] = translations[model_name][field_name]
return schema
return transformer
# Usage
translations = {
"User": {
"en": {"title": "User", "description": "User account"},
"es": {"title": "Usuario", "description": "Cuenta de usuario"},
"name": {
"en": {"label": "Name", "placeholder": "Enter name"},
"es": {"label": "Nombre", "placeholder": "Ingrese nombre"}
}
}
}
@manager.schema_transformer("User", "1.0.0")
def _(schema: JsonSchema) -> JsonSchema:
return add_i18n_support(translations)(schema)
Best Practices¶
- Don't mutate input - Create modified copies, don't alter input schema
- Keep transformers simple - One responsibility per transformer
- Make them reusable - Use factory functions for flexibility
- Test independently - Each transformer should be testable in isolation
- Document what changes - Clear docstrings explaining modifications
- Consider order - Transformers run in registration order
- Use for model-specific changes - Global changes belong in custom generators
Common Pitfalls¶
Mutating Input Schema¶
# ❌ BAD - Mutates input
def bad_transformer(schema: JsonSchema) -> JsonSchema:
schema["x-custom"] = "value" # Mutates input!
return schema
# ✅ GOOD - Creates new dict
def good_transformer(schema: JsonSchema) -> JsonSchema:
return {**schema, "x-custom": "value"}
# ✅ ALSO GOOD - Modifies safely
def also_good_transformer(schema: JsonSchema) -> JsonSchema:
schema = schema.copy() # or import copy; copy.deepcopy(schema)
schema["x-custom"] = "value"
return schema
Assuming Schema Structure¶
# ❌ BAD - Assumes properties exists
def bad_transformer(schema: JsonSchema) -> JsonSchema:
schema["properties"]["name"]["x-label"] = "Name" # KeyError if missing!
return schema
# ✅ GOOD - Checks first
def good_transformer(schema: JsonSchema) -> JsonSchema:
if "properties" in schema and "name" in schema["properties"]:
schema["properties"]["name"]["x-label"] = "Name"
return schema
Overwriting Existing Fields¶
# ❌ BAD - Overwrites examples
def bad_transformer(schema: JsonSchema) -> JsonSchema:
schema["examples"] = [{"new": "example"}] # Loses existing!
return schema
# ✅ GOOD - Preserves existing
def good_transformer(schema: JsonSchema) -> JsonSchema:
existing_examples = schema.get("examples", [])
new_example = {"new": "example"}
schema["examples"] = existing_examples + [new_example]
return schema
Next Steps¶
Now that you understand schema transformers:
Related topics:
- Custom Schema Generators - For deeper customization
- Schema Generation - Using transformers
API Reference:
- SchemaConfig API - Complete SchemaConfig details