Pydantic: Simplifying Data Validation in Python
Pydantic is a powerful Python library that uses type annotations to validate data structures. Learn about the powerful features of Pydantic with code examples.
Join the DZone community and get the full member experience.
Join For FreeWhile exploring AI agents, I came across two interesting libraries - Pydantic and Logfire. In this article, you will learn about Pydantic with code examples and understand what Pydantic brings to the table in the world of Data validation for Python developers.
Pydantic is a powerful Python library that uses type annotations to validate data structures. It's become an essential tool for many Python developers, especially those working on web applications, APIs, and data-intensive projects.
Why You Need Pydantic
- Robust data validation. Pydantic ensures your data matches expected types and formats.
- Improved code readability. Type hints make your code's intent clearer.
- Automatic error handling. Get detailed error messages for invalid data.
- High performance. Pydantic is optimized for speed.
- Easy integration. Works well with popular frameworks like FastAPI and Django.
Core Features and Examples
1. Basic Model Definition
Pydantic's core functionality is data validation. It uses Python-type hints to automatically validate the structure and types of data. When you define a Pydantic model, each field is annotated with its expected type. Pydantic then ensures that any data assigned to these fields conforms to the specified types.
Let's start with a simple example:
from pydantic import BaseModel
from typing import List, Optional
import logfire
class User(BaseModel):
username: str
age: int
email: str
is_active: bool = True
tags: List[str] = []
profile_picture: Optional[str] = None
logfire.configure()
user = User(username="johndoe", age=30, email="john@example.com", tags=["python", "developer"])
logfire.info(f"{user}")
In this case, Pydantic will ensure that name
and email
are strings, and age
is an integer. If you try to create a User
with invalid data types, Pydantic will raise a ValidationError
.
You can see that I used logfire, I will discuss logfire in another article. For starters, logfire is an observability platform from Pydantic.
Pydantic can coerce data into the expected types. For example, if you set age="30". There won't be any ValidationError as Pydantic will coerce the age to an Integer. For strict types, refer to Pydantic documentation.
This is how the code looks in an IDE with Pydantic installed using pip install pydantic
.
Another simple example:
from typing import List, Optional
from pydantic import BaseModel
class Book(BaseModel):
title: str
author: str
publication_year: int
isbn: str
genres: List[str]
description: Optional[str] = None
# Creating a valid book instance
book = Book(
title="The Hitchhiker's Guide to the Galaxy",
author="Douglas Adams",
publication_year=1979,
isbn="0-330-25864-8",
genres=["Science Fiction", "Comedy"],
)
print(book)
This example defines a Book
model with various fields. Pydantic will ensure that all required fields are provided and that they match the specified types.
2. Nested Models
Pydantic supports a wide range of data types, including nested models, lists, dictionaries, and more. You can create complex data structures that mirror your application's needs. It allows for nested models, which is great for complex data structures:
from pydantic import BaseModel
from typing import List
class Address(BaseModel):
street: str
city: str
country: str
postal_code: str
class Author(BaseModel):
name: str
age: int
address: Address
class Book(BaseModel):
title: str
author: Author
genres: List[str]
# Creating a book with nested author and address
book = Book(
title="1984",
author=Author(
name="George Orwell",
age=46,
address=Address(
street="50 Lawford Road",
city="London",
country="United Kingdom",
postal_code="N1 5BJ"
)
),
genres=["Dystopian", "Political Fiction"]
)
print(book)
This example shows how you can nest models within each other, allowing for complex, hierarchical data structures.
3. Custom Validators
Pydantic allows you to define custom validation logic using decorators. This is useful for implementing business rules or complex validation scenarios:
import re
from pydantic import BaseModel, field_validator
class User(BaseModel):
username: str
email: str
password: str
@field_validator('username')
def username_alphanumeric(cls, v):
assert v.isalnum(), 'Username must be alphanumeric'
return v
@field_validator('email')
def email_valid(cls, v):
regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
assert re.match(regex, v), 'Invalid email format'
return v
@field_validator('password')
def password_strength(cls, v):
assert len(v) >= 8, 'Password must be at least 8 characters'
assert any(c.isupper() for c in v), 'Password must contain an uppercase letter'
assert any(c.islower() for c in v), 'Password must contain a lowercase letter'
assert any(c.isdigit() for c in v), 'Password must contain a digit'
return v
# Try creating users
try:
user1 = User(username="john_doe", email="john@example.com", password="StrongPass1")
print("Valid user:", user1)
except ValueError as e:
print("Validation error:", e)
try:
user2 = User(username="alice!", email="invalid-email", password="weak")
except ValueError as e:
print("Validation error:", e)
This example demonstrates custom validators for username, email, and password fields, ensuring that the data meets specific criteria.
4. Config and Field Constraints
Pydantic models can be configured with various options to control their behavior. You can also add constraints to individual fields. It offers various ways to configure models and add constraints to fields:
from typing import List
from pydantic import BaseModel, Field
class Product(BaseModel):
id: int = Field(..., gt=0)
name: str = Field(..., min_length=3, max_length=50)
price: float = Field(..., ge=0)
tags: List[str] = Field(default_factory=list, max_length=5)
class Config:
allow_mutation = False
extra = "forbid"
# Creating a valid product
product = Product(id=1, name="Laptop", price=999.99, tags=["electronics", "computer"])
print(product)
# Attempting to create an invalid product
try:
invalid_product = Product(
id=0, name="TV", price=-100, tags=["a", "b", "c", "d", "e", "f"]
)
except ValueError as e:
print("Validation error:", e)
# Attempting to modify the product (which is not allowed due to allow_mutation=False)
try:
product.price = 899.99
except AttributeError as e:
print("Modification error:", e)
This example shows how to use Field
for adding constraints to individual fields and how to configure the model behavior using the Config
class.
5. Working With JSON
Pydantic seamlessly integrates with JSON data. You can easily parse JSON into Pydantic models and convert models back to JSON:
from typing import List
from pydantic import BaseModel
class Comment(BaseModel):
id: int
text: str
class Post(BaseModel):
id: int
title: str
content: str
comments: List[Comment]
# JSON data
json_data = """
{
"id": 1,
"title": "Hello, Pydantic!",
"content": "This is a post about Pydantic.",
"comments": [
{"id": 1, "text": "Great post!"},
{"id": 2, "text": "Thanks for sharing."}
]
}
"""
# Parse JSON data into a Pydantic model
post = Post.model_validate_json(json_data)
print(post)
# Convert Pydantic model back to JSON
post_json = post.model_dump_json()
print(post_json)
This example demonstrates how Pydantic can easily parse JSON data into Python objects and vice versa.
Additional Features
Settings Management
Pydantic can be used to manage application settings, providing type-safe configuration handling:
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
database_url: str
api_key: str
debug_mode: bool = False
settings = Settings()
print(settings)
This allows you to load settings from environment variables or configuration files while ensuring type safety and providing default values.
Schema Generation
Pydantic can automatically generate JSON Schema from your models, which is useful for API documentation:
import json
print(json.dumps(User.model_json_schema()))
Output:
{
"properties": {
"username": {
"title": "Username",
"type": "string"
},
"age": {
"title": "Age",
"type": "integer"
},
"email": {
"title": "Email",
"type": "string"
},
"is_active": {
"default": true,
"title": "Is Active",
"type": "boolean"
},
"tags": {
"default": [],
"items": {
"type": "string"
},
"title": "Tags",
"type": "array"
},
"profile_picture": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Profile Picture"
}
},
"required": [
"username",
"age",
"email"
],
"title": "User",
"type": "object"
}
This feature is particularly valuable when working with OpenAPI (Swagger) specifications.
You can find the Jupyter notebook with the Pydantic code here.
Conclusion
Pydantic is a versatile and powerful tool for data validation in Python. Its use of type annotations makes it intuitive for Python developers, while its extensive features allow for complex validation scenarios. By utilizing Pydantic in your projects, you can ensure data integrity, improve code readability, and catch errors early in the development process. In building APIs, working with configuration files, or processing data from various sources, Pydantic can significantly simplify your data-handling tasks and make your code more robust and maintainable.
Opinions expressed by DZone contributors are their own.
Comments