Streamline Data Validation in Python with Cerberus
Written on
Introduction to Data Validation in Python
In Python programming, utilizing dictionaries to manage data objects is quite common, particularly in web development. When creating backend services, verifying JSON payloads from the front end is crucial. Similarly, data science applications often require data validation.
Traditionally, many developers resort to using extensive if-else statements for validation. While this method may work for a couple of straightforward attributes, it is not scalable. Object-oriented solutions might seem advanced, yet they can lead to unnecessary complexity in simpler applications.
This article highlights an impressive third-party library, Cerberus, which greatly simplifies validation processes. It offers reusable and flexible validation rules, accommodating various complex scenarios.
Getting Started with Cerberus
Installation
Installing third-party libraries in Python is straightforward. Simply execute the following pip command:
pip install cerberus
After installation, we can move on to practical usage.
Basic Usage
To begin, we need to import the Validator class from the Cerberus module:
from cerberus import Validator
Next, we define a "schema," which contains all the rules for validating our data dictionary:
schema = {
'name': {'type': 'string'},
'age': {'type': 'integer'}
}
This schema specifies that the dictionary should have two fields: "name" (a string) and "age" (an integer).
Now, we initialize our validator using this schema:
profile_validator = Validator(schema)
Let's create a sample dictionary to validate:
my_profile = {'name': 'Christopher Tao', 'age': 34}
profile_validator.validate(my_profile)
The validation returns True, indicating the dictionary meets the criteria. If we alter the "age" value to a string, the validation will fail.
Understanding Validation Errors
To identify the reasons for validation failure, we can access the error messages stored within the validator:
profile_validator.errors
More Complex Validation Rules
Cerberus supports more intricate validation rules beyond basic data types. For instance, if we want to ensure users are at least 18 years old, we can add a minimum age requirement to our schema:
profile_validator = Validator()
my_profile = {'name': 'Alice', 'age': 16}
profile_validator.validate(document=my_profile, schema={
'name': {'type': 'string'},
'age': {'type': 'integer', 'min': 18}
})
In this instance, the validator is initialized without a schema and instead receives a schema at the time of validation, allowing for more dynamic rule adjustments.
Validating Nested Dictionaries
Cerberus also accommodates nested dictionaries. If our profile includes an address with street number and name, we can define the schema as follows:
profile_validator.validate({
'name': 'Chris',
'address': {
'street_no': '1',
'street_name': 'One St'
}
}, {
'name': {'type': 'string'},
'address': {
'type': 'dict',
'schema': {
'street_no': {'type': 'integer'},
'street_name': {'type': 'string'}
}
}
})
Here, the "address" sub-document is specified as a dictionary with its own schema.
Handling Unknown Fields
A common challenge in data validation is dealing with unknown fields. By default, if our validator is set up to expect only a "name" field, any additional fields will cause validation to fail. However, Cerberus provides a robust solution for this issue.
Allowing Unknown Fields
To permit unknown fields to pass validation, we can set the allow_unknown attribute to True:
profile_validator.allow_unknown = True
profile_validator.validate({'name': 'Chris', 'age': 34})
Ignoring Specific Data Types
Alternatively, we may want to ignore unknown fields of particular data types, such as strings, while disallowing others like integers. We can accomplish this by defining the allow_unknown attribute accordingly.
Setting Allow Unknown at Initialization
If we anticipate needing to accept unknown fields, we can set the flag during validator initialization:
profile_validator = Validator({}, allow_unknown=True)
Allow Unknown for Sub-documents
We can also specify the allow_unknown setting at a sub-document level, maintaining strict validation at the root level while allowing flexibility in nested dictionaries.
Mandatory Fields
Cerberus enables us to enforce mandatory fields during validation. By default, if a required field is missing, no error will be raised. To ensure all fields are mandatory, we can set the require_all flag to True.
Customizing Required Fields
For specific fields, we can define them as required within the schema.
Normalizing Data
One of Cerberus's standout features is its ability to normalize data types. For instance, if we receive user profiles with varying representations of age (e.g., integer in one source and string in another), Cerberus can unify these types:
profile_validator = Validator({
'name': {'type': 'string'},
'age': {'coerce': int}
})
Normalizing the Dictionary
To normalize a dictionary, we call the normalize method:
my_profile_normalized = profile_validator.normalized(my_profile)
This will convert the age to an integer, ensuring consistency across our data.
Additional Validation Rules and Customization
Cerberus offers around 30 built-in validation rules, including those for regex matching, dependencies, and more. If existing rules do not meet specific needs, custom rules can be defined through functions.
Conclusion
In this article, we've explored the Cerberus library for Python, which provides a streamlined approach to validating dictionaries. Its flexibility, support for handling unknown fields, and ability to define custom validation rules make it an invaluable tool for any developer.