Author's image | Leonardo ai and Canva
Data serialization is a basic programming concept with great value in everyday programs. It refers to the conversion of complex data objects into an intermediate format that can be easily saved and converted back to its original form. However, common Python data serialization libraries such as JSON and pickle have very limited functionality. With structured programs and object-oriented programming, we need more robust support for handling classes of data.
Marshmallow is one of the most famous data handling libraries, widely used by Python developers to develop robust software applications. It supports data serialization and provides a robust abstract solution to handle data validation in an object-oriented paradigm.
In this article, we use a running example shown below to understand how to use Marshmallow in existing projects. The code shows three classes that represent a simple e-commerce model: Product
, Customer
and Order
Each class minimally defines its parameters. We will see how to save an instance of an object and make sure that it is correct when we try to load it again in our code.
from typing import List
class Product:
def __init__(self, _id: int, name: str, price: float):
self._id = _id
self.name = name
self.price = price
class Customer:
def __init__(self, _id: int, name: str):
self._id = _id
self.name = name
class Order:
def __init__(self, _id: int, customer: Customer, products: List(Product)):
self._id = _id
self.customer = customer
self.products = products
Introduction to Marshmallow
Facility
Marshmallow is available as a Python library on PyPI and can be easily installed using pip. To install or update the Marshmallow dependency, run the following command:
pip install -U marshmallow
This installs the recent stable version of Marshmallow into the live environment. If you want the development version of the library with all the latest features, you can install it with the following command:
pip install -U git+https://github.com/marshmallow-code/marshmallow.git@dev
Creating Schemas
Let's start by adding Marshmallow functionality to the Product
class. We need to create a new class that represents a schema, an instance of the Product
The class should follow. Think of an outline as a blueprint that defines the variables in the Product
class and the data type to which they belong.
Let's break down and understand the basic code below:
from marshmallow import Schema, fields
class ProductSchema(Schema):
_id = fields.Int(required=True)
name = fields.Str(required=True)
price = fields.Float(required=True)
We create a new class that inherits from the Schema
class in Marshmallow. Then, we declare the same variable names as our Product
Class and define its field types. The Fields class in Marshmallow supports several data types; here, we use the primitive types Int, String, and Float.
Serialization
Now that we have a schema defined for our object, we can convert a Python class instance into a JSON string or Python dictionary for serialization. Here is the basic implementation:
product = Product(_id=4, name="Test Product", price=10.6)
schema = ProductSchema()
# For Python Dictionary object
result = schema.dump(product)
# type(dict) -> {'_id': 4, 'name': 'Test Product', 'price': 10.6}
# For JSON-serializable string
result = schema.dumps(product)
# type(str) -> {"_id": 4, "name": "Test Product", "price": 10.6}
We create an object of our own ProductSchema
which converts a Product object to a serializable format such as JSON or dictionary.
Please note the difference between
dump
anddumps
Function results. One returns a Python dictionary object that can be saved using pickle and the other returns a string object that follows the JSON format.
Deserialization
To reverse the serialization process, we use deserialization. An object is saved so that it can be loaded and accessed later, and Marshmallow helps with that.
A Python dictionary can be validated using the load function, which checks variables and their associated data types. The following function shows how this works:
product_data = {
"_id": 4,
"name": "Test Product",
"price": 50.4,
}
result = schema.load(product_data)
print(result)
# type(dict) -> {'_id': 4, 'name': 'Test Product', 'price': 50.4}
faulty_data = {
"_id": 5,
"name": "Test Product",
"price": "ABCD" # Wrong input datatype
}
result = schema.load(faulty_data)
# Raises validation error
The schema validates that the dictionary has the correct parameters and data types. If the validation fails, a ValidationError
rises so it is essential to wrap the load function
in a try-except block. If successful, the resulting object is still a dictionary when the original argument is also a dictionary. Not that useful, is it? What we usually want is to validate the dictionary and convert it back to the original object it was serialized from.
To achieve this, we use the post_load
Decorator provided by Marshmallow:
from marshmallow import Schema, fields, post_load
class ProductSchema(Schema):
_id = fields.Int(required=True)
name = fields.Str(required=True)
price = fields.Float(required=True)
@post_load
def create_product(self, data, **kwargs):
return Product(**data)
We create a function in the schema class with the post_load
Decorator. This function takes the validated dictionary and converts it back into a Product object. It includes **kwargs
This is important because Marshmallow can pass additional necessary arguments through the decorator.
This modification to the loading functionality ensures that after validation, the Python dictionary is passed to the post_load
function, which creates a Product
dictionary object. This allows an object to be deserialized using Marshmallow.
Validation
Often, we need additional validation specific to our use case. While data type validation is essential, it does not cover all the validation we might need. Even in this simple example, additional validation is needed for our use case. Product
object. We need to make sure that the price is not less than 0. We can also define more rules, such as making sure that our product name is between 3 and 128 characters long. These rules help ensure that our codebase conforms to a defined database schema.
Let's now see how we can implement this validation using Marshmallow:
from marshmallow import Schema, fields, validates, ValidationError, post_load
class ProductSchema(Schema):
_id = fields.Int(required=True)
name = fields.Str(required=True)
price = fields.Float(required=True)
@post_load
def create_product(self, data, **kwargs):
return Product(**data)
@validates('price')
def validate_price(self, value):
if value <= 0:
raise ValidationError('Price must be greater than zero.')
@validates('name')
def validate_name(self, value):
if len(value) < 3 or len(value) > 128:
raise ValidationError('Name of Product must be between 3 and 128 letters.')
We modified the ProductSchema
class to add two new functions. One validates the price parameter and the other validates the name parameter. We use the function-validate decorator and note the name of the variable that the function is supposed to validate. The implementation of these functions is simple: if the value is incorrect, we throw a ValidationError
.
Nested Schemas
Now, with the basics Product
Class Validation: We have covered all the basic functionality that the Marshmallow library offers. Now, let’s build the complexity and see how the other two classes will be validated.
He Customer
The class is quite simple as it contains basic attributes and primitive data types.
class CustomerSchema(Schema):
_id = fields.Int(required=True)
name = fields.Int(required=True)
However, defining the scheme for the Order
The class forces us to learn a new and necessary concept of nested schemas. An order will be associated with a specific customer and the customer can order any number of products. This is defined in the class definition and when we validate the Order
schema, we also need to validate the Product
and Customer
objects that are passed to him.
Instead of redefining everything in the OrderSchema
We will avoid repetition and use nested schemes. The sorting scheme is defined as follows:
class OrderSchema(Schema):
_id = fields.Int(require=True)
customer = fields.Nested(CustomerSchema, required=True)
products = fields.List(fields.Nested(ProductSchema), required=True)
Within Order
scheme, we include the ProductSchema
and CustomerSchema
definitions. This ensures that validations defined for these schemas are applied automatically, following the DRY (Don't repeat yourself) principle in programming, which allows the reuse of existing code.
Ending
In this article, we cover the quick start and use case of Marshmallow library, one of the most popular data validation and serialization libraries in Python. Although it is similar to Pydantic, many developers prefer Marshmallow because of its schema definition method, which resembles validation libraries in other languages such as JavaScript.
Marshmallow is easy to integrate with Python backend frameworks like FastAPI and Flask, making it a popular choice for web frameworks and data validation tasks, as well as ORMs like SQLAlchemy.
Kanwal Mehreen Kanwal is a machine learning engineer and technical writer with a deep passion for data science and the intersection of ai with medicine. She is the co-author of the eBook “Maximizing Productivity with ChatGPT.” As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She is also recognized as a Teradata Diversity in tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change and founded FEMCodes to empower women in STEM fields.