MarshMallow: The most complete Python library for data serialization and validation

Author's image | Leonardo ai and Canva

Data serialization is a basic programming concept with great value in everyday programs. It refers to the conversion of complex data objects into an intermediate format that can be easily saved and converted back to its original form. However, common Python data serialization libraries such as JSON and pickle have very limited functionality. With structured programs and object-oriented programming, we need more robust support for handling classes of data.

Marshmallow is one of the most famous data handling libraries, widely used by Python developers to develop robust software applications. It supports data serialization and provides a robust abstract solution to handle data validation in an object-oriented paradigm.

In this article, we use a running example shown below to understand how to use Marshmallow in existing projects. The code shows three classes that represent a simple e-commerce model: Product, Customerand OrderEach class minimally defines its parameters. We will see how to save an instance of an object and make sure that it is correct when we try to load it again in our code.

from typing import List

class Product:
    def __init__(self, _id: int, name: str, price: float):
    	self._id = _id
    	self.name = name
    	self.price = price

class Customer:
    def __init__(self, _id: int, name: str):
    	self._id = _id
    	self.name = name

class Order:
    def __init__(self, _id: int, customer: Customer, products: List(Product)):
    	self._id = _id
    	self.customer = customer
    	self.products = products

Introduction to Marshmallow

Facility

Marshmallow is available as a Python library on PyPI and can be easily installed using pip. To install or update the Marshmallow dependency, run the following command:

pip install -U marshmallow

This installs the recent stable version of Marshmallow into the live environment. If you want the development version of the library with all the latest features, you can install it with the following command:

pip install -U git+https://github.com/marshmallow-code/marshmallow.git@dev

Creating Schemas

Let's start by adding Marshmallow functionality to the Product class. We need to create a new class that represents a schema, an instance of the Product The class should follow. Think of an outline as a blueprint that defines the variables in the Product class and the data type to which they belong.

Let's break down and understand the basic code below:

from marshmallow import Schema, fields

class ProductSchema(Schema):
    _id = fields.Int(required=True)
    name = fields.Str(required=True)
    price = fields.Float(required=True)

We create a new class that inherits from the Schema class in Marshmallow. Then, we declare the same variable names as our Product Class and define its field types. The Fields class in Marshmallow supports several data types; here, we use the primitive types Int, String, and Float.

Serialization

Now that we have a schema defined for our object, we can convert a Python class instance into a JSON string or Python dictionary for serialization. Here is the basic implementation:

product = Product(_id=4, name="Test Product", price=10.6)
schema = ProductSchema()
    
# For Python Dictionary object
result = schema.dump(product)

# type(dict) -> {'_id': 4, 'name': 'Test Product', 'price': 10.6}

# For JSON-serializable string
result = schema.dumps(product)

# type(str) -> {"_id": 4, "name": "Test Product", "price": 10.6}

We create an object of our own ProductSchemawhich converts a Product object to a serializable format such as JSON or dictionary.

Please note the difference between dump and dumps Function results. One returns a Python dictionary object that can be saved using pickle and the other returns a string object that follows the JSON format.

Deserialization

To reverse the serialization process, we use deserialization. An object is saved so that it can be loaded and accessed later, and Marshmallow helps with that.

A Python dictionary can be validated using the load function, which checks variables and their associated data types. The following function shows how this works:

product_data = {
    "_id": 4,
    "name": "Test Product",
    "price": 50.4,
}
result = schema.load(product_data)
print(result)  	

# type(dict) -> {'_id': 4, 'name': 'Test Product', 'price': 50.4}

faulty_data = {
    "_id": 5,
    "name": "Test Product",
    "price": "ABCD" # Wrong input datatype
}
result = schema.load(faulty_data) 

# Raises validation error

The schema validates that the dictionary has the correct parameters and data types. If the validation fails, a ValidationError rises so it is essential to wrap the load function in a try-except block. If successful, the resulting object is still a dictionary when the original argument is also a dictionary. Not that useful, is it? What we usually want is to validate the dictionary and convert it back to the original object it was serialized from.

To achieve this, we use the post_load Decorator provided by Marshmallow:

from marshmallow import Schema, fields, post_load

class ProductSchema(Schema):
    _id = fields.Int(required=True)
    name = fields.Str(required=True)
    price = fields.Float(required=True)

@post_load
def create_product(self, data, **kwargs):
    return Product(**data)

We create a function in the schema class with the post_load Decorator. This function takes the validated dictionary and converts it back into a Product object. It includes **kwargs This is important because Marshmallow can pass additional necessary arguments through the decorator.

This modification to the loading functionality ensures that after validation, the Python dictionary is passed to the post_load function, which creates a Product dictionary object. This allows an object to be deserialized using Marshmallow.

Validation

Often, we need additional validation specific to our use case. While data type validation is essential, it does not cover all the validation we might need. Even in this simple example, additional validation is needed for our use case. Product object. We need to make sure that the price is not less than 0. We can also define more rules, such as making sure that our product name is between 3 and 128 characters long. These rules help ensure that our codebase conforms to a defined database schema.

Let's now see how we can implement this validation using Marshmallow:

from marshmallow import Schema, fields, validates, ValidationError, post_load

class ProductSchema(Schema):
    _id = fields.Int(required=True)
    name = fields.Str(required=True)
    price = fields.Float(required=True)

@post_load
def create_product(self, data, **kwargs):
    return Product(**data)


@validates('price')
def validate_price(self, value):
    if value <= 0:
        raise ValidationError('Price must be greater than zero.')

@validates('name')
def validate_name(self, value):
    if len(value) < 3 or len(value) > 128:
        raise ValidationError('Name of Product must be between 3 and 128 letters.')

We modified the ProductSchema class to add two new functions. One validates the price parameter and the other validates the name parameter. We use the function-validate decorator and note the name of the variable that the function is supposed to validate. The implementation of these functions is simple: if the value is incorrect, we throw a ValidationError.

Nested Schemas

Now, with the basics Product Class Validation: We have covered all the basic functionality that the Marshmallow library offers. Now, let’s build the complexity and see how the other two classes will be validated.

He Customer The class is quite simple as it contains basic attributes and primitive data types.

class CustomerSchema(Schema):
    _id = fields.Int(required=True)
    name = fields.Int(required=True)

However, defining the scheme for the Order The class forces us to learn a new and necessary concept of nested schemas. An order will be associated with a specific customer and the customer can order any number of products. This is defined in the class definition and when we validate the Order schema, we also need to validate the Product and Customer objects that are passed to him.

Instead of redefining everything in the OrderSchemaWe will avoid repetition and use nested schemes. The sorting scheme is defined as follows:

class OrderSchema(Schema):
    _id = fields.Int(require=True)
    customer = fields.Nested(CustomerSchema, required=True)
    products = fields.List(fields.Nested(ProductSchema), required=True)

Within Order scheme, we include the ProductSchema and CustomerSchema definitions. This ensures that validations defined for these schemas are applied automatically, following the DRY (Don't repeat yourself) principle in programming, which allows the reuse of existing code.

Ending

In this article, we cover the quick start and use case of Marshmallow library, one of the most popular data validation and serialization libraries in Python. Although it is similar to Pydantic, many developers prefer Marshmallow because of its schema definition method, which resembles validation libraries in other languages such as JavaScript.

Marshmallow is easy to integrate with Python backend frameworks like FastAPI and Flask, making it a popular choice for web frameworks and data validation tasks, as well as ORMs like SQLAlchemy.

Kanwal Mehreen Kanwal is a machine learning engineer and technical writer with a deep passion for data science and the intersection of ai with medicine. She is the co-author of the eBook “Maximizing Productivity with ChatGPT.” As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She is also recognized as a Teradata Diversity in tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change and founded FEMCodes to empower women in STEM fields.

MarshMallow: The most complete Python library for data serialization and validation

Technical Terrence Team

If you had invested £10,000 in Greggs shares at the beginning of 2024, this is how much you would have now

Leave a Reply Cancel reply

Recommended.

Illuvium and Samsung team up to showcase their products at Gamescom 2024

The best President's Day tech deals on Apple, Samsung, Dyson, Blink and more

Getting cheaper, getting higher? Ethereum Dencun Update and the Potential for ETH to Break Above $4,000 Again

Meta could soon announce AI chatbots for young people on Instagram and Facebook

Penguiana announces the pre-sale of the Memecoin token for $PENGU, scheduled for Friday, May 4, 2024

Categories

Important Links

MarshMallow: The most complete Python library for data serialization and validation

Introduction to Marshmallow

Facility

Creating Schemas

Serialization

Deserialization

Validation

Nested Schemas

Ending

Related

Technical Terrence Team

If you had invested £10,000 in Greggs shares at the beginning of 2024, this is how much you would have now

Leave a Reply Cancel reply

Recommended.

Illuvium and Samsung team up to showcase their products at Gamescom 2024

The best President's Day tech deals on Apple, Samsung, Dyson, Blink and more

Getting cheaper, getting higher? Ethereum Dencun Update and the Potential for ETH to Break Above $4,000 Again

Meta could soon announce AI chatbots for young people on Instagram and Facebook

Penguiana announces the pre-sale of the Memecoin token for $PENGU, scheduled for Friday, May 4, 2024

Categories

Important Links

Get daily news updates to your inbox!