Image by Author
Have you ever compared your Python code to that of experienced developers and felt a stark difference? Despite learning Python from online resources, there’s often a gap between beginner and expert-level code. That’s because experienced developers adhere to best practices established by the community. These practices are often overlooked in online tutorials but are crucial for large-scale applications. In this article, I will be sharing 7 tips that I use in my production code for clearer and more organized code.
1. Type Hinting and Annotations
Python is a dynamically typed programming language, where the variable types are inferred at runtime. While it allows for flexibility, it significantly reduces code readability and understanding in a collaborative setting.
Python provides support for type hinting in function declarations that serve as an annotation of the function argument types and the return types. Even though Python doesn’t enforce these types during runtime, it’s still helpful because it makes your code easier to understand for other people (and yourself!).
Starting with a basic example, here is a simple function declaration with type hinting:
def sum(a: int, b: int) -> int:
return a + b
Here, even though the function is fairly self-explanatory, we see that the function parameters and return values are denoted as int type. The function body could be a single line, as here, or several hundred lines. Yet, we can understand the pre-conditions and return types just by looking at the function declaration.
It’s important to know that these annotations are just for clarity and guidance; they don’t enforce the types during execution. So, even if you pass in values of different types, like strings instead of integers, the function will still run. But be cautious: if you don’t provide the expected types, it might lead to unexpected behavior or errors during runtime. For instance, in the provided example, the function sum() expects two integers as arguments. But if you try to add a string and an integer, Python will throw a runtime error. Why? Because it doesn’t know how to add a string and an integer together! It’s like trying to add apples and oranges – it just doesn’t make sense. However, if both arguments are strings, it will concatenate them without any issue.
Here’s the clarified version with test cases:
print(sum(2,5)) # 7
# print(sum('hello', 2)) # TypeError: can only concatenate str (not "int") to str
# print(sum(3,'world')) # TypeError: unsupported operand type(s) for +: 'int' and 'str'
print(sum('hello', 'world')) # helloworld
Typing Library for Advanced Type Hinting
For advanced annotations, Python includes the typing standard library. Let us see its use in a more interesting approach.
from typing import Union, Tuple, List
import numpy as np
def sum(variable: Union(np.ndarray, List)) -> float:
total = 0
# function body to calculate the sum of values in iterable
return total
Here, we alter the same summation function that now accepts a numpy array or list iterable. It computes and returns their sum as a floating-point value. We utilize the Union annotation from the typing library to specify the possible types that the variable parameter can accept.
Let us further change the function declaration to show that the list members should also be of type float.
def sum(variable: Union(np.ndarray, List(float))) -> float:
total = 0
# function body to calculate the sum of values in iterable
return total
These are just some beginner examples to help understand type hinting in Python. As projects grow, and codebases become more modular, type annotations significantly enhance readability and maintainability. The typing library offers a rich set of features including Optional, various iterables, Generics, and support for custom-defined types, empowering developers to express complex data structures and relationships with precision and clarity.
2. Writing Defensive Functions and Input Validation
Even though type-hinting seems helpful, it is still error-prone as the annotations are not enforced. These are just extra documentation for the developers but the function will still be executed if different argument types are used. Therefore, there is a need to enforce the pre-conditions for a function and code in a defensive manner. Hence, we manually check these types and raise appropriate errors if the conditions are violated.
The below function shows how interest is calculated using the input parameters.
def calculate_interest(principal, rate, years):
return principal * rate * years
It is a simple operation, yet will this function work for every possible solution? No, not for the edge cases where the invalid values are passed as input. We need to ensure that the input values are bound within a valid range for the function to execute correctly. In essence, some pre-conditions must be satisfied for the function implementation to be correct.
We do this as follows:
from typing import Union
def calculate_interest(
principal: Union(int, float),
rate: float,
years: int
) -> Union(int, float):
if not isinstance(principal, (int, float)):
raise TypeError("Principal must be an integer or float")
if not isinstance(rate, float):
raise TypeError("Rate must be a float")
if not isinstance(years, int):
raise TypeError("Years must be an integer")
if principal <= 0:
raise ValueError("Principal must be positive")
if rate <= 0:
raise ValueError("Rate must be positive")
if years <= 0:
raise ValueError("Years must be positive")
interest = principal * rate * years
return interest
Note, that we use conditional statements for input validation. Python also has assertion statements that are sometimes used for this purpose. However, assertions for input validation are not a best practice as they can disabled easily and will lead to unexpected behaviour in production. The use of explicit Python conditional expressions is preferable for input validation and enforcing pre-conditions, post-conditions, and code invariants.
3. Lazy Loading with Generators and Yield Statements
Consider a scenario, where you are provided with a large dataset of documents. You need to process the documents and perform certain operations on each document. However, due to the large size, you can not load all the documents in memory and pre-process them simultaneously.
A possible solution is to only load a document in memory when required and process only a single document at a time, also called lazy loading. Even though we know what documents we will need, we do not load a resource until it is required. There is no need to retain the bulk of documents in memory when they are not in active use in our code. This is exactly how generators and yield statements approach the problem.
Generators allow lazy-loading that improves the memory efficiency of Python code execution. Values are generated on the fly as needed, reducing memory footprint and increasing execution speed.
import os
def load_documents(directory):
for document_path in os.listdir(directory):
with open(document_path) as _file:
yield _file
def preprocess_document(document):
filtered_document = None
# preprocessing code for the document stored in filtered_document
return filtered_document
directory = "docs/"
for doc in load_documents(directory):
preprocess_document(doc)
In the above function, the load_documents function uses the yield keyword. The method returns an object of type . When we iterate over this object, it continues execution from where the last yield statement is. Therefore, a single document is loaded and processed, improving Python code efficiency.
4. Preventing Memory Leaks using Context Managers
For any language, efficient use of resources is of primary importance. We only load something in memory when required as explained above through the use of generators. However, it is equally important to close a resource when it is no longer needed by our program. We need to prevent memory leaks and perform proper resource teardown to save memory.
Context managers simplify the common use case of resource setup and teardown. It is important to release resources when they are not required anymore, even in case of exceptions and failures. Context managers reduce the risk of memory leaks using automatic cleanup while keeping the code concise and readable.
Resources can have multiple variants such as database connections, locks, threads, network connections, memory access, and file handles. Let’s focus on the simplest case: file handles. The challenge here is ensuring that each file opened is closed exactly once. Failure to close a file can lead to memory leaks, while attempting to close a file handle twice results in runtime errors. To address this, file handles should be wrapped inside a try-except-finally block. This ensures that the file is closed properly, regardless of whether an error occurs during execution. Here’s how the implementation might look:
file_path = "example.txt"
file = None
try:
file = open(file_path, 'r')
contents = file.read()
print("File contents:", contents)
finally:
if file is not None:
file.close()
However, Python provides a more elegant solution using context managers, which handle resource management automatically. Here’s how we can simplify the above code using the file context manager:
file_path = "example.txt"
with open(file_path, 'r') as file:
contents = file.read()
print("File contents:", contents)
In this version, we don’t need to explicitly close the file. The context manager takes care of it, preventing potential memory leaks.
While Python offers built-in context managers for file handling, we can also create our own for custom classes and functions. For class-based implementation, we define __enter__ and __exit__ dunder methods. Here’s a basic example:
class CustomContextManger:
def __enter__(self):
# Code to create instance of resource
return self
def __exit__(self, exc_type, exc_value, traceback):
# Teardown code to close resource
return None
Now, we can use this custom context manager within ‘with’ blocks:
with CustomContextManger() as _cm:
print("Custom Context Manager Resource can be accessed here")
This approach maintains the clean and concise syntax of context managers while allowing us to handle resources as needed.
5. Separation of Concern with Decorators
We often see multiple functions with the same logic implemented explicitly. This is a prevalent code smell, and excessive code duplication makes the code difficult to maintain and unscalable. Decorators are used to encapsulate similar functionality in a single place. When a similar functionality is to be used by multiple other functions, we can reduce code duplication by implementing common functionality within a decorator. It follows Aspect-Oriented Programming (AOP) and the Single Responsibility principle.
Decorators are heavily used in the Python web frameworks such as Django, Flask and FastAPI. Let me explain the effectiveness of decorators by using it as a middleware in Python for logging. In a production setting, we need to know how long it takes to service a request. It is a common use case and will be shared across all endpoints. So, let us implement a simple decorator-based middleware that will log the time taken to service a request.
The dummy function below is used to service a user request.
def service_request():
# Function body representing complex computation
return True
Now, we need to log the time it takes for this function to execute. One way is to add logging within this function as follows:
import time
def service_request():
start_time = time.time()
# Function body representing complex computation
print(f"Time Taken: {time.time() - start_time}s")
return True
While this approach works, it leads to code duplication. If we add more routes, we’d have to repeat the logging code in each function. This increases code duplication as this shared logging functionality needs to be added to each implementation. We remove this with the use of decorators.
The logging middleware will be implemented as below:
def request_logger(func):
def wrapper(*args, **kwargs):
start_time = time.time()
res = func()
print(f"Time Taken: {time.time() - start_time}s")
return res
return wrapper
In this implementation, the outer function is the decorator, which accepts a function as input. The inner function implements the logging functionality, and the input function is called within the wrapper.
Now, we simply decorate the original service_request function with our request_logger decorator:
@request_logger
def service_request():
# Function body representing complex computation
return True
Using the @ symbol passes the service_request function to the request_logger decorator. It logs the time taken and calls the original function without modifying its code. This separation of concerns allows us to easily add logging to other service methods in a similar manner like this:
@request_logger
def service_request():
# Function body representing complex computation
return True
@request_logger
def service_another_request():
# Function body
return True
6. Match Case Statements
Match statements were introduced in Python3.10 so it is a fairly new addition to the Python syntax. It allows for simpler and more readable pattern matching, preventing excessive boilerplate and branching in the typical if-elif-else statements.
For pattern-matching, match case statements are the more natural way of writing it as they do not necessarily need to return boolean values as in conditional statements. The following example from the Python documentation shows how match case statements offer flexibility over conditional statements.
def make_point_3d(pt):
match pt:
case (x, y):
return Point3d(x, y, 0)
case (x, y, z):
return Point3d(x, y, z)
case Point2d(x, y):
return Point3d(x, y, 0)
case Point3d(_, _, _):
return pt
case _:
raise TypeError("not a point we support")
As per the documentation, without pattern matching, this function’s implementation would require several isinstance() checks, one or two len() calls, and a more convoluted control flow. Under the hood, the match example and the traditional Python version translate into similar code. However, with familiarity with pattern matching, the match case approach is likely to be preferred as it provides a clearer and more natural syntax.
Overall, match case statements offer an improved alternative for pattern matching, which will likely become more prevalent in newer codebases.
7. External Configuration Files
In production, the majority of our code relies on external configuration parameters like API keys, passwords, and various settings. Hardcoding these values directly into the code is considered poor practice for scalability and security reasons. Instead, it’s crucial to keep configurations separate from the code itself. We commonly achieve this using configuration files such as JSON or YAML to store these parameters, ensuring they’re easily accessible to the code without being directly embedded within it.
An everyday use case is database connections that have multiple connection parameters. We can keep these parameters in a separate YAML file.
# config.yaml
database:
host: localhost
port: 5432
username: myuser
password: mypassword
dbname: mydatabase
To handle this configuration, we define a class called DatabaseConfig:
class DatabaseConfig:
def __init__(self, host, port, username, password, dbname):
self.host = host
self.port = port
self.username = username
self.password = password
self.dbname = dbname
@classmethod
def from_dict(cls, config_dict):
return cls(**config_dict)
Here, the from_dict class method serves as a builder method for the DatabaseConfig class, allowing us to create a database configuration instance from a dictionary.
In our main code, we can employ parameter hydration and the builder method to create a database configuration. By reading the external YAML file, we extract the database dictionary and use it to instantiate the config class:
import yaml
def load_config(filename):
with open(filename, "r") as file:
return yaml.safe_load(file)
config = load_config("config.yaml")
db_config = DatabaseConfig.from_dict(config("database"))
This approach eliminates the need for hardcoding database configuration parameters directly into the code. It also offers an improvement over using argument parsers, as we no longer need to pass multiple parameters every time we run our code. Moreover, by accessing the config file path through an argument parser, we can ensure that the code remains flexible and doesn’t rely on hardcoded paths. This method facilitates easier management of configuration parameters, which can be modified at any time without requiring changes to the codebase.
Ending Notes
In this article, we discussed some of the best practices used in the industry for production-ready code. These are common industry practices that alleviate multiple problems one can face in real-life situations.
Nonetheless, it is worth noting that despite all such best practices, documentation, docstrings, and test-driven development are by far the most essential practices. It is important to think about what a function is supposed to do and then document all design decisions and implementations for the future as people working on a codebase change over time. If you have any insights or practices you swear by, please do not hesitate to let us know in the comment section below.
Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of ai with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.