Image by the author
Writing efficient Python code is important to optimize performance and resource usage, whether you're working on data science projects, building web applications, or working on other programming tasks.
By using Python's powerful features and best practices, you can reduce computational time and improve the responsiveness and maintainability of your applications.
In this tutorial, we'll explore five essential tips that will help you write more efficient Python code by providing coding examples for each. Let's get started.
1. Use list comprehensions instead of loops
You can use list comprehensions to create lists from existing lists and other iterables, such as strings and tuples. They are generally more concise and faster than the usual loops for list operations.
Let's say we have a dataset of user information and we want to extract the names of users who have a score greater than 85.
Using a loop
First, we'll do this using a for loop and an if statement:
data = ({'name': 'Alice', 'age': 25, 'score': 90},
{'name': 'Bob', 'age': 30, 'score': 85},
{'name': 'Charlie', 'age': 22, 'score': 95})
# Using a loop
result = ()
for row in data:
if row('score') > 85:
result.append(row('name'))
print(result)
You should get the following result:
Output >>> ('Alice', 'Charlie')
Using a list comprehension
Now, let's rewrite it using a list comprehension. You can use the generic syntax (output for input in iterable if condition)
like:
data = ({'name': 'Alice', 'age': 25, 'score': 90},
{'name': 'Bob', 'age': 30, 'score': 85},
{'name': 'Charlie', 'age': 22, 'score': 95})
# Using a list comprehension
result = (row('name') for row in data if row('score') > 85)
print(result)
Which should give you the same result:
Output >>> ('Alice', 'Charlie')
As you can see, the list comprehension version is more concise and easier to maintain. You can try other examples and profile your code with timeit to compare the execution times of loops versus list comprehensions.
Therefore, list comprehensions allow you to write more readable and efficient Python code, especially in list transformation and filtering operations. But be careful not to overuse them. Read Why You Shouldn't Overuse List Comprehensions in Python to learn why overusing them can become too much of a good thing.
2. Use generators for efficient data processing
You can use generators in Python to iterate over large data sets and sequences without having to store them all in memory in advance. This is particularly useful in applications where memory efficiency is important.
Unlike regular Python functions that use the return
Keyword to return the entire sequence, generator functions generate a generator object, which you can then loop through to get individual elements, on demand and one at a time.
Suppose we have a large CSV file with user data and we want to process each row, one at a time, without loading the entire file into memory at once.
Here is the generator function for this:
import csv
from typing import Generator, Dict
def read_large_csv_with_generator(file_path: str) -> Generator(Dict(str, str), None, None):
with open(file_path, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
yield row
# Path to a sample CSV file
file_path="large_data.csv"
for row in read_large_csv_with_generator(file_path):
print(row)
NoteRemember to replace 'large_data.csv' with the path to your file in the above snippet.
As you can see, using generators is especially useful when working with streaming data or when the size of the dataset exceeds the available memory.
For a more detailed review of generators, read Introduction to Python Generators.
3. Cache expensive function calls
Caching can significantly improve performance by storing the results of expensive function calls and reusing them when the function is called again with the same inputs.
Suppose you are coding a k-means clustering algorithm from scratch and you want to cache the computed Euclidean distances. Here is how you can cache function calls with the @cache
decorator:
from functools import cache
from typing import Tuple
import numpy as np
@cache
def euclidean_distance(pt1: Tuple(float, float), pt2: Tuple(float, float)) -> float:
return np.sqrt((pt1(0) - pt2(0)) ** 2 + (pt1(1) - pt2(1)) ** 2)
def assign_clusters(data: np.ndarray, centroids: np.ndarray) -> np.ndarray:
clusters = np.zeros(data.shape(0))
for i, point in enumerate(data):
distances = (euclidean_distance(tuple(point), tuple(centroid)) for centroid in centroids)
clusters(i) = np.argmin(distances)
return clusters
Let's take the following function call example:
data = np.array(((1.0, 2.0), (2.0, 3.0), (3.0, 4.0), (8.0, 9.0), (9.0, 10.0)))
centroids = np.array(((2.0, 3.0), (8.0, 9.0)))
print(assign_clusters(data, centroids))
What outputs:
Outputs >>> (0. 0. 0. 1. 1.)
For more information, read How to Speed Up Python Code with Caching.
4. Use context managers for resource management
In Python, context managers ensure that resources (such as files, database connections, and threads) are managed properly after use.
Let's say you need to query a database and you want to make sure the connection is closed properly after use:
import sqlite3
def query_db(db_path):
with sqlite3.connect(db_path) as conn:
cursor = conn.cursor()
cursor.execute(query)
for row in cursor.fetchall():
yield row
Now you can try to run queries on the database:
query = "SELECT * FROM users"
for row in query_database('people.db', query):
print(row)
To learn more about the uses of context managers, read 3 Interesting Uses of Python Context Managers.
5. Vectorize operations using NumPy
NumPy allows you to perform element-by-element operations on arrays (like operations on vectors) without the need for explicit loops. This is often much faster than loops because NumPy uses C in the background.
Let's say we have two large arrays representing the scores of two different tests and we want to calculate the average score for each student. Let's do this using a loop:
import numpy as np
# Sample data
scores_test1 = np.random.randint(0, 100, size=1000000)
scores_test2 = np.random.randint(0, 100, size=1000000)
# Using a loop
average_scores_loop = ()
for i in range(len(scores_test1)):
average_scores_loop.append((scores_test1(i) + scores_test2(i)) / 2)
print(average_scores_loop(:10))
Here's how you can rewrite them with NumPy's vectorized operations:
# Using NumPy vectorized operations
average_scores_vectorized = (scores_test1 + scores_test2) / 2
print(average_scores_vectorized(:10))
Loops vs. Vectorized Operations
Let's measure the loop execution times and NumPy versions using timeit:
setup = """
import numpy as np
scores_test1 = np.random.randint(0, 100, size=1000000)
scores_test2 = np.random.randint(0, 100, size=1000000)
"""
loop_code = """
average_scores_loop = ()
for i in range(len(scores_test1)):
average_scores_loop.append((scores_test1(i) + scores_test2(i)) / 2)
"""
vectorized_code = """
average_scores_vectorized = (scores_test1 + scores_test2) / 2
"""
loop_time = timeit.timeit(stmt=loop_code, setup=setup, number=10)
vectorized_time = timeit.timeit(stmt=vectorized_code, setup=setup, number=10)
print(f"Loop time: {loop_time:.6f} seconds")
print(f"Vectorized time: {vectorized_time:.6f} seconds")
As you can see, vectorized operations with Numpy are much faster than the loop version:
Output >>>
Loop time: 4.212010 seconds
Vectorized time: 0.047994 seconds
Ending
That's all for this tutorial!
We review the following tips (using list comprehensions instead of loops, leveraging generators for efficient processing, caching expensive function calls, managing resources with context managers, and vectorizing operations with NumPy) that can help optimize the performance of your code.
If you're looking for specific tips for data science projects, read 5 Python Best Practices for Data Science.
twitter.com/balawc27″ rel=”noopener”>Bala Priya C. Bala is a technical developer and writer from India. She enjoys working at the intersection of mathematics, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, programming, and drinking coffee! Currently, she is working on learning and sharing her knowledge with the developer community by creating tutorials, how-to guides, opinion pieces, and more. Bala also creates interesting resource overviews and programming tutorials.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>