Image generated by DALL·E 3
Data scientists were in an exciting position; Although their job in the modern era requires them to use the programming language, there are still many business aspects that their job must remember. That's why Python code used by data scientists typically reflects a narrative about how to solve a business problem. The environment for data scientists is also notable; We use the Jupyter Notebook IDE, which allows a great way to experiment with data manipulation and model development.
With a different way of coding the activity, data scientists would do things differently during the programming activity. It includes the comments activity, which is an activity to explain your code. For data scientists who constantly change their requirements and work collaboratively, it is crucial to provide a proper explanation of the code through comments.
This article will discuss how to comment Python code as a data scientist. We will discuss the various points that would improve your activity and add value to anyone who reads your codes. Let's get into it.
Before we continue, let's learn a little about two different types of comments. The first is the single line comment, which uses the '#' notation in the code. Generally used for a simple explanation of the code. For example, the following code exemplifies the use of single-line comments.
# The code is to import the Pandas package and call it pd
import pandas as pd
The other way to comment is by using the multi-line method, which uses triple quotes. Technically they are not comments but string objects, but Python would ignore them if we do not assign them to a variable. We can see them in action with the following example.
"""
The code below would import the Pandas package, and we would call them pd throughout the whole working environment.
"""
import pandas as pd
In this section, we will discuss some general tips for commenting. It's not necessarily applicable for data scientists, as these tips are best practice for programmers, but it's good to remember. The tips are:
- Consider placing the comment on a separate line directly above the code we want to explain to increase readability.
- Consistent comment style throughout the code you are working on.
- Avoid using jargon and hard-to-understand technical terms if you know your audience wouldn't understand them.
- Only comment if it adds value to avoid explaining something so obvious.
- Please keep and update the comment if it is no longer relevant.
These are general guidelines to provide a better commenting experience. Now, let's move on to one more specific for the data scientist.
For the data scientist, the coding activity would be different from that of a software engineer or web developer. That is why there would be differences in the commenting activity. Here are some tips that are specific to us data scientists.
1. Use comments to clarify complex processes or activities.
Data science activity would involve many experimental processes that could confuse readers or ourselves in the future if we do not explain them. Commenting on the code would help us better explain the intent, especially if there are many steps involved. For example, the following code would explain how we remove outliers using normalization and scaling.
# Perform data normalization (Min-Max scaling)
normalized_data = (data - np.min(data)) / (np.max(data) - np.min(data))
# Remove outliers by using the sigma rule (3 standard deviations removal)
removed_outlier_data = normalized_data(np.abs(stats.zscore(normalized_data)) < 3)
The comment above explains what was done for each process and the concept behind them. Specifying the concepts we use in the code is essential to understanding what we have done.
It is not limited to preprocessing but can be commented on any data science step. From data retrieval to model monitoring, commenting things out so anyone can understand them is a good practice. Remember that as a data scientist, our comment could become the bridge between code and analytical knowledge.
2. Have a standard for commenting
Data science activity is a collaborative process, so it is good to have a standard structure that everyone understands. It is also useful even if you work alone as you have the standard that you would know. For example, you could standardize the comment for each function you perform.
# Function: name of the function
# Usage: description of how to use the function
# Parameters: list the parameters and explain them
# Output: explain the output
The above is a standard example, since you can create something independently. Don't forget to use the same style, language and abbreviations when you have a standard like this.
3. Use comments to help workflow
In a collaborative environment, feedback is essential to help the team understand the workflow. We can use the comment to help us understand when there are new code updates or what to do next. For example, an update to another feature causes errors in our process, so we next need to fix the errors.
# TODO: Fix this function ASAP
some_function_to_fix()
4. Implement Markdown notebook cells
Data Scientist IDE is quite notable as we use Notebook for experimenting. Using the cell in the notebook, we can isolate each code so that it can run independently without needing to run the entire code. The notebook cell is not limited to code, but can be transformed into a Markdown cell.
Markdown is a formatting language that describes how text should look. In the cell, Markdown could explain the following code in more detail. The advantage of using Markdown is that we can comment in more detail than the standard commenting process. You can even add tables, images, LaTeX and much more.
For example, the image below shows how we used Markdown to explain our project, goal, and steps.
You can read more about Jupyter Markdown Cell in its documentation to better understand what it can do.
Comments are an integral part of the data scientist's activity, as they help the reader clarify what happened to the code. For a data scientist, the feedback process differs slightly from that of a software engineer or web developer, as our work process is different. That's why this article provides some tips you can use when commenting as a data scientist. The tips are:
- Use comments to clarify complex processes or activities
- Have a comment standard
- Use comments to help workflow
- Implement Markdown notebook cells
I hope that helps.
Cornellius Yudha Wijaya He is an assistant data science manager and data writer. While working full-time at Allianz Indonesia, she loves sharing Python tips and data through social media and print media.