3 minute panda
What should we do to see the entire data frame printed after the execution of a python script?
Sometimes running a Python script without reporting any errors is not the only task of the debugging process. We need to make sure that the functions are executed as expected. It is a typical step in exploratory data analysis to check how the data looks before and after specific data processing.
So we need to print some essential data frames or variables during script execution to check if they are “correct”. However, the plain print command can only display the top and bottom rows of the data frame sometimes (as shown in the example below), making the verification procedure unnecessarily difficult.
Typically, data frames have the format of pandas.DataFrame
and if you use print command directly you can get something like this,
import pandas as pd
import numpy as npdata = np.random.randn(5000, 5)
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D', 'E'])
print(df.head(100))
You may have already noticed that the central part of the data frame is hidden by three dots. What if we really need to check what the top 100 rows are? For example, we want to check the output of a specific step in the middle of a large Python script, to make sure that the functions are executed as expected.
set_option()
One of the easiest solutions is to edit the default number of rows pandas display,
pd.set_option('display.max_rows', 500)
print(df.head(100))
where set_option
is a method that allows you to control the behavior of Pandas functions, including setting the maximum number of rows or columns to display, as we did above. the first argument display.max_rows
is to set the maximum number of rows to display and 500 is the value we set as the maximum number of rows.
Although this method is widely used, it is not ideal to place it inside a Python executable file, especially if you have multiple data frames to print and you want them to display different numbers of rows.
For example, I have a script structured as shown,
## Code Block 1 ##
...
print(df1.head(20))
...## Code Block 2 ##
...
print(df2.head(100))
...
## Code Block N ##
...
print(df_n)
...
we have different numbers of top rows to display throughout the entire script and sometimes we want to see the full printed dataframe, but sometimes we only care about the dimension and structure of the dataframe without needing to see all of them the data.
In such a case, we probably need to use the function pd.set_option()
to set the desired display
either pd.reset_option()
use the default options every time we print a data frame, which makes it very complicated and troublesome.
## Code Block 1 ##
...
pd.set_option('display.max_rows', 20)
print(df1.head(20))
...## Code Block 2 ##
...
pd.set_option('display.max_rows', 100)
print(df2.head(100))
...
## Code Block N ##
...
pd.reset_option('display.max_rows')
print(df_n)
...
There is actually a more flexible and effective way to display the entire data frame without specifying display options for Pandas.
Chain()
to_string()
directly transfer the pd.DataFrame
object to a string object and when we print it, it doesn’t care about the display limit of pandas
.
pd.set_option('display.max_rows', 10)
print(df.head(100).to_string())
We can see above that even though I set the maximum number of rows to display as 10, to_string()
helps us to print the entire data frame of 100 rows.
The function, to_string()
converts an entire data frame to string
format, so you can keep all the values and indices in the data frame in the print step. From set_option()
only effective on pandas objects, our impression string
you are not limited by the maximum number of rows to display set above.
So the strategy is that you don’t need to configure anything via set_option()
and you just need to use to_string()
to see the entire data frame. It will save you thinking about which option to set where in the script.
takeaway
- Wear
set_option('display.max_rows')
when you have a constant number of rows to display throughout the script. - Wear
to_string()
if you want to print the entire Pandas dataframe no matter what Pandas options have been set.
Thank you for reading! I hope you enjoy using the Pandas hack in your work!
Please subscribe to my media If you want to read more of my stories. And you can also join the Medium membership for me referral link!