Image by author
What is natural classification and why do we need it?
When working with Python iterables, such as lists, sorting is a common operation you will perform. To sort lists you can use the list method. sort()
to sort a list instead or the sorted()
Function that returns an ordered list.
He sorted()
The function works well when you have a list of numbers or strings containing letters. But what about strings that contain alphanumeric characters, such as file names, directory names, version numbers, and more? He sorted()
The function performs lexicographic classification.
Look at this simple example:
# List of filenames
filenames = ("file10.txt", "file2.txt", "file1.txt")
sorted_filenames = sorted(filenames)
print(sorted_filenames)
You will get the following result:
Output >>> ('file1.txt', 'file10.txt', 'file2.txt')
Well, 'file10.txt' appears before 'file2.txt' in the result. It's not the intuitive sort order we expected. This is because the sorted()
The function uses the ASCII values of the characters to sort and not the numeric values. Enter natural classification.
Natural sorting is a sorting technique that arranges elements in a way that reflects their natural order, particularly for alphanumeric data. Unlike lexicographic classification, natural classification interprets the numerical value of digits within strings and organizes them accordingly, resulting in a more meaningful and expected sequence.
In this tutorial, we will explore natural classification with the Python library. natsort.
Starting
To get started, you can install the natsort
library using pip:
As a best practice, install the required package in a virtual environment for the project. Because natsort requires Python 3.7 or later, be sure to use a recent version of Python, preferably Python 3.11 or later. To learn how to manage different versions of Python, read Too Many Python Versions to Manage? Pyenv to the Rescue.
Basic examples of natural classification
We'll start with simple use cases where natural sorting is beneficial:
- File name sorting: When working with file names that contain digits, natural sorting ensures that files are arranged in the natural intuitive order.
- Version Sorting: Natural sorting is also useful for sorting strings of version numbers, ensuring that versions are sorted by their numeric values rather than their ASCII values. Which might not reflect the desired release sequence.
Now let's proceed to code these examples.
Sort file names
Now that we have installed the natsort library, we can import it into our Python script and use the different functions that the library offers.
Let's go back to the first file name sorting example (the one we saw at the beginning of the tutorial) where lexicographic sorting with the function was not what we wanted.
Now let's sort the same list using the natsorted()
work like this:
import natsort
# List of filenames
filenames = ("file10.txt", "file2.txt", "file1.txt")
# Sort filenames naturally
sorted_filenames = natsort.natsorted(filenames)
print(sorted_filenames)
In this example, natsorted()
The natsort library function is used to sort the list of file names naturally. As a result, the file names are arranged in the expected numerical order:
Output >>> ('file1.txt', 'file2.txt', 'file10.txt')
Sort version numbers
Let's take another similar example where we have strings indicating versions:
import natsort
# List of version numbers
versions = ("v-1.10", "v-1.2", "v-1.5")
# Sort versions naturally
sorted_versions = natsort.natsorted(versions)
print(sorted_versions)
Here the natsorted()
The function is applied to sort the list of version numbers naturally. The resulting ordered list maintains the correct numerical order of the versions:
Output >>> ('v-1.2', 'v-1.5', 'v-1.10')
Customize sorting with a key
When using the built-in sorted()
function, you may have used the key
Parameter to customize. Similarly, the sorted()
The function also takes the optional. key
parameter that you can use to sort based on specific criteria.
Let's take an example: we have file_data
what is the list of tuples. The first element of the tuple (at index 0) is the file name and the second element (at index 1) is the file size.
Let's say we want to sort by file size in ascending order. So we set the key
parameter a lambda x: x(1)
so that the file size at index 1 is used as the sort key:
import natsort
# List of tuples containing filename and size
file_data = (
("data_20230101_080000.csv", 100),
("data_20221231_235959.csv", 150),
("data_20230201_120000.csv", 120),
("data_20230115_093000.csv", 80)
)
# Sort file data based on file size
sorted_file_data = natsort.natsorted(file_data, key=lambda x:x(1))
# Print sorted file data
for filename, size in sorted_file_data:
print(filename, size)
Here is the result:
data_20230115_093000.csv 80
data_20230101_080000.csv 100
data_20230201_120000.csv 120
data_20221231_235959.csv 150
Case-insensitive string sorting
Another use case where natural sorting is useful is when you need case-insensitive sorting of strings. Again, lexicographic classification based on ASCII values will not give the desired results.
To perform case-insensitive sorting, we can set alg
to natsort.ns.IGNORECASE
which will ignore the case when ordering. He alg
The key controls the algorithm that natsorted()
applications:
import natsort
# List of strings with mixed case
words = ("apple", "Banana", "cat", "Dog", "Elephant")
# Sort words naturally with case-insensitivity
sorted_words = natsort.natsorted(words, alg=natsort.ns.IGNORECASE)
print(sorted_words)
Here, the list of words with mixed case is sorted naturally in a case-insensitive manner:
Output >>> ('apple', 'Banana', 'cat', 'Dog', 'Elephant')
Ending
And that's a wrap! In this tutorial, we review the limitations of lexicographic classification and how natural classification can be a good alternative when working with alphanumeric strings. You can find all the code at GitHub.
We start with simple examples and also discuss sorting based on custom keys and handling case-insensitive sorting in Python. You can then explore other capabilities of the natsort library. I'll see you all soon in another Python tutorial. Until then, keep coding!
twitter.com/balawc27″ rel=”noopener”>Bala Priya C. is a developer and technical writer from India. He enjoys working at the intersection of mathematics, programming, data science, and content creation. His areas of interest and expertise include DevOps, data science, and natural language processing. He likes to read, write, code and drink coffee! Currently, he is working to learn and share his knowledge with the developer community by creating tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>