Image by the author
Integrated in Python pathlib module makes working with file system paths a breeze. In How to Navigate the File System with Python's Pathlib, we covered the basics of working with path objects and navigating the file system. It's time to go further.
In this tutorial, we will go over three specific file management tasks using the capabilities of the pathlib module:
- Organize files by extension
- Searching for specific files
- Back up important files
By the end of this tutorial, you will have learned how to use pathlib for file management tasks. Let's get started!
1. Organize files by extension
When you're researching and working on a project, you'll often create ad hoc files and dump related documents into your working directory until it becomes a mess and you need to organize it.
Let's take a simple example where the project directory contains requirements.txt, configuration files, and Python scripts. We would like to organize the files into subdirectories, one for each extension. For convenience, let's choose extensions as the name of the subdirectories.
Organize files by extension | Author image
Here is a Python script that scans a directory, identifies files by their extensions, and moves them to the respective subdirectories:
# organize.py
from pathlib import Path
def organize_files_by_extension(path_to_dir):
path = Path(path_to_dir).expanduser().resolve()
print(f"Resolved path: {path}")
if path.exists() and path.is_dir():
print(f"The directory {path} exists. Proceeding with file organization...")
for item in path.iterdir():
print(f"Found item: {item}")
if item.is_file():
extension = item.suffix.lower()
target_dir = path / extension(1:) # Remove the leading dot
# Ensure the target directory exists
target_dir.mkdir(exist_ok=True)
new_path = target_dir / item.name
# Move the file
item.rename(new_path)
# Check if the file has been moved
if new_path.exists():
print(f"Successfully moved {item} to {new_path}")
else:
print(f"Failed to move {item} to {new_path}")
else:
print(f"Error: {path} does not exist or is not a directory.")
organize_files_by_extension('new_project')
He organize_files_by_extension()
The function takes a directory path as input, resolves it to an absolute path, and organizes the files within that directory by their file extensions. It first makes sure that the specified path exists and is a directory.
It then loops through all the items in the directory. For each file, it retrieves the file extension, creates a new directory with the extension name (if it doesn't already exist), and moves the file to this new directory.
After each file is moved, it confirms the success of the operation by checking for the existence of the file in the new location. If the specified path does not exist or is not a directory, it prints an error message.
Here is the result of the example function call (organizing the files in the new_project directory):
Now try this in a project directory in your working environment. I used if-else to account for errors, but you can also use try-except blocks to improve this version.
2. Search for specific files
Sometimes you may not want to organize files by their extension into different subdirectories like in the example above, but instead just want to find all files with a specific extension (like all image files) and for that you can use the globbing feature.
Let's say we want to find the requirements.txt file to see the project's dependencies. Let's use the same example, but after grouping the files into subdirectories by extension.
If you use the glob()
method on the path object as shown to search for all text files (defined by the pattern '*.txt'), you will see that it does not find the text file:
# search.py
from pathlib import Path
def search_and_process_text_files(directory):
path = Path(directory)
path = path.resolve()
for text_file in path.glob('*.txt'):
# process text files as needed
print(f'Processing {text_file}...')
print(text_file.read_text())
search_and_process_text_files('new_project')
This is because glob()
It only searches in the current directory, which does not contain the requirements.txt file. The requirements.txt file is in the txt subdirectory. Therefore, you must use recursive globbing with the rglob()
method instead.
Here is the code to find the text files and print their contents:
from pathlib import Path
def search_and_process_text_files(directory):
path = Path(directory)
path = path.resolve()
for text_file in path.rglob('*.txt'):
# process text files as needed
print(f'Processing {text_file}...')
print(text_file.read_text())
search_and_process_text_files('new_project')
He search_and_process_text_files
The function takes a directory path as input, resolves it to an absolute path, and searches all .txt
files inside that directory and its subdirectories using the rglob()
method.
For each text file found, prints the file path and then reads and prints its contents. This function is useful for recursively locating and processing all text files within a specific directory.
Since requirements.txt is the only text file in our example, we get the following output:
Output >>>
Processing /home/balapriya/new_project/txt/requirements.txt...
psycopg2==2.9.0
scikit-learn==1.5.0
Now that you know how to use globbing and recursive globbing, try redoing the first task (organizing files by extension) by using globbing to find and group the files and then move them to the destination subdirectory.
3. Back up important files
So far we have seen examples such as organizing files by extension and searching for specific files. But what about backing up certain important files? Why not?
Here we would like to copy files from the project directory to a backup directory instead of moving the file to another location. In addition to pathlib, we will also use the to silence module copy function.
Let's create a function that copies all files with a specific extension (all .py files) to a backup directory:
#back_up.py
import shutil
from pathlib import Path
def back_up_files(directory, backup_directory):
path = Path(directory)
backup_path = Path(backup_directory)
backup_path.mkdir(parents=True, exist_ok=True)
for important_file in path.rglob('*.py'):
shutil.copy(important_file, backup_path / important_file.name)
print(f'Backed up {important_file} to {backup_path}')
back_up_files('new_project', 'backup')
He back_up_files()
takes an existing directory path and a backup directory path function and backs up all Python files in a specified directory and its subdirectories to a designated backup directory.
Creates path objects for both the source directory and the backup directory, and ensures that the backup directory exists by creating it and creating any necessary parent directories if they do not already exist.
The function then iterates through all .py
files in the source directory using the rglob()
Method: For each Python file found, it copies the file to the backup directory and retains the original file name. Basically, this function helps to create a backup of all the Python files within a project directory.
After running the script and checking the result, you can always check the contents of the backup directory:
For your example directory, you can use back_up_files('/path/to/directory', '/path/to/backup/directory')
to make backup copies of files of interest.
Ending
In this tutorial, we've explored practical examples of using Python's pathlib module to organize files by extension, search for specific files, and back up important files. You can find all the code used in this tutorial here on GitHub.
As you can see, the pathlib module makes working with file paths and file management tasks easier and more efficient. Now, go ahead and apply these concepts in your own projects to better handle your file management tasks. Happy coding!
twitter.com/balawc27″ rel=”noopener”>girl priya c Bala is a technical developer and writer from India. She enjoys working at the intersection of mathematics, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, programming, and drinking coffee! Currently, she is working on learning and sharing her knowledge with the developer community by creating tutorials, how-to guides, opinion pieces, and more. Bala also creates interesting resource overviews and coding tutorials.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>