Introduction
Recently I have been working on fine-tuning specific domains of various LLMs. The first, and perhaps most important, part of this task is to collect, extract, and clean textual data to feed the LLM. I realized that my code was getting confusing with a lot of repetition, because for each identified source I was writing a script from scratch that had a lot of things in common with other scripts in my codebase. I was not following the “Don't repeat yourself”Principle (DRY) at all. That's why I decided to implement the template design pattern and make my codebase more elegant and efficient.
template design pattern
I will not repeat here what a design pattern is and how we classify design patterns based on their functionalities as I have written many articles on the topic. If you are interested in reading my previous articles on this topic I will leave some references at the end.
In this article I will show you a example related to data processing. Let's say that in our project we have to deal with different types of data that we want to analyze. Some of this data is…