- Introduction: What is Tablib?
- Work with data sets
- Importing data
- Export data
- dynamic columns
- Formatters
- Concluding
For many years I have been working with tools like Pandas and PySpark in Python for data ingestion, processing and export. These tools are great for complex data transformations and large data sizes (Pandas when the data fits in memory). However, I have often used these tools when the following conditions apply:
- The data size is relatively small. Think well below 100,000 rows of data.
- Performance is not a problem at all. Think about a one-time job or a job that repeats every night at midnight, but I don't care if it takes 20 seconds or 5 minutes.
- No complex transformations needed. Consider simply importing 20 JSON files with the same format, stacking them on top of each other, and then exporting them as a CSV file.