Working in data science and analytics for seven years, I have created and queried many tables. I often ask myself, “What does this column mean?” “Why are there two columns with the same name in table A and table B? Which one should I use? “What is the granularity of this table?” etc
If you have faced the same frustration, this article is for you!
In this article, I'll share five principles that will help you create tables that your colleagues will appreciate. Please note that this is written from the perspective of a data scientist. Therefore, it will not cover traditional database design best practices, but will instead focus on strategies for creating easy-to-use tables.
Maintaining a single source of truth for each key data point or metric is very important for reporting and analysis. There shouldn't be any repeated logic across multiple tables.
For convenience, we sometimes calculate the same metric in multiple tables. for example, the Gross Merchandise Value (GMV)
The calculation can exist in the customers table, in the monthly financial reports table, in the merchants table…