The debate between microservice and monolith continues in software, but it boils down to a gentle boil in the world of data
I don't need to spend many words to convince you that choosing tools in the data space is difficult. There are hundreds, if not thousands, of ways to skin this cat.
Something that people overlook is how architecture impacts these decisions.
About 20 years ago, applications lived on computers owned by the companies that needed them; This is called “on-premise”. Owning these computers is an architectural decision. Consequently, cloud software providers did not exist, as there was no demand for cloud software due to its inherent incompatibility with the architecture of the time.
Fast forward to 2024 and the opposite is true: most people are completely in the cloud. However, some of us still run our own servers. Others have hybrid models. This means that understanding the impact your architecture has on the solutions you choose is more important than ever, and in this article we'll dive deeper into how a microservices approach versus a monolithic approach to data architecture affects the tools you purchase.
Something that happens in the data is a restart of the microservice versus monolith debate. To understand what this is:
A monolithic application is built as a single unified unit, while a microservices architecture is a collection of smaller services that can be deployed independently.
Within data, monolithic applications were quite common 10 years ago. An example of this could be a large Airflow repository, containing your data ingestion code, data transformation code, and business automations.
Business automations could include things like updating dashboards, sending reports, or sending alerts when tasks fail.
By contrast, the cloud explosion, partly driven by venture capital funds (see This report), has given rise to data architectures that look a lot like microservices. Microservices include applications to manage batch data movement, data transformation, and storage…