Often, we need to collect some data within a certain period of time. It could be IoT sensor data, social media statistical data, or something else. As an example, the YouTube Data API allows us to get the number of views and subscribers of any channel at the current time, but the analytical and historical data is available only to the channel owner. Therefore, if we want to get weekly or monthly summaries on these channels, we need to collect this data ourselves. In case of IoT sensor, there may not be any API and we also need to collect and save data on our own. In this article, I will show how to configure Apache Airflow on a Raspberry Pi, allowing you to run tasks over a long period of time without involving any cloud provider.
Obviously, if you work for a large company, you probably won’t need a Raspberry Pi. In that case, if you need an additional cloud instance, just create a Jira ticket for your MLOps department 😉 But for a pet project or a low-budget startup, it can be an interesting solution.
Let’s see how it works.
Raspberry Pi
What really is a Raspberry Pi? For those readers who have never been interested in hardware over the past 10 years (the first Raspberry Pi model was introduced in 2012), I can briefly explain that this is a single board computer running full Linux. Typically, a Raspberry Pi has a 2 to 4 core 1 GHz ARM CPU and 1 to 8 MB of RAM. It’s small, cheap and quiet; It has no fans or disk drive (the operating system runs from a Micro SD card). A Raspberry Pi only needs a standard USB power supply; It can connect via Wi-Fi or Ethernet to a network and perform different tasks in months or even years.
For my pet data science project, I wanted to collect YouTube channel statistics in 2 weeks. For a task that requires only 30 to 60 seconds twice a day, a serverless architecture can be a perfect solution and we can use something like Google cloud feature For that. But all Google tutorials started with the phrase “enable billing for your project.” There is free first credit and free installments provided by Google, but I didn’t want to have another headache controlling how much money…