When working with time series data, it can be important to apply filters to remove noise. This story shows how to implement a low-pass filter in SQL/BigQuery that can be useful when improving machine learning capabilities.
Filtering time series data is one of the most useful preprocessing tools in data science. In reality, data is almost always a combination of signal and noise where the noise is not only defined by the lack of periodicity but also by not representing the information of interest. For example, imagine daily visits to a retail store. If you are interested in how seasonal changes affect visitation, you may not be interested in short-term patterns due to changes in days of the week (there may be a higher number of visits overall on Saturdays compared to Mondays , but that's not what interests you).
Time series filtering is a cleaning tool for your data.
Although this may seem like a small blip in the data, noise or irrelevant information (such as short-term visitation pattern) certainly increases the complexity of your functions and therefore affects your model. If such noise is not removed, the model complexity and training data volume must be adjusted accordingly to avoid overfitting.
This is where filtration comes to the rescue. Similar to how you would filter outliers from a training set or less important metrics from a feature set, time series filtering removes noise from a time series feature. Bottom line: Time series filtering is a cleaning tool for your data. Applying time series filtering will restrict your data to reflect only the frequencies (or temporal patterns) you are interested in and will therefore result in a cleaner signal that will improve your subsequent statistical or machine learning model (see Figure 1 for a synthetic summary). example).
A detailed walkthrough of what a filter is and how it works is beyond the scope of this story (and is a very complex topic in general). However, at a high level, filtering can be viewed as modifying an input signal by applying another signal (also called core or filter…