Many of today's most competitive tech markets involve moving dots on a map: ride-hailing services (Uber, Lyft, Grab), micromobility services (Lime, Bird), food delivery services (Delivery Hero, Postsmates , Doordash) and more. Additionally, many services that don't put customer locations at the center of their product use cases still want to know their customers' locations so they can better personalize their experiences based on where they are and what's happening around them. around.
What all this means for data scientists is that there are many latitudes and longitudes floating around our data lakes (pun intended); And buried deep within these two variables is a wealth of information!
Creative and effective use of latitude and longitude can bring immense predictive power to our machine learning applications and greater dimensionality to our analytics efforts, helping us data scientists bring more value to our businesses and our clients.
The objective of this article is to provide a demonstration of some feature engineering techniques that use only latitude and longitude, comparing their predictive power in a home sale price prediction problem in Miami. This estructure is the next one:
- Configuration of the problem of predicting the sale price of a home in Miami
- Feature engineering experiments.
2.1. Raw latitude and longitude
2.2. Spatial density
23. Geohash destination encoding
2.4. Combination of all features - Discussion
- Conclusion
Since this post focuses on feature engineering, model evaluation will be fairly straightforward for the sake of brevity and clarity (i.e., no cross-validation or hyperparameter optimization).
Additionally, this post will use Polars as a data manipulation library, unlike Pandas; If you, dear reader, are unfamiliar with polars or are still stuck in Panda-land, feel free to check out my previous post first. “The 3 reasons why I have permanently changed from pandas to polar ones”.