5. Univariate sequential coding
It's time to build a sequential mechanism to track user choices over time. The mechanism I idealized works on two separate vectors (which after the process end up being one, therefore univariate), a historical vector and a caching vector.
He historical vector It is the one used to perform knn on existing clusters. Once a session has concluded, we update the history vector with the user's new choices. At the same time, we adjust the existing values with a decay function that decreases the existing weights over time. In doing so, we ensure that we stay up to date with customer trends and Give more weight to new options, rather than old ones..
Instead of updating the vector every time the user makes a choice (which is not computationally efficient, plus we run the risk of older options decaying too quickly, since each user interaction will trigger the decay mechanism), we can store a temporal vector that is only valid for the current session. Each user interaction, converted into a vector. using tag frequency as a hot weightwill be added to the existing cached vector.
Once the session is closed, we will retrieve the history vector from the database, merge it with the cached vector, and apply adjustment mechanismssuch as the decay function and pruning, as we will see later). Once the historical vector is updated, it will be stored in the database, replacing the previous one.
The two reasons for following this approach are to minimize the weight difference between older and newer interactions and to make the entire process scalable and computationally efficient.
6. Pruning mechanism
The system has been completed. However, there is an additional problem: covariate encoding has a flaw: its basis vector It scales proportionally to the number of tags encoded. For example, if our database reached 100,000 tags, the vector would have an equivalent number of dimensions.
The original covariate encoding architecture already takes this problem into account and proposes a PCA compression mechanism as a solution. However, applied to our recommender, PCA causes problems when iteratively adding vectors, resulting in information loss. Because each user choice will cause a sum of the existing vectors with a new one, this solution is not recommended.
However, if we cannot compress the vector, we can prune the dimensions with the lowest scores. The system will execute a knn based on the most relevant scores of the vector; This direct method of feature engineering will not negatively (better yet, not excessively) affect the results of the final recommendation.
By pruning our vector, we can arbitrarily set a maximum number of dimensions for our vectors. Without altering the label indices, we can begin to operate on sparse vectors, instead of dense ones, a data structure that only stores the active indices of our vectors, being able to scale indefinitely. We can compare the recommendations obtained from a full vector (dense vector) with a sparse vector (pruned vector).
As we can see, we can detect minor differences, but the overall integrity of the vector has been maintained. in exchange for scalability. A very intuitive alternative to this process is to perform clustering at the label level, keeping the size of the vector fixed. In this case, a tag will need to be semantically assigned to the nearest tag and will not occupy its dedicated dimension.
7. Exemplary estimate
Now that you have fully understood the theory behind this new approach, we can compare them more clearly. In a multivariate approach, the first step was to identify users' top preferences through clustering. As we can see, this process required us to store as many vectors as there were examples found.
However, in a univariate approach, because covariate coding works on a transposed version of the coded datacan use sections of our historical vector to store user preferences, so only a single vector is used for the entire process. Wearing the historical vector as a query to search through encoded tags: en top-k search results knn will be equivalent to the k top-k preference groups.
8. Recommendation approaches
Now that we've captured more than one preference, how do we plan to recommend items? This is the main difference between the two systems. The traditional multivariate recommender will use the example to recommend k items to a user. However, our system has assigned our client a supercluster and the top subclusters below it (depending on our tag segmentation level, we may increase the number of levels). We will not recommend top k items, but the top k subgroups.
Using groupby instead of vector search
Until now we have been using a vector to store data, but that doesn't mean we should rely on vector search to make recommendations, because it will be much slower than a SQL operation. Note that it is possible to get exactly the same results using vector search on the users array.
If you're wondering why you would switch from a vector-based system to a counting-based one, it's a legitimate question. The simple answer to is that this is the most faithful replica of the multivariate system (as shown in the reference images), but much more scalable (can reach up to 3000 recommendations/s on 16 CPU cores using pandas). Originally, the univariate recommender was designed to employ vector search, but, as shown, there are better and simpler search algorithms.
Let's run a full test that we can monitor. We can use the code from the sample notebook: For our simple example, the user selects at least one game. labeled with corresponding labels.
# if no vector exists, the first choices are the historical vector
historical_vector = user_choices(5, tag_lists=(('Shooter', 'Fantasy')), tag_frequency=tag_frequency, display_tags=False)# day1
cached_vector = user_choices(3, tag_lists=(('Puzzle-Platformer'), ('Dark Fantasy'), ('Fantasy')), tag_frequency=tag_frequency, display_tags=False)
historical_vector = update_vector(historical_vector, cached_vector, 1, 0.8)
# day2
cached_vector = user_choices(3, tag_lists=(('Puzzle'), ('Puzzle-Platformer')), tag_frequency=tag_frequency, display_tags=False)
historical_vector = update_vector(historical_vector, cached_vector, 1, 0.8)
# day3
cached_vector = user_choices(3, tag_lists=(('Adventure'), ('2D', 'Turn-Based')), tag_frequency=tag_frequency, display_tags=False)
historical_vector = update_vector(historical_vector, cached_vector, 1, 0.8)
compute_recommendation(historical_vector, label_1_max=3)
At the end of 3 sessions, these are the 3 best examples (label_1) extracted from our recommender:
In the notebook, you'll find the option to run Monte Carlo simulations, but there wouldn't be an easy way to validate them (mainly because team games aren't labeled as accurately and I noticed that most small games list too many unrelated games or not). common tags).
The architectures of the most popular recommender systems do not yet take session history into account, but with the development of new algorithms and the increase in computing power, it is now possible to address a higher level of complexity.
This new approach should offer a comprehensive alternative to sequential recommender systems available on the market, but I am convinced that there is always room for improvement. To further improve this architecture, it would be possible to change one based on clustering still network based approach.
It is important to note that this recommender system can only excel when applied to a limited number of domains, but has the potential to shine in conditions of scarce computational resources or extremely high demand.