Pushing the limits of the two-tower model | by Samuel Flender | December 2023

Where the assumptions behind the architecture of the two-tower model break down and how to go beyond

(Image created by author using generative ai)

Two tower models are among the most common architectural design choices in modern recommender systems: the key idea is to have one tower that learns relevance and a second, shallow tower that learns observation biases, such as position bias.

In this post, we'll take a closer look at two assumptions behind two-tower models, in particular:

he factorization assumptionthat is, the hypothesis that we can simply multiply the probabilities calculated by the two towers (or add their logits), and
he positional independence assumptionthat is, the hypothesis that the only variable that determines position bias is the position of the item itself and not the context in which it is printed.

We will see where both assumptions break down and how to go beyond these limitations with newer algorithms such as the MixEM model, the Dot Product model, and XPA.

Let's start with a brief reminder.

Two tower models: the story so far

The main learning goal of classification models in recommender systems is relevance: we want the model to predict the best possible content given the context. Here, context simply means everything we have learned about the user, for example from their previous interaction or their search histories, depending on the application.

However, classification models often suffer from certain observation biases, that is, the tendency of users to interact more or less with an impression depending on how it was presented to them. The most prominent observation bias is position bias: the tendency for users to interact more with items that are displayed first.

The key idea in two-tower models is to train two “towers”, i.e. neural networks, in parallel, the main tower for relevance learning, and…

Pushing the limits of the two-tower model | by Samuel Flender | December 2023

Technical Terrence Team

Wall Street analyzes the profitability of JD.com By Investing.com

Leave a Reply Cancel reply

Recommended.

Bitcoin Core Developer Slams Runes, Claims Protocol Exploits Bitcoin Design Flaws

XRP Surges to Strongest Point Since November – Market Updates Bitcoin News

What is PaperQA and How Does it Assist in Scientific Research?

Hacker manipulates PlayDapp minting process and PLA plummets

Netflix's The Kitchen turns London into a cyberpunk dystopia in new trailer

Categories

Important Links

Pushing the limits of the two-tower model | by Samuel Flender | December 2023

Where the assumptions behind the architecture of the two-tower model break down and how to go beyond

Two tower models: the story so far

Related

Technical Terrence Team

Wall Street analyzes the profitability of JD.com By Investing.com

Leave a Reply Cancel reply

Recommended.

Bitcoin Core Developer Slams Runes, Claims Protocol Exploits Bitcoin Design Flaws

XRP Surges to Strongest Point Since November – Market Updates Bitcoin News

What is PaperQA and How Does it Assist in Scientific Research?

Hacker manipulates PlayDapp minting process and PLA plummets

Netflix's The Kitchen turns London into a cyberpunk dystopia in new trailer

Categories

Important Links

Get daily news updates to your inbox!