Keyphrase recommendation in e-commerce advertising faces significant challenges, particularly in the trade-off between relevance and effectiveness for sellers and advertisers. The main problem lies in recommending keyphrases that are relevant to items and represent real user queries, which is crucial for targeted advertising. This problem has been addressed as an extreme multi-label classification (XMC) task, which uses search logs to map items to multiple queries. However, current XMC models have limitations in addressing the full spectrum of keyphrases. They tend to focus on tail keyphrases, which are searched less frequently, while overlooking head keyphrases that generate higher revenue due to their popularity. Furthermore, training data derived from search logs is highly skewed, with 90% of items being associated with a single query in terms of engagement. This bias introduces a bias towards popular items, neglecting the vast majority of inventory that could benefit from advertising. The challenge is further compounded by the biased presentation of items in search results, where ranking significantly influences buyer engagement, potentially distorting the relevance of less popular items for certain queries.
Previous attempts to mitigate the challenges of keyphrase recommendation have employed various methods, each with their limitations. Open vocabulary models such as GROOV, One2Seq, and One2One often suggest keyphrases outside the tag space, reducing their practical applicability. Keyphrase extraction methods such as keyBERT treat the problem as a two-step process: generation and ranking. However, this approach is limited by token adjacency and presence in the article text and does not guarantee that the suggested keyphrases align with the shopper's actual search queries. Other implemented models include fastText, a basic linear neural network using word vectors and hierarchical softmax, and Graphite, a state-of-the-art XMC model using bipartite graphs for efficient mapping. Proprietary models such as variants of Rules Engine (RE) and Similar Listing (SL), which focus on historical co-occurrences and article similarities respectively, have also been implemented. While these methods offer some improvements, they still struggle to provide comprehensive keyword recommendations, especially for new or less popular articles, and often fail to effectively balance primary and secondary keywords.
Researchers from eBay Inc. USA and Pennsylvania State University have presented GraphicExa unique graph-based approach to keyphrase recommendation, which addresses the limitations of previous methods. This innovative technique extracts permutations of tokens from item titles to suggest relevant keyphrases to sellers. The researchers highlight the inadequacy of traditional metrics such as precision and recall to assess real-world performance, proposing a more comprehensive set of metrics that assess both keyphrase relevance and potential buyer reach. GraphEx demonstrates superior performance compared to existing production models at eBay, effectively balancing the dual goals of relevance and reach. The method is designed for scalability and is capable of handling billions of items while supporting near real-time inference in resource-constrained production environments. This approach represents a significant advancement in keyphrase recommendation, offering a more nuanced and practical solution to the challenges facing e-commerce advertising.
GraphEx employs a unique approach to keyphrase recommendation by formulating it as a permutation problem that combines title strings with a set of predefined keyphrases. The method consists of two main phases: Construction and Inference.
In the construction phase, GraphEx creates a series of bipartite graphs for each leaf category within a metacategory. These graphs represent the relationship between words in keyphrases and the keyphrases themselves. The set of vertices in each graph is split into two subsets: x, which contains all unique words in the keyphrases, and Y, which contains the unique keyphrases. Edges are created between words and the keyphrases they belong to, and both words and keyphrases are represented as non-negative integers for efficient processing.
The inference phase, while not fully detailed in the provided text, likely involves using these bipartite graphs to generate keyphrase recommendations for new item titles. This approach allows GraphEx to overcome the limitations of adjacency and the presence of tokens in the item text, potentially leading to more relevant and diverse keyphrase suggestions.
GraphEx’s design enables efficient scaling for billions of items and supports near real-time inference in resource-constrained environments, addressing key challenges in large-scale e-commerce platforms..
GraphEx demonstrates superior performance compared to other models in recommending keyphrases across multiple metrics and categories. The evaluation focuses on relevance, popularity (head vs. tail), and diversity of the recommended keyphrases. In terms of relevant ratio (RP) and head ratio (HP), GraphEx shows balanced performance. While some models such as RE and RE-trank have higher RP due to their limited predictions, GraphEx outperforms most models in HP, especially in larger categories. GraphEx consistently outperforms other models in relative relevant ratio (RRR) and relative head ratio (RHR), indicating its ability to recommend more relevant and popular keyphrases.
GraphEx excels at recommending diverse top keyword phrases, outperforming other models by factors ranging from 1.11x to 23.9x across different categories. This diversity is crucial to increasing engagement with potential buyers. GraphEx’s execution performance shows impressive results. It achieves up to 17x speedup compared to fastText and 13x compared to Graphite in the largest category (CAT_1) for inference latency. GraphEx also requires the least storage space for its models, even after building graphs for multiple leaf categories. Training time for GraphEx is significantly shorter, taking less than 1 minute across all categories, compared to hours or days for other models.
GraphEx’s engineering architecture for delivering keyphrase recommendations to sellers on the eBay platform demonstrates its efficiency and scalability in real-world applications. The system is designed to handle batch and near real-time (NRT) inferences, adapting to different scenarios of item updates and additions. The batch inference process takes place in two parts: a one-stop run for all items on eBay and a daily differential update for new or revised items. This approach ensures that the system maintains up-to-date recommendations while optimizing resource usage. NRT inference, crucial for newly created or revised items, is implemented using Python code hosted on eBay’s internal ML inference service, Darwin.
GraphEx’s performance on batch inference is particularly notable. Running on eBay’s Krylov machine learning platform, it processes 200 million items in just 1.5 hours—a significant improvement over fastText and Graphite, which take 1.75 and 1.5 days, respectively. This efficiency enables daily model updates, allowing GraphEx to quickly adapt to new keywords and trends. The architecture utilizes eBay’s existing infrastructure, including Spark for data processing and a key-value store (NuKV) to provide recommendations. This integration allows GraphEx to scale effectively, handling billions of items and hundreds of billions of keywords across the eBay platform. GraphEx’s fast training time—comparable to Graphite but far superior to fastText—enables daily model updates. This frequent update cycle ensures that the system can quickly incorporate new keywords and trends, maintaining relevance in the dynamic e-commerce environment.
GraphEx represents a significant advancement in keyphrase recommendation for e-commerce advertising. This robust graph-based extraction method effectively addresses the challenges of mapping article titles to relevant keyphrases without being limited by article vocabulary or token order. It is specifically designed for the online advertising sector on e-commerce platforms.
The main advantages of GraphEx include:
1. Improved relevance: Generates more relevant keywords for the article, which improves the accuracy of recommendations.
2. Focus on Top Keyphrases: By targeting popular keywords preferred by advertisers, GraphEx helps generate more sales.
3. Scalability: Successfully implemented at eBay, it handles billions of items daily, demonstrating its ability to operate at scale.
4. Comprehensive evaluation: The researchers employed a combination of ai metrics and evaluations, recognizing the limitations of traditional metrics to accurately compare model performance.
5. Superior performance: When evaluated against existing production models at eBay, GraphEx demonstrated superior results across several metrics.
6. Efficient Cold Start Recommendations – Provides the most profitable keyword suggestions for new articles or advertisers.
7. Low latency: GraphEx achieves the lowest inference latency in eBay’s current system, enabling fast, real-time recommendations.
8. Frequent Updates: The model allows for daily updates, ensuring that it remains responsive to the rapidly changing query space in e-commerce.
Simply put, GraphEx addresses critical challenges in recommending key phrases for e-commerce advertising, offering a solution that balances relevance, popularity, and efficiency while demonstrating superior performance in a large-scale, real-world application.
Take a look at the PaperAll credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our fact sheet..
Don't forget to join our SubReddit of over 50,000 ml
FREE ai WEBINAR: 'SAM 2 for Video: How to Optimize Your Data' (Wednesday, September 25, 4:00 am – 4:45 am EST)
Asjad is a consultant intern at Marktechpost. He is pursuing Bachelors in Mechanical Engineering from Indian Institute of technology, Kharagpur. Asjad is a Machine Learning and Deep Learning enthusiast who is always researching the applications of Machine Learning in the healthcare domain.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>