Strategic Linear Contextual Bandits - Apple Machine Learning Research

Motivated by the phenomenon of strategic agents gaming a recommender system to maximize the number of times users are recommended, we study a strategic variant of the linear contextual bandit problem, where arms strategically misinform the learner about observed contexts. privately. % under manipulation of the strategic context. We treat the algorithm design problem as one of \emph{mechanism design} under uncertainty and propose the Optimistic Gloom Triggering Mechanism (OptGTM) that minimizes regret while incentivizing agents to be approximately truthful. We show that OptGTM achieves sublinear regret even though agents have no constraints on their ability to fool the learning algorithm by misreporting contexts. We then also show that not taking into account the strategic nature of agents results in linear regret. However, striking a balance between incentive compatibility and regret minimization has been shown to be inevitable. More generally, this work provides insights into the intersection of online learning and mechanism design.

Strategic Linear Contextual Bandits – Apple Machine Learning Research

Technical Terrence Team

Tech Rally Drives S&P 500, Nasdaq to All-Time Highs; Powell's comments in focus By Reuters

Leave a Reply Cancel reply

Recommended.

The rise of commodities in 2023 with commodity funds

Ethereum reduces rates by 70%, Wall Street analysts explain the catalyst RBLK needs 100x

Crypto Market Surpasses $2.5 Trillion as Meme Coins Rise

Brazilian Government Prepares New Decree to Clarify Cryptocurrency Rules – Regulation Bitcoin News

Cree un proceso de procesamiento de recibos y facturas con Amazon Textract

Categories

Important Links