Oxford University Researchers Develop Deep Double Dueling Q-Learning to Translate Trading Signals into SOTA Trading Strategies

A standard method used by effective quantitative trading techniques is the generation of trading signals with a statistically significant association with future prices. The actions resulting from these signals are intended to take positions to profit from possible price fluctuations. The more crucial the performance, the higher the signal frequency and rotation of the strategy.

A limit order is the purchase or sale of a security at a fixed price or higher. Limit order books are quite popular financial market mechanisms and are widely used by exchanges around the world. The security expert keeps track of limit orders placed for securities in his database. The specialist who manages the limit order book ensures that the highest priority order is filled before other orders in the book and before other orders held or submitted by other traders at the same or worse price. The introduction of AI has had a significant impact on the trading system. Although studies have shown that LOB prices can be predicted in short time frames, it is still a challenge to develop an ideal trading strategy fast enough to turn this predictability into trading profit.

The Oxford University research team proposed Deep Dueling Double Q-Learning with APEX (Asynchronous Prioritized Experience Replay) architecture in their new paper ASYNCHRONOUS DEEP DOUBLE DUELING Q-LEARNING FOR EXECUTION OF TRADING SIGNALS IN LIMIT ORDER BOOK MARKETS. This approach translates predictive signals into optimal limit order trading strategies using deep reinforcement learning. Reinforcement learning has been used to learn a variety of tasks in limit order book market contexts, including trading, portfolio optimization, market making, and optimal trade execution.

The equipment enables the placing of limit orders at various prices in a LOB trading environment by establishing a single space of action and status. The RL broker also learns to use limit orders for individual units of stock to manage his inventory by holding a variety of long or short positions over time, as well as the timing and level placement of limit orders.

More generally, it shows a real application of RL to create limit order trading strategies, which are still typically created by hand as part of a trading system.

As a result of significant portfolio turnover, transaction costs can have a substantial and unacceptable influence on earnings, making it difficult to integrate high-frequency forecasts into lucrative, tradable strategies. The researchers propose that RL can be a valuable tool for carrying out this translation function and learning the best solutions for a particular signal and market combination. The need to manually adjust execution techniques for various markets and signals is eliminated with this type of strategy customization, which has been found to significantly improve performance. A single observation space for practical uses can be created by combining several different signals. As a result, the RL problem could immediately merge with the difficulty of incorporating multiple forecasts into a cohesive trading strategy.

review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 13k+ ML SubReddit, discord channel, Y electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.

Niharika is a technical consulting intern at Marktechpost. She is a third year student, currently pursuing her B.Tech from the Indian Institute of Technology (IIT), Kharagpur. She is a very enthusiastic individual with a strong interest in machine learning, data science, and artificial intelligence and an avid reader of the latest developments in these fields.