Online commerce, video apps, and web ads see a lot of clicks as the Internet and e-economy grow. The number of click samples in a typical industry data set has reached hundreds of billions and continues to grow daily. To determine if a user would click on the suggested item, click-through rate (CTR) prediction can be used. It is important work in recommendation and advertising systems. User experience and ad revenue can be directly improved through accurate CTR prediction. The goal of the CTR challenge is to predict a user’s decision to click on the suggested item. It is important work in recommendation and advertising systems.
To keep a CTR prediction model up to date, it is essential to reduce the time required to retrain on a large data set. This is because CTR prediction is a time-sensitive activity (eg, recent topics and pastimes from new users). In addition, reducing training time also reduces training costs, resulting in a good return on investment, given a stable IT budget. In recent years, the processing power of the GPU has grown rapidly. Larger batch sizes can benefit more from the parallel processing power of GPUs as GPU memory and FLOPS increase. Figure 1(a) demonstrates that a single step forward and backward take roughly the same time when scaling eight times the batch size, indicating that GPUs with small batch sizes are extremely underutilized.
They focus on building a precision-preserving approach to increasing the batch size on a single GPU, which can be easily expanded for multi-node training, to avoid straying from system optimization and reduce communication costs. Large batch training decreases the number of steps and, as a result, greatly reduces the overall training time because the number of training epochs remains constant (Figure 1(b)). In a multi-GPU environment, where the large keying layer gradients need to be sent across multiple GPUs and computers, a large batch also benefits more due to the high communication costs involved.
Since CTR prediction is a very delicate task and cannot tolerate loss of precision, the problem of applying large batch training is loss of precision while naively increasing the batch size. In CV and NLP tasks, properly designed hyperparameter scaling rules and optimization techniques are not ideal for CTR prediction. This is because the embedded layers dominate the network-wide parameters in CTR prediction (eg, 99.9%, see Table 1), and the inputs are sparser and frequency unbalanced. In this study, they explained why previously used CTR prediction scaling rules had failed and provided a successful algorithm and scaling control for training large batches.
Conclusion: To the best of their knowledge, they are the first to observe the stability of the training CTR prediction model in very large batches. • With careful mathematical analysis, they show that the rate of learning of unusual features should not scale as the batch size increases.
• They attribute the difficulty of scaling the lot size to the disparity in the frequencies of identification. With CowClip, they can increase the lot size with a simple and effective scaling technique.
• To stabilize the CTR prediction task training process, they provide an efficient optimization strategy called Column Adaptive Clipping (CowClip). They successfully scaled four models to 128 times the batch size on two open data sets. On the Criteo dataset, they specifically train the DeepFM model with 72x speedup and 0.1% AUC increase.
All of the project’s codebase is open source on GitHub.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 13k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.