At Meta, AI workloads are ubiquitous, serving as the foundation for numerous applications such as content understanding, fonts, generative AI, and ad classification. With its tight integration with Python, enthusiast-mode programming, and easy APIs, PyTorch can run these workloads. In particular, DLRMs are vital to improving user experiences across all Meta products and offerings. Hardware systems must supply more and more memory and compute as the size and complexity of these models grow, all without sacrificing efficiency.
When it comes to highly efficient processing of unique Meta recommendation workloads at scale, GPUs are not always the best choice. To address this problem, the Meta team developed a suite of application-specific integrated circuits (ASICs) called the “Meta Training and Inference Accelerator” (MTIA). Keeping in mind the needs of the next generation recommendation model, the first generation ASIC is included in PyTorch to develop a fully optimized ranking system. Keeping developers productive is an ongoing process as they maintain compatibility with PyTorch 2.0, which dramatically improves performance at the PyTorch compiler level.
In 2020, the team created the original MTIA ASIC to handle Meta’s internal processing needs. Co-engineered with silicon, PyTorch, and recommendation models, this inference accelerator is part of a complete solution. Using 7nm TSMC technology, this 800MHz accelerator can achieve 102.4 TOPS with INT8 precision and 51.2 TFLOPS with FP16 precision. The device’s TDP, or Thermal Design Power, is 25 W.
The accelerator can be divided into constituent parts, including processing elements (PEs), on-chip and off-chip memory resources, and interconnects in a grid structure. A separate control subsystem within the accelerator manages the software. The firmware coordinates the execution of jobs in the accelerator, controls the available computing and memory resources, and communicates with the host through a specific host interface. LPDDR5 is used for off-chip DRAM in the memory subsystem, allowing expansion to 128 GB. More bandwidth and much less latency is available for frequently accessed data and instructions because the 128MB of on-chip SRAM is shared among all the PEs.
The 64 PEs in the grid are arranged in an 8-by-8 matrix. Each PE’s 128KB of local SRAM allows for fast data storage and processing. A mesh network links PEs to each other and to memory banks. The grid can be used in its entirety to get work done, or it can be divided into numerous sub-grids, each of which can handle your work. Matrix multiplication, accumulation, data transport, and nonlinear function computation are just some of the important tasks optimized by the multiple fixed function units and two processor cores on each PE. RISC-V ISA based processor cores have been extensively modified to perform the required computing and control operations. The architecture was designed to take full advantage of two essential elements for effective workload management: parallelism and data reuse.
The researchers compared MTIA with an NNPI accelerator and a graphics processing unit. The results show that MTIA is based on the efficient management of small forms and batch sizes for low complexity models. MTIA actively optimizes its SW stack to achieve similar levels of performance. Meanwhile, it uses larger forms that are significantly more optimized on the GPU SW stack to run models of medium and high complexity.
To optimize the performance of Meta workloads, the team now focuses on finding a compromise between computing power, memory capacity, and interconnect bandwidth to develop a better and more efficient solution.
review the Project. Don’t forget to join our 21k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a strong interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its real life application.