The demand for optimized inference workloads has never been more critical in deep learning. Meet throw away an open source deep learning compiler developed by a dedicated team at CentML Inc. This Python-based compiler aims to streamline the compilation process, offering end-to-end support for DNN models from PyTorch and ONNX to efficient CUDA cores, focusing on NVIDIA GPUs.
Hidet emerged from the research presented in the article “Hidet: Task Mapping Programming Paradigm for Tensor Deep Learning Programs”, the compiler addresses the challenge of reducing the latency of deep learning model inferences, a vital aspect to ensure an efficient model on a variety of platforms, from cloud services to edge devices.
The development of Throw away is driven by the recognition that developing efficient tensor programs for deep learning operators is a complex task, given the complexities of modern accelerators such as NVIDIA GPUs and Google TPUs, along with the rapid expansion of operator types. While existing deep learning compilers such as Apache TVM leverage declarative programming primitives, Hidet takes a unique approach.
The compiler incorporates the scheduling process into tensor programs, introducing dedicated assignments known as task assignments. These task assignments allow developers to define computation assignment and ordering directly within tensor programs, enriching expressible optimizations by allowing fine-grained manipulations at the program declaration level. This innovative approach is known as the task mapping programming paradigm.
Additionally, Hidet introduces a post-scheduling merge optimization, automating the merging process after scheduling. This not only allows developers to focus on programming individual operators, but also significantly reduces the engineering efforts required for operator fusion. The paradigm also constructs an efficient hardware-centric programming space independent of the program input size, thus substantially reducing the tuning time.
Extensive experiments on modern convolution and transformer models show the power of Hidet, outperforming state-of-the-art DNN inference frameworks such as ONNX Runtime and the TVM compiler equipped with AutoTVM and Ansor schedulers. On average, Hidet achieves a 1.22x improvement, with a maximum performance gain of 1.48x.
In addition to its superior performance, Hidet demonstrates its efficiency by significantly reducing adjustment times. Compared to AutoTVM and Ansor, Hidet reduces tuning times by 20 times and 11 times, respectively.
As Hidet continues to evolve, it sets new standards for efficiency and performance in deep learning compilation. With its approach to task mapping and fusion optimization, Hidet has the potential to become a cornerstone in the toolset of developers looking to push the boundaries of deep learning model servicing.
Niharika is a Technical Consulting Intern at Marktechpost. She is a third-year student currently pursuing her B.tech degree at the Indian Institute of technology (IIT), Kharagpur. She is a very enthusiastic person with a keen interest in machine learning, data science and artificial intelligence and an avid reader of the latest developments in these fields.
<!– ai CONTENT END 2 –>