Optimizing machine learning models with dynamic shapes can be crucial to achieving better performance and flexibility. Dynamic shapes refer to the ability of a model to handle input data with different dimensions during runtime. Users use frameworks that support dynamic computing graphs, such as TensorFlow or PyTorch eager execution. These frameworks allow you to build models that can adapt to varying input sizes during runtime.
There are many challenges when optimizing machine learning models with dynamic shapes, as many traditional optimizations rely on static shape analysis. Missing information in dynamic dimensions can significantly affect the optimizations that can be made between operators and functions. Models with dynamic shapes must handle batches of different sizes. Optimizing for different batch sizes can be more challenging than optimizing for a fixed batch size, particularly in production environments.
Current machine learning (ML) compilers typically reduce programs to hardware in a traditional one-shot reduction pipeline, applying one optimization after another, typically rewriting the program in a lower-level representation. This approach often results in loss of shape and additional information between abstraction layers, making it difficult to perform incremental optimizations across boundaries.
Researchers present Chill out. It is a compiler abstraction for optimizing dynamic end-to-end machine learning workloads. It has first-class symbolic shape annotations to track dynamic shape calculations globally throughout the program. It also has a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and library calls in a single representation to enable cross-level optimizations. It is an end-to-end compilation framework for optimizing dynamic shape models.
Researchers adopt a direct deduction method that deduces the annotation of an expression based on its input components. Direct deduction is simple and local, and annotations for temporary variables can be obtained during compiler passes. Additionally, when shapes cannot be inferred automatically, direct inference can use the results of a user-inserted match to continue inferring subsequent annotations.
The researchers say that all optimizations in Relax are performed as shape-aware composable dynamic transformations. This incrementally optimizes or partially reduces parts of the calculation using different approaches. It considers analysis at other levels and incorporates additional optimizations that assume dynamic shape relationships.
Experimental results show that Relax compiles and optimizes emerging LLMs on various hardware backends, delivering competitive performance to highly optimized platform-specific solutions. Additionally, Relax supports LLM on a wide set of devices and environments, including mobile phones, embedded devices, and web browsers through WebAssembly and WebGPU.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 32k+ ML SubReddit, 41k+ Facebook community, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
we are also in Telegram and WhatsApp.
Arshad is an intern at MarktechPost. He is currently pursuing his international career. Master’s degree in Physics from the Indian Institute of technology Kharagpur. Understanding things down to the fundamental level leads to new discoveries that lead to the advancement of technology. He is passionate about understanding nature fundamentally with the help of tools such as mathematical models, machine learning models, and artificial intelligence.
<!– ai CONTENT END 2 –>