Image by author
In today's tech-savvy world, we are surrounded by mind-blowing wonders powered by ai: voice assistants that answer our questions, smart cameras that identify faces, and self-driving cars that navigate the roads. They are like the superheroes of our digital age! However, making these technological wonders work seamlessly on our everyday devices is harder than it seems. These ai superheroes have a special need: significant computing power and memory resources. It's like trying to fit an entire library into a tiny backpack. And guess what? Most of our usual devices, such as phones, smart watches, etc., do not have enough 'intellectual capacity' to handle these ai superheroes. This poses a major problem in the widespread deployment of ai technology.
Therefore, it is essential to improve the efficiency of these large ai models to make them accessible. This course: “TinyML and efficient deep learning computing” by the MIT HAN laboratory addresses this central obstacle. Introduces methods to optimize ai models, ensuring their viability in real-world scenarios. Let's take a detailed look at what it offers:
Course structure:
Duration: Fall 2023
Moment: Tuesday/Thursday 3:35-5:00 pm Eastern Time
Instructor: Professor Song Han
Teaching assistants: Han Cai and Ji Lin
As this is an ongoing course, you can watch the live stream at this link.
Course focus:
Theoretical foundation: It starts with fundamental concepts of deep learning and then progresses to sophisticated methods for efficient ai computing.
Practical experience: It provides hands-on experience by allowing students to implement and work with large language models like LLaMA 2 on their laptops.
1. Efficient inference
This module primarily focuses on improving the efficiency of ai inference processes. It delves into techniques such as pruning, sparsity, and quantization aimed at making inference operations faster and more resource-efficient. Key topics covered include:
- Pruning and scarcity (Part I and II): Explore methods to reduce the size of models by removing unnecessary parts without compromising performance.
- Quantization (Part I and II): Techniques for representing data and models using fewer bits, saving memory and computational resources.
- Neural architecture search (Part I and II): These lectures explore automated techniques to discover the best neural network architectures for specific tasks. They demonstrate practical uses in various areas, such as NLP, GAN, point cloud analysis, and pose estimation.
- Knowledge Distillation: This session focuses on knowledge distillation, a process in which a compact model is trained to mimic the behavior of a larger, more complex model. Its objective is to transfer knowledge from one model to another.
- MCUNet: TinyML on microcontrollers: This conference introduces MCUNet, which focuses on implementing TinyML models on microcontrollers, allowing ai to run efficiently on low-power devices. It covers the essence of TinyML, its challenges, creating compact neural networks, and its various applications.
- TinyEngine and parallel processing: This part discusses TinyEngine and explores methods for efficient implementation and parallel processing strategies such as loop optimization, multithreading, and memory layout for ai models on constrained devices.
2. Domain-specific optimization
In the Domain Specific Optimization segment, the course covers several advanced topics aimed at optimizing ai models for specific domains:
- Transformer and LLM (Part I and II): It delves into Transformer basics, design variants and covers advanced topics related to efficient inference algorithms for LLM. It also explores efficient inference systems and tuning methods for LLM.
- Vision transformer: This section introduces the basic concepts of Vision Transformer, efficient ViT strategies, and various acceleration techniques. It also explores self-supervised learning methods and multimodal large language models (LLMs) to improve ai capabilities in vision-related tasks.
- GAN, video and point cloud: This conference focuses on improving Generative Adversarial Networks (GANs) by exploring efficient GAN compression techniques (using NAS + distillation), AnyCost GAN for dynamic costs, and differentiable augmentation for data-efficient GAN training. These approaches aim to optimize models for GAN, video recognition, and point cloud analysis.
- Diffusion model: This lecture provides information on the structure, training, domain-specific optimization, and rapid sampling strategies of diffusion models.
3. Efficient training
Efficient training refers to the application of methodologies to optimize the training process of machine learning models. This chapter covers the following key areas:
- Distributed training (Part I and II): Explore strategies for distributing training across multiple devices or systems. Provides strategies to overcome bandwidth and latency bottlenecks, optimize memory consumption, and implement efficient parallelization methods to improve the efficiency of training large-scale machine learning models in distributed computing environments.
- On-device training and transfer learning: This session primarily focuses on training models directly on edge devices, handling memory constraints, and employing transfer learning methods for efficient adaptation to new domains.
- Efficient adjustment and fast engineering: This section focuses on fine-tuning large language models (LLMs) using efficient tuning techniques such as BitFit, Adapter, and Prompt-Tuning. Additionally, it highlights the concept of rapid engineering and illustrates how it can improve model performance and adaptability.
4. Advanced topics
This module covers topics in an emerging field of quantum machine learning. While detailed lectures for this segment are not yet available, topics planned for coverage include:
- Basics of quantum computing
- Quantum machine learning
- Robust Quantum ML Noise
These topics will provide a fundamental understanding of quantum principles in computing and explore how these principles are applied to improve machine learning methods while addressing the challenges posed by noise in quantum systems.
If you are interested in diving deeper into this course, check out the playlist below:
https://www.youtube.com/watch?v=videoserieshttps://www.youtube.com/watch?v=videoseries
This course has received fantastic feedback, especially from ai enthusiasts and professionals. Although the course is ongoing and scheduled to conclude in December 2023, I highly recommend joining! If you are taking this course or intend to do so, please share your experiences. Let's chat and learn together about TinyML and how to make ai smarter on small devices. Your input and ideas would be valuable!
Kanwal Mehreen is an aspiring software developer with a strong interest in data science and ai applications in medicine. Kanwal was selected as a Google Generation Scholar 2022 for the APAC region. Kanwal loves sharing technical knowledge by writing articles on trending topics and is passionate about improving the representation of women in the tech industry.