Recently, artificial intelligence (AI) models have shown remarkable improvement. The open source movement has made it easy for programmers to combine different open source models to create novel applications.
Stable diffusion allows automatic generation of photorealistic images and other styles from text input. Since these models are often large and computationally intensive, all the necessary computations are sent to servers (GPUs) when building web applications that use them. On top of that, most workloads need a specific GPU family on which popular deep learning frameworks can run.
The Machine Learning Compilation (MLC) team presents a project as an effort to alter the current situation and increase biodiversity in the environment. They believed there were numerous benefits to be gained by moving computing to the customer, such as lower service provider costs and better individualized experiences and security.
According to the team, ML models should be able to be ported to a location without the necessary GPU-accelerated Python frameworks. AI frameworks often rely heavily on optimized computing libraries from hardware vendors. Therefore, the backup is important to start over. To maximize returns, unique variants must be generated based on each customer’s infrastructure specifications.
The proposed stable webcast directly places the regular broadcast model in the browser and runs directly through the client’s GPU on the user’s laptop. Everything is handled locally within the browser and never touches a server. According to the team, this is the world’s first stable browser-based broadcast.
Here, machine learning build technology plays a central role (MLC). PyTorch, Hugging Face broadcasters and tokenizers, rust, wasm and WebGPU are some of the open source technologies on which the proposed solution rests. An exciting work-in-progress within Apache TVM, Apache TVM Unity is the foundation on which the main stream is built.
The team used the Runway stable diffusion v1-5 models from the Hugging Face diffuser library.
Key model components are captured on an IRModule in TVM using TorchDynamo and Torch FX. The TVM IRModule can generate executable code for each function, allowing them to be implemented in any environment that can run at least the minimal TVM runtime (javascript being one of them).
They use TensorIR and MetaSchedule to create scripts that automatically generate efficient code. These transforms are tuned locally to generate optimized GPU shaders using the device’s native GPU runtimes. They provide a repository for these tweaks, allowing future builds to occur without tweaking.
They build static memory scheduling optimizations to optimize memory reuse across multiple layers. The TVM web runtime uses Emscripten and TypeScript to facilitate the implementation of the generation module.
Also, they use the wasm port of the hugged-face oxide tokenizer library.
Except for the final step, which creates a 400-place JavaScript application to tie everything together, the entire workflow is done in Python. The introduction of new models is an exciting by-product of this type of participatory development.
The open source community is what makes all of this possible. In particular, the team builds on TVM Unity, the most recent and exciting addition to the TVM project, which provides pioneering interactive MLC development experiences in Python, allowing them to build further optimizations in Python and gradually roll out the app on the web. . TVM Unity also makes it easy to quickly compose new ecosystem solutions.
review the Tool and GitHub link. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 16k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a strong interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its real life application.