Sponsored content
By Rajan Mistry Sr. Application Engineer at Qualcomm Developer Network
Today, you can’t help but read the media headlines about ai and the increasing sophistication of generative ai models like Stable Diffusion. A great example of a generative ai use case on Windows is Microsoft 365 Copilot. This ai assistant can perform tasks like analyzing your spreadsheets, generating content, and organizing your meetings.
And while that intelligence may seem magical, its capabilities do not magically arise. They are based on powerful machine learning models that have evolved rapidly. The key enabler of these models is rich modeling frameworks that allow ML developers to experiment and collaborate.
One of these emerging machine learning frameworks is ONNX Runtime (ONNX RT). The open source framework’s underlying ONNX format allows ML developers to exchange models, while ONNX RT can run them from a variety of languages (e.g., Python, C++, C#, etc.) and hardware platforms.
Our Qualcomm ai Stack now supports ONNX RT and enables hardware-accelerated ai on Windows in Snapdragon applications. In case you haven’t heard, Windows on Snapdragon is the next generation Windows platform, built on years of evolution in mobile computing. Its key features include heterogeneous computing, up to all-day battery life, and Qualcomm Hexagon NPU.
Let’s take a closer look at how you can use Qualcomm ai Stack with ONNX RT for bare metal, hardware-accelerated ai in your Windows apps on Snapdragon.
ONNX Runtime support on Qualcomm ai Stack
Qualcomm ai Stack, shown in Figure 1 below, provides the tools and runtimes to take advantage of the NPU at the edge:
ai-engine-direct-sdk?cmpid=pr-glQvK2skeT&utm_medium=pr&utm_source=Articles&utm_campaign=QDN-Content-Syndication-fy23″ rel=”noopener” target=”_blank”><img decoding="async" alt="Figure 1: Qualcomm ai Stack provides hardware and software components for ai at the edge on all Snapdragon platforms.” width=”100%” src=”https://technicalterrence.com/wp-content/uploads/2023/11/Hardware-accelerated-AI-for-Windows-applications-using-ONNX-RT.png”/><img decoding="async" src="https://technicalterrence.com/wp-content/uploads/2023/11/Hardware-accelerated-AI-for-Windows-applications-using-ONNX-RT.png" alt="Figure 1: Qualcomm ai Stack provides hardware and software components for ai at the edge on all Snapdragon platforms.” width=”100%”/>
Figure 1: Qualcomm ai Stack provides hardware and software components for ai at the edge on all Snapdragon platforms.
At the top of the stack are popular ai frameworks for generating models. These models can then be run on various ai runtimes, including ONNX RT. ONNX RT includes an execution provider that uses the Qualcomm ai Engine Direct SDK Basic metal inference on several Snapdragon cores, including its Hexagon NPU. Figure 2 shows a more detailed view of the Qualcomm ai Stack components:
ai-engine-direct-sdk?cmpid=pr-glQvK2skeT&utm_medium=pr&utm_source=Articles&utm_campaign=QDN-Content-Syndication-fy23″ rel=”noopener” target=”_blank”><img decoding="async" alt="Figure 2: Overview of the Qualcomm ai Stack, including its support for the runtime framework and backend libraries.” width=”100%” src=”https://technicalterrence.com/wp-content/uploads/2023/11/1700680303_682_Hardware-accelerated-AI-for-Windows-applications-using-ONNX-RT.png”/><img decoding="async" src="https://technicalterrence.com/wp-content/uploads/2023/11/1700680303_682_Hardware-accelerated-AI-for-Windows-applications-using-ONNX-RT.png" alt="Figure 2: Overview of the Qualcomm ai Stack, including its support for the runtime framework and backend libraries.” width=”100%”/>
Figure 2: Overview of the Qualcomm ai Stack, including its support for the runtime framework and backend libraries.
Application level integration
At the application level, developers can compile their applications for the ONNX runtime built with support for the Qualcomm ai Engine Direct SDK. The ONNX RT execution provider constructs a graph from an ONNX model for execution in a supported backend library.
Developers can use ONNX runtime APIs that provide a consistent interface across all runtime providers. It is also designed to support various programming languages such as Python, C/C++/C#, Java, and Node.js.
We offer two options for generating context binaries. One way is to use the Qualcomm ai Engine Direct toolchain. Alternatively, developers can generate the binary using ONNX RT EP, which in turn uses the Qualcomm ai Engine direct API. Context binaries help applications reduce networking compilation time. These are created when the application is run for the first time. On subsequent runs, the model is loaded from the cached context binary file.
Starting
When you’re ready to get started, visit the ai-engine-direct-sdk?cmpid=pr-glQvK2skeT&utm_medium=pr&utm_source=Articles&utm_campaign=QDN-Content-Syndication-fy23″ rel=”noopener” target=”_blank”>Qualcomm ai Engine Direct SDK Page where you can download the SDK and access the documentation.
Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries..