Hardware-accelerated AI for Windows applications using ONNX RT

ONNX Runtime support on Qualcomm ai Stack

Qualcomm ai Stack, shown in Figure 1 below, provides the tools and runtimes to take advantage of the NPU at the edge:

ai-engine-direct-sdk?cmpid=pr-glQvK2skeT&utm_medium=pr&utm_source=Articles&utm_campaign=QDN-Content-Syndication-fy23″ rel=”noopener” target=”_blank”><img decoding="async" alt="Figure 1: Qualcomm ai Stack provides hardware and software components for ai at the edge on all Snapdragon platforms.” width=”100%” src=”https://technicalterrence.com/wp-content/uploads/2023/11/Hardware-accelerated-AI-for-Windows-applications-using-ONNX-RT.png”/><img decoding="async" src="https://technicalterrence.com/wp-content/uploads/2023/11/Hardware-accelerated-AI-for-Windows-applications-using-ONNX-RT.png" alt="Figure 1: Qualcomm ai Stack provides hardware and software components for ai at the edge on all Snapdragon platforms.” width=”100%”/>
Figure 1: Qualcomm ai Stack provides hardware and software components for ai at the edge on all Snapdragon platforms.

At the top of the stack are popular ai frameworks for generating models. These models can then be run on various ai runtimes, including ONNX RT. ONNX RT includes an execution provider that uses the Qualcomm ai Engine Direct SDK Basic metal inference on several Snapdragon cores, including its Hexagon NPU. Figure 2 shows a more detailed view of the Qualcomm ai Stack components:

ai-engine-direct-sdk?cmpid=pr-glQvK2skeT&utm_medium=pr&utm_source=Articles&utm_campaign=QDN-Content-Syndication-fy23″ rel=”noopener” target=”_blank”><img decoding="async" alt="Figure 2: Overview of the Qualcomm ai Stack, including its support for the runtime framework and backend libraries.” width=”100%” src=”https://technicalterrence.com/wp-content/uploads/2023/11/1700680303_682_Hardware-accelerated-AI-for-Windows-applications-using-ONNX-RT.png”/><img decoding="async" src="https://technicalterrence.com/wp-content/uploads/2023/11/1700680303_682_Hardware-accelerated-AI-for-Windows-applications-using-ONNX-RT.png" alt="Figure 2: Overview of the Qualcomm ai Stack, including its support for the runtime framework and backend libraries.” width=”100%”/>
Figure 2: Overview of the Qualcomm ai Stack, including its support for the runtime framework and backend libraries.

Application level integration

At the application level, developers can compile their applications for the ONNX runtime built with support for the Qualcomm ai Engine Direct SDK. The ONNX RT execution provider constructs a graph from an ONNX model for execution in a supported backend library.

Developers can use ONNX runtime APIs that provide a consistent interface across all runtime providers. It is also designed to support various programming languages such as Python, C/C++/C#, Java, and Node.js.

We offer two options for generating context binaries. One way is to use the Qualcomm ai Engine Direct toolchain. Alternatively, developers can generate the binary using ONNX RT EP, which in turn uses the Qualcomm ai Engine direct API. Context binaries help applications reduce networking compilation time. These are created when the application is run for the first time. On subsequent runs, the model is loaded from the cached context binary file.

Starting

When you’re ready to get started, visit the ai-engine-direct-sdk?cmpid=pr-glQvK2skeT&utm_medium=pr&utm_source=Articles&utm_campaign=QDN-Content-Syndication-fy23″ rel=”noopener” target=”_blank”>Qualcomm ai Engine Direct SDK Page where you can download the SDK and access the documentation.

Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries..