Recent advances in generative ai and huge multimodal, language, and vision models can be the foundation for open domain knowledge, inference, and generation capabilities, enabling open task help scenarios. The ability to produce relevant instructions and content is just the beginning of what is needed to build ai systems that work with humans in the real world. This includes mixed reality task assistants, interactive robots, smart manufacturing plants, autonomous vehicles, and many more.
ai systems must continually perceive and reason multimodally in a flow over their environment to work seamlessly with humans in the real world. This criterion extends beyond object detection and tracking. For physical teamwork to be successful, everyone involved must be aware of the potential functions of objects, their relationships to each other, spatial limitations, and how these factors change over time.
These systems must be able to reason not only about the physical world but also about humans. This reasoning should include judgments about the cognitive states and social norms of real-time collaborative behavior, in addition to lower-level judgments about body posture, voice, and actions.
Using a combination of mixed reality and artificial intelligence technologies, such as large language and vision models, Microsoft Research introduces SIGMA. This interactive program can use HoloLens 2 to guide users through procedural tasks. A large language model, such as GPT-4, or a set of manually defined stages in a task library can be used to create tasks dynamically. When a user asks SIGMA an open-ended question during the interaction, the system can use its extensive language model to provide an answer. To top it off, SIGMA can locate and highlight task-relevant objects in the user's field of view using vision models such as Detic and SEEM.
Several design options support these research objectives. An example of a system implementation is a client-server architecture. The HoloLens 2 device runs a lightweight client application that transmits multiple multimodal data streams to a more powerful desktop server. These streams include RGB (red, green, and blue) information, depth, audio, head, hand, and gaze tracking. Client applications receive data and instructions from the desktop server on how to display content on the device, which executes the basic functionality of the application. By using this design, researchers can push beyond the current computing limits of the headset and open the door to possibilities for expanding the program to additional mixed reality devices.
The open source architecture known as Platform for Situated Intelligence (psi) is the basis of SIGMA and allows the development and research of integrated multimodal ai systems. The underlying \psi framework provides a high-performance logging and streaming infrastructure, which also enables rapid prototyping. The framework's data playback infrastructure makes data-driven application-level development and tuning possible. Finally, there is a lot of support for viewing, debugging, tuning, and maintenance in Platform for Sulated Intelligence Studio.
While SIGMA's current functionality lacks sophistication, it serves as a foundation for future research into the convergence of mixed reality and artificial intelligence. Many research topics, particularly perception, can and have been explored using collected data sets. These problems range from computer vision to speech recognition.
As an example of Microsoft's continued dedication to the field, SIGMA is a research platform. It is representative of the company's efforts to investigate new artificial intelligence and mixed reality technologies. Dynamics 365 Guides is another enterprise-ready mixed reality solution that Microsoft offers to frontline employees. Frontline employees receive step-by-step procedural assistance and relevant workflow insights with Copilot in Dynamics 365 Guides, which customers are currently using in private preview. ai and mixed reality work together to make this possible. Business users can greatly benefit from Dynamics 365 Guides, a feature-rich tool designed for frontline workers running difficult operations.
By making the system publicly available, the researchers hope to alleviate the burdens on other researchers associated with the fundamental engineering tasks of creating a complete interactive application so that they can move directly toward the exciting new frontiers of their field.
Review the Details and Project. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter. Join our Telegram channel, Discord channeland LinkedIn Grabove.
If you like our work, you will love our Newsletter..
Don't forget to join our 41k+ ML SubReddit
Dhanshree Shenwai is a Computer Science Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking with a keen interest in ai applications. He is excited to explore new technologies and advancements in today's evolving world that makes life easier for everyone.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>