It would be easy to think that Apple is late to the ai game. Since late 2022, when ChatGPT took the world by storm, most of Apple's competitors have struggled to catch up. While Apple has certainly talked about ai and has even launched some products with ai in mind, it seemed to be dipping its toe in rather than diving in head-first.
But in recent months, rumors and reports have suggested that Apple has, in fact, simply been biding its time, waiting to make its decision. In recent weeks there have been reports that Apple is talking to OpenAI and Google about potentially boosting some of their ai features, and the company has also been working on its own model, called Ajax.
If you look at the ai research published by Apple, a picture begins to develop of how Apple's approach to ai could come to life. Now, obviously, making assumptions about products based on research papers is a deeply inexact science: the line from research to store shelves is windy and bumpy. But at least you can get an idea of what the company is about. thought about and how its ai features could work when Apple starts talking about them at its annual developer conference, WWDC, in June.
Smaller and more efficient models
I suspect you and I are hoping for the same thing here: better Siri. And it looks like Better Siri is coming! In much of Apple's research (and in much of the tech industry, in the world and everywhere) the assumption is that large language models will make virtual assistants better and smarter immediately. For Apple, getting to Better Siri means making those models as quickly as possible and making sure they're everywhere.
In iOS 18, Apple plans to have all of its ai functions running in an on-device model, completely offline. Bloomberg recently reported. It's difficult to build a good multipurpose model even when you have a network of data centers and thousands of cutting-edge GPUs; It's much, much harder to do it with just the guts inside your smartphone. So Apple has to get creative.
In an article called “LLM in a flash: Efficient inference of large language models with limited memory” (all of these articles have really boring titles but they are really interesting, I promise!), the researchers came up with a system to store a model's data, which is usually stored in your device's RAM, on the SSD. “We have demonstrated the ability to run LLM up to twice the size of the available DRAM (on the SSD),” the researchers wrote, “achieving a 4 to 5x speedup in inference speed compared to traditional loading methods.” on the CPU, and 20-25x on the GPU.” They found that by taking advantage of the most affordable storage available on their device, the models can run faster and more efficiently.
Apple researchers also created a system called ELBERTO that can essentially compress an LLM into a much smaller size without making it significantly worse. Its compressed version of Google's Bert model was 15 times smaller (just 1.2 megabytes) and experienced only a 4 percent reduction in quality. However, it came with some latency trade-offs.
Overall, Apple is pushing to resolve a central tension in the model world: The bigger a model gets, the better and more useful it can be, but also the harder to drive, more power-hungry and slower it can become. Like so many others, the company is trying to find the right balance between all those things while also looking for a way to have it all.
siri but oh well
A lot of what we talk about when we talk about ai products is virtual assistants: assistants who know things, who can remind us of things, who can answer questions and do things on our behalf. So it's not exactly surprising that much of Apple's ai research boils down to a single question: what if Siri was really, really, really good?
A group of Apple researchers has been working on a way to use Siri without the need to use any activation word; Instead of hearing “Hey Siri” or “Siri,” the device could simply sense whether you're talking to it. “This problem is significantly more challenging than detecting a voice trigger,” the researchers acknowledged, “as there may be no initial trigger phrase that marks the beginning of a voice command.” Perhaps that is why another group of researchers developed a system to more accurately detect trigger words. other role trained a model to better understand rare words, which are often misunderstood by attendees.
In both cases, the appeal of an LLM is that, in theory, you can process much more information much faster. In the wake word article, for example, the researchers found that when No By trying to discard all the unnecessary sounds but instead feeding them all to the model and letting it process what matters and what doesn't, the wake word worked much more reliably.
Once Siri hears you, Apple will be working hard to make sure it understands and communicates better. In an article, he developed a system called STEER (which stands for Semantic Turn Extension-Expansion Recognition, so we'll use STEER) which aims to improve your back-and-forth communication with an assistant by trying to figure out when they're asking a follow-up question and when. you are asking for a new one. In another, you use LLM to better understand “ambiguous queries” and figure out what you mean no matter how you say it. “In uncertain circumstances,” they wrote, “intelligent conversational agents may need to take the initiative to reduce their uncertainty by proactively asking good questions, thereby solving problems more effectively.” other role It's meant to help with that, too: The researchers used LLM to make assistants less verbose and more understandable when generating answers.
ai in health, image editors, in your Memojis
Whenever Apple talks publicly about ai, it tends to focus less on raw technological power and more on the day-to-day things that ai can actually do for you. So while there's a lot of attention on Siri, especially as Apple looks to compete with devices like the Humane ai Pin, Rabbit R1, and Google's continued destruction of Gemini across Android, there are plenty of other ways Apple seems to view ai as useful.
One obvious place for Apple to focus is on health: LLMs could, in theory, help you navigate through the oceans of biometric data collected by your various devices and help you make sense of it all. That's why Apple has been researching how to collect and collate all your motion data, how to use gait recognition and your headphones to identify you, and how to track and understand your heart rate data. Apple also created and released “the largest multi-device, multi-location sensor-based human activity dataset” available after collecting data from 50 participants with multiple on-body sensors.
Apple also seems to envision ai as a creative tool. For one article, researchers interviewed a group of animators, designers and engineers and constructed a system called Keyframer which “allows users to iteratively build and refine generated designs.” Instead of typing a message and getting an image, and then typing another message to get another image, you start with a message but then get a set of tools to modify and refine parts of the image to your liking. You could imagine this kind of back-and-forth artistic process appearing from the Memoji creator to some of Apple's more professional artistic tools.
In another role, Apple describes a tool called MGIE that allows you to edit an image simply by describing the edits you want to make. (“Make the sky bluer,” “make my face less weird,” “add some rocks,” that kind of thing). “Instead of a brief but ambiguous guide, MGIE derives explicit visual intent and leads to reasonable image editing. ”the researchers wrote. His initial experiments were not perfect, but they were impressive.
We could even get some ai in Apple Music – for an article called “Stereo singing voice cancellation with limited resources”, the researchers explored ways to separate vocals from instruments in songs, which could be useful if Apple wants to give people tools to, for example, remix songs in the same way that can be done on TikTok or instagram.
Over time, I bet this is the kind of thing Apple will lean on, especially on iOS. Some of this Apple will incorporate into its own applications; some will be offered to third-party developers as APIs. (The recent Journal Suggestions feature is probably a good guide to how this might work.) Apple has always touted its hardware capabilities, particularly compared to your average Android device; Combining all that power with privacy-focused ai on the device could be a big differentiator.
But if you want to see the biggest, most ambitious ai at Apple, you need to know about Ferret. Ferret is a large, multimodal language model that can receive instructions, focus on something specific you've circled or otherwise selected, and understand the world around it. It's designed for the now-normal ai use case of asking a device about the world around it, but it could also understand what's on its screen. In Ferret's paper, researchers show that it could help you navigate apps, answer questions about App Store ratings, describe what you're looking at, and more. This has really interesting implications for accessibility, but it could also completely change the way you use your phone, and one day your Vision Pro and/or smart glasses.
We're getting ahead of ourselves here, but you can imagine how this would work with some of the other things Apple is working on. A Siri that can understand what you want, combined with a device that can see and understand everything that's happening on your screen, is a phone that can literally be used alone. Apple wouldn't need deep integrations with everything; You could just run the apps and tap the right buttons automatically.
Again, this is all just research, and for everything to work well starting this spring would be a legitimately unprecedented technical achievement. (I mean, you've tried chatbots, you know they're not great.) But I'll bet you anything we'll get some big ai announcements at WWDC. Apple CEO Tim Cook even teased it in February and basically promised it on this week's earnings conference call. And two things are very clear: Apple is very involved in the ai race and it could mean a total overhaul of the iPhone. Heck, you might even start using Siri voluntarily! And that would be quite an achievement.