Apple Intelligence

Why Siri sucks, the recent wave of AI products miss the mark, and why I think Apple is one of the only companies positioned to emerge on top after the AI frenzy.

June 5, 2024

I have a lot of Apple products and in some way or another, I mostly love them. It’s something that started with the iPod a long time ago, and from there I rapidly got sucked into an ecosystem starting with the first-gen MacBook Pro, which landed me a novel career path in Mac OS X icon-designing. It was the beginning of a long obsession. From software to hardware, Apple products just delighted me.

There was always one exception.

Siri is terrible.

I’ve had a group chat or two where we have extended sessions of sharing some absolutely confounding Siri screw-ups.

It’s mostly funny, but at times when you think you can rely on it to do a task and it fails — say, quickly getting directions to that place with a baby in one arm and a bag of groceries in the other only to have it direct you to a town 3,090 miles away — it’s infuriating.

The thing about artificial intelligence is that it actually has to be kind of smart; if it’s not, it’s not just ‘dumb’ in the way a computer that deterministically executes calculations is dumb, but it’s actively stupid. It makes a computer, the most predictable machine ever, entirely unpredictable, and therein lies tremendous frustration. It won’t just return an error — it will return unpredictable nonsense.

It’s the exact thing that made Microsoft’s Clippit Office assistant (aka ‘Clippy’) so useless in its day — it was being so unpredictable in attempts to be helpful that it was comically stupid and unhelpful.

Siri is exactly that. When it works, it works. When it doesn’t, you have no idea what it will decide to do this time. I still don’t, really. Asking for directions to my local pub always ends up with it navigating to a town 4300 miles away by the same name half of the time.

A lot of people therefore equate that because Siri is bad (and hasn’t, to them, felt like it improved rapidly like Apple’s general products), Apple’s entirely missed the boat on AI.

That’s where I think people are wrong.

Apple and AI now

Seemingly unbeknownst to many, Apple packs a lot of AI in everyday interactions — though before this year, it always labeled as ‘machine learning’ in keynotes. Certainly more than most give Apple credit for.

If you use an iPhone and never touch Siri, you’re still almost always using some kind of AI based feature during your interactions with apps on the iPhone. And those features... are good.

Object detection in images means iPhone is semantically aware of what you are photographing in real time to adjust its capture settings, and determines how it processes your shots. It lets you lift entire subjects out of a photo, map your room while detecting various types of furniture and measuring highly accurate dimensions, take usable photos on moonless nights and even do high fidelity 3D scans with the camera array.

Every keystroke on the iPhone keyboard uses the highly optimized transformer LLM to get your semi-random fat-fingered presses on the glass screen corrected into usable and correct words and sentences.

Even something as simple as unlocking your iPhone is infused with AI for facial recognition.

Uniquely, these AI applications work in extremely well defined contexts where occasional unpredictable outcomes are limited by their usage patterns. It’s the context that makes them more than actually usable. They go beyond that, becoming incredibly powerful and enabling pretty magical features.

The Near Future

What makes Siri so poor in everyday utility, then, is context. It’s true that given the same prompt, modern LLMs absolutely beat it, but if I were to sketch out integration of iOS with a somewhat modern LLM, you’ll see how much more compelling it can be than even the most up to date product from OpenAI:

a day with siri pic.twitter.com/xTuiOrmrLH
— saint laurent del rey (@laurentdelrey) June 4, 2024

System-wide, suggested actions are surfaced contextually. Data that is disparate between apps can be tied together in one interface: the operating system. Finally, it is aware of context and acting accordingly. With modern AI ingesting information about a user’s context window, even Siri can seem... smart.

What’s so powerful is that this hypothetical feature would extend beyond Apple’s own apps. It doesn’t rely on the kind of deals OpenAI has to sign, or ChatGPT getting extra features. It just relies on access. How? Well...

He Who Controls the Screens

I use ChatGPT, and I’ve tried Gemini and Claude and other highly-rated AI models and products. What remains obvious to me, though, is that these products are a handy compartmentalized assistant at best, or a novelty at worst. Much like how Copilot for code finds actual utility in an IDE, AI in a website based assistant is a nice sidekick but ultimately short of the truly transformative potential AI maximalists keep hinting at.

One solution is, of course, hardware. For it to escape its containment, companies are encapsulating AI assistants into physical products. Humane’s Ai Pin and Rabbit’s R1 are toted as solutions to bringing it into a usable, somewhat novel form factor so that it can ingest information in contexts where it can both see and respond to what is relevant to you. The Ai Pin has been nearly universally panned — while I applaud its attempt at doing something truly new, it is willfully ignorant to the reality that people need and love their phones, and that a solution isn’t to shun it. The Rabbit R1 is a literal scam. Meanwhile it feels like every week a new AI wearable is announced.

Ultimately, though, these startups have a point: it is indeed hardware that is lacking, but the solution is not a standalone device. It has to come from those who own computer. Google and Apple, being the only creators of operating systems and hardware that are widely used can directly have models act on everything that is relevant to you. They can tap into the data in your apps, and see exactly what is on your screen and what you are seeing or hearing through your devices’s sensors. Information awareness of what you are doing and how you’re doing it.

To some extent, Microsoft is already making a play for it with Recall. They recognize the simple truth: if you own the screen, you win.

Not only can you act on every bit of data a user sees and ingests — and preferably, go beyond that by ingesting data from apps on the system the user might not see — you now can start modeling interactions themselves for a future true 'large action model'. There's no reason a proper intelligent system can’t learn how to do things based on the way humans perform it.

That is saying nothing of the issue of scale — Apple and Google both have the ability and expertise to develop chipsets and models in tandem for simple tasks (like say, setting a Timer while knowing some user context) with local execution up to complex high-token queries that have to be sent to powerful GPU-rich server farms.

I think, however, that Apple is the most likely to succeed at this - or rather, this is basically Apple’s game to lose.

Why Apple?

First off, there’s the reaction to Microsoft Recall: a company that is writing checks that its privacy balance can’t reasonably cover. People are, rightfully, suspect of their motivations and handling of user data:

Apple has not just heavily invested in marketing its privacy stance, but deploys almost unreasonable levels of engineering to enforce privacy on a systemic level.

Face ID data is so secure, that even when it is somehow extracted from a Secure Enclave, it is useless for reconstructing facial data because not only is it a hash of a 3D dot pattern, that 3D dot pattern is randomized per device.

Apple could, hypothetically, create a local AI panopticon that still maintains user privacy and crucially, could be trusted and accepted by users.

Second, this gets back to the question of why Siri was so bad.

Perhaps it’s not entirely an accurate criterium, but it rings true more often than not:

If Apple is early to market with a product, it tends to be less successful than when it enters a more established market late, beating its competition in execution.

Apple wasn’t first to touchscreen smartphones. It wasn’t first to MP3 players, and it won’t be first to LLM and 2020s-era powered AI features. But its culture of relentlessly polishing a product and reinventing key interfaces — from the mouse, to the click wheel and multi touch — for a superior user experience has given it an edge, and the interfaces of AI products are anything but polished and perfected at their current state.

We’re in the era of boundless experimentation and rapid iteration, and the next great user interface for unparalleled computer intelligence is simply waiting to be designed.

Newsletter

Membership

Upgrade to Membership

Receive my annual photography or design course, access to previous courses, my Lightroom presets, premium articles and more.