Siri AI at WWDC 2026: Apple's Pragmatic Pivot with Gemini, Vision LLMs, and Developer Openness

Apple's new Siri AI at WWDC 2026 adopts a more pragmatic approach by integrating a custom Gemini model, vision LLMs, and the Core AI library. Simon Willison remains cautiously optimistic, stressing he'll believe it when he sees it.

苹果 Siri Gemini 视觉语言模型设备端AI 隐私保护

KEY POINTS

Apple ditches its pure in-house approach, bringing a custom Gemini model to Siri AI and running it on both Apple silicon and Google Cloud via PCC.
The new Siri uses vision LLMs to read screen content, bypassing the need for individual app integrations and reducing adoption friction.
Core AI library, with PyTorch integration, lets developers fully harness Apple hardware for AI model inference and training.
Balancing privacy with performance, PCC extends to Google Cloud and NVIDIA GPUs, but retains strict security architecture and binary transparency.

ANALYSIS

The catalyst: from Apple Intelligence to Apple pragmatism Simon Willison has been a vocal critic of Apple's AI ambitions, especially after the 2024 WWDC promises that mostly evaporated. This year, his skepticism remains – 'I'll believe it when I see it' – but his tone is notably softer. The new Siri AI, he admits, at least looks technically feasible.

That shift points to a deeper story: Apple is moving from idealistic self-reliance and on-device purism toward pragmatic engineering. And it's all distilled into three key changes.

Breaking it down: three pivots that make Siri feel more real

Model strategy: swallowing pride and embracing Gemini Apple has long touted its own models and on-device privacy. Now Siri AI is powered by a custom Gemini model. Why Gemini? Likely because its strengths in multimodal reasoning and agentic tool-use align perfectly with what Siri needs – reading flight info on screen, booking restaurants, completing tasks across apps.

Even more telling, those Gemini models run not only on Apple silicon but also on Google Cloud's NVIDIA GPUs via Private Cloud Compute. Apple's security blog details how they worked with Google and NVIDIA to extend PCC infrastructure while maintaining Apple-level protections. It's a realistic tradeoff between privacy and capability.

Interaction revolution: vision LLMs bypass the integration bottleneck Siri's perennial frustration has been that every app needs explicit developer integration to work with it. Apple's clever fix: let Siri literally watch the screen.

Using vision language models, Siri can extract text, buttons, and images from the display, understand the context, and simulate actions. This completely sidesteps the long wait for app developers to adopt Apple Intelligence APIs. If it works well, it could reshape human-computer interaction as profoundly as the graphical interface once did.

Developer ecosystem: Core AI finally bows to PyTorch Apple has long been stubborn with its own ML frameworks like Core ML. But the new Core AI library bridges directly with PyTorch, letting developers bring their trained models to Apple hardware. That means you can fine-tune on a Mac cluster and run inferencing efficiently on an iPhone, even leveraging the unified memory for larger models.

This isn't just a tooling improvement; it's a critical U-turn in Apple's battle for AI developers.

Bigger picture: the OS-level war for AI adoption What WWDC 2026 really reveals is that AI competition is shifting from raw model capability to system integration. Whoever can seamlessly embed AI into the operating system – lowering the barriers for developers and users – will capture the next computing platform.

Vision LLMs are becoming the eyes for this kind of system-level AI, while hybrid cloud architectures address the performance-privacy tension. Apple may be late, but this time it's arriving with a pragmatic yet differentiated play.

Why it matters: what to watch now

If you're an Apple developer: get comfortable with Core AI and its PyTorch bridge; it could change how you deploy on-device AI. Also, start understanding vision LLM interaction paradigms – your app might be used in ways you never designed for.
If you're a product manager or founder: study how Apple balances privacy with cloud offloading. The PCC extension model is a blueprint for trusted cloud AI. And the combination of screen reading + task automation will spawn a wave of ‘automated assistant’ apps.
If you're a user: keep expectations in check, but don't dismiss the new Siri out of hand. An AI that can see your screen is fundamentally different from one that just listens to your voice.

Counterintuitive take: the strategic clarity behind Apple's 'capitulations' Many will see the use of Gemini and Google Cloud as Apple caving to competitors. In reality, it shows rare clarity: refusing to let 'closed' ideology cripple capabilities, or let perfect privacy become the enemy of shipping products.

Perhaps the biggest surprise is Apple allowing NVIDIA GPUs and Google Cloud inside its most sacred security perimeter – PCC – while still claiming to guarantee user privacy. The architectural details (isolated namespaces, short-lived instances, auditable binaries) are worth studying for anyone building trusted infrastructure.

Simon's 'I'll believe it when I see it' is the right attitude. But this time, Apple isn't just showing slides; it's delivering a verifiable engineering package. The real test begins now, with developer betas in the wild.

Analysis by BitByAI · Read original

Originally from Simon Willison · Analyzed by BitByAI