The race to control the AI frontier simply got another plot twist– and this time, it talks back, takes a look at you, and perhaps even listens with sensation.
OpenAI released its brand-new ” o” series of designs today, presenting GPT-4o and its light-weight cousin, GPT-4o-mini (aka o4 and o3). These brand-new designs aren’t simply tuned-up chatbots– they’re omnimodal, indicating they can comprehend and create text, image, audio, and video natively. No Frankenstein modules sewn together to phony visual literacy.
This is efficiently AI with eyes, ears, and a mouth.
One design to rule them all?
OpenAI states the “o” represents ” omni,” and the ramifications are precisely what you ‘d anticipate: a merged design that can take in a screenshot, hear your voice fracture, and spit out a mentally adjusted reply– all in actual time. It’s the very first genuine tip of a future where AI assistants aren’t simply in your phone– they are your phone
The o3 (mini) variation is constructed for speed and cost, with efficiency better to Claude Haiku or a well-oiled Mistral, however still keeping that complete multimodal superpower set. On the other hand, o4 (full-fat GPT-4o) is directly gunning for the major leagues, matching GPT-4-turbo in power however zipping through images and audio like it’s playing a casual round of charades.
And it’s not simply speed. These designs are more affordable to run, more effective to release, and might– here’s the kicker– run natively on gadgets That’s right: real-time, multimodal AI without the latency of the cloud Believe individual assistants that do not simply listen to commands, however react like buddies
Beyond chatbots: Go into the agentic age
With this release, OpenAI is preparing for the agentic layer of AI– those smarter-than-smart assistants that not just talk and compose however observe, act, and autonomously manage jobs
Want your AI to parse a Twitter thread, create a chart, draft a tweet, and reveal it on Discord with a smug meme? That’s not simply within reach. It’s virtually on your desk– using a monocle, drinking espresso, and fixing your grammar in a wonderful baritone.
The o series designs are indicated to power whatever from real-time voice bots to AR glasses, using a mean the “AI-first” hardware motion that has tech’s old guard (and brand-new) on edge. In the exact same method the iPhone redefined mobile, these designs are the start of AI’s native user interface age
OpenAI vs. the field
This isn’t taking place in a vacuum. Google’s Gemini is progressing. Anthropic’s Claude is punching above its weight. Meta has a Llama in the laboratory. However OpenAI’s o series might have done something the rest have not yet nailed: real-time, unified multimodal fluency in a single design.
This might be OpenAI’s response to the unavoidable: hardware Whether through Apple’s reported AI partnership or its own “Jony Ive stealth mode” job, OpenAI is prepping for a world where AI isn’t simply an app– it’s the OS.
Modified by Andrew Hayward
Usually Smart Newsletter
A weekly AI journey told by Gen, a generative AI design.