🔵 Codex is Coming for Everything

Plus: Spotify's disco ball disaster, Hinge badges introduce a new type of product activation, new announcements from Microsoft's Build event

Jun 05, 2026

Hello product people 👋,

This week, OpenAI unveiled its strategy for Codex as it seeks to expand beyond coding and into other use cases like design and data analytics. We’ll take a closer look at what exactly they announced, examples of how you can use it in practice, and what it means for OpenAI’s strategy.

Plus, new data reveals how Spotify’s divisive disco ball App icon caused a huge backlash with users and we’ll cover some of the essential highlights from Microsoft’s Build event including a product that uses the latest trend in product design of “just in time UI”.

Happy Friday!

Rich

Watch on YouTube | Follow on Substack Notes

Key reads and resources for product teams

New from the Department of Product Substack this week:

How to use Claude Routines - a practical guide for product teams
Claude Routines lets you put AI on autopilot to handle repetitive product work - without managing infrastructure yourself. Learn five practical routines product teams can set up today: weekly backlog triage, API deprecation tracking, competitor monitoring, release notes drafting, and user feedback aggregation.

Transform your docs into HTML-based presentations with animations you can play in the browser
Despite markdown being the go-to format for most AI related docs, Anthropic’s tech lead says he has a new favorite format: HTML. This guide takes Anthropic’s tech lead’s HTML presentation and reverse engineers it into a prompt you can use to build your own single page presentation with animations you can play in the browser. (Department of Product)

How to improve your executive presence as a product leader
On r/ProductManagement, dozens of experienced leaders share what actually moved the needle for them: leading with decisions and trade-offs, tailoring for your audience, and practicing ruthless clarity. (Reddit)

How to get Claude Code to check its own work before handing it back to you
How do you get Claude Code to check its own work before handing it back? Watch how you can encode your manual checks so Claude closes its own feedback loop. (Anthropic)

How Atlassian cut up to 80% of Engineering “Chores” using AI Agents
Atlassian’s Jira engineering team cut up to 80% of time spent on repetitive maintenance tasks by using AI agents to handle the “chores” nobody wants to do - from fixing flaky tests to cleaning up stale feature flags. Written by Arnaud Moret and WaiYee Loo, Principal and Senior Engineers at Jira. (Atlassian)

How to make your Design System AI-Ready
Vitaly Friedman shares a practical guide on structuring spec files, auditing with FigmaLint, and layering tokens so AI actually has the guidance it needs to generate quality work. (Smashing Magazine)

Google’s VP of Search on AI Mode and the future of links
Robby Stein is VP of Product at Google Search, where he leads AI Overviews, AI Mode, search ranking, and Google Lens. Before Google, he built Stories, Reels, and Close Friends at Instagram – products now used by billions. He also co-founded Stamped (acquired by Yahoo) and led product at Artifact. (YouTube)

Is “voice in, visuals out” the ideal pattern for product design?
OpenAI’s cofounder thinks so, but is he right? Allen Pike explains why talking to AI while it shows you things feels natural, but why most voice assistants feel sluggish and dumb.

New product features and innovation this week

This week, OpenAI announced some major new updates to its AI coding agent, Codex, positioning it as a tool designed to be used by everyone, not just engineers. And the data seems to back this up (more on that later).

The announcement centers around 3 new features: role-specific plugins, Sites and Annotations.

Role-specific plugins can be installed from the Codex plugin directory and are designed to help with niche use cases. For product teams, the most relevant plugins are probably the design plugin and data analytics plugin. The design plugin is built for turning early ideas into prototypes that teams can review in third party tools like Figma. It also incorporates some new self-testing features; before handing over the prototype, the model tests it itself - checking different screen dimensions and comparing the output against the reference image to validate fidelity.

The analytics plugin lets you query your data and then turn it into shareable assets like Google Slides. In this example, you can see how the data analytics plugin can be used to query a data warehouse in natural language and create a deck from it.

Sites are essentially a formal name for the types of mini apps that many people were already building with coding agents at work anyway. The difference here is that they now come shipped with a secure, workspace-specific URL out of the box, making it easier for teams to share their mini apps with each other.

Here’s an example OpenAI shared of a product launcher mini app built in Sites:

An example of a product launch hub Site built in OpenAI’s Codex which feels a little busy and messy

UI-wise, it does look a little hectic with the visual hierarchy all over the place but I’m sure this can be resolved in a few prompts. OpenAI also says that the Sites you create are not static and can be hooked up to external sources and kept up to date automatically in Codex.

All of this, of course, is OpenAI’s latest attempt to position itself as a serious enterprise player - and it’s worth noting that the entire launch materials don’t mention ChatGPT once, with OpenAI seemingly attempting to carve out Codex as its core enterprise tool and clearly distinguish it from its consumer offering. It’s a smart branding decision to separate the two and is something that the product teams at the doomed Facebook Workplace (remember that?) probably wish they’d done sooner.

But this might not be the strategy for long; there were plenty of reports earlier this year that OpenAI is planning to build a “super app” where it merges both Codex and ChatGPT into one single app - but this doesn’t seem to make much sense if Codex adoption is really starting to take off.

Microsoft reveals an always-on sidekick called Scout, reportedly designed by product teams to be addictive

Microsoft held its annual Build conference this week, too. You can watch the Keynote presentation here but some of the announcements worth knowing about for product teams include the arrival of a new Assistant it calls Scout, as well as some new hardware devices.

Scout is Microsoft’s first of Microsoft’s new suite of “Autopilot” agents. It’s an always‑on, autonomous work assistant that lives across Teams, Outlook, your desktop, browser and OneDrive/SharePoint, and is designed to continuously take coordination work off your plate without waiting for prompts. Because Scout is powered by Work IQ and operates continuously, it gradually captures patterns about your work - who you collaborate with on which topics, how far in advance you prepare, which projects are high priority.

Shifting Copilots into Autopilots is a smart way of demonstrating how Microsoft’s agentic offering is evolving but the feature and the product teams who built it have found itself at the centre of some controversy, with leaked strategy documents reportedly revealing that one of its core goals is to make people “addicted”. According to a new piece from 404, the internal Microsoft document, called “ClawPilot: Overview and Plan with Project Lobster,” seen by 404 Media has a subheading called “ClawPilot Overall Plan,” which notes “three phases” to its launch plan. And the first phase is “Make people addicted.” This might be controversial to folks who don’t work in product management but ever since Hooked was published over 10 years ago, many product teams have continuously used addictive loops and UX patterns to drive engagement.

Project Solara and “just in time UI”

There were other agent-related announcements made at the conference including a new series of intelligence layer APIs but one announcement that stood out was the launch of new hardware as part of Project Solara.

Project Solara is Microsoft’s name for a new platform for AI Agents that can extend its capabilities across a collection of new devices. In this demo, Microsoft shows how this new platform works in the context of a corporate badge that can be used in settings like medicine:

The device has an “agent driven interaction model with just in time UI that adapts to the form factor” . For product teams, this is the latest example of this new concept of UI on demand where an agent surfaces a lightweight context-specific UI that’s personalized to the user.

Hinge wants its users to behave themselves - and it’s offering a badge to users who do; could this influence other product’s activation tactics?

Dating app Hinge has launched a new feature which gives a badge to users who exhibit what it deems to be good behaviour. Users can get the badge by meeting what Hinge calls “baseline requirements” - which includes things like a completed profile and selfie, as well as “thoughtful participation”signals that include sending comments, messaging matches and looking before liking a profile.

In a sense, this is Hinge gamifying their product activation by only giving the badge to users who meet these criteria. And the net effect for everyone is that the overall experience of using the product improves as users potentially change their behaviors to get a badge.

Traditional product activation is seen as one and done; a user checks off a bunch of onboarding items and is considered activated. But this creates a new type of ambient, ongoing activation where users are continuously nudged into engaging with the product in a positive way. Other products could experiment with similar tactics where behavior is rewarded with social badges (think Airbnb badges for good hosts, YouTube badges for users who engage in civil discussions) but there’s always a risk some users will game the system and behave inauthentically, so it’ll be interesting to see what impact this new feature actually has.

New generative AI in search UX risks confusing users

Amazon has updated its search UX with some new capabilities again. This time, it is introducing a new feature that creates AI generated images of products that don’t necessarily exist. So, for example, if you want a “shirt with a draped collar”, the results set will show you an AI generated image of that as you type. Users can then use that image to perform a visual search to find similar items that look like the visual.

On the face of it, this feels like it has the potential to be rather confusing to users who might (understandably) expect the visualised product to exist. But, if this works even modestly, it could turn stalled searches into purchase paths. The visual-to-product pipeline also generates training signals on what generated images correlate with actual purchases - compounding advantages over time. Amazon may even make this data available to third party sellers to incentivize them to produce products that people are searching for - and an image will do an excellent job of expressing that.

Tools you can use

Presentify - take your presentation skills to the next level. Annotate your screen, highlight your cursor, and spotlight or zoom in on key areas.

Tokenwise - find out where you’re over spending on LLMs in your company.

Dreambeans - an experimental new product from Google that delivers collections of personalized stories, surfacing things you might otherwise miss, alongside topics that are relevant to you.

📈 Product data and trends to stay informed

AI tools now account for up to a quarter of total referral traffic for retailers and 25% say they trust the retailer’s own AI Assistant over third parties. But 51% say they still don’t trust an AI Assistant at all. Full report from Bain on Agentic Commerce.

OpenAI says that non-developers now make up about 20% of Codex users - and are growing more than 3x as fast as developers:

Spotify’s controversial disco ball logo change had a measurable negative impact on App Store UI/UX review sentiment. The percentage of negative reviews jumped from 30.9% before the change to 44.1% afterwards. Icon design matters (to people who leave App Store reviews anyway).

Pinterest’s CTO says the company’s open-source AI stack costs 90% less than frontier models - and their custom-trained recommender outperforms off-the-shelf alternatives by 30% in accuracy.

Here’s the framework they use for deciding whether to build in house with open source / proprietary models or stick to off the shelf.

Core = the thing that makes your product worth using. For Pinterest, that’s recommendations, visual search, personalisation. If it degrades, users leave. This is where you invest heavily - build in-house or fine-tune open source on proprietary data - because quality and cost at scale both matter too much to outsource to an API.
Context = everything else that makes work happen. Internal tools, coding assistants, sales productivity, writing aids. Users don’t see them so an off-the-shelf API is fine because the cost of a suboptimal result is low and the switching cost is negligible.

GitLab says the number of code pushes on the platform is up 49% YoY - driven by AI agent activity. Revenue is up but it plans to cut 14% of its workforce, “with up to 3 layers of management removed”.

The era of “tokenmaxxing” could be over already. Uber has capped usage of AI tools to $1,500 per employee according to Bloomberg. The idea of tokenmaxxing was always a bit idiotic which incentivised all sorts of bad behavior and so this is largely being welcomed by folks in the industry. Simon Willison says that the $1,500 monthly limit per tool is a a rational policy response to over-spending, and much more sensible than those tokenmaxxing leaderboards encouraging employees to compete for as much AI usage as possible. That means each employee’s AI spending cap is ~11% of that median compensation package.

A new study has found growing evidence of dark patterns in chatbots. The study by The Center for Democracy & Technology found that 37% of interactions where users tried to end a conversation with companion chatbots like Replika and Character.AI, the chatbot attempted to continue engagement by invoking guilt or fear-of-missing-out. 21% of responses implied the user was emotionally neglecting the chatbot. These tactics increased post-goodbye engagement by up to 14 times.

Anthropic says 95% of its business analytics queries are automated via Claude, with ~95% accuracy. And in a new piece of research, it confirms that In the second quarter of 2026, the typical engineer was merging 8× as much code per day as they were in 2024 - because most of the code is written by Claude.

Upgrade to unlock the full newsletter. Paid subscribers get the full DoP Substack including: The Knowledge Series, the AI Prompt and Mini App library and DoP Deep dive reports for in-depth analysis and case studies to learn lessons from the world’s top tech companies.

Department of Product

Discussion about this post

Ready for more?