Why Semantic Data Layers matter to product teams
đ§ Practical ways you can use it for speeding up discovery, making analytics self-serve, building better in-product search capabilities and more.
đ The Knowledge Series breaks down emerging AI technologies with practical playbooks designed specifically for product teams. Get 100+ guides and practical tutorials covering everything from Claude Code and MCP to agentic workflows, vibe coding, and more.
Anthropicâs data team recently revealed that they built a self-service reporting system where 95% of its data requests are now automated with Claude.
The teams who worked on it say the biggest challenge when building this system was this:
In other words, the hard part wasnât writing the SQL - it was figuring out exactly what data the user actually meant when they asked a question. And semantic data layers play a huge role in this.
A semantic data layer lets you formalize core product concepts (user, workspace, subscription, session, feature, experiment, etc.) and their relationships as a knowledge graph or semantic model. This model then sits as a âsemantic layerâ between raw data and tools, so everyone queries with the same business terms and metrics.
AI and AI agent tools with natural language searches have brought semantic data back to the top of the agenda and according to recent reports1, enterprise companies like Microsoft, Databricks, SAP and other AI software providers are all fighting over control and access to this semantic layer as one of the next major battlegrounds in the AI race.
In this Knowledge Series, weâll explore what semantic data is, why it matters to product teams and how exactly you can put it to use for practical use cases like:
Standardising how your company talks about important definitions
Speeding up product discovery and concept exploration
Building better in-product search features
Making data analytics truly self-serve so that you donât have to ask your data teams for custom reports (using Anthropicâs set up as inspiration)
Plus, weâll look at a real world case study and experiment with using tools like Claude Code to create visual representations of your data that you can use to present your data model back to non-technical users.
What exactly is a Semantic Data Layer?
A semantic data layer is a translation layer that sits between your raw data warehouse and whoever (or whatever) is querying it. It takes raw table columns and turns them into named, governed business concepts.
Without one, âmonthly active usersâ might mean five different things depending on which analyst wrote the SQL. With one, âmonthly active usersâ is a single defined metric that always returns the same number no matter where you query it from. Anyone who has been tasked with producing a standardized report at any company can attest to just how difficult this is in companies with no pre-existing definitions set up.
A semantic data layer could store things like:
âActivationâ - the specific milestone that marks a user as activated, per your productâs definition
âFeature adoptionâ - which specific events count, whether a single use counts or a threshold
âCustomer PIIâ - identified semantically, so that everyone has a shared understanding of personally identifiable information that can be redacted for privacy reasons
âNew userâ - first 7 days, first 30, or first session only
(You can read more about SQL in previous Knowledge Series editions if youâre interested in that).
Why it matters now specifically
Semantic layers are having a bit of a renaissance in 2026 because precise definitions of data are essential for AI Agents and LLMs. Once definitions are set up, it allows non-technical parts of a business to self-serve analytics reports on demand through conversational interfaces - and it lets AI agents perform data and reporting tasks autonomously.
Not investing in a solid foundation including a semantic data layer can make it virtually impossible to build a self-serve analytics model. In Anthropicâs case, before investing in a semantic layer and complementary agent skills, analytics questions were stuck at around 21% accuracy on their evals. Investing in the infrastructure got these numbers consistently above 95% in aggregate and regularly around 99% in certain domains, meaning that their data science team can focus on more strategic work.



