Wobby - The Semantic Layer: Your AI Agent's New Best Friend (or Why They Keep Asking About "Revenue")

Week after week, new AI agents for data emerge. The data community, always searching for the next silver bullet, has enthusiastically embraced these digital assistants. They promise to be your SQL whisperer and data-driven oracle, effortlessly extracting insights from the depths of your data warehouse. Yet, when you ask one for something as basic as "What was our total revenue last quarter?" and it confidently returns a figure that suspiciously resembles the sum of every numerical column in your fct_sales_transaction table, you're left with that familiar, unsettling realization: Oh, dear. It's hallucinating our financials.

‍

This isn’t a knock on AI agents, mind you. They’re powerful, truly. They can parse natural language, formulate complex SQL queries, and even execute them with impressive speed. The problem isn’t their intelligence; it’s their context. Or, more accurately, their profound lack thereof. An AI agent, left to its own devices in a sprawling data lake, simply lacks the inherent business understanding that a human analyst develops over years. It can see all the data points, but it has no inherent understanding of their business relevance, what the metrics really mean, or how to combine them to answer a nuanced business question without explicit, precise guidance.

‍

This is where the semantic layer enters the picture—a concept that has existed for years but has suddenly found its spotlight in the AI era. Ironically, these layers, which originated in the 90s when slow databases required pre-aggregated data for performance reasons, have unexpectedly become the ideal context engine for Large Language Models. Consider what they offer: a system specifically designed to define business metrics, relationships, and meaningful calculations—precisely what an AI needs to produce accurate queries rather than results that are technically correct but fundamentally wrong.

‍

What We Talk About When We Talk About Semantic Layers

Let me explain what a semantic layer actually is. Think of it as a translator that sits on top of your raw data. It takes all those complex database tables and turns them into everyday business terms that anyone can understand. This way, everyone in your company - from executives to new hires to AI tools - speaks the same language when talking about data.

‍

In the past, most companies defined their business metrics (like "active users" or "customer lifetime value") directly in their BI tools. But as data stacks have become more fragmented and complex, and as the demand for self-service analytics has grown, the need for a centralized, reusable semantic layer has become painfully obvious. The goal is making data understandable and trustworthy at scale.

‍

Enter the modern contenders. dbt's Semantic Layer, powered by MetricFlow, aims to push metric definitions upstream, closer to the data transformation layer. The idea is simple: define your metrics once in dbt, and then expose them consistently to various downstream tools, be it a BI dashboard, a spreadsheet, or an AI agent.

Then there's Snowflake's Semantic Views, a newer entrant that brings semantic modeling directly into the database. By storing semantic information natively within Snowflake, it offers a way to govern and expose data definitions where the data lives. This approach just makes sense in a world where data platforms are becoming ever-expanding.

And, of course, Databricks has its own semantic layer implementation as part of Unity Catalog called UC Metrics. This feature allows organizations to define metrics and business logic directly within the Databricks platform, ensuring consistency across all data applications.

At its heart, the semantic layer is more than just a technical thing; it's a philosophical one. It's a commitment to a single source of truth for your business metrics, rather than hoping every analyst remembers the right way to calculate "active users."

‍

The Quiet Problem with AI Agents: Context at Scale

We’ve talked about semantic layers, but let’s pivot to the other half of this equation: the AI agent. Forget the sci-fi fantasies of sentient robots (for now). In the context of Agentic Analysis, an AI agent is essentially a sophisticated Large Language Model (LLM) equipped with a set of tools—like the ability to query a SQL database, interact with APIs, or even run Python scripts. It operates in a loop: understanding a natural language request, breaking it down into actionable steps, selecting and using the appropriate tools, and then synthesizing the results. It continues this process, iterating and refining, until it achieves its goal or determines it cannot proceed. They're not just glorified chatbots; they're increasingly capable of doing things, not just telling you things. And what they're doing, more and more, is interacting directly with your data, often by executing SQL.

‍

For the business user, who often doesn't speak SQL as a first language, the output of an AI agent can feel like a magic trick. They can see the numbers, but they can't easily audit the underlying logic. And while you can look at the SQL queries an AI agent executes, and you can ask it to explain its logic (and it will, often quite eloquently), the deeper challenge lies not in its ability to generate SQL, but in its understanding of business context. You’re left with a gnawing uncertainty: Did it actually understand the nuances of my business question, or did it just get lucky with a plausible-sounding query, or worse, did it miss a crucial business nuance that a human analyst would instinctively grasp?

‍

Then there’s the context problem. Your raw data tables are, by design, devoid of business context. A column named amount could be sales_amount, discount_amount, refund_amount, or the_amount_of_coffee_I_drank_this_morning. A human analyst knows the difference, often intuitively, based on years of experience and tribal knowledge. An AI agent, however, sees only amount. Without a semantic layer to provide that crucial business context—to tell it that amount in sales_transactions refers to gross_sales_value and should be aggregated as a sum for revenue—it’s operating in a vacuum.

‍

Finally, there's the scale problem. AI agents are fast. They can execute hundreds of queries in a fraction of the time it would take a human. This is, ostensibly, a good thing. More queries, more insights, right? Not if those queries are based on flawed assumptions, incorrect definitions, or a fundamental misunderstanding of your business. A single bad query from a human analyst is a problem; hundreds of bad queries from an AI agent is a catastrophe. Without the guardrails and consistent definitions provided by a semantic layer, the sheer volume of AI-driven data interaction becomes a liability, not an asset. It's a problem that grows exponentially with every new agent deployed, every new query executed.

This scaling challenge presents an opportunity, however. Rather than viewing AI agents as mere consumers of data definitions, what if they could actively participate in improving them?

‍

Bottom-Up Governance: When AI Agents Become Your Data Stewards

For years, data governance has felt like a four-letter word. It conjures images of endless meetings, rigid rules, and top-down mandates handed down from on high, often with little understanding of the messy realities on the ground. The traditional approach—define everything upfront, lock it down, and pray—has, more often than not, failed to keep pace with the velocity and complexity of modern data.

‍

But what if the agents themselves, the very entities consuming and interacting with the data, could become part of the governance solution? This is the tantalizing promise of bottom-up governance in the age of AI agents. Instead of a static, centrally imposed set of rules, imagine a dynamic system where AI agents, through their interactions and occasional missteps, actively contribute to the refinement and enrichment of the semantic layer.

‍

Here's how this bottom up governance model works in practice: When an AI agent encounters something it doesn't understand in your data perhaps an ambiguous metric definition or an unfamiliar data field instead of making potentially incorrect assumptions, it actively flags this knowledge gap. The agent then initiates a structured clarification request with a human expert (data steward, analyst, or subject matter expert).

‍

What makes this approach powerful is what happens next. After receiving human clarification, the AI agent doesn't just use this knowledge for the current query and forget it. Instead, it proposes specific updates to the semantic layer itself; suggesting new metric definitions or business rules. A human verifies these proposals, and once approved, they become permanent additions to your semantic layer.

‍

This creates a flywheel effect: each interaction makes the semantic layer more comprehensive, and each improvement makes future AI interactions more accurate. It's a big shift from traditional top down governance where definitions are imposed by committee, to an organic system where the semantic layer evolves based on actual usage patterns and real business needs.

Consider two practical scenarios:

Unknown Contract Status: An AI agent encounters unfamiliar status codes while analyzing contract data (e.g., "ST," "PND," "CXL"). Rather than guessing their meanings, it asks a business user for clarification. The user explains these represent "Started," "Pending," and "Cancelled" statuses. The agent not only applies this knowledge to the current query but also saves these definitions to the semantic layer as metadata for the contract_status dimension, ensuring all future queries will correctly interpret these codes.
Query to Metric: After carefully crafting a complex SQL query to calculate "revenue per active user by customer segment," the AI agent recognizes this metric might be valuable for future analysis. It proposes saving this calculation as a formal metric in the semantic layer. Once approved by a data steward, this calculation becomes a standardized metric that any user or AI can reference without needing to rebuild the complex logic, ensuring consistency across all analyses.

The focus is continuous improvement. It's about leveraging the sheer volume of AI-driven data interaction as a force for good, turning every query into a potential opportunity to refine and strengthen your data definitions. It’s a recognition that the semantic layer isn’t a static artifact, but a living, breathing representation of your business, constantly evolving, constantly learning, much like the AI agents themselves. And in a world where data is constantly changing, a static semantic layer is, by definition, a broken one.

‍

The Catch-22 (Because There’s Always a Catch-22)

Now, before we all get carried away with visions of perfectly governed data ecosystems and AI agents that anticipate our every analytical whim, let’s pump the brakes. Because, as with all things in data, there’s a catch. Or, more accurately, a Catch-22. You need a robust semantic layer for your AI agents to be truly effective, to move beyond the realm of glorified search engines and into genuine data partners. But here’s the rub: building a good semantic layer—one that’s comprehensive, accurate, and truly reflects your business logic—requires a level of data maturity and thoughtful data modeling that many organizations are still striving for. It’s a bit of a chicken-and-egg problem, isn’t it? You need the semantic layer to make your AI smart, but you need smart data practices to build the semantic layer.

‍

It’s not impossible, but the persistent, unavoidable truth remains: the human element is paramount. AI agents can be incredible tools, amplifiers of human intelligence, but they are not a substitute for it. The “bottom-up governance” I waxed poetic about in the previous section still requires humans to be thoughtful, to be precise, and to be willing to engage with the messy details of data. An AI agent can ask for clarification, but a human still has to provide it. And that clarification needs to be accurate, consistent, and ultimately, codified into the semantic layer in a way that makes sense. If your human data stewards are still debating the definition of active_user in a Slack channel, your AI agent isn’t going to magically solve that for you.

‍

So, where does this approach work, and where does it fall short? It works best in environments where there’s already a foundational commitment to data quality and a culture of collaboration between data producers and consumers. It thrives where there’s a willingness to iterate, to learn from mistakes, and to view data governance as an ongoing process, not a one-time project. It falls short when organizations view the semantic layer as a magic bullet, a technical solution to a fundamentally organizational problem. It also struggles when the underlying data itself is so chaotic that even the most sophisticated semantic layer can’t untangle the mess. You can’t build a cathedral on quicksand, and you can’t expect a pristine semantic layer to magically fix a data swamp.

‍

What AI Agents Demand of Your Data Stack

If AI agents are to be more than just glorified SQL generators, if they are to truly become partners in data analysis, then the semantic layer isn't just a nice-to-have; it's the essential operating system for their very existence. This means new jobs for data teams: What does all this semantic layer and AI agent talk really mean for the people doing the work? For the data engineers, the analysts, the data caretakers, and the leaders making decisions? Quite a lot, as it turns out.

‍

For Data Engineers, your job is changing. Beyond data pipelines (which are still important), you'll now build and maintain the structures that give data meaning. You'll need to focus on good data modeling, clear notes, and using feedback from AI agents. Your job grows from fixing data flows to creating the language that helps AI understand your business.

‍

For Analysts, AI agents won't take your job—they'll make it better. You'll spend less time writing basic queries and more time on deeper analysis and telling data stories. You'll teach and guide these AI tools, giving them the business knowledge they need, making sure their insights are right and useful.

‍

For Data Caretakers, your know-how is key. You'll guide AI tools with your understanding of business terms, where data comes from, and data quality. You'll clear up confusion and check what the AI learns, changing from rule-enforcers to helpers who make your company's data work better.

‍

For Leaders, putting money into a semantic layer is a smart move. It builds a strong data base that uses AI while making sure everyone understands the business the same way. The payoff comes through better accuracy, faster work, and an edge over competitors in a world full of data.

‍

The Future We’re Building

So, what does this future look like? In the optimistic scenario, AI agents with robust semantic layers will understand your business like veteran analysts—but with unmatched speed and scale. They'll anticipate needs, surface insights proactively, and suggest new analytical paths. Data will flow smoothly, insights will be widely accessible, and the elusive "single source of truth" will finally be achieved.

‍

Realistically, we'll see gradual improvements: fewer data discrepancies, AI agents that hallucinate less and seek clarification more often. It's an iterative process of building trust and refining definitions—a human-AI partnership rather than a replacement.

‍

The future of agentic data analysis isn't just about waiting for bigger LLMs—it's about smarter, contextualized understanding. Context engineering will be the big thing in the near future. The semantic layer serves as the essential operating system for trustworthy AI data agents, providing the business context they need to function effectively.

‍

This isn't theoretical futurism. Wobby, as a startup in this space, is actively building AI data analysts that leverage semantic layers exactly as described. Their innovative approach—developed through extensive customer collaboration—makes them the first company to successfully implement AI agents that interact with semantic layers in this dynamic way. Their real-world experience with customers has directly informed this article and proven these concepts in practice. The question isn't whether this future will arrive—it's whether your organization will be ready when it does.

‍

The semantic layer may well be the key to unlocking data-driven decision making's full potential in the age of AI.

‍