By clicking on 'Accept all cookies', you agree to the storage of cookies on your device to improve navigation on the site, analyze site usage, and assist with our marketing efforts. View our Privacy Policy for more information.
Research
4
 min read

What We Learned About Prompting LLMs for Text-to-SQL

At Wobby, we’re building AI agents that turn natural language into SQL — automatically. It's one of those things that sounds magical, but takes a lot of careful engineering to get right. One of the biggest things we've learned: how you prompt the model makes a huge difference .

Recently, we studied a bunch of different prompting strategies. Zero-shot, few-shot, single-domain, cross-domain. Prompt formats, data examples, relationships between tables — we tried it all. And let me tell you, we walked away with some strong opinions on what works and what doesn’t.

This blog shares some of those learnings. Whether you're building your own text-to-SQL tool or just curious about how we do it at Wobby, read on.

Prompting 101 (Quick Refresher)

When we talk about prompting here, we mean this: what do we show the language model before we ask it to generate SQL?

A basic text-to-SQL prompt usually includes:

  • A short instruction ("Write SQL using the tables below")
  • A description of the database (table names, columns, relationships)
  • Sometimes: examples of questions + their matching SQL
  • And then: the question we want answered

There are two common setups:

We use both at Wobby, depending on what the user is doing and what data is available.

Schema Alone Isn’t Enough

In zero-shot prompts, just listing the table and columns is not enough . We found that adding table relationships (e.g. foreign keys) improves accuracy significantly.

Even better: include sample data values . This helps the model understand how column values look in practice — which is critical when it’s trying to write WHERE clauses with the right formatting (e.g. 'USA' vs 'The United States of America' ).

We tested different ways of showing table content. The winner? A format that shows distinct column values (what the paper calls SelectCol ). This gave better results than showing raw rows ( InsertRow ) or just running SELECT * LIMIT 3 .

So now, in Wobby’s zero-shot setup, we always include:

  • The full schema ( CREATE TABLE )
  • Foreign key relationships
  • A few distinct values per column

(And yeah, we normalize everything — lowercase, consistent spacing, no weird formatting.)

Few-shot: The More, The Merrier (Mostly)

When you can include examples, do it. We saw consistent gains even with 1-2 examples. And if you can go up to 4-8 good examples from the same domain? That’s often enough to get most of the benefits.

But here’s the twist:

  • Table content still matters even with examples.
  • How you show that content becomes less important — the model becomes less picky once it’s seen a few examples.
  • Relationships between tables (like joins) can be learned from the examples themselves.

In single-domain mode, you can get away with a simpler schema format, as long as you show real, useful examples — but we still recommend including table content if you can.

Sample Data Improves Accuracy

If you also show sample values from the table — like a few example rows — results get better. This helps the model understand what kind of values are stored, which matters for writing filters (e.g. WHERE clauses).

We tested a few ways to show this:

  • Raw insert statements
  • SELECT * LIMIT 3
  • Distinct values per column

The last one (distinct values per column) worked best. It exposes more variety and avoids redundancy.

But Prompting Isn’t Everything

Here’s the thing — no matter how well you prompt an LLM, expecting it to always write perfect SQL from scratch isn’t realistic.

At Wobby, we’ve learned that you need more than just prompting . You need a framework around the agent:

  • The ability to store previously generated SQL queries — and re-use them when a user asks a similar question later.
  • A way to define important terms (like what “Churn” or “Active User” actually means), and inject that definition into the prompt when it’s relevant.

This kind of infrastructure makes a huge difference in reliability and performance. Prompting is a starting point — but robust context handling and reuse is where things get scalable and smart.

And yes — there are a few other things we do at Wobby behind the scenes too. We won’t spill all the beans just yet… some stuff is patent pending. 🍝

We’re obsessed with making text to sql work better — more accurate, more reliable, more helpful for data teams. Prompting is just one piece of that puzzle, but it’s a big one.

If you're building your own LLM-powered data tools, we hope this gives you a few ideas. And if you want to see what this looks like in production, you know where to find us :)

Table of content