Guide April 18, 2026 · 17 mins · The D23 Team

Text-to-SQL with Claude: An Implementation Guide for Data Engineering Teams

Deep dive on production text-to-SQL with Claude, prompt caching, and tool use. Real code patterns for data engineering teams.

Text-to-SQL with Claude: An Implementation Guide for Data Engineering Teams

Understanding Text-to-SQL: From Natural Language to Database Queries

Text-to-SQL is the process of converting natural language questions into executable SQL queries. Instead of asking a data analyst to write a query, a user can simply ask: “What was our revenue growth month-over-month for Q3?” and the system automatically generates the correct SQL, executes it, and returns results.

This capability has become increasingly practical with large language models like Claude. As outlined in the Text-to-SQL: A Developer’s Zero-to-Hero Guide, the core challenge isn’t generating grammatically correct SQL—it’s generating correct SQL that matches your specific schema, business logic, and data semantics.

When implemented well, text-to-SQL delivers measurable outcomes: reduced query-writing time from hours to seconds, fewer data requests to analytics teams, and democratized access to data across non-technical users. For data engineering teams, it means fewer ad-hoc queries clogging your data warehouse and more time spent on strategic work.

But “well” is the operative word. A naive implementation—dumping your entire schema into a prompt and hoping Claude figures it out—will fail spectacularly at scale. This guide walks through the architectural patterns, code implementations, and operational considerations that separate production systems from proof-of-concepts.

The Core Architecture: Separating Concerns

A production text-to-SQL system has five distinct layers:

1. Schema Representation Layer Your database schema is the foundation. Claude needs to understand table names, column names, data types, relationships, and business semantics. This isn’t just metadata—it’s the contract between the user’s question and the query engine.

2. Prompt Construction Layer How you present the schema and question to Claude determines output quality. This includes context selection (which tables are relevant?), example queries (few-shot prompting), and constraint specification (which joins are valid?).

3. LLM Interaction Layer Claude accepts your prompt and returns SQL. This layer handles token efficiency, caching strategies, and fallback patterns when the model refuses or hallucinates.

4. Query Validation Layer Not every syntactically valid SQL is semantically correct. This layer checks for logical errors, missing joins, and business rule violations before execution.

5. Execution & Feedback Layer Actual query execution, error handling, and result formatting. Critically, this layer feeds errors back into the system for iterative refinement.

Most teams collapse these layers and wonder why their system breaks. Don’t.

Schema Representation: Making Your Database Intelligible to Claude

Claude’s context window is large—200K tokens with the Claude 3.5 Sonnet model—but it’s not infinite. More importantly, Claude doesn’t need your entire schema to answer a question about monthly revenue. It needs the relevant schema, presented clearly.

Start with a schema descriptor format. Here’s a practical structure:

TABLE: orders
COLUMNS:
  - order_id (INTEGER, PRIMARY KEY)
  - customer_id (INTEGER, FOREIGN KEY → customers.customer_id)
  - order_date (DATE)
  - total_amount (DECIMAL)
  - status (VARCHAR) [values: pending, completed, cancelled]
RELATIONSHIPS:
  - orders.customer_id → customers.customer_id (many-to-one)
  - orders.order_id → order_items.order_id (one-to-many)
BUSINESS_NOTES:
  - "status" is set to 'completed' only when payment is confirmed
  - Cancelled orders are soft-deleted (status='cancelled')
  - total_amount is pre-tax; use order_items for line-level detail

This format is more verbose than raw DDL, but Claude understands the semantics without guessing. The business notes are critical—they prevent the model from joining tables incorrectly or misinterpreting column meaning.

For large schemas (100+ tables), you can’t send everything. Instead, implement dynamic schema selection. When a user asks about revenue, retrieve only tables related to orders, customers, and products. Use embedding-based retrieval or simple keyword matching. As discussed in Techniques for improving text-to-SQL, schema selection is one of the highest-impact optimization techniques.

Here’s a Python snippet for basic schema selection:

import json
from anthropic import Anthropic

schema_embeddings = {}  # Pre-computed embeddings of table descriptions
all_tables = {
    "orders": "Customer orders with dates and amounts",
    "customers": "Customer information and demographics",
    "products": "Product catalog with pricing",
    "order_items": "Line items within orders",
    "marketing_campaigns": "Marketing campaign metadata",
    # ... more tables
}

def select_relevant_schema(user_question: str, top_k: int = 5) -> dict:
    """
    Use keyword matching or embeddings to select relevant tables.
    Returns a filtered schema dict.
    """
    # Simple keyword-based approach
    question_lower = user_question.lower()
    relevant_tables = {}
    
    keywords = {
        "orders": ["order", "revenue", "purchase", "sale"],
        "customers": ["customer", "user", "account", "who"],
        "products": ["product", "item", "sku", "category"],
        "order_items": ["line item", "unit price", "quantity"],
    }
    
    for table, kws in keywords.items():
        if any(kw in question_lower for kw in kws):
            relevant_tables[table] = all_tables[table]
    
    return relevant_tables if relevant_tables else all_tables

selected = select_relevant_schema("What's our total revenue by product category?")
print(selected)  # Returns only relevant tables

This reduces token usage and improves accuracy by eliminating irrelevant schema noise.

Prompt Engineering for Text-to-SQL: The Foundation of Quality

The prompt is where theory meets practice. A poorly constructed prompt will generate plausible-looking but incorrect SQL. A well-constructed prompt guides Claude toward correct queries with minimal hallucination.

As outlined in the Text-to-SQL Prompting Guide, the key elements are:

1. System Message (Instruction) Set expectations clearly. Tell Claude it’s a SQL expert, explain the database, and define output format.

You are an expert SQL analyst. You will be given a natural language question
and a database schema. Your task is to generate a single, correct SQL query.

IMPORTANT RULES:
1. Only use tables and columns provided in the schema.
2. Always use explicit JOINs; never rely on implicit joins.
3. Filter out cancelled orders unless explicitly asked.
4. Return only the SQL query, no explanation.
5. If the question is ambiguous or impossible, respond with: "UNABLE_TO_PARSE"

2. Schema Context Include the full schema for selected tables, with business notes.

3. Few-Shot Examples Provide 2-4 example question-query pairs. This dramatically improves output quality.

EXAMPLE 1:
Question: What was total revenue in January 2024?
SQL: SELECT SUM(total_amount) FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2024 AND EXTRACT(MONTH FROM order_date) = 1 AND status = 'completed';

EXAMPLE 2:
Question: Show me the top 5 customers by spending.
SQL: SELECT c.customer_id, c.name, SUM(o.total_amount) as total_spent FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.status = 'completed' GROUP BY c.customer_id, c.name ORDER BY total_spent DESC LIMIT 5;

4. The User Question Place the actual user question last, after all context.

Here’s a complete prompt template:

def build_text_to_sql_prompt(user_question: str, schema: dict, examples: list) -> str:
    system_msg = """You are an expert SQL analyst with deep knowledge of database design.
    You will generate correct, efficient SQL queries based on natural language questions.
    
    CRITICAL RULES:
    1. Only use tables and columns from the provided schema.
    2. Use explicit JOINs. Never use implicit joins or comma-separated table lists.
    3. Always filter for completed orders unless the question asks otherwise.
    4. Return ONLY the SQL query. No markdown, no explanation.
    5. If ambiguous or impossible, respond: UNABLE_TO_PARSE
    """
    
    schema_str = "DATABASE SCHEMA:\n"
    for table, details in schema.items():
        schema_str += f"\nTABLE: {table}\n"
        schema_str += f"  Columns: {', '.join(details['columns'])}\n"
        if 'business_notes' in details:
            schema_str += f"  Notes: {details['business_notes']}\n"
    
    examples_str = "\nEXAMPLES:\n"
    for i, ex in enumerate(examples, 1):
        examples_str += f"\nExample {i}:\n"
        examples_str += f"  Q: {ex['question']}\n"
        examples_str += f"  SQL: {ex['sql']}\n"
    
    user_msg = f"{schema_str}{examples_str}\nNow answer this question:\nQ: {user_question}"
    
    return system_msg, user_msg

system, user = build_text_to_sql_prompt(
    "What's our monthly revenue trend for 2024?",
    schema=selected_schema,
    examples=few_shot_examples
)

Prompt Caching: Reducing Latency and Cost

Prompt caching is a game-changer for text-to-SQL systems. Your schema and examples don’t change on every request, but you’re sending them with every prompt. Claude’s prompt caching feature allows you to cache the static parts of your prompt, reducing both latency and token costs.

With prompt caching, the schema and examples are cached on Claude’s servers. Subsequent requests in the same conversation or session only pay for the new user question, not the entire schema.

Here’s how to implement it:

from anthropic import Anthropic

client = Anthropic()

def text_to_sql_with_caching(
    user_question: str,
    schema: dict,
    examples: list
) -> str:
    """
    Generate SQL using Claude with prompt caching.
    """
    
    # Build static content (schema + examples)
    schema_text = format_schema(schema)
    examples_text = format_examples(examples)
    
    system_instructions = """You are an expert SQL analyst. Generate correct SQL queries.
    RULES:
    1. Only use provided tables and columns.
    2. Use explicit JOINs.
    3. Filter for completed orders unless asked otherwise.
    4. Return ONLY the SQL query.
    5. If ambiguous, respond: UNABLE_TO_PARSE
    """
    
    # Use cache_control to mark schema and examples as cacheable
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        system=[
            {
                "type": "text",
                "text": system_instructions,
            },
            {
                "type": "text",
                "text": f"DATABASE SCHEMA:\n{schema_text}",
                "cache_control": {"type": "ephemeral"}
            },
            {
                "type": "text",
                "text": f"EXAMPLES:\n{examples_text}",
                "cache_control": {"type": "ephemeral"}
            }
        ],
        messages=[
            {
                "role": "user",
                "content": f"Generate SQL for: {user_question}"
            }
        ]
    )    
    
    return response.content[0].text

# First call: schema is cached
sql1 = text_to_sql_with_caching(
    "What's total revenue in Q1 2024?",
    schema, examples
)

# Second call: uses cached schema, only new question is processed
sql2 = text_to_sql_with_caching(
    "Show top 10 customers by order count.",
    schema, examples  # Same schema, so cache is reused
)

With caching, the second query costs ~90% less than the first because the schema and examples are cached. For systems handling hundreds of queries daily, this translates to significant cost savings and lower latency.

Tool Use and Agentic Patterns: Beyond Single-Turn Generation

Simple text-to-SQL (question → SQL → result) works for straightforward queries. But real data questions are often iterative: “Show me revenue by region… actually, just the top 5 regions… and compare to last year.”

Claude’s tool use feature enables agentic patterns where Claude can call functions to execute queries, validate results, and refine its approach. This is more powerful than single-turn generation.

Here’s a tool-based architecture:

from anthropic import Anthropic
import json

client = Anthropic()

# Define tools Claude can use
tools = [
    {
        "name": "execute_sql",
        "description": "Execute a SQL query against the database and return results.",
        "input_schema": {
            "type": "object",
            "properties": {
                "sql": {
                    "type": "string",
                    "description": "The SQL query to execute"
                }
            },
            "required": ["sql"]
        }
    },
    {
        "name": "get_schema_info",
        "description": "Get detailed schema information for a table.",
        "input_schema": {
            "type": "object",
            "properties": {
                "table_name": {
                    "type": "string",
                    "description": "Name of the table"
                }
            },
            "required": ["table_name"]
        }
    },
    {
        "name": "validate_query",
        "description": "Check if a SQL query is valid without executing it.",
        "input_schema": {
            "type": "object",
            "properties": {
                "sql": {
                    "type": "string",
                    "description": "The SQL query to validate"
                }
            },
            "required": ["sql"]
        }
    }
]

def execute_sql(sql: str) -> dict:
    """Execute SQL and return results."""
    try:
        # Your database connection logic
        results = db.execute(sql)
        return {"status": "success", "rows": results, "count": len(results)}
    except Exception as e:
        return {"status": "error", "message": str(e)}

def get_schema_info(table_name: str) -> dict:
    """Return schema details for a table."""
    # Your schema lookup logic
    return schema_cache.get(table_name, {})

def validate_query(sql: str) -> dict:
    """Validate SQL syntax without executing."""
    try:
        # Your validation logic (e.g., using sqlparse)
        parsed = parse_sql(sql)
        return {"status": "valid", "tables": parsed.tables}
    except Exception as e:
        return {"status": "invalid", "error": str(e)}

def process_tool_call(tool_name: str, tool_input: dict) -> str:
    """Route tool calls to appropriate handlers."""
    if tool_name == "execute_sql":
        result = execute_sql(tool_input["sql"])
    elif tool_name == "get_schema_info":
        result = get_schema_info(tool_input["table_name"])
    elif tool_name == "validate_query":
        result = validate_query(tool_input["sql"])
    else:
        result = {"error": f"Unknown tool: {tool_name}"}
    
    return json.dumps(result)

def text_to_sql_with_tools(user_question: str) -> str:
    """
    Use Claude with tool use to iteratively answer questions.
    """
    messages = [
        {
            "role": "user",
            "content": f"""Answer this data question by generating and executing SQL.
            
Question: {user_question}
            
Use the tools available to:
1. First, validate that your SQL is correct.
2. Then, execute the query.
3. Interpret the results and provide a clear answer.
            """
        }
    ]
    
    # Agentic loop
    while True:
        response = client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=2048,
            tools=tools,
            messages=messages
        )
        
        # Check if Claude wants to use a tool
        if response.stop_reason == "tool_use":
            # Process tool calls
            tool_calls = [block for block in response.content if block.type == "tool_use"]
            
            # Add assistant response to messages
            messages.append({"role": "assistant", "content": response.content})
            
            # Process each tool call and add results
            tool_results = []
            for tool_call in tool_calls:
                result = process_tool_call(tool_call.name, tool_call.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": tool_call.id,
                    "content": result
                })
            
            # Add tool results to messages
            messages.append({"role": "user", "content": tool_results})
        else:
            # Claude has finished (stop_reason == "end_turn")
            # Extract final text response
            final_response = next(
                (block.text for block in response.content if hasattr(block, "text")),
                None
            )
            return final_response

# Example usage
answer = text_to_sql_with_tools(
    "What's our revenue trend by region for Q1 and Q2 2024?"
)
print(answer)

This agentic approach lets Claude:

  • Validate queries before execution
  • Inspect schema details dynamically
  • Execute, evaluate, and refine queries iteratively
  • Provide context-aware explanations

As discussed in Building Production Text-to-SQL for 70000+ Tables, this architecture scales to massive schemas because Claude can selectively inspect only the tables it needs.

Query Validation and Error Handling: The Safety Layer

Not every valid SQL query is correct for your use case. A query might be syntactically perfect but semantically wrong: missing a critical filter, joining tables incorrectly, or violating business rules.

Implement a validation layer before execution:

import sqlparse
from sqlparse.sql import Identifier, IdentifierList

class QueryValidator:
    def __init__(self, allowed_tables: set, business_rules: dict):
        self.allowed_tables = allowed_tables
        self.business_rules = business_rules
    
    def validate(self, sql: str) -> tuple[bool, str]:
        """
        Validate SQL query against safety rules.
        Returns (is_valid, error_message)
        """
        parsed = sqlparse.parse(sql)[0]
        
        # Rule 1: Only allowed tables
        tables_in_query = self._extract_tables(parsed)
        if not tables_in_query.issubset(self.allowed_tables):
            invalid = tables_in_query - self.allowed_tables
            return False, f"Unauthorized tables: {invalid}"
        
        # Rule 2: Completed orders filter
        if "orders" in tables_in_query:
            if "status = 'completed'" not in sql.lower() and "status='completed'" not in sql.lower():
                return False, "Queries on 'orders' must filter for status='completed'"
        
        # Rule 3: No DELETE or UPDATE
        for token in parsed.tokens:
            if token.ttype is None and token.value.upper() in ("DELETE", "UPDATE", "DROP"):
                return False, "Only SELECT queries are allowed"
        
        # Rule 4: Reasonable LIMIT
        if "LIMIT" not in sql.upper():
            return False, "Query must include LIMIT clause (max 10000)"
        
        return True, ""
    
    def _extract_tables(self, parsed) -> set:
        """Extract table names from parsed SQL."""
        tables = set()
        for token in parsed.tokens:
            if isinstance(token, IdentifierList):
                for identifier in token.get_identifiers():
                    tables.add(identifier.get_real_name())
            elif isinstance(token, Identifier):
                tables.add(token.get_real_name())
        return tables

# Usage
validator = QueryValidator(
    allowed_tables={"orders", "customers", "products"},
    business_rules={"orders": "must filter status='completed'"}
)

sql = "SELECT * FROM orders WHERE status='completed' LIMIT 100"
is_valid, error = validator.validate(sql)
if not is_valid:
    print(f"Validation failed: {error}")
else:
    result = execute_query(sql)

This layer prevents accidental data exposure, runaway queries, and business logic violations.

Integration with D23: Embedding Text-to-SQL in Analytics

D23 makes it straightforward to integrate text-to-SQL into your analytics stack. As a managed Apache Superset platform with AI and API-first capabilities, D23 provides native support for text-to-SQL workflows. You can embed this functionality directly into dashboards, enabling users to ask natural language questions and get instant results without leaving your BI interface.

The D23 platform handles the operational complexity: schema management, query caching, result formatting, and dashboard integration. This means your data engineering team focuses on business logic and data quality, not infrastructure.

For teams building embedded analytics or self-serve BI, integrating text-to-SQL through D23’s API-first architecture allows you to:

  • Expose text-to-SQL as an API endpoint
  • Embed query builders in your product
  • Cache results for performance
  • Audit all generated queries for compliance

This is particularly valuable for private equity firms standardizing analytics across portfolio companies or venture capital firms tracking portfolio performance, where consistency and governance are critical.

Real-World Challenges and Solutions

Challenge 1: Schema Drift Your database evolves. Columns are renamed, tables are added, relationships change. Your text-to-SQL system breaks because Claude references outdated schema.

Solution: Version your schema. Store schema snapshots with timestamps. Update the schema in your prompt cache when changes occur. Include a “last updated” timestamp in prompts so Claude knows the schema currency.

Challenge 2: Ambiguous Questions “Show me sales” could mean revenue, units sold, number of transactions, or something else entirely.

Solution: Implement clarification prompts. When Claude detects ambiguity, ask the user for clarification rather than guessing. Use tool use to present options: “Did you mean revenue (sum of order amounts) or transaction count?”

Challenge 3: Complex Business Logic Your company calculates “revenue” as (order amount - discounts + tax) only for completed orders after a certain date, excluding specific customer segments. This is hard to encode in a prompt.

Solution: Create database views that encapsulate business logic. Instead of asking Claude to join five tables and apply filters, create a revenue_v_2024 view that does this. Now Claude just references the view.

CREATE VIEW revenue_v_2024 AS
SELECT 
    o.order_id,
    o.customer_id,
    o.order_date,
    (o.total_amount - COALESCE(d.discount_amount, 0) + o.tax) as revenue
FROM orders o
LEFT JOIN discounts d ON o.order_id = d.order_id
WHERE o.status = 'completed'
    AND o.order_date >= '2024-01-01'
    AND o.customer_id NOT IN (SELECT customer_id FROM excluded_customers);

Challenge 4: Performance at Scale As your data grows, queries Claude generates might become slow. A full table scan on a 1B-row table will timeout.

Solution: Include performance guidance in your prompt. Suggest indexes, recommend partitioning strategies, and provide execution time estimates. Use EXPLAIN ANALYZE results to guide Claude toward efficient queries.

def get_query_performance_hint(table: str) -> str:
    """
    Get performance info for a table to include in prompt.
    """
    explain = db.execute(f"EXPLAIN SELECT COUNT(*) FROM {table}")
    return f"""
    TABLE: {table}
    Estimated rows: {explain['rows']}
    Recommended indexes: {get_indexes(table)}
    Partitioning: {get_partitions(table)}
    """

Measuring Success: Metrics That Matter

How do you know if your text-to-SQL system is working? Track these metrics:

1. Query Success Rate Percentage of generated queries that execute without error. Target: >95% for production systems.

2. Semantic Correctness Do generated queries return the right answer? This requires manual validation or comparison against human-written queries. Target: >90%.

3. Latency Time from user question to result. With prompt caching, should be <2 seconds for most queries.

4. Cost Per Query Token usage × model pricing. Prompt caching should reduce this to <$0.01 per query for most workloads.

5. User Adoption How many users are asking questions via text-to-SQL vs. traditional BI tools? This indicates product-market fit.

6. Query Complexity Track the distribution of query types. Are users asking simple aggregations or complex multi-table joins? This guides prompt engineering improvements.

Advanced Techniques: Fine-Tuning and Specialized Models

For mission-critical systems, consider fine-tuning. As outlined in Text-to-SQL with Open Source Models, you can fine-tune smaller, open-source models on your schema and query patterns. This gives you:

  • Lower latency (smaller models are faster)
  • Lower cost (open-source models can run on-premise)
  • Better accuracy on your specific domain

However, fine-tuning requires a dataset of question-query pairs. Start with prompt engineering and caching. Move to fine-tuning only if you have:

  • 1000 validated question-query examples

  • Specific schema patterns that generic models struggle with
  • Strict latency or cost requirements

For most teams, Claude with prompt caching and tool use is the right balance of capability, cost, and simplicity.

Governance and Compliance

Text-to-SQL systems can expose sensitive data if not properly governed. Implement:

1. Query Auditing Log all generated queries. Who asked? What table? When? This enables compliance audits and helps identify abuse.

2. Row-Level Security Ensure users only see data they’re authorized to access. Embed RLS filters into views or use database-native RLS features.

3. Query Approval Workflows For sensitive queries, require human approval before execution.

4. Data Classification Mark sensitive columns (PII, financial data) and prevent text-to-SQL from accessing them without explicit permission.

Refer to D23’s Privacy Policy and Terms of Service for how managed platforms handle governance.

Building Your First Text-to-SQL System: A Checklist

  1. Schema Preparation (Week 1)

    • Document your schema with business context
    • Create views for complex business logic
    • Identify sensitive tables and implement access controls
  2. Prompt Development (Week 2-3)

    • Build system instructions
    • Create 5-10 few-shot examples
    • Test with Claude directly (use the playground)
  3. Validation Layer (Week 3-4)

    • Implement query validator
    • Define business rules
    • Test edge cases
  4. Integration (Week 4-5)

    • Build API endpoint
    • Implement prompt caching
    • Add error handling and logging
  5. Testing and Iteration (Week 5-6)

    • Collect real user questions
    • Measure success metrics
    • Refine prompts based on failures
  6. Deployment (Week 6+)

    • Roll out to pilot users
    • Monitor performance
    • Gather feedback and iterate

Most teams underestimate week 5-6. Real-world questions are messier than you expect. Plan for iteration.

Conclusion: Text-to-SQL as Strategic Capability

Text-to-SQL isn’t just a feature—it’s a shift in how teams interact with data. Instead of waiting for analysts to write queries, users ask questions and get answers. This democratizes data access and frees your engineering team to focus on data quality and infrastructure.

Implementing it well requires careful attention to schema representation, prompt engineering, validation, and governance. But the payoff is substantial: faster insights, lower support costs, and broader data literacy across your organization.

Start with prompt engineering and caching. Add tool use for iterative refinement. Implement validation to ensure safety. Measure success against real user questions, not toy examples.

For teams using Apache Superset, D23 provides a managed platform that handles the operational complexity, letting you focus on the strategic outcomes. Whether you build in-house or use a managed service, the principles in this guide apply: clear schema representation, thoughtful prompting, robust validation, and relentless iteration.

The future of data analytics is conversational. Text-to-SQL with Claude is a practical path to get there.