Guide April 18, 2026 · 21 mins · The D23 Team

Amazon Bedrock for Text-to-SQL: A Practical Implementation Guide

Learn how to build text-to-SQL with Amazon Bedrock and Claude. Step-by-step guide for engineers implementing natural language queries.

Amazon Bedrock for Text-to-SQL: A Practical Implementation Guide

Understanding Text-to-SQL and Why It Matters

Text-to-SQL is the bridge between natural language and database queries. Instead of forcing users to learn SQL syntax, they ask questions in plain English—“How many customers signed up last month?” or “What’s our churn rate by region?”—and the system translates that into executable SQL automatically.

This capability has become central to modern analytics. Data teams spend less time writing custom queries for stakeholders. Product teams can embed AI-powered analytics directly into their applications without maintaining a separate analytics platform. And non-technical users finally get genuine self-serve access to data, not just pre-built dashboards.

Amazon Bedrock makes this practical. It’s a managed service that gives you access to foundation models—including Claude, which excels at code generation and reasoning—without running your own infrastructure. When you combine Bedrock with Claude’s strong understanding of SQL semantics and database logic, you get a system that can reliably translate natural language into accurate queries.

The challenge isn’t whether text-to-SQL is possible anymore. It’s building it reliably, handling edge cases, validating outputs before they hit your database, and integrating it into a production system where latency and cost matter. This guide walks you through exactly how to do that.

The Architecture: How Text-to-SQL Actually Works

Before you write code, you need to understand the flow. Text-to-SQL isn’t magic—it’s a sequence of well-defined steps, each one critical to accuracy and safety.

The Core Pipeline

The typical text-to-SQL pipeline has five stages:

1. Intent Recognition and Context Gathering When a user submits a natural language query, the system first needs to understand what they’re asking about. This means identifying which tables and columns are relevant, understanding temporal context (“last month” vs. “this quarter”), and recognizing any implicit business logic (churn rate is a calculated metric, not a raw column).

In a Bedrock implementation, this happens through prompt engineering and schema injection. You send Claude the user’s question along with your database schema—but you don’t send the entire schema. You send only the tables and columns relevant to answering that question. This reduces token usage, improves accuracy, and keeps the model focused.

2. SQL Generation Given the schema and the user’s question, Claude generates SQL. Because Claude understands programming languages deeply, it can reason about joins, aggregations, filtering, and window functions. It’s not pattern-matching; it’s actually understanding the structure of the query it’s building.

This is where Bedrock’s managed inference shines. You’re not managing model weights or inference servers. You call an API, pass your prompt, and get back generated SQL. The latency is measured in milliseconds, not seconds.

3. Validation and Safety Checks Generated SQL isn’t automatically safe. It might reference non-existent columns, use incorrect join logic, or attempt to scan your entire customer table when the user only asked about one region. Validation happens in two phases:

  • Syntactic validation: Does the SQL parse correctly? Can your database even execute it?
  • Semantic validation: Does it make sense? Does it match the user’s intent? Does it access only tables the user has permission to query?

Validation typically happens with a second Bedrock call. You send Claude the generated SQL along with the original question and ask it to verify the SQL matches the intent. This catches hallucinations and logical errors before they reach your database.

4. Query Execution Once validated, the SQL executes against your actual database. This is straightforward—it’s just a database connection—but it’s where you apply row-level security, query timeouts, and resource limits. You don’t want a user’s innocent question to accidentally scan 10 billion rows.

5. Result Formatting and Explanation Raw query results aren’t always useful. A user asked “How many customers churned?” and you return a table with 47 rows and 12 columns. Bedrock can help here too. You can ask Claude to summarize the results, highlight anomalies, or format them for presentation. This is optional but powerful for embedding analytics into products.

Each of these stages has failure modes. The model might misunderstand the schema. It might generate syntactically valid but semantically wrong SQL. It might hallucinate column names. A robust implementation anticipates these and handles them gracefully.

Setting Up Amazon Bedrock and Claude

Getting started with Bedrock is straightforward, but there are setup steps you can’t skip.

Prerequisites and Access

First, you need AWS account access to Bedrock. Not all AWS regions support Bedrock yet, so confirm your region supports it. As of now, Bedrock is available in us-east-1, us-west-2, eu-west-1, and a few others. Check the AWS Bedrock documentation for the current list.

You also need to request access to specific models. Bedrock uses a model access request system. You go to the AWS console, navigate to Bedrock, and request access to Claude. This typically takes a few minutes to a few hours. Once approved, you can use Claude’s API.

Choose your Claude model version carefully. Claude 3.5 Sonnet is the best balance of cost and capability for text-to-SQL work. It’s fast enough for sub-second latency and smart enough to handle complex schema reasoning. Claude 3 Opus is more capable but slower and more expensive; use it only if you’re hitting accuracy limits with Sonnet.

IAM Permissions

Your application needs IAM permissions to call Bedrock. The minimum permissions are:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel"
      ],
      "Resource": "arn:aws:bedrock:*::foundation-model/anthropic.claude-*"
    }
  ]
}

If you’re using Bedrock Agents (a higher-level abstraction for multi-step workflows), you’ll also need bedrock:InvokeAgent permissions. For basic text-to-SQL, InvokeModel is sufficient.

Cost Considerations

Bedrock pricing is per-token—input tokens and output tokens separately. Claude 3.5 Sonnet costs roughly $0.003 per 1,000 input tokens and $0.015 per 1,000 output tokens. A typical text-to-SQL call sends 1,000–3,000 input tokens (your schema plus the question) and gets back 100–300 output tokens (the SQL). That’s roughly $0.003–$0.010 per query.

At scale, this adds up. If you’re running 10,000 queries per day, you’re looking at $30–$100 daily. For comparison, a Looker or Tableau license costs $500–$2,000 per user per year, but those are fixed costs. Bedrock is variable—more queries, more cost.

Optimize by caching schemas, reusing prompts, and filtering schema before sending it to Bedrock. Only include tables and columns relevant to the user’s question. If your database has 500 tables, don’t send all 500 to Claude. Send the 5 that matter.

Building Your First Text-to-SQL Implementation

Now let’s build something concrete. Here’s a minimal implementation in Python using the Bedrock API directly.

The Basic Flow

import boto3
import json
from datetime import datetime

bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')

def generate_sql(user_question: str, schema: dict) -> str:
    """
    Convert a natural language question to SQL using Bedrock + Claude.
    
    Args:
        user_question: The user's natural language query
        schema: A dict describing tables and columns
    
    Returns:
        Generated SQL string
    """
    
    schema_text = format_schema_for_prompt(schema)
    
    prompt = f"""You are a SQL expert. Given the following database schema and a user question, generate accurate SQL.

Database Schema:
{schema_text}

User Question: {user_question}

Generate only the SQL query, nothing else. No markdown, no explanation, just the SQL."""
    
    response = bedrock.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022',
        body=json.dumps({
            'anthropic_version': 'bedrock-2023-06-01',
            'max_tokens': 1024,
            'messages': [
                {
                    'role': 'user',
                    'content': prompt
                }
            ]
        })
    )
    
    result = json.loads(response['body'].read())
    generated_sql = result['content'][0]['text'].strip()
    
    return generated_sql

def format_schema_for_prompt(schema: dict) -> str:
    """
    Format database schema into readable text for the prompt.
    """
    output = []
    for table_name, columns in schema.items():
        col_defs = ', '.join([f"{col['name']} ({col['type']})" for col in columns])
        output.append(f"Table: {table_name}\n  Columns: {col_defs}")
    return '\n'.join(output)

# Example usage
schema = {
    'customers': [
        {'name': 'id', 'type': 'INT'},
        {'name': 'name', 'type': 'VARCHAR'},
        {'name': 'email', 'type': 'VARCHAR'},
        {'name': 'signup_date', 'type': 'DATE'},
        {'name': 'status', 'type': 'VARCHAR'}
    ],
    'orders': [
        {'name': 'id', 'type': 'INT'},
        {'name': 'customer_id', 'type': 'INT'},
        {'name': 'amount', 'type': 'DECIMAL'},
        {'name': 'order_date', 'type': 'DATE'}
    ]
}

question = "How many customers signed up in the last 30 days?"
sql = generate_sql(question, schema)
print(sql)

This generates something like:

SELECT COUNT(*) as new_customers
FROM customers
WHERE signup_date >= CURRENT_DATE - INTERVAL '30 days'

It’s not perfect every time, but it works. The key insight: Claude understands SQL well enough to generate reasonable queries from schema and intent.

Adding Validation

Generated SQL needs validation. Here’s how to add a second Bedrock call that validates the output:

def validate_sql(user_question: str, generated_sql: str, schema: dict) -> dict:
    """
    Validate that generated SQL matches user intent.
    Returns {'valid': bool, 'explanation': str}
    """
    
    schema_text = format_schema_for_prompt(schema)
    
    prompt = f"""You are a SQL validator. Review the following:

Original Question: {user_question}

Generated SQL: {generated_sql}

Database Schema:
{schema_text}

Does the SQL correctly answer the question? Respond with only:
VALID: yes/no
REASON: brief explanation"""
    
    response = bedrock.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022',
        body=json.dumps({
            'anthropic_version': 'bedrock-2023-06-01',
            'max_tokens': 256,
            'messages': [
                {
                    'role': 'user',
                    'content': prompt
                }
            ]
        })
    )
    
    result = json.loads(response['body'].read())
    validation_text = result['content'][0]['text'].strip()
    
    # Parse the response
    lines = validation_text.split('\n')
    is_valid = 'yes' in lines[0].lower()
    reason = lines[1] if len(lines) > 1 else ''
    
    return {'valid': is_valid, 'reason': reason}

This catches hallucinations and logical errors before they reach your database. If validation fails, you can either regenerate the SQL or return an error to the user.

Advanced Patterns and Optimization

Once you have the basics working, there are several patterns that make text-to-SQL production-grade.

Schema Pruning and Relevance

Don’t send your entire database schema to Bedrock. Instead, use embeddings or a classifier to identify which tables are relevant to the user’s question. This reduces token usage and improves accuracy.

One approach: embed your table and column descriptions using a small embedding model (like Titan Embeddings, also available through Bedrock). When a user asks a question, embed the question and find the most similar tables. Send only those.

from sentence_transformers import SentenceTransformer
import numpy as np

def find_relevant_tables(question: str, schema: dict, embeddings_model) -> dict:
    """
    Find tables relevant to the user's question using embeddings.
    """
    question_embedding = embeddings_model.encode(question)
    
    relevant = {}
    for table_name, columns in schema.items():
        table_description = f"{table_name}: {', '.join([c['name'] for c in columns])}"
        table_embedding = embeddings_model.encode(table_description)
        
        similarity = np.dot(question_embedding, table_embedding)
        if similarity > 0.3:  # Threshold
            relevant[table_name] = columns
    
    return relevant

This reduces token usage by 50–80% on large schemas and makes Claude’s SQL generation more accurate because it’s not distracted by irrelevant tables.

Handling Ambiguity with Clarification

Sometimes a question is genuinely ambiguous. “Show me sales by region”—does that mean total sales, average sales, or sales trend? Claude can help here too.

def clarify_ambiguous_question(question: str, schema: dict) -> str:
    """
    If a question is ambiguous, ask Claude to clarify.
    """
    schema_text = format_schema_for_prompt(schema)
    
    prompt = f"""Is this question unambiguous enough to generate SQL from?

Question: {question}

Schema:
{schema_text}

If ambiguous, respond with:
AMBIGUOUS: yes/no
CLARIFICATION: what you need to know

If unambiguous, respond:
AMBIGUOUS: no"""
    
    response = bedrock.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022',
        body=json.dumps({
            'anthropic_version': 'bedrock-2023-06-01',
            'max_tokens': 256,
            'messages': [{'role': 'user', 'content': prompt}]
        })
    )
    
    result = json.loads(response['body'].read())
    return result['content'][0]['text'].strip()

If Claude identifies ambiguity, you can prompt the user for clarification before generating SQL. This prevents bad queries and improves user experience.

Caching for Performance

Bedrock supports prompt caching. If you’re running the same schema against many questions, cache the schema in your prompt. This reduces latency and cost.

def generate_sql_with_cache(question: str, schema: dict) -> str:
    """
    Generate SQL with prompt caching for schema.
    """
    schema_text = format_schema_for_prompt(schema)
    
    response = bedrock.invoke_model(
        modelId='anthropic.claude-3-5-sonnet-20241022',
        body=json.dumps({
            'anthropic_version': 'bedrock-2023-06-01',
            'max_tokens': 1024,
            'system': [
                {
                    'type': 'text',
                    'text': f"You are a SQL expert. Here is the database schema:\n\n{schema_text}",
                    'cache_control': {'type': 'ephemeral'}
                }
            ],
            'messages': [
                {
                    'role': 'user',
                    'content': f"Generate SQL for: {question}"
                }
            ]
        })
    )
    
    result = json.loads(response['body'].read())
    return result['content'][0]['text'].strip()

With prompt caching, the schema is cached on Bedrock’s servers for 5 minutes. Subsequent requests reuse the cached version, reducing input token cost by 90% and latency by 50–70%.

Integrating Text-to-SQL with Analytics Platforms

Text-to-SQL is most powerful when integrated into a larger analytics system. This is where D23’s managed Apache Superset becomes relevant. Superset is built for embedding analytics and self-serve BI, and text-to-SQL fits naturally into that workflow.

When you’re building text-to-SQL, you’re solving the “query generation” problem. But you still need to handle visualization, caching, permissions, and multi-step exploration. That’s where a proper BI platform comes in.

Architecture: Text-to-SQL + Superset

A typical integration looks like:

  1. User asks a natural language question in your application or Superset’s interface
  2. Your text-to-SQL service (running Bedrock + Claude) generates SQL
  3. The SQL executes against your database
  4. Results flow to Superset for visualization and caching
  5. Users explore further using Superset’s native query builder or ask another natural language question

Superset handles the heavy lifting: caching query results, managing permissions, formatting visualizations, and letting users drill down without regenerating SQL. Your Bedrock implementation handles the natural language translation.

API-First Design

To integrate text-to-SQL with Superset or any analytics platform, expose it via API. Here’s a minimal Flask example:

from flask import Flask, request, jsonify
import json

app = Flask(__name__)

@app.route('/api/text-to-sql', methods=['POST'])
def text_to_sql():
    data = request.json
    question = data.get('question')
    schema = data.get('schema')  # Or fetch from database
    
    if not question:
        return jsonify({'error': 'No question provided'}), 400
    
    try:
        sql = generate_sql(question, schema)
        validation = validate_sql(question, sql, schema)
        
        if not validation['valid']:
            return jsonify({
                'error': 'Generated SQL is invalid',
                'reason': validation['reason']
            }), 400
        
        return jsonify({
            'sql': sql,
            'question': question,
            'status': 'success'
        })
    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(debug=False, port=5000)

This API can be called from Superset, your application, or any other system. You could also integrate it with D23’s API-first approach to embed text-to-SQL directly into dashboards.

Handling Edge Cases and Failure Modes

Production text-to-SQL systems fail in predictable ways. Here’s how to handle them.

Hallucinated Columns

Claude sometimes invents column names that don’t exist. Prevent this by validating column references against your actual schema:

import re

def validate_columns_exist(sql: str, schema: dict) -> tuple[bool, list]:
    """
    Check if all columns in the SQL exist in the schema.
    Returns (is_valid, missing_columns)
    """
    # Extract all identifiers from SQL (simple regex approach)
    identifiers = re.findall(r'\b([a-zA-Z_][a-zA-Z0-9_]*)\b', sql)
    
    all_columns = set()
    for table_columns in schema.values():
        for col in table_columns:
            all_columns.add(col['name'])
    
    missing = [id for id in identifiers if id not in all_columns and id not in schema]
    
    return len(missing) == 0, missing

If missing columns are found, regenerate the SQL or ask Claude to fix it.

Query Timeout and Resource Limits

A user might ask an innocent question that results in a table scan of billions of rows. Protect your database:

import time
from contextlib import contextmanager

@contextmanager
def limited_query_execution(connection, timeout_seconds=30):
    """
    Execute a query with timeout and resource limits.
    """
    try:
        # Set query timeout (database-specific)
        connection.execute(f"SET statement_timeout = {timeout_seconds * 1000}")
        yield connection
    finally:
        connection.execute("RESET statement_timeout")

def execute_generated_sql(sql: str, connection, timeout=30) -> dict:
    """
    Safely execute generated SQL with protections.
    """
    try:
        with limited_query_execution(connection, timeout):
            cursor = connection.cursor()
            start = time.time()
            cursor.execute(sql)
            rows = cursor.fetchall()
            elapsed = time.time() - start
            
            return {
                'success': True,
                'rows': rows,
                'elapsed_seconds': elapsed,
                'row_count': len(rows)
            }
    except Exception as e:
        return {
            'success': False,
            'error': str(e)
        }

Set reasonable timeouts (30 seconds is typical) and limits on result rows (1000 rows is a good default).

Permission and Row-Level Security

You can’t let every user query every table. Implement row-level security by filtering the schema based on user permissions:

def get_user_schema(user_id: str, schema: dict) -> dict:
    """
    Return only the schema the user has permission to access.
    """
    # Fetch user permissions from your system
    user_permissions = get_user_permissions(user_id)
    
    filtered_schema = {}
    for table_name, columns in schema.items():
        if user_permissions.get(f'table:{table_name}', False):
            filtered_schema[table_name] = columns
    
    return filtered_schema

When generating SQL for a user, use their filtered schema. This ensures they can only query what they’re allowed to see.

Real-World Implementation: A Complete Example

Let’s build a more complete example that ties everything together. This is a minimal but production-ready text-to-SQL service.

import boto3
import json
import logging
from typing import Optional, Dict, List
from datetime import datetime

logger = logging.getLogger(__name__)

class TextToSQLService:
    def __init__(self, region='us-east-1', model_id='anthropic.claude-3-5-sonnet-20241022'):
        self.bedrock = boto3.client('bedrock-runtime', region_name=region)
        self.model_id = model_id
    
    def generate_and_validate_sql(
        self,
        question: str,
        schema: Dict,
        max_retries: int = 2
    ) -> Dict:
        """
        Generate SQL and validate it. Retry on validation failure.
        """
        for attempt in range(max_retries):
            try:
                # Generate SQL
                sql = self._generate_sql(question, schema)
                logger.info(f"Generated SQL (attempt {attempt + 1}): {sql}")
                
                # Validate
                validation = self._validate_sql(question, sql, schema)
                if validation['valid']:
                    return {
                        'success': True,
                        'sql': sql,
                        'question': question,
                        'attempt': attempt + 1
                    }
                else:
                    logger.warning(f"Validation failed: {validation['reason']}")
                    if attempt < max_retries - 1:
                        # Try again with more explicit instructions
                        schema = self._get_relevant_schema(question, schema)
            except Exception as e:
                logger.error(f"Error on attempt {attempt + 1}: {str(e)}")
                if attempt == max_retries - 1:
                    return {'success': False, 'error': str(e)}
        
        return {
            'success': False,
            'error': 'Failed to generate valid SQL after retries'
        }
    
    def _generate_sql(self, question: str, schema: Dict) -> str:
        schema_text = self._format_schema(schema)
        prompt = f"""You are a SQL expert. Generate a SQL query that answers the following question.

Database Schema:
{schema_text}

Question: {question}

Return ONLY the SQL query, no explanation or markdown."""
        
        response = self.bedrock.invoke_model(
            modelId=self.model_id,
            body=json.dumps({
                'anthropic_version': 'bedrock-2023-06-01',
                'max_tokens': 1024,
                'messages': [{'role': 'user', 'content': prompt}]
            })
        )
        
        result = json.loads(response['body'].read())
        return result['content'][0]['text'].strip()
    
    def _validate_sql(self, question: str, sql: str, schema: Dict) -> Dict:
        schema_text = self._format_schema(schema)
        prompt = f"""Validate this SQL query.

Original Question: {question}
Generated SQL: {sql}

Database Schema:
{schema_text}

Respond with:
VALID: yes or no
REASON: explanation"""
        
        response = self.bedrock.invoke_model(
            modelId=self.model_id,
            body=json.dumps({
                'anthropic_version': 'bedrock-2023-06-01',
                'max_tokens': 256,
                'messages': [{'role': 'user', 'content': prompt}]
            })
        )
        
        result = json.loads(response['body'].read())
        text = result['content'][0]['text'].strip()
        
        is_valid = 'yes' in text.lower().split('\n')[0]
        reason = text.split('\n')[1] if '\n' in text else ''
        
        return {'valid': is_valid, 'reason': reason}
    
    def _format_schema(self, schema: Dict) -> str:
        lines = []
        for table, columns in schema.items():
            col_str = ', '.join([f"{c['name']} ({c['type']})" for c in columns])
            lines.append(f"Table: {table}\n  Columns: {col_str}")
        return '\n'.join(lines)
    
    def _get_relevant_schema(self, question: str, schema: Dict) -> Dict:
        """Filter schema to only relevant tables."""
        # Simple keyword matching (could use embeddings for better results)
        keywords = set(question.lower().split())
        relevant = {}
        for table, columns in schema.items():
            if any(keyword in table.lower() for keyword in keywords):
                relevant[table] = columns
        return relevant if relevant else schema

# Usage
service = TextToSQLService()

schema = {
    'customers': [
        {'name': 'id', 'type': 'INT'},
        {'name': 'name', 'type': 'VARCHAR'},
        {'name': 'signup_date', 'type': 'DATE'},
        {'name': 'status', 'type': 'VARCHAR'}
    ],
    'orders': [
        {'name': 'id', 'type': 'INT'},
        {'name': 'customer_id', 'type': 'INT'},
        {'name': 'amount', 'type': 'DECIMAL'},
        {'name': 'order_date', 'type': 'DATE'}
    ]
}

result = service.generate_and_validate_sql(
    "How many customers signed up last month?",
    schema
)

if result['success']:
    print(f"Generated SQL: {result['sql']}")
else:
    print(f"Error: {result['error']}")

This service handles generation, validation, schema filtering, and retry logic. It’s the foundation for a production text-to-SQL system.

Deploying and Monitoring Text-to-SQL

Once your text-to-SQL service works locally, you need to deploy it and monitor its behavior.

Deployment Options

AWS Lambda: For low-volume, serverless deployment. Bedrock calls are fast, so Lambda’s cold start isn’t a major issue. You can wrap your service in a Lambda handler and deploy it via API Gateway.

ECS/Fargate: For higher volume or if you need persistent connections to your database. Container-based deployment gives you more control over resource allocation and scaling.

EC2 with Auto Scaling: For very high volume. You can run multiple instances of your text-to-SQL service behind a load balancer.

For most teams, Lambda or Fargate is the right choice. Bedrock handles the scaling, so you only need to scale your application layer.

Monitoring and Observability

Track these metrics:

  • Query generation latency: How long does it take to generate SQL? Target: <1 second
  • Validation success rate: What percentage of generated SQL passes validation? Target: >95%
  • First-pass success rate: What percentage of SQL works on the first attempt without retry? Target: >90%
  • Token usage: How many tokens per query? This directly impacts cost
  • Database query latency: How long do generated queries take to execute?
  • Error rates: How often does the service fail completely?

Log all of these to CloudWatch or your monitoring system. Set up alarms for error rates and latency spikes.

import time
import logging
from datetime import datetime

class MonitoredTextToSQLService(TextToSQLService):
    def generate_and_validate_sql(self, question, schema, max_retries=2):
        start_time = time.time()
        result = super().generate_and_validate_sql(question, schema, max_retries)
        elapsed = time.time() - start_time
        
        # Log metrics
        logger.info(json.dumps({
            'timestamp': datetime.utcnow().isoformat(),
            'question': question,
            'success': result.get('success'),
            'elapsed_seconds': elapsed,
            'attempt': result.get('attempt', -1),
            'error': result.get('error')
        }))
        
        return result

These logs become your debugging tool. When a user reports bad SQL, you can trace exactly what happened.

Comparison with Alternatives

Text-to-SQL isn’t the only way to enable natural language analytics. How does it compare?

Text-to-SQL vs. Pre-Built Dashboards

Pre-built dashboards (what Looker, Tableau, and Power BI focus on) are static. They answer specific questions well but can’t handle ad-hoc exploration. Text-to-SQL enables true self-serve: any question, answered instantly.

The tradeoff: text-to-SQL requires more setup and carries risk (bad SQL, security issues). Dashboards are safer but less flexible.

Text-to-SQL vs. Search-Based Analytics

Some platforms (like Metabase) use search to find pre-computed metrics and dashboards. This is safer than text-to-SQL but less powerful. It only works if someone has already built the metric you’re looking for.

Text-to-SQL generates new queries on the fly, enabling truly novel questions.

Text-to-SQL vs. Natural Language Interfaces in BI Tools

Tableau and Looker have added natural language features. But these are built on top of their proprietary platforms. You’re locked into their ecosystem.

With Bedrock + Claude, you control the entire stack. You can customize prompts, integrate with your own databases, and avoid vendor lock-in. This is especially valuable for teams building embedded analytics where flexibility matters.

Best Practices and Lessons Learned

After building several text-to-SQL systems, here’s what actually works:

1. Start Simple, Add Complexity Gradually

Begin with basic SQL generation. Add validation, schema pruning, and caching only when you hit specific problems. Premature optimization wastes time.

2. Invest in Prompt Engineering

Your prompts matter more than your code. Spend time crafting clear, specific instructions for Claude. Include examples of good SQL. Specify the exact format you want (just SQL, no markdown). Small prompt changes often have big impact on quality.

3. Validate Aggressively

Don’t trust generated SQL. Validate syntax, validate that it matches intent, validate that it doesn’t access tables the user shouldn’t see. Validation catches 90% of problems before they reach your database.

4. Cache Schema, Not Results

Cache your database schema in Bedrock prompts (using prompt caching). Don’t cache query results at the text-to-SQL layer—that’s Superset’s job. This keeps your system simple and avoids stale data issues.

5. Limit Result Sets

Always limit the number of rows returned (e.g., 1000 max). Users don’t need 10 million rows; they need the answer to their question. Limiting rows also improves performance.

6. Handle Ambiguity Explicitly

If a question is ambiguous, ask for clarification rather than guessing. It’s better to ask “Do you mean revenue or profit?” than to return the wrong answer.

7. Monitor and Iterate

Track which questions fail and why. Use that data to improve your prompts, schema filtering, and validation logic. Text-to-SQL is an iterative process.

Integrating with D23 and Apache Superset

If you’re building text-to-SQL, you’re probably also thinking about how to surface those results to users. This is where a proper analytics platform comes in.

D23 is built on Apache Superset, the open-source BI platform that excels at self-serve analytics and embedded dashboards. When you integrate text-to-SQL with Superset, you get:

  • Query caching: Superset caches results, so repeated questions are instant
  • Visualization: Results are automatically formatted into charts and tables
  • Exploration: Users can drill down, filter, and explore without writing more SQL
  • Permissions: Row-level security and table-level access control
  • Sharing: Users can save queries as dashboards and share them

Superset’s API-first architecture makes integration straightforward. You can call text-to-SQL from Superset’s SQL Lab, or embed it in your own application and use Superset’s visualization layer.

For teams at scale-ups and mid-market companies, this combination—Bedrock for text-to-SQL, Superset for visualization and caching—provides the flexibility of custom analytics without the overhead of maintaining a full platform.

Conclusion: Text-to-SQL Is Practical Now

Text-to-SQL used to be research. Now it’s practical engineering. Amazon Bedrock and Claude make it accessible: no model training, no infrastructure, just an API call.

Building a production system still requires work. You need validation, schema management, permission handling, and monitoring. But that work is straightforward engineering, not research.

Start with the basic implementation in this guide. Get it working for your schema. Add validation and schema filtering. Deploy it. Monitor it. Iterate based on what you learn.

The result is a system where your users can ask any question in plain English and get an answer instantly. That’s powerful. That’s what modern analytics should be.

For teams embedding analytics into products or building self-serve BI platforms, text-to-SQL is a game-changer. Combined with D23’s managed Superset platform, you can build analytics experiences that rival Looker or Tableau without the cost or complexity.

The technical foundation is here. The patterns are proven. The tools are available. The only thing left is to build.