How to Build an Analytics MCP Server in 200 Lines of Python
Learn to build a production-ready analytics MCP server in Python. Connect data warehouses, enable text-to-SQL, and integrate with AI agents in minimal code.
What Is an MCP Server and Why It Matters for Analytics
Model Context Protocol (MCP) servers are a relatively new but rapidly evolving standard for connecting AI agents and applications to external data sources and tools. Think of an MCP server as a standardized bridge between your analytics infrastructure and any AI-powered application that needs to query, explore, or report on data.
Traditionally, if you wanted an AI agent to run a SQL query against your data warehouse, you’d need to write custom API wrappers, manage authentication, handle rate limiting, and maintain versioning across multiple integrations. MCP abstracts all of that friction. It’s a protocol, not a platform—meaning you can build once and connect to any MCP-compatible client, whether that’s Claude, an internal chatbot, or a custom analytics dashboard.
For analytics teams specifically, an MCP server unlocks several concrete benefits:
- Text-to-SQL without vendor lock-in: Users can ask natural language questions about data, and the server translates them to SQL queries without needing Looker, Tableau, or proprietary query builders.
- Embedded analytics at scale: Product teams can give end-users self-serve analytics without shipping a full BI platform. The MCP server handles the data layer; your frontend handles presentation.
- AI-assisted exploration: Data analysts can leverage LLM reasoning to explore schemas, suggest metrics, and debug queries faster than manual SQL writing.
- Standardized integrations: Once your server speaks MCP, it works with any MCP client—no custom connectors per tool.
This is especially valuable for teams already using Apache Superset or evaluating managed open-source BI as an alternative to proprietary platforms. An MCP server lets you extend Superset’s capabilities programmatically and integrate it into broader AI workflows.
The MCP Protocol Basics: Tools, Resources, and Prompts
Before writing code, you need to understand the three core primitives MCP exposes:
Tools
Tools are callable functions that an MCP client can invoke. In an analytics context, a tool might be “run_query”, “list_tables”, or “get_schema”. The server defines what tools are available, their parameters, and what they return. The client (e.g., Claude) decides when and how to call them based on user intent.
Resources
Resources are read-only data that the server makes available to clients. For analytics, resources could be table schemas, column metadata, or saved query definitions. Unlike tools, resources don’t perform side effects—they’re informational.
Prompts
Prompts are templates or instructions that guide the client’s behavior. An analytics server might define a prompt like “You are a data analyst. When asked questions about our warehouse, first check the schema resource, then use the run_query tool.” This helps ensure consistent, correct usage patterns.
For a minimal analytics MCP server, you’ll typically focus on tools (the query execution and metadata retrieval) and resources (schema definitions). Prompts are optional but useful for steering AI behavior.
Setting Up Your Python Environment
To build an analytics MCP server, you’ll need the official MCP SDK for Python. Start with a clean virtual environment:
python -m venv mcp-analytics-env
source mcp-analytics-env/bin/activate # On Windows: mcp-analytics-env\Scripts\activate
pip install mcp
You’ll also need a database driver. For this example, we’ll use DuckDB (an in-process SQL database perfect for demos) and PostgreSQL support:
pip install duckdb psycopg2-binary
If you’re connecting to Snowflake, BigQuery, or another warehouse, install the appropriate driver:
# For Snowflake
pip install snowflake-connector-python
# For BigQuery
pip install google-cloud-bigquery
# For Redshift
pip install redshift-connector
You’ll also want to install a JSON schema validator and any LLM libraries if you plan to add text-to-SQL capabilities:
pip install jsonschema pydantic
Once your environment is ready, verify the MCP SDK is installed:
python -c "import mcp; print(mcp.__version__)"
Building Your First Analytics MCP Server: A Complete Example
Here’s a fully functional 200-line analytics MCP server that connects to a data warehouse, exposes schema metadata, and allows clients to run SQL queries:
import json
import logging
from typing import Any
import duckdb
from mcp.server import Server
from mcp.types import Tool, TextContent, ToolResult
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Initialize the MCP server
server = Server("analytics-server")
# Database connection (using DuckDB for simplicity; swap for your warehouse)
db = duckdb.connect(":memory:")
# Sample data setup
def setup_sample_data():
"""Create sample tables for demonstration."""
db.execute("""
CREATE TABLE customers AS
SELECT
1 as customer_id, 'Alice' as name, 'acme@example.com' as email, 5000 as ltv
UNION ALL
SELECT 2, 'Bob', 'bob@example.com', 12000
UNION ALL
SELECT 3, 'Carol', 'carol@example.com', 8500
""")
db.execute("""
CREATE TABLE orders AS
SELECT
101 as order_id, 1 as customer_id, '2024-01-15' as order_date, 250 as amount
UNION ALL
SELECT 102, 1, '2024-02-10', 180
UNION ALL
SELECT 103, 2, '2024-01-20', 450
UNION ALL
SELECT 104, 3, '2024-03-05', 320
""")
setup_sample_data()
# Define tools
@server.list_tools()
async def list_tools() -> list[Tool]:
"""Expose available tools to MCP clients."""
return [
Tool(
name="list_tables",
description="List all available tables in the warehouse.",
inputSchema={
"type": "object",
"properties": {},
"required": []
}
),
Tool(
name="get_schema",
description="Get the schema (columns and types) for a specific table.",
inputSchema={
"type": "object",
"properties": {
"table_name": {
"type": "string",
"description": "Name of the table to inspect."
}
},
"required": ["table_name"]
}
),
Tool(
name="run_query",
description="Execute a SQL query against the warehouse. Returns results as JSON.",
inputSchema={
"type": "object",
"properties": {
"sql": {
"type": "string",
"description": "SQL query to execute."
}
},
"required": ["sql"]
}
)
]
# Implement tool handlers
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> ToolResult:
"""Handle tool invocations from MCP clients."""
try:
if name == "list_tables":
result = db.execute(
"SELECT table_name FROM information_schema.tables WHERE table_schema='main'"
).fetchall()
tables = [row[0] for row in result]
return ToolResult(
content=[TextContent(type="text", text=json.dumps({"tables": tables}))]
)
elif name == "get_schema":
table_name = arguments.get("table_name")
if not table_name:
return ToolResult(
content=[TextContent(type="text", text="Error: table_name is required.")],
isError=True
)
result = db.execute(f"DESCRIBE {table_name}").fetchall()
schema = [
{"column": row[0], "type": row[1]}
for row in result
]
return ToolResult(
content=[TextContent(type="text", text=json.dumps({"schema": schema}))]
)
elif name == "run_query":
sql = arguments.get("sql")
if not sql:
return ToolResult(
content=[TextContent(type="text", text="Error: sql is required.")],
isError=True
)
# Execute query and convert results to JSON
result = db.execute(sql).fetchall()
columns = [desc[0] for desc in db.description]
rows = [
{col: val for col, val in zip(columns, row)}
for row in result
]
return ToolResult(
content=[TextContent(type="text", text=json.dumps({"rows": rows, "count": len(rows)}))]
)
else:
return ToolResult(
content=[TextContent(type="text", text=f"Unknown tool: {name}")],
isError=True
)
except Exception as e:
logger.error(f"Tool error: {e}")
return ToolResult(
content=[TextContent(type="text", text=f"Error: {str(e)}")],
isError=True
)
if __name__ == "__main__":
import asyncio
asyncio.run(server.run())
This server:
- Initializes an MCP server with the name “analytics-server”.
- Sets up sample data in DuckDB (customers and orders tables).
- Defines three tools:
list_tables: Returns all available tables.get_schema: Inspects columns and types for a given table.run_query: Executes arbitrary SQL and returns JSON results.
- Implements tool handlers that execute the logic and return results in MCP-compatible format.
- Includes error handling to gracefully manage invalid inputs or database errors.
Connecting to Your Real Data Warehouse
The example above uses DuckDB for simplicity, but production deployments connect to actual data warehouses. Here’s how to adapt the code for PostgreSQL:
import psycopg2
from psycopg2.extras import RealDictCursor
# Replace the DuckDB setup with PostgreSQL
class WarehouseConnection:
def __init__(self, host, database, user, password, port=5432):
self.conn = psycopg2.connect(
host=host,
database=database,
user=user,
password=password,
port=port
)
def execute(self, query):
"""Execute a query and return results as dictionaries."""
with self.conn.cursor(cursor_factory=RealDictCursor) as cur:
cur.execute(query)
return cur.fetchall()
def close(self):
self.conn.close()
# Initialize in your server
warehouse = WarehouseConnection(
host="your-postgres-host.com",
database="analytics_db",
user="analytics_user",
password="your_password"
)
For Snowflake, the pattern is similar:
from snowflake.connector import connect
snowflake_conn = connect(
user="your_user",
password="your_password",
account="your_account",
warehouse="your_warehouse",
database="your_database"
)
cursor = snowflake_conn.cursor()
cursor.execute("SELECT * FROM your_table")
results = cursor.fetchall()
The key principle: once you have a database connection, the MCP tool handlers remain identical. You’re just swapping the backend.
Adding Text-to-SQL Capabilities
Now that your server exposes raw query tools, the next step is enabling natural language queries. This requires an LLM that understands your schema and can generate SQL.
Here’s a minimal implementation using OpenAI’s API:
import openai
class TextToSQLGenerator:
def __init__(self, api_key: str, warehouse_conn):
self.client = openai.OpenAI(api_key=api_key)
self.warehouse = warehouse_conn
def get_schema_description(self) -> str:
"""Build a text description of all tables and columns."""
schema_text = ""
# Fetch all tables
tables = self.warehouse.execute(
"SELECT table_name FROM information_schema.tables WHERE table_schema='public'"
)
for table_row in tables:
table_name = table_row[0]
schema_text += f"\nTable: {table_name}\n"
# Fetch columns for this table
columns = self.warehouse.execute(
f"SELECT column_name, data_type FROM information_schema.columns WHERE table_name='{table_name}'"
)
for col in columns:
schema_text += f" - {col[0]} ({col[1]})\n"
return schema_text
def generate_sql(self, natural_language_query: str) -> str:
"""Convert a natural language question to SQL."""
schema = self.get_schema_description()
prompt = f"""You are a SQL expert. Given the following database schema and a user question,
generate a valid SQL query.
Schema:
{schema}
User question: {natural_language_query}
Return ONLY the SQL query, no explanation."""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0
)
return response.choices[0].message.content.strip()
Then add a new tool to your MCP server:
@server.list_tools()
async def list_tools() -> list[Tool]:
# ... existing tools ...
return [
# ... existing tools ...
Tool(
name="natural_language_query",
description="Ask a question in plain English and get SQL results.",
inputSchema={
"type": "object",
"properties": {
"question": {
"type": "string",
"description": "Your data question in natural language."
}
},
"required": ["question"]
}
)
]
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> ToolResult:
# ... existing handlers ...
elif name == "natural_language_query":
question = arguments.get("question")
if not question:
return ToolResult(
content=[TextContent(type="text", text="Error: question is required.")],
isError=True
)
try:
# Generate SQL from natural language
sql = text_to_sql_generator.generate_sql(question)
logger.info(f"Generated SQL: {sql}")
# Execute the generated query
result = warehouse.execute(sql)
return ToolResult(
content=[TextContent(type="text", text=json.dumps({"results": result, "sql": sql}))]
)
except Exception as e:
return ToolResult(
content=[TextContent(type="text", text=f"Error: {str(e)}")],
isError=True
)
This approach is powerful but requires careful prompt engineering and query validation. In production, you’d add safeguards like query timeouts, cost estimation, and approval workflows for expensive queries.
Deploying and Testing Your MCP Server
Once your server is complete, you need to run it and connect a client. The MCP SDK provides a built-in stdio transport (standard input/output), which is ideal for local testing and integration with tools like Claude.
Local Testing
Start your server:
python your_analytics_server.py
The server will listen on stdin/stdout. To test it, you can use the MCP CLI or write a simple client script:
import json
import subprocess
# Start the server as a subprocess
server_process = subprocess.Popen(
["python", "your_analytics_server.py"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True
)
# Send a tool call request
request = {
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "list_tables",
"arguments": {}
}
}
server_process.stdin.write(json.dumps(request) + "\n")
server_process.stdin.flush()
response = server_process.stdout.readline()
print(json.loads(response))
Production Deployment
For production, you’ll want to run your server as a persistent service. Common options include:
- Docker: Package your server in a container and deploy to Kubernetes, ECS, or any container platform.
- Systemd: Run as a Linux service with automatic restart on failure.
- AWS Lambda / Google Cloud Functions: If your queries are short-lived, serverless is cost-effective.
- Managed platforms: Services like Heroku, Railway, or Render can host your server with minimal configuration.
Here’s a minimal Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "analytics_server.py"]
And a requirements.txt:
mcp>=0.1.0
duckdb>=0.8.0
psycopg2-binary>=2.9.0
openai>=1.0.0
pydantic>=2.0.0
Integrating with AI Agents and Frontends
Once your server is running, connecting it to AI agents or frontend applications depends on the client. The most common pattern is connecting to Claude (via Claude’s tool use feature) or a custom chatbot.
For Claude integration, you’d configure your MCP server in Claude’s settings or via the Claude API. The server exposes its tools, and Claude can invoke them as part of its reasoning process.
For embedded analytics (e.g., in a product dashboard), you’d create a frontend that calls your MCP server’s tools via HTTP or WebSocket. Here’s a simple React example:
async function queryAnalytics(question) {
const response = await fetch('/api/mcp-tool', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
tool: 'natural_language_query',
arguments: { question }
})
});
const result = await response.json();
return result.results;
}
// Usage in a component
const [results, setResults] = useState([]);
const handleQuery = async (question) => {
const data = await queryAnalytics(question);
setResults(data);
};
Your backend would wrap the MCP server and expose it via HTTP:
from fastapi import FastAPI
from your_analytics_server import server as mcp_server
app = FastAPI()
@app.post("/api/mcp-tool")
async def call_mcp_tool(tool_name: str, arguments: dict):
result = await mcp_server.call_tool(tool_name, arguments)
return result
Performance and Scalability Considerations
As your analytics MCP server scales, you’ll encounter several challenges:
Query Timeouts
Long-running queries can hang clients. Implement timeouts:
import signal
def timeout_handler(signum, frame):
raise TimeoutError("Query execution exceeded timeout.")
# Set a 30-second timeout
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(30)
try:
result = warehouse.execute(sql)
finally:
signal.alarm(0) # Cancel the alarm
Connection Pooling
For high-concurrency workloads, maintain a pool of database connections:
from sqlalchemy import create_engine
engine = create_engine(
"postgresql://user:password@host/database",
pool_size=20,
max_overflow=10
)
Query Cost Estimation
Before executing expensive queries, estimate their cost (especially for cloud warehouses):
def estimate_query_cost(sql: str) -> float:
"""Estimate BigQuery query cost in USD."""
from google.cloud import bigquery
client = bigquery.Client()
query_job = client.query(
sql,
job_config=bigquery.QueryJobConfig(dry_run=True)
)
# BigQuery charges $6.25 per TB scanned
bytes_scanned = query_job.total_bytes_processed
cost = (bytes_scanned / 1e12) * 6.25
return cost
Caching
Cache frequently accessed schema information and query results:
from functools import lru_cache
import time
class CachedWarehouse:
def __init__(self, warehouse, cache_ttl=300):
self.warehouse = warehouse
self.cache_ttl = cache_ttl
self._schema_cache = {}
self._cache_times = {}
def get_schema(self, table_name: str):
if table_name in self._schema_cache:
if time.time() - self._cache_times[table_name] < self.cache_ttl:
return self._schema_cache[table_name]
schema = self.warehouse.execute(f"DESCRIBE {table_name}")
self._schema_cache[table_name] = schema
self._cache_times[table_name] = time.time()
return schema
Security and Authorization
Analytics servers handle sensitive data. Implement proper security:
Authentication
Require clients to authenticate before accessing tools:
from mcp.types import Request
import os
class AuthenticatedServer(Server):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.api_key = os.getenv("MCP_API_KEY")
async def authenticate(self, request: Request) -> bool:
auth_header = request.headers.get("Authorization", "")
token = auth_header.replace("Bearer ", "")
return token == self.api_key
Row-Level Security (RLS)
For multi-tenant systems, filter query results based on user identity:
def apply_row_security(sql: str, user_id: str) -> str:
"""Inject RLS conditions into the query."""
# Assuming a 'user_id' column in relevant tables
return f"{sql} WHERE user_id = '{user_id}'"
Query Validation
Prevent SQL injection and malicious queries:
import sqlparse
def is_safe_query(sql: str) -> bool:
"""Basic validation—reject queries with dangerous keywords."""
dangerous = ['DROP', 'DELETE', 'TRUNCATE', 'ALTER', 'INSERT', 'UPDATE']
parsed = sqlparse.parse(sql)[0]
for token in parsed.tokens:
if token.ttype is None and token.value.upper() in dangerous:
return False
return True
Real-World Example: Integrating with Apache Superset
If you’re already using Apache Superset, an MCP server complements your BI platform by enabling programmatic access and AI-assisted analytics. Here’s how they work together:
- Superset handles visualization: Your dashboards, charts, and saved queries live in Superset.
- MCP server handles programmatic access: External tools, AI agents, and custom applications query data via the MCP server.
- Shared metadata: Both reference the same warehouse schema, ensuring consistency.
For example, a data analyst might use Superset to build a dashboard, while a product team uses the MCP server to embed analytics in their application. Both benefit from the same underlying data layer.
If you’re evaluating managed open-source BI solutions, D23’s managed Apache Superset platform provides production-grade hosting, API access, and expert support—making it easy to combine Superset’s visual analytics with programmatic access via MCP servers.
Common Pitfalls and How to Avoid Them
Pitfall 1: Unvalidated User Input
Problem: Directly passing user queries to SQL without validation. Solution: Use parameterized queries and input validation:
# Bad
db.execute(f"SELECT * FROM users WHERE id = {user_id}")
# Good
db.execute("SELECT * FROM users WHERE id = ?", (user_id,))
Pitfall 2: No Query Timeout
Problem: A user runs a query that scans the entire warehouse, hanging the server. Solution: Enforce query timeouts as shown in the scalability section above.
Pitfall 3: Exposing Sensitive Schema
Problem: The list_tables and get_schema tools expose all tables, including those with sensitive data.
Solution: Filter tables based on user permissions:
def get_accessible_tables(user_id: str) -> list[str]:
"""Return only tables the user has permission to query."""
# Query your authorization system
permissions = get_user_permissions(user_id)
return [t for t in all_tables if t in permissions]
Pitfall 4: Ignoring Warehouse Costs
Problem: Every query scans the entire table, racking up cloud warehouse bills. Solution: Optimize queries before execution and set cost limits:
def optimize_query(sql: str) -> str:
"""Add LIMIT and column selection to reduce costs."""
# This is a simplified example; real optimization is more complex
if 'LIMIT' not in sql.upper():
sql += " LIMIT 10000"
return sql
Comparing MCP Servers to Traditional BI Tools
You might wonder: why build an MCP server when tools like Looker, Tableau, and Power BI already exist?
The answer depends on your use case:
| Aspect | MCP Server | Looker/Tableau | D23 (Managed Superset) |
|---|---|---|---|
| Setup Time | Hours | Days/weeks | Minutes |
| Cost | Low (self-hosted) | High (per-seat licensing) | Moderate (managed) |
| Customization | Full control | Limited | High (open-source) |
| AI Integration | Native (MCP protocol) | Requires plugins | Native (via MCP) |
| Embedded Analytics | Easy (programmatic) | Complex (iFrame) | Easy (API-first) |
| Scalability | Depends on deployment | Managed | Managed |
MCP servers shine when you need programmatic analytics access, AI-driven exploration, or embedded analytics in custom applications. Traditional BI tools excel at visual discovery, collaborative reporting, and no-code dashboard building.
Many teams use both: Superset for visual analytics and an MCP server for programmatic access. This hybrid approach combines the best of both worlds.
Next Steps: Advanced MCP Patterns
Once you’ve mastered the basics, consider these advanced patterns:
Multi-Warehouse Federation
Connect to multiple data sources and unify them:
class FederatedWarehouse:
def __init__(self, sources: dict):
self.sources = sources # {'postgres': conn1, 'snowflake': conn2}
def execute(self, sql: str, source: str = 'default'):
return self.sources[source].execute(sql)
Real-Time Streaming Analytics
Integrate with Apache Kafka or Kinesis for streaming data:
from kafka import KafkaConsumer
def stream_analytics():
consumer = KafkaConsumer('analytics-events')
for message in consumer:
# Process event and update MCP server state
pass
Custom Visualizations
Return structured data that frontends can render as charts:
class ChartResult:
def __init__(self, data, chart_type='bar'):
self.data = data
self.chart_type = chart_type
def to_json(self):
return {
'type': self.chart_type,
'data': self.data
}
Conclusion
Building an analytics MCP server in Python is straightforward and unlocks powerful new ways to interact with data. Whether you’re enabling AI-assisted analytics, embedding self-serve BI, or standardizing data access across your organization, an MCP server provides a flexible, vendor-neutral foundation.
The 200-line example in this guide is just the starting point. From there, you can add text-to-SQL, multi-warehouse support, caching, and security features tailored to your needs.
For teams already using Apache Superset, an MCP server complements your BI platform and opens new integration possibilities. And if you’re exploring managed open-source BI alternatives to Looker, Tableau, and Power BI, D23’s managed Superset platform provides production-grade infrastructure, API-first access, and expert data consulting—making it easy to combine visual analytics with programmatic data access.
To explore MCP servers further, check out the MCP Server Directory and the curated MCP resources on GitHub. For data science–specific MCP patterns, the 11 Data Science MCP Servers article offers practical examples. You can also review Microsoft’s guide to MCP servers for broader integration patterns, and explore Cloudflare’s new MCP servers for deployment and browser automation ideas. Additionally, the best MCP servers for web analytics guide and Firecrawl’s developer MCP servers list provide context on real-world MCP implementations. The open-source MCP servers registry at Glama offers a daily-updated list of production-ready servers you can learn from.
Start small, test locally, and scale as your analytics needs grow. Your data—and your AI agents—will thank you.