Connecting Apache Superset to Microsoft Fabric Lakehouses
Learn how to connect Apache Superset to Microsoft Fabric lakehouses via OneLake and Delta tables. Step-by-step integration guide for production analytics.
Connecting Apache Superset to Microsoft Fabric Lakehouses
Microsoft Fabric lakehouses represent a modern approach to data architecture—combining the flexibility of data lakes with the queryability of data warehouses. If you’re running Apache Superset as your analytics platform, connecting it directly to a Fabric lakehouse unlocks a powerful workflow: ingest raw data into OneLake, transform it with Delta tables, and surface insights through Superset dashboards without moving data between systems.
This guide walks through the technical setup, authentication patterns, and best practices for integrating Superset with Fabric lakehouses. Whether you’re building embedded analytics for your product or centralizing BI across your organization, this integration eliminates the need for intermediate data exports and keeps your analytics layer lightweight and responsive.
Understanding the Architecture: Superset, Fabric, and OneLake
Before diving into connection details, it helps to understand what each component does and how they fit together.
Apache Superset is an open-source business intelligence platform that sits on top of your data layer. It handles visualization, dashboard creation, SQL querying, and—when configured with AI—text-to-SQL query generation. Superset doesn’t store data itself; it connects to external databases and data warehouses via database drivers and executes queries against them. This architecture makes Superset lightweight and agnostic to your underlying data platform.
Microsoft Fabric is an integrated analytics platform that combines data engineering, data science, and business analytics under one SaaS umbrella. Within Fabric, a lakehouse is a unified data store that supports both structured and unstructured data. Lakehouses in Fabric are built on Delta Lake format (the same open standard used by Databricks) and are accessible through a SQL analytics endpoint—a managed SQL interface that lets external tools query lakehouse tables as if they were in a traditional data warehouse.
OneLake is the underlying cloud storage layer in Microsoft Fabric. When you create a lakehouse, its data is stored in OneLake, Microsoft’s tenant-wide, multi-cloud storage abstraction. OneLake uses Delta format by default, which provides ACID transactions, schema enforcement, and time-travel capabilities—all valuable for analytics workloads.
The integration pattern is straightforward: Superset connects to the SQL analytics endpoint of a Fabric lakehouse, executes queries against Delta tables, and renders results in dashboards. This approach avoids data duplication and keeps your single source of truth in Fabric.
Why Connect Superset to Fabric Lakehouses?
Several scenarios make this integration compelling:
Cost Efficiency: Fabric’s consumption-based pricing model (measured in Fabric Capacity Units) can be more predictable than per-seat BI tool licensing. By using Superset—an open-source tool—alongside Fabric, you avoid per-user seat costs from Looker, Tableau, or Power BI while maintaining enterprise-grade analytics infrastructure.
Embedded Analytics: If you’re building analytics features into your product, Superset’s lightweight architecture and API-first design make it ideal for embedding. Connecting it to Fabric lakehouses lets you scale analytics without managing separate databases or ETL pipelines. D23’s managed Superset platform, for instance, specializes in exactly this use case—providing hosted Superset with AI integration and API endpoints for embedding dashboards and self-serve BI into applications.
Data Lakehouse Flexibility: Fabric lakehouses support both structured (via SQL endpoint) and unstructured data (via OneLake). Superset queries the structured layer, but you maintain the flexibility to run Apache Spark notebooks, Python scripts, or other data engineering workloads against the same data.
Reduced Data Movement: Without this integration, you might export data from Fabric to a separate analytics database. Direct connection eliminates that step, reducing latency, storage costs, and the risk of stale data.
Open-Source Advantage: Unlike Power BI (which is tightly integrated with Fabric), Superset remains vendor-neutral. You can query Fabric lakehouses, Snowflake, BigQuery, or any supported database from the same platform. This flexibility matters for organizations with heterogeneous data stacks or those evaluating multiple analytics tools.
Prerequisites and Setup Requirements
Before connecting Superset to a Fabric lakehouse, ensure you have:
Fabric Environment:
- A Microsoft Fabric workspace with appropriate licensing (Fabric capacity or trial)
- At least one lakehouse created in that workspace
- Delta tables populated with data (via Dataflows, pipelines, notebooks, or direct ingestion)
- Access to the lakehouse’s SQL analytics endpoint URL
Superset Deployment:
- A running instance of Apache Superset (self-hosted or managed via a provider like D23)
- Admin access to add database connections
- Network connectivity to the Fabric SQL analytics endpoint (ensure firewall rules permit outbound HTTPS to Microsoft Fabric endpoints)
Authentication Credentials:
- A Microsoft Entra ID (Azure AD) service principal or user account with permission to query the lakehouse
- Service principal credentials (client ID, client secret, and tenant ID) if using service principal authentication
- Or, username and password if using SQL authentication (less common for Fabric)
SQL Driver:
- The Microsoft ODBC Driver for SQL Server or the Python
pyodbclibrary (Superset’s default for SQL Server connections) - Alternatively, the
pymssqllibrary
For detailed guidance on setting up Fabric lakehouse connections, refer to Microsoft Fabric’s official lakehouse connection documentation and the Matillion authentication guide for Fabric Lakehouses, which covers SQL analytics endpoint authentication patterns applicable to any SQL client.
Step 1: Locate Your Fabric Lakehouse SQL Analytics Endpoint
The SQL analytics endpoint is the gateway for external tools to query a Fabric lakehouse. Here’s how to find it:
In Microsoft Fabric:
- Navigate to your workspace and open the lakehouse
- In the top-right corner, click “SQL analytics endpoint”
- In the SQL analytics endpoint view, look for the connection string or server name. It typically follows this pattern:
<workspace-name>-<lakehouse-name>.<region>.fabric.microsoft.com - Copy the full connection string or note the server name and database name (which is the lakehouse name)
Example Connection Details:
- Server:
contoso-analytics-lakehouse.westus2.fabric.microsoft.com - Database:
contoso_analytics_lakehouse - Port:
1433(default SQL Server port)
Keep these details handy—you’ll need them when configuring Superset.
Step 2: Authenticate to Fabric Lakehouses
Fabric supports multiple authentication methods for SQL analytics endpoints. The most common for external tools like Superset are:
Service Principal Authentication (Recommended for Production)
A service principal is a non-human identity in Microsoft Entra ID that can be granted permissions to access Fabric resources. This approach is ideal for automated, unattended scenarios like analytics dashboards.
To set up a service principal:
- In Azure AD, create a new app registration (or use an existing one)
- Generate a client secret and note the client ID and tenant ID
- In your Fabric workspace, grant the service principal access:
- Go to Workspace Settings → Manage Access
- Add the service principal with “Member” or “Admin” role
- In the SQL analytics endpoint, ensure the service principal has query permissions (this is inherited from workspace access)
For detailed steps, consult the Fabric OneLake ingestion documentation, which covers service principal setup in the context of Fabric data access.
User Account Authentication
If you’re connecting from a single user’s machine or a development environment, you can use your Microsoft Entra ID credentials directly. Superset will prompt for username and password or use integrated Windows authentication if available.
Token-Based Authentication
For advanced scenarios, you can obtain an access token from Microsoft Entra ID and pass it to the SQL connection. This is more complex but useful if you’re building custom integrations.
For production deployments, service principal authentication is strongly recommended because it doesn’t rely on individual user credentials and can be rotated independently.
Step 3: Configure the Database Connection in Superset
Once you have your SQL analytics endpoint details and authentication credentials, add the connection to Superset.
Access the Superset Database Configuration:
- Log into Superset as an admin user
- Navigate to Settings → Database Connections (or Data → Databases, depending on your Superset version)
- Click ”+ Database” to add a new connection
Select the Database Type:
Look for “Microsoft SQL Server” or “SQL Server” in the list of supported databases. Superset uses the pyodbc or pymssql driver for SQL Server connections.
Enter Connection Details:
The connection form will ask for:
- Database Name: The name of your lakehouse (e.g.,
contoso_analytics_lakehouse) - Host: The SQL analytics endpoint server name (e.g.,
contoso-analytics-lakehouse.westus2.fabric.microsoft.com) - Port:
1433(default) - Username: For service principal auth, use the client ID; for user auth, use your email or UPN
- Password: The client secret (for service principal) or your password (for user auth)
Connection String (Advanced):
If Superset’s UI doesn’t provide all necessary fields, you can enter a raw connection string. For service principal authentication with pyodbc, use:
mssql+pyodbc://CLIENT_ID:CLIENT_SECRET@SERVER_NAME:1433/DATABASE_NAME?driver=ODBC+Driver+17+for+SQL+Server&Encrypt=yes&TrustServerCertificate=no&Connection+Timeout=30
Replace:
CLIENT_IDwith your service principal’s client IDCLIENT_SECRETwith the client secretSERVER_NAMEwith your Fabric SQL analytics endpointDATABASE_NAMEwith your lakehouse name
For detailed guidance on Superset database connections, refer to the official Superset database documentation, which covers connection string formats and driver configuration for SQL-based databases.
Enable Additional Options:
- Allow DML: Uncheck this unless you need to run INSERT, UPDATE, or DELETE queries (usually not needed for analytics)
- Allow CTAS: Leave unchecked to prevent accidental table creation
- Allow CVAS: Leave unchecked similarly
- Expose in SQL Lab: Check this to allow users to write custom SQL queries against the lakehouse
Test the Connection:
Before saving, click “Test Connection” to verify that Superset can reach the SQL analytics endpoint and authenticate successfully. If the test fails, check:
- Network connectivity (firewall rules, VPN, IP allowlisting)
- Authentication credentials (client ID, secret, or password)
- Service principal workspace permissions
- Lakehouse SQL analytics endpoint status (ensure it’s not paused)
Once the test passes, save the connection.
Step 4: Import Tables and Create Datasets
After the database connection is established, Superset needs to know which tables to expose for dashboard building.
Refresh the Table List:
- Go to Data → Databases
- Click on your Fabric lakehouse connection
- Click “Refresh” or “Sync Tables” to import all available tables from the lakehouse
Superset will query the SQL analytics endpoint’s metadata and list all Delta tables in your lakehouse.
Create Datasets (Optional but Recommended)
Datasets are Superset’s abstraction layer over raw tables. They allow you to:
- Define virtual columns (computed fields)
- Set default aggregations
- Create metrics and dimensions
- Control which columns users can see
- Simplify the dashboard-building experience
To create a dataset:
- Go to Data → Datasets
- Click ”+ Dataset”
- Select your Fabric lakehouse database and choose a table
- Define columns, metrics, and filters as needed
- Save the dataset
Datasets are optional—you can build dashboards directly against raw tables—but they provide governance and consistency, especially in self-serve BI environments.
Step 5: Build Dashboards Against Fabric Data
With tables imported and datasets created, you can now build dashboards using Fabric lakehouse data.
Create a New Dashboard:
- Go to Dashboards → + Dashboard
- Give it a name and description
- Click “Create”
Add Charts:
- Click ”+ Chart” to add a visualization
- Select your Fabric dataset or table
- Choose a chart type (bar, line, table, map, etc.)
- Define dimensions (grouping columns) and metrics (aggregations like SUM, COUNT, AVG)
- Configure filters, sorting, and drill-down behavior
- Save the chart to the dashboard
Use SQL Lab for Complex Queries:
If you need to write custom SQL against Fabric Delta tables, use SQL Lab:
- Go to SQL → SQL Lab
- Select your Fabric lakehouse database
- Write SQL queries against the lakehouse tables
- Execute and visualize results
- Save as a chart or dataset for reuse
Example query against a Fabric lakehouse:
SELECT
DATE_TRUNC('month', order_date) AS month,
product_category,
SUM(revenue) AS total_revenue,
COUNT(DISTINCT order_id) AS order_count
FROM sales_fact
WHERE order_date >= '2024-01-01'
GROUP BY DATE_TRUNC('month', order_date), product_category
ORDER BY month DESC, total_revenue DESC
This query groups sales by month and product category, pulling data directly from Delta tables in your Fabric lakehouse.
Performance Optimization and Best Practices
Connecting Superset to Fabric lakehouses introduces some performance considerations. Here are strategies to optimize:
Index and Partition Delta Tables
Fabric’s SQL analytics endpoint benefits from well-designed Delta tables. Partition large tables by date or region to reduce query scans:
# Example: Partitioned Delta table in Fabric notebook
df.write.format("delta") \
.mode("overwrite") \
.partitionBy("year", "month") \
.save("abfss://lakehouse@onelake.dfs.fabric.com/Tables/sales_fact")
Partitioning allows the SQL analytics endpoint to prune partitions during queries, significantly reducing latency.
Caching and Materialized Views
For frequently accessed aggregations, consider creating materialized views or pre-aggregated tables in Fabric:
CREATE TABLE revenue_by_month AS
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(revenue) AS total_revenue
FROM sales_fact
GROUP BY DATE_TRUNC('month', order_date)
Then query this aggregated table in Superset instead of the raw fact table. This reduces query time and Fabric capacity consumption.
Use Superset’s Caching
Superset includes built-in query result caching. Configure it to cache dashboard queries:
- Go to Settings → Advanced → Cache Configuration
- Set a cache timeout (e.g., 1 hour)
- Enable caching for dashboards
This prevents redundant queries to Fabric when multiple users view the same dashboard.
Limit Query Complexity
Avoid overly complex queries with many joins or subqueries. If a query takes more than 30 seconds to execute, consider:
- Breaking it into multiple smaller charts
- Pre-aggregating data in Fabric
- Using Superset’s drill-down filters to reduce result sets
Monitor Fabric Capacity
Fabric charges for query execution and data movement. Monitor your capacity usage:
- In Fabric, go to Admin Portal → Capacity → Usage
- Identify heavy queries and optimize them
- Consider increasing capacity if usage is consistently high
For architectural guidance on optimizing Fabric lakehouses, see the comprehensive guide to building lakehouse architecture with Microsoft Fabric.
AI-Powered Analytics: Text-to-SQL with Superset
One of the most powerful features of modern Superset deployments is AI-assisted query generation—text-to-SQL. This allows users to ask questions in natural language, and the system generates SQL automatically.
If you’re using a managed Superset platform like D23, text-to-SQL is often pre-configured. For self-hosted Superset, you can enable this feature by integrating with an LLM provider (OpenAI, Anthropic, etc.).
Example Text-to-SQL Workflow:
User: “Show me total revenue by product category for the last quarter”
Superset’s AI layer generates:
SELECT
product_category,
SUM(revenue) AS total_revenue
FROM sales_fact
WHERE order_date >= DATE_TRUNC('quarter', CURRENT_DATE) - INTERVAL '1 quarter'
GROUP BY product_category
ORDER BY total_revenue DESC
This query is then executed against your Fabric lakehouse and results are visualized. Text-to-SQL dramatically accelerates self-serve BI adoption because users don’t need SQL skills to explore data.
For more on how Superset compares to other BI tools in the context of AI integration, see the Preset blog comparing Apache Superset to Power BI, which covers Superset’s strengths in open-source flexibility and extensibility.
Security and Governance Considerations
When connecting Superset to Fabric lakehouses, security is paramount.
Service Principal Isolation
Create a dedicated service principal for Superset’s Fabric connection. This allows you to:
- Control exactly which lakehouse tables Superset can access
- Rotate credentials independently
- Audit queries to Fabric (all queries appear as coming from the service principal)
- Revoke access without affecting other applications
Row-Level Security (RLS)
Fabric supports row-level security on Delta tables. If you have sensitive data (e.g., sales data for specific regions), you can restrict access at the table level:
- In Fabric, configure RLS on the Delta table
- Map Superset users to Fabric security roles
- Queries automatically filter results based on the authenticated user
Note: This requires additional configuration and is more commonly used in Power BI. For Superset, consider applying filters at the dataset level instead.
Encryption and Network Security
- Ensure SSL/TLS encryption for all connections between Superset and Fabric (HTTPS for API calls, encrypted SQL connections)
- Use private endpoints if connecting from a private network
- Enable IP allowlisting on the Fabric SQL analytics endpoint if possible
Audit Logging
Both Superset and Fabric maintain audit logs:
- Superset logs all dashboard views, query executions, and user actions
- Fabric logs all data access and queries
Review these logs periodically to identify unusual activity or unauthorized access attempts.
Troubleshooting Common Connection Issues
“Connection Timeout” Error
This usually indicates a network issue:
- Verify the SQL analytics endpoint URL is correct
- Check that your firewall allows outbound HTTPS to Microsoft Fabric endpoints
- Ensure the lakehouse SQL analytics endpoint is not paused (it may auto-pause after inactivity)
“Authentication Failed” Error
Double-check credentials:
- For service principal: Verify client ID, secret, and tenant ID
- For user auth: Confirm username (email or UPN) and password
- Ensure the service principal or user has workspace access in Fabric
“Table Not Found” Error
This means Superset can connect but can’t see tables:
- Click “Refresh” on the database connection to re-sync tables
- Verify tables exist in the lakehouse SQL analytics endpoint (you can query them directly in Fabric)
- Check that the service principal or user has SELECT permissions on the tables
Slow Query Performance
If dashboards load slowly:
- Check Fabric capacity usage (may be at limit)
- Optimize the underlying Delta table (add partitioning, remove unnecessary columns)
- Enable Superset query caching
- Break complex queries into multiple simpler queries
Integration with D23’s Managed Superset Platform
If you’re evaluating Superset for production use, D23 provides a managed platform that simplifies deployment and adds enterprise features.
D23 offers:
- Pre-configured Superset with AI/LLM integration for text-to-SQL and natural language dashboards
- API-first architecture for embedding analytics into products
- Managed hosting with automatic scaling, backups, and security updates
- Expert data consulting to help design schemas, optimize queries, and architect self-serve BI
- MCP server integration for connecting Superset to various data sources and LLMs
When using D23 with Fabric lakehouses, the Fabric connection is managed through D23’s database configuration interface, but the underlying process is identical to self-hosted Superset. D23 handles scaling, security, and performance optimization, freeing your team to focus on analytics rather than infrastructure.
For teams at scale-ups and mid-market companies, D23 eliminates the operational overhead of managing Superset while providing the flexibility of open-source BI without the per-seat costs of Looker, Tableau, or Power BI. Visit D23’s homepage to learn more about managed Superset and embedded analytics capabilities.
Real-World Example: Building a Sales Dashboard
Let’s walk through a concrete example: building a sales dashboard that pulls data from a Fabric lakehouse.
Scenario: A B2B SaaS company stores customer transactions in a Fabric lakehouse (via daily Dataflows from their transactional database). They want to build a dashboard showing revenue trends, top customers, and regional performance.
Step 1: Prepare Data in Fabric
In Fabric, they have Delta tables:
customers(customer_id, customer_name, region, industry)orders(order_id, customer_id, order_date, amount, product_category)
Step 2: Connect Superset to Fabric
Following the steps above, they add their Fabric lakehouse as a database connection in Superset.
Step 3: Create Datasets
They create a dataset combining orders and customers via a join:
SELECT
o.order_id,
o.order_date,
o.amount,
o.product_category,
c.customer_name,
c.region,
c.industry
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
They define metrics: total_revenue (SUM of amount), order_count (COUNT of order_id), avg_order_value (AVG of amount).
Step 4: Build Charts
- Revenue by Month: Line chart, X-axis = order_date (truncated to month), Y-axis = total_revenue
- Top 10 Customers: Bar chart, X-axis = customer_name, Y-axis = total_revenue, sorted descending
- Revenue by Region: Pie chart, showing proportion of revenue by region
- Product Category Performance: Table showing product_category, total_revenue, order_count, avg_order_value
Step 5: Assemble Dashboard
They create a dashboard called “Sales Overview” and add all four charts. They add filters at the dashboard level:
- Date range filter (order_date)
- Region filter (multi-select)
- Industry filter (multi-select)
Users can now slice the data dynamically without touching SQL.
Step 6: Enable Embedding (Optional)
If the company wants to embed this dashboard in their product (e.g., showing customer-specific revenue in their customer portal), they use Superset’s embedding API. With D23’s managed platform, this is simplified—the platform provides pre-built embedding SDKs and handles authentication.
Advanced: Using Apache Spark with Superset and Fabric
For data engineering teams, Fabric notebooks allow you to run Apache Spark code directly against lakehouse data. While Superset can’t directly execute Spark code, you can use notebooks to create pre-aggregated tables that Superset queries:
Example Spark Notebook in Fabric:
# Read Delta table from lakehouse
df = spark.read.table("sales_fact")
# Complex transformation
result = df.groupBy("product_category", "region") \
.agg({"revenue": "sum", "order_id": "count"}) \
.withColumnRenamed("sum(revenue)", "total_revenue") \
.withColumnRenamed("count(order_id)", "order_count")
# Write aggregated result back to lakehouse
result.write.format("delta").mode("overwrite").option("mergeSchema", "true").saveAsTable("sales_summary")
Then in Superset, query sales_summary directly. This pattern allows data engineers to build complex transformations while analytics teams use Superset for visualization.
For more on Apache Superset’s capabilities and setup, see the dbt blog post explaining Apache Superset features and database connectivity.
Conclusion
Connecting Apache Superset to Microsoft Fabric lakehouses creates a modern, cost-effective analytics stack. Superset’s lightweight architecture and open-source flexibility pair well with Fabric’s lakehouse model, giving you a single source of truth for data while maintaining the agility to build dashboards, embed analytics, and empower self-serve BI.
The integration is straightforward: configure SQL analytics endpoint access, add the connection to Superset, import tables, and build dashboards. Performance and security follow standard practices—partition your Delta tables, use service principals for authentication, enable caching, and monitor Fabric capacity.
For teams looking to avoid the per-seat costs of Looker, Tableau, or Power BI while maintaining enterprise-grade analytics, this approach delivers significant value. If managing Superset infrastructure feels like overhead, managed platforms like D23 handle deployment, scaling, and AI integration, letting you focus on analytics rather than operations.
Whether you’re a data leader at a scale-up, an engineering team embedding analytics into your product, or a CTO evaluating open-source BI alternatives, Superset + Fabric lakehouses is a compelling, flexible, and cost-effective choice.