Multi-Tenant Apache Superset: Patterns That Work at Scale
Deep dive into multi-tenant Apache Superset architecture. Learn database isolation, schema patterns, RLS, and scaling strategies for production deployments.
Multi-Tenant Apache Superset: Patterns That Work at Scale
Running Apache Superset at scale for multiple customers, business units, or product teams means one thing: isolation. Not just logical separation in a dashboard folder, but genuine data isolation where one tenant’s query can’t leak into another’s, where performance degradation in one tenant doesn’t cascade across your infrastructure, and where compliance requirements—GDPR, HIPAA, SOC 2—don’t become architectural nightmares.
This is the reality D23 encounters with every managed Superset deployment. We’ve helped data teams at scale-ups, mid-market companies, and PE portfolio firms build multi-tenant analytics platforms that don’t collapse under the weight of their own success. The patterns that work aren’t always obvious, and the wrong choices early on can force painful refactors later.
This guide walks through the architectural decisions that matter: how to isolate data, which tenancy patterns scale, where RLS fits, and when to add a semantic layer. We’ll skip the marketing speak and focus on what actually works in production.
Understanding Multi-Tenancy in Analytics Platforms
Multi-tenancy in Apache Superset means multiple customers, business units, or product teams share the same Superset instance while remaining completely isolated from one another. This differs sharply from single-tenant deployments, where each customer gets their own Superset instance.
The appeal is straightforward: one Superset cluster serves dozens or hundreds of tenants, reducing operational overhead, infrastructure costs, and complexity. But the tradeoff is real. You’re trading deployment simplicity for architectural complexity. When done poorly, multi-tenancy becomes a liability. When done right, it’s the only way to scale.
At its core, multi-tenancy in Superset involves three layers of isolation:
Application-level isolation happens inside Superset itself. Users, roles, and permissions determine who sees what dashboard, chart, or dataset. This is native to Superset and works well for small teams but breaks down when you need hard data isolation.
Database-level isolation physically separates data at the database layer using separate databases, schemas, or tables per tenant. This is where most of the heavy lifting happens.
Query-level isolation ensures that even if a tenant somehow accesses the underlying database, they can only query their own data. This is where row-level security (RLS) and query rewriting come in.
Each layer has trade-offs. The more isolation you add, the more operational complexity you inherit. The less isolation you add, the greater the risk of data leakage or performance issues.
The Three Tenancy Patterns and When to Use Them
There are three primary patterns for isolating tenant data in Superset. Your choice depends on your tenant count, data volume, compliance requirements, and operational tolerance.
Pattern 1: Separate Databases Per Tenant
This is the nuclear option for isolation. Each tenant gets their own complete database. Tenant A connects to postgres-tenant-a, Tenant B to postgres-tenant-b, and so on.
Advantages:
- Complete isolation. One tenant’s database can’t interfere with another’s.
- Compliance is straightforward. You can encrypt, backup, and audit each database independently.
- Performance is predictable. Tenant A’s query spike doesn’t affect Tenant B.
- Scaling is linear. Add a new tenant, add a new database.
Disadvantages:
- Operational overhead explodes. You’re now managing dozens or hundreds of databases instead of one.
- Cost per tenant is high. Each database needs its own compute, storage, and backup.
- Superset connection management becomes complex. You need dynamic connection pooling and routing logic.
- Data sharing across tenants becomes impossible without ETL.
When to use it: Separate databases work best when you have fewer than 20 tenants, each with significant data volume, high compliance requirements, or a need for complete independence. Private equity firms standardizing analytics across portfolio companies often use this pattern because each portfolio company needs genuine separation.
Implementing this requires Superset to dynamically select the correct database connection based on the logged-in user. You’ll use Superset’s database connection management to register each tenant’s database, then write custom middleware or use Superset’s role-based access control to route queries to the correct connection.
User logs in as tenant_a_user
↓
Superset middleware identifies tenant_a
↓
Queries route to postgres-tenant-a connection
↓
Data isolation is guaranteed
The operational cost is real. You need infrastructure-as-code (Terraform, CloudFormation) to spin up databases, monitoring for each database, backup strategies per database, and security group management. But if your tenants are large enough to justify it, the isolation is bulletproof.
Pattern 2: Shared Database with Separate Schemas
This is the middle ground. All tenants share one database, but each gets their own schema. Tenant A’s data lives in schema_a, Tenant B’s in schema_b.
Advantages:
- Much simpler than separate databases. One database to manage, monitor, and back up.
- Still provides strong isolation. Schema-level permissions prevent cross-tenant access.
- Cheaper than separate databases. One database cluster instead of many.
- Easier to add new tenants. Create a schema, grant permissions, done.
- Data sharing is possible. You can create views that aggregate across schemas if needed.
Disadvantages:
- Single database is a shared resource. A runaway query in one tenant’s schema can affect others.
- Scaling is harder. Eventually, one database becomes a bottleneck.
- Backup and recovery are less granular. You back up the entire database, not per-tenant.
- Schema management becomes a DevOps task. You need automation to create and manage schemas.
When to use it: Schema-per-tenant works best when you have 10-100 tenants, each with moderate data volume, and you can tolerate some shared resource contention. Most SaaS analytics platforms that aren’t massive use this pattern.
Implementation is cleaner than separate databases. In Superset, you register one database connection, but your datasets and tables are scoped to specific schemas. You then use Superset’s RLS features to ensure users can only access their schema.
Database: analytics_prod
├── schema_tenant_a (tables: users, orders, events)
├── schema_tenant_b (tables: users, orders, events)
└── schema_tenant_c (tables: users, orders, events)
Each tenant sees identical table structures but different data. This is where Superset’s dataset management shines. You can create a template dataset for users, then duplicate it per schema, or use dynamic dataset creation.
Pattern 3: Shared Database and Schema with Row-Level Security
This is the most resource-efficient pattern. All tenants share one database and one schema. Isolation happens at the row level using RLS.
Advantages:
- Minimal operational overhead. One database, one schema, one set of tables.
- Cheapest per tenant. Maximum density.
- Simplest to add new tenants. Just add new rows to your data.
- Easiest to share data across tenants. Aggregate views work naturally.
Disadvantages:
- RLS is complex to implement correctly. One mistake leaks data.
- Performance requires careful query optimization. RLS filters can slow queries significantly.
- Debugging is harder. You need to trace which RLS filters applied to which query.
- Compliance is trickier. You can’t easily isolate one tenant’s data for audit or backup.
When to use it: RLS-based tenancy works best when you have many tenants (100+), each with small to moderate data volumes, and you can invest in proper RLS implementation and testing. SaaS analytics tools with thousands of small customers use this pattern.
In Superset, RLS is implemented through the Row Level Security feature, which filters rows based on user attributes. When Tenant A’s user queries the orders table, Superset automatically adds a WHERE tenant_id = 'A' filter. This happens transparently to the user.
Table: orders
├── tenant_id | user_id | order_id | amount
├── A | 1 | 101 | 500
├── A | 2 | 102 | 300
├── B | 3 | 103 | 1000
└── B | 4 | 104 | 200
Tenant A user queries "SELECT * FROM orders"
↓
Superset applies RLS: "SELECT * FROM orders WHERE tenant_id = 'A'"
↓
User sees only rows 101 and 102
The risk is high if RLS is misconfigured. A missing filter, a poorly written condition, or a cached query can leak data. This is why RLS requires rigorous testing and monitoring.
Database and Schema Isolation in Practice
Assuming you’re using the schema-per-tenant pattern (the most common for mid-market), here’s how to structure your databases and schemas for multi-tenancy.
Schema Naming Conventions
Consistency matters. Use a naming convention that makes tenant identity obvious and enables automation.
schema_tenant_{tenant_id}
schema_acme_corp
schema_customer_12345
Or if you prefer to separate data from metadata:
data.tenant_a
data.tenant_b
metadata.tenant_a
metadata.tenant_b
The convention you choose should be automatable. Your provisioning scripts need to create schemas, grant permissions, and set up backups based on this naming pattern.
Permission and Grant Strategy
Once schemas are created, you need to grant permissions carefully. The principle is least privilege: each Superset connection should only access its own schema.
Create a database user per tenant:
CREATE USER superset_tenant_a WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE analytics_prod TO superset_tenant_a;
GRANT USAGE ON SCHEMA schema_tenant_a TO superset_tenant_a;
GRANT SELECT ON ALL TABLES IN SCHEMA schema_tenant_a TO superset_tenant_a;
Then in Superset, register a separate database connection for each tenant using that user. This way, even if a malicious user somehow gains access to the connection string, they can only query their own schema.
Alternatively, if you want to minimize the number of database connections, use a shared user but apply RLS at the table level:
ALTER TABLE schema_tenant_a.orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON schema_tenant_a.orders
FOR SELECT USING (tenant_id = current_user_id);
This is more complex but reduces connection overhead.
Data Model Consistency
When using schema-per-tenant, each schema should have identical table structures. This lets you use Superset datasets as templates.
Create a master schema with your canonical data model:
schema_template/
├── users
├── orders
├── events
└── products
Then provision new tenants by cloning this schema:
CREATE SCHEMA schema_tenant_new LIKE schema_template;
Or use a migration tool like Flyway or Liquibase to version-control your schema and apply migrations consistently across all tenant schemas.
Row-Level Security: The Right Way
RLS is powerful but dangerous. When implemented carelessly, it’s a data leakage vector. When implemented well, it’s transparent to users and bulletproof.
How RLS Works in Superset
Superset’s RLS feature filters query results based on user attributes. You define rules that say “this user can only see rows where tenant_id = their_tenant_id.”
In Superset’s UI, you navigate to Security → Row Level Security and create rules:
Table: orders
Clause: tenant_id = {TENANT_ID}
Role: tenant_user
Where {TENANT_ID} is a placeholder that Superset replaces with the user’s actual tenant ID at query time. The user’s tenant ID comes from their user attributes in Superset.
Setting User Attributes for RLS
For RLS to work, each user needs attributes that identify their tenant. This is typically done via LDAP, SAML, or custom authentication.
When a user logs in via SAML, their tenant ID is extracted from the SAML assertion:
<saml:Attribute Name="tenant_id" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic">
<saml:AttributeValue>tenant_a</saml:AttributeValue>
</saml:Attribute>
Superset’s authentication backend parses this and stores it as a user attribute. Then, when the user queries a table with RLS enabled, Superset injects the filter.
RLS Filter Patterns
Simple RLS:
WHERE tenant_id = {TENANT_ID}
Multi-level tenancy (e.g., enterprise customer with multiple departments):
WHERE tenant_id = {TENANT_ID} AND department_id = {DEPARTMENT_ID}
Hierarchical access (managers see their team’s data):
WHERE tenant_id = {TENANT_ID} AND (owner_id = {USER_ID} OR manager_id = {USER_ID})
Time-based access (view only recent data):
WHERE tenant_id = {TENANT_ID} AND created_at > NOW() - INTERVAL '90 days'
The key is that RLS filters are applied transparently. The user doesn’t write them; Superset does.
RLS Testing and Validation
This is non-negotiable. RLS misconfiguration is a critical security issue.
Test matrix:
- User A queries Tenant A data → should see all rows
- User A queries Tenant B data → should see zero rows
- User A queries aggregated data → should only aggregate Tenant A’s rows
- User A queries via API → should only see Tenant A’s data
- User A exports data → should only export Tenant A’s data
Write automated tests that verify each scenario. Use Superset’s API to programmatically query as different users and validate results.
# Test: User from Tenant A cannot see Tenant B data
response = superset_client.query(
dataset_id=orders_dataset,
user='tenant_a_user',
sql='SELECT COUNT(*) FROM orders WHERE tenant_id = "tenant_b"'
)
assert response.count == 0, "RLS filter failed"
Scaling Considerations: From Dozens to Thousands of Tenants
As your tenant count grows, new challenges emerge. The patterns that work for 10 tenants break at 100. The patterns that work at 100 break at 1,000.
Connection Pooling and Management
If you’re using separate database users per tenant (separate databases or schema-per-tenant with per-user access), you need robust connection pooling.
Each Superset worker maintains a connection pool to the database. With 100 tenants, that’s potentially 100 connections per worker. With 10 workers, that’s 1,000 connections to your database. Most databases have connection limits (PostgreSQL defaults to 100).
Use PgBouncer or similar connection pooling middleware to multiplex connections:
Superset Worker 1 → PgBouncer → PostgreSQL (1 connection)
Superset Worker 2 → PgBouncer → PostgreSQL (1 connection)
Superset Worker 3 → PgBouncer → PostgreSQL (1 connection)
PgBouncer maintains a pool of connections to PostgreSQL and reuses them across Superset requests. This reduces the load on your database.
Query Caching and Invalidation
Caching is critical for multi-tenant performance. But cached data must be tenant-aware.
Superset’s caching layer (Redis by default) caches query results. But if Tenant A’s query and Tenant B’s query are structurally identical, they might hit the same cache key.
Ensure your cache keys include the tenant ID:
cache_key = f"query_{dataset_id}_{query_hash}_{tenant_id}"
Also, when a tenant’s data changes, you need to invalidate their cache entries without affecting other tenants. This is typically done via ETL hooks or Superset’s cache invalidation API.
Monitoring and Observability
With multiple tenants, you need visibility into which tenant is consuming resources.
Instrument your Superset instance to track:
- Queries per tenant
- Query latency per tenant
- Cache hit rate per tenant
- Data volume per tenant
- API calls per tenant
Use these metrics to identify noisy neighbors (one tenant consuming disproportionate resources) and enforce rate limits or quota.
@app.before_request
def check_tenant_quota():
tenant_id = get_current_tenant()
queries_today = redis.get(f"queries_{tenant_id}_{date.today()}")
if queries_today > TENANT_QUERY_LIMIT:
return {"error": "Query quota exceeded"}, 429
Semantic Layers and Multi-Tenancy
As your Superset instance grows, maintaining consistent metrics and dimensions across tenants becomes difficult. This is where semantic layers come in.
A semantic layer is a centralized definition of metrics, dimensions, and business logic that sits between your raw data and your BI tool. Instead of each dashboard defining “revenue” differently, the semantic layer defines it once.
Tools like dbt provide semantic layer functionality through dbt Semantic Layer, which integrates with Superset via the MCP (Model Context Protocol) server. This allows Superset to query your dbt models directly, ensuring consistency.
In a multi-tenant context, your semantic layer should include tenant-aware models:
-- dbt model: fct_orders.sql
SELECT
order_id,
tenant_id,
user_id,
amount,
created_at
FROM {{ source('raw', 'orders') }}
WHERE tenant_id = '{{ env_var("TENANT_ID") }}'
When you query this model from Superset, dbt injects the correct tenant ID automatically.
API-First Tenancy and Embedded Analytics
If you’re embedding Superset dashboards into your product (a common pattern for product teams and SaaS companies), you need API-first tenancy.
Superset’s API allows you to programmatically create dashboards, charts, and datasets. In a multi-tenant context, you can use the API to provision each tenant’s analytics environment automatically.
When a new customer signs up:
- Create a schema for them in your database
- Use Superset’s API to create datasets pointing to their schema
- Use Superset’s API to create template dashboards
- Generate a guest token for embedded access
# Provision new tenant
response = superset_client.create_database(
database_name=f"tenant_{tenant_id}",
sqlalchemy_uri=f"postgresql://user:pass@localhost/analytics_prod?schema=schema_{tenant_id}"
)
# Create dataset
response = superset_client.create_dataset(
database_id=response['id'],
table_name='orders',
schema='schema_' + tenant_id
)
# Generate guest token for embedding
token = superset_client.create_guest_token(
user={
"username": f"guest_{tenant_id}",
"attributes": {"tenant_id": tenant_id}
}
)
Guest tokens allow temporary, limited access to Superset dashboards without requiring user accounts. This is how you embed dashboards in your product without exposing your Superset instance.
AI and Text-to-SQL in Multi-Tenant Contexts
AI-powered analytics (text-to-SQL, natural language queries) adds complexity to multi-tenancy. When a user asks “show me revenue by region,” the AI needs to understand which tables are relevant, which columns exist, and which tenant owns the data.
D23’s MCP server for analytics solves this by providing tenant-aware context to AI models. The MCP server exposes your Superset instance’s metadata (databases, schemas, tables, columns) in a way that LLMs can understand, while automatically filtering for the current tenant.
When Tenant A’s user asks a natural language question, the MCP server only exposes Tenant A’s tables and columns to the AI model. The AI generates SQL, and Superset executes it with RLS filters applied.
This requires:
- Tenant-aware metadata exposure (MCP server filters metadata by tenant)
- Context injection (AI model knows the current tenant)
- RLS enforcement (even AI-generated queries are filtered)
Compliance and Multi-Tenancy
Multi-tenancy introduces compliance challenges. GDPR, HIPAA, and SOC 2 all require data isolation and audit trails.
GDPR and Data Residency
If your tenants are in different regions, you might need to store their data in region-specific databases. This is especially true for EU customers (GDPR) or healthcare organizations (HIPAA).
You can extend the schema-per-tenant pattern to include region-per-database:
PostgreSQL (US East)
├── schema_tenant_a_us
├── schema_tenant_b_us
PostgreSQL (EU Frankfurt)
├── schema_tenant_c_eu
├── schema_tenant_d_eu
Superset’s database connection management handles this. Register region-specific connections and route queries based on tenant location.
Audit Logging
Multi-tenancy requires comprehensive audit logging. Every query, export, and dashboard view should be logged with the tenant ID.
Superset has built-in query logging. Enable it and ensure logs include the tenant ID:
{
"timestamp": "2024-01-15T10:30:00Z",
"tenant_id": "tenant_a",
"user_id": "user_123",
"query": "SELECT * FROM orders WHERE...",
"latency_ms": 245,
"rows_returned": 1000
}
Store these logs in a separate, immutable data store (e.g., S3 with versioning disabled, or a dedicated audit database).
Encryption and Data Protection
With multiple tenants, encryption strategies matter.
Encryption in transit: Use TLS for all connections between Superset and databases, and between clients and Superset. This is standard and non-negotiable.
Encryption at rest: If your database supports it (PostgreSQL with pgcrypto, AWS RDS with encryption), enable it. This protects data if someone gains unauthorized access to storage.
Application-level encryption: For highly sensitive data, encrypt at the application level before storing. This adds complexity but provides an additional layer of protection.
Key management: Use a key management service (AWS KMS, HashiCorp Vault) to manage encryption keys. Never hardcode keys in configuration.
Common Pitfalls and How to Avoid Them
Pitfall 1: Insufficient RLS Testing
Problem: RLS is configured but not thoroughly tested. A configuration error leaks data.
Solution: Write automated tests for every RLS rule. Test positive cases (user sees their data), negative cases (user can’t see other tenants’ data), and edge cases (users with multiple roles, hierarchical access).
Pitfall 2: Shared Cache Keys Across Tenants
Problem: Tenant A’s cached query is returned to Tenant B because cache keys don’t include tenant ID.
Solution: Always include tenant ID in cache keys. Review Superset’s caching configuration and customize cache key generation if needed.
Pitfall 3: Connection Pool Exhaustion
Problem: With many tenants and many Superset workers, database connections are exhausted, causing queries to hang.
Solution: Use connection pooling middleware (PgBouncer, Pgpool). Monitor connection usage and set appropriate pool sizes. Implement connection timeouts and circuit breakers.
Pitfall 4: Noisy Neighbor Problem
Problem: One tenant runs expensive queries that slow down the entire Superset instance, affecting all other tenants.
Solution: Implement query timeouts, resource quotas per tenant, and query queue management. Use Superset’s query execution limits. Monitor slow queries and identify problematic tenants.
Pitfall 5: Inconsistent Data Models Across Tenants
Problem: Each tenant’s schema has slightly different table structures, making it impossible to use shared datasets or templates.
Solution: Use version-controlled schema definitions (Flyway, Liquibase) to ensure all tenant schemas are identical. Automate schema provisioning.
Operational Patterns for Multi-Tenant Superset
Beyond architecture, operations matter. Here’s how successful teams run multi-tenant Superset in production.
Provisioning New Tenants
Automation is essential. When a new tenant signs up, you need to:
- Create database schema or database
- Set up permissions and grants
- Create Superset datasets
- Create template dashboards
- Configure RLS rules
- Set up monitoring and alerting
Write Terraform or CloudFormation templates that automate all of this:
resource "aws_rds_db_instance" "tenant_db" {
identifier = "tenant-${var.tenant_id}"
engine = "postgres"
# ...
}
resource "superset_database" "tenant" {
database_name = "tenant-${var.tenant_id}"
sqlalchemy_uri = "postgresql://.../${aws_rds_db_instance.tenant_db.id}"
}
Backup and Disaster Recovery
With multiple tenants, backup strategy is critical. You need to back up data per tenant and be able to restore individual tenants without affecting others.
If using separate databases per tenant, this is straightforward. Each database has its own backup schedule.
If using schema-per-tenant, you need to back up individual schemas:
pg_dump --schema=schema_tenant_a analytics_prod > tenant_a_backup.sql
pg_dump --schema=schema_tenant_b analytics_prod > tenant_b_backup.sql
Test restore procedures regularly. A backup is only useful if you can restore from it.
Updates and Migrations
Updating Superset or changing your database schema affects all tenants. Plan carefully.
For Superset upgrades, test in a staging environment with a subset of tenants. Use blue-green deployments to minimize downtime.
For database schema changes (adding columns, new tables), apply changes to all tenant schemas in a controlled manner. Use migration tools to ensure consistency.
flyway migrate -schemas=schema_tenant_a,schema_tenant_b,schema_tenant_c
Choosing Your Tenancy Pattern: A Decision Framework
Which pattern should you choose? Here’s a framework:
Choose separate databases if:
- Fewer than 20 tenants
- Each tenant has significant data volume (>100GB)
- Strong compliance requirements (HIPAA, PCI-DSS)
- Tenants need complete independence
- You can afford the operational overhead
Examples: Private equity firms managing portfolio companies, large enterprise customers with their own infrastructure requirements.
Choose schema-per-tenant if:
- 10-100 tenants
- Moderate data volume per tenant (1GB-100GB)
- Standard compliance requirements (GDPR, SOC 2)
- Some data sharing across tenants is acceptable
- You want to balance isolation and operational simplicity
Examples: Most SaaS analytics platforms, mid-market companies with multiple business units, venture capital firms tracking portfolio companies.
Choose RLS if:
- 100+ tenants
- Small to moderate data volume per tenant (<1GB)
- Minimal compliance requirements
- Maximum resource efficiency is critical
- You have strong engineering practices around RLS testing
Examples: High-volume SaaS platforms with thousands of small customers, consumer analytics platforms.
In practice, many organizations use a hybrid approach. Core tenants get separate databases. Mid-tier tenants share a database with schemas. Small tenants use RLS. This maximizes both isolation and efficiency.
Real-World Example: Multi-Tenant Analytics for a PE Firm
Consider a private equity firm managing 15 portfolio companies. Each company needs its own analytics platform but the firm wants a single Superset instance for operational simplicity.
The team chooses schema-per-tenant because:
- 15 tenants is manageable
- Each portfolio company has moderate data volume (5-50GB)
- SOC 2 compliance is required
- Some cross-company analysis is valuable
Architecture:
PostgreSQL analytics_prod
├── schema_portfolio_company_1
│ ├── tables: users, orders, events, products
│ └── RLS: tenant_id = 'company_1'
├── schema_portfolio_company_2
│ ├── tables: users, orders, events, products
│ └── RLS: tenant_id = 'company_2'
└── ... (13 more schemas)
Superset Instance
├── Database connection: analytics_prod
├── Datasets: users, orders, events, products (per schema)
├── Dashboards: KPI Dashboard, Cohort Analysis, etc.
└── RLS Rules: Filter by tenant_id
Authentication
├── SAML (company's identity provider)
├── User attributes: tenant_id, department
└── RLS filters applied at query time
Provisioning a new portfolio company:
- Create schema_portfolio_company_16 in PostgreSQL
- Run migration scripts to create tables
- Load sample data
- Create Superset datasets pointing to new schema
- Assign RLS rules to new datasets
- Create dashboard from template
- Add company’s users to Superset with correct SAML attributes
Monitoring:
- Query count per company (identify heavy users)
- Query latency per company (identify slow queries)
- Data volume per company (track growth)
- Cache hit rate (optimize performance)
Compliance:
- Audit logs include company ID
- Backups per schema (can restore individual companies)
- Encryption at rest (PostgreSQL)
- TLS for all connections
This setup scales to 50+ portfolio companies before needing significant changes. At that point, the team might move to separate database clusters per region or implement more sophisticated query routing.
Conclusion: Multi-Tenancy as a Strategic Choice
Multi-tenant Apache Superset isn’t an afterthought. It’s a strategic architectural decision that affects everything: your database design, your security model, your operational overhead, and your ability to scale.
The patterns that work—separate databases for isolation, schemas for balance, RLS for density—each have clear trade-offs. There’s no one-size-fits-all answer. Your choice depends on your tenant count, data volume, compliance requirements, and operational maturity.
What matters most is making the choice intentionally and implementing it consistently. A well-designed multi-tenant Superset instance can serve dozens of tenants with minimal operational overhead. A poorly designed one becomes a security and performance liability.
If you’re building multi-tenant analytics at scale, start with D23, which handles the architectural complexity for you. D23’s managed Superset platform includes multi-tenancy by design, with built-in isolation, RLS management, and AI-powered analytics through MCP integration.
But whether you build it yourself or use a managed service, understand the patterns. Test thoroughly. Monitor relentlessly. And never assume your RLS configuration is secure until you’ve proven it.