Migrating from Looker on AWS to Apache Superset
Complete guide to migrating from Looker on AWS to Apache Superset. Learn architecture, data mapping, and cost savings without vendor lock-in.
Understanding the Migration Landscape
Moving from Looker to Apache Superset represents a significant shift in how your organization approaches business intelligence. Unlike a simple platform upgrade, this migration involves rethinking your BI architecture, reconnecting data sources, and redefining user workflows. For teams running Looker on AWS, the good news is that Superset’s architecture aligns well with cloud-native deployments, and the underlying data warehouse connections remain largely unchanged.
The primary motivation for this migration typically centers on three factors: cost reduction, operational independence, and architectural flexibility. Looker’s licensing model charges per user seat and enforces platform overhead that scales with your organization. Apache Superset, by contrast, operates on an open-source model where you control deployment, scaling, and licensing entirely. When you host Superset on AWS using the same warehouse as your current Looker instance, you’re essentially replacing the visualization and querying layer while keeping your data infrastructure intact.
This guide walks through the complete migration process, from assessment through cutover, with practical guidance specific to AWS deployments. We’ll cover data mapping, dashboard recreation, user onboarding, and how to leverage D23’s managed Apache Superset platform to eliminate operational burden if you prefer not to manage infrastructure directly.
Assessing Your Current Looker Setup
Before moving a single dashboard, you need a complete inventory of your Looker environment. This assessment determines migration complexity, timeline, and resource requirements.
Documenting Your Looker Instance
Start by cataloging everything in your Looker instance:
- Total dashboard count and average complexity (number of tiles, filters, drill-down paths)
- Explore definitions and their underlying views and dimensions
- Scheduled reports and alerts that run on automation
- User counts broken down by role (viewers, explorers, developers)
- Custom code including derived tables, liquid parameters, and custom fields
- Access controls and row-level security (RLS) rules
- Connected data sources including databases, APIs, and external systems
- Custom visualizations and any Looker marketplace extensions
This inventory serves two purposes: it quantifies the work ahead, and it identifies which assets you’ll need to recreate versus which you can retire. Many organizations discover that 20-30% of their Looker dashboards see minimal traffic and can be archived rather than migrated.
Evaluating Data Warehouse Compatibility
Your data warehouse connection is the foundation of both Looker and Superset. Verify that Apache Superset supports your warehouse natively. Superset maintains official drivers for PostgreSQL, MySQL, Snowflake, BigQuery, Redshift, Athena, and dozens of others. If you’re running Looker against Redshift, PostgreSQL, or Snowflake on AWS, you’re in excellent shape—Superset has battle-tested drivers for all three.
Test connectivity from a test Superset instance to your warehouse using the same credentials and network configuration. This validates that your AWS security groups, VPC routing, and IAM policies will work with Superset before you commit to the migration.
Understanding Your LookML Layer
LookML is Looker’s semantic modeling language. It defines how raw tables map to business dimensions, measures, and explores. Superset doesn’t have a direct LookML equivalent, but it offers multiple paths to replicate this functionality:
- Direct SQL in Superset charts: Write SQL queries directly, including CTEs and complex joins
- Database views: Create materialized or standard views in your warehouse that encapsulate LookML logic
- Semantic layer integration: Use Cube as an open-source semantic layer between Superset and your warehouse, which provides a middle ground between raw SQL and full LookML
- Superset’s native dataset abstraction: Build Superset datasets that function similarly to LookML explores
For most migrations, a combination of database views and Superset datasets provides the best balance of maintainability and performance. You’re not rewriting LookML one-to-one; you’re translating business logic into SQL views and Superset dataset definitions.
Designing Your Superset Architecture on AWS
Superset’s architecture differs from Looker’s, particularly in how it separates the metadata database from the application server. Understanding this design helps you plan a deployment that scales with your needs.
Core Architecture Components
A production Superset deployment on AWS consists of:
- Metadata database: A PostgreSQL or MySQL instance (typically RDS) that stores dashboard definitions, user accounts, dataset configurations, and query cache
- Application servers: Stateless Superset containers running on ECS, EKS, or similar, handling the web UI and API requests
- Asynchronous task queue: Celery workers (with Redis backend) that execute long-running queries in the background
- Data warehouse connection: Your existing Redshift, RDS PostgreSQL, Snowflake, or Athena instance
- Object storage: S3 for storing exported reports, cached query results, and backup metadata
This architecture is more distributed than Looker’s, but it’s also more flexible. You can scale application servers independently from query processing, and you can deploy Superset across multiple availability zones for high availability.
Choosing a Deployment Model
You have three main options for running Superset on AWS:
Self-managed on ECS/EKS: You build and maintain Docker images, manage RDS instances, configure Celery workers, and handle scaling policies. This gives you complete control but requires DevOps expertise. The official guide for deploying Superset on AWS ECS with Terraform provides a solid starting point, including custom image building and ECR integration.
Self-managed on EC2: You run Superset directly on EC2 instances with systemd or similar process managers. This is simpler than containerized deployment but less cloud-native and harder to scale horizontally.
Managed Superset platform: Services like D23 handle infrastructure, scaling, security patching, and backups, letting your team focus on data work rather than platform operations. This is particularly valuable if your team lacks DevOps bandwidth or wants to avoid the operational overhead of managing a BI platform.
For this migration guide, we’ll assume a self-managed ECS deployment, as it represents the most common choice for teams with existing AWS infrastructure. However, the data migration and dashboard recreation steps apply regardless of deployment model.
Networking and Security Considerations
When deploying Superset on AWS, ensure:
- VPC placement: Run Superset in the same VPC as your data warehouse (or use VPC peering) to minimize latency and avoid data exfiltration concerns
- Security groups: Allow inbound HTTPS traffic on port 443 from your users’ networks and outbound database access on your warehouse’s port (typically 5432 for PostgreSQL, 3306 for MySQL)
- IAM roles: Grant Superset’s ECS task role permissions to read from RDS (metadata database) and your data warehouse
- Encryption in transit: Use TLS for all connections, including database connections and internal service communication
- Secrets management: Store database credentials in AWS Secrets Manager, not in environment variables or config files
Preparing Your Data Sources and Datasets
The transition from Looker’s semantic layer to Superset’s dataset model requires careful planning. This is where you decide how much of your LookML logic to preserve versus simplify.
Mapping Looker Explores to Superset Datasets
In Looker, an “explore” is a starting point for data analysis, combining a base view with related dimensions and measures. In Superset, the equivalent is a dataset—a SQL query (or table reference) plus a set of defined columns with metadata like data type, aggregation options, and formatting.
For each Looker explore you’re migrating:
- Identify the base view: This is typically a table or materialized view in your warehouse
- List all dimensions and measures: Document their names, data types, and any custom SQL or formatting
- Note relationships and joins: Looker explores often combine multiple views; you’ll need to express these as SQL joins in your Superset dataset query
- Check for derived tables and custom fields: These may need to be recreated as database views or Superset-level calculated columns
For example, if Looker has an explore called “orders” based on a view that joins orders, customers, and products tables, you’d create a Superset dataset with a SQL query like:
SELECT
o.order_id,
o.order_date,
o.total_amount,
c.customer_name,
c.customer_segment,
p.product_category,
p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
Then, in Superset, you’d define columns for each field, set appropriate data types, and configure which columns are filterable, groupable, or aggregatable.
Creating Database Views for Reusable Logic
If your LookML includes complex derived tables or frequently-reused calculations, create database views in your warehouse rather than embedding them in every Superset query. This approach:
- Centralizes business logic in one place
- Makes it easier to update calculations across multiple dashboards
- Improves query performance by pushing computation to the warehouse
- Simplifies Superset dataset definitions
For instance, if Looker has a derived table that calculates monthly cohort retention, create a materialized view in Redshift or PostgreSQL:
CREATE MATERIALIZED VIEW cohort_retention AS
SELECT
DATE_TRUNC('month', first_order_date) AS cohort_month,
DATE_TRUNC('month', order_date) AS order_month,
COUNT(DISTINCT customer_id) AS customers
FROM customer_orders
GROUP BY 1, 2;
Then reference this view directly in your Superset dataset, keeping the definition simple and maintainable.
Handling Row-Level Security (RLS)
Looker’s RLS capabilities let you restrict data access based on user attributes. Superset supports RLS through a combination of:
- Row-level security rules: Define SQL predicates that filter data based on logged-in user properties
- Dataset-level filters: Apply automatic filters to specific datasets for certain user roles
- Database-level security: Leverage your warehouse’s native RLS if available (Snowflake, PostgreSQL, etc.)
When migrating RLS rules from Looker, map each rule to a Superset RLS configuration. For example, if Looker restricts regional managers to see only their region’s data, create a Superset RLS rule:
Clause: "region" = '{{ current_user_id() }}'
Assuming your user metadata includes a region field, this automatically filters data for each user.
Migrating Dashboards and Charts
Dashboard recreation is the most time-intensive part of migration. However, the process is straightforward once you understand Superset’s chart types and configuration options.
Assessing Dashboard Complexity
Not all dashboards are worth migrating. Evaluate each dashboard based on:
- Usage frequency: Check Looker’s usage logs to identify dashboards viewed monthly or more
- Business criticality: Prioritize dashboards used for decision-making, reporting, or operational monitoring
- Complexity: Estimate the effort to recreate, considering number of charts, filters, and drill-down interactions
- Audience size: Focus on dashboards used by multiple teams or departments
Create a migration backlog, prioritizing high-usage, critical dashboards. Plan to recreate 60-70% of your Looker dashboards; the remaining 30-40% are often lightly-used or outdated and can be archived.
Understanding Superset’s Chart Types
Superset offers a rich set of visualization types, though not all map directly to Looker’s visualizations. Here’s a quick mapping:
| Looker Visualization | Superset Equivalent |
|---|---|
| Table | Table |
| Number/Single Value | Big Number |
| Bar Chart | Bar Chart |
| Line Chart | Line Chart |
| Scatter Plot | Scatter Plot |
| Map | Map (Deck.gl) |
| Funnel | Funnel Chart |
| Pivot Table | Pivot Table |
| Gauge | Gauge Chart |
| Custom Visualization | Custom Plugin (requires development) |
Most standard charts translate directly. Custom Looker visualizations require either finding a Superset equivalent or developing a custom Superset plugin.
Dashboard Recreation Workflow
For each dashboard you’re migrating:
- Create a new Superset dashboard and note its name and purpose
- Recreate each chart by:
- Selecting the appropriate dataset or writing a custom SQL query
- Choosing the visualization type
- Configuring dimensions (row/grouping columns) and measures (aggregated columns)
- Setting filters and drill-down interactions
- Add dashboard-level filters that apply to multiple charts
- Configure drill-down and cross-filtering interactions
- Test all filters and interactions to ensure they work as expected
- Compare visually with the original Looker dashboard to verify accuracy
This process typically takes 30-60 minutes per moderately complex dashboard. Simple dashboards with 3-5 charts may take 15 minutes; complex dashboards with 15+ charts and intricate filtering may take 2+ hours.
Leveraging Superset’s Advanced Features
While recreating dashboards, take advantage of Superset capabilities that may exceed Looker’s functionality:
- Text-to-SQL with AI: D23 integrates AI-powered text-to-SQL capabilities, allowing users to generate queries by typing natural language questions. This can reduce the need for pre-built charts and empower self-serve analysis
- Dashboard parameters: Use Superset’s native parameters to create flexible, reusable dashboard templates
- Alerts and reports: Configure automatic alerts when metrics cross thresholds, and schedule dashboard exports to email
- Custom CSS and themes: Style dashboards to match your organization’s branding
Handling Metadata Migration
Metadata includes user accounts, permissions, dashboard definitions, and dataset configurations. Superset stores all metadata in its backend database (RDS PostgreSQL or MySQL).
Backing Up and Transferring Metadata
Before migrating, back up your Superset metadata database following official best practices. This is critical before any migration or upgrade:
- Snapshot your RDS instance (or create a manual backup)
- Export database contents using
pg_dumpormysqldump:pg_dump -h your-rds-endpoint.us-east-1.rds.amazonaws.com -U superset_user superset_db > superset_backup.sql - Store the backup in S3 with versioning enabled
For a fresh Superset deployment, you won’t import Looker’s metadata directly. Instead, you’ll recreate dashboards in Superset using the recreation workflow above. However, if you’re upgrading an existing Superset instance, use the official upgrade documentation to migrate metadata safely.
User and Permission Management
Superset uses role-based access control (RBAC) with predefined roles:
- Admin: Full access to all dashboards, datasets, and configuration
- Alpha: Can create and edit dashboards and datasets
- Gamma: Can view dashboards and explore data through existing datasets
- Public: Can access public dashboards without logging in
Map your Looker user roles to Superset roles:
- Looker admins → Superset admins
- Looker developers → Superset alphas
- Looker viewers → Superset gammas
For user provisioning, integrate Superset with your identity provider (Okta, Azure AD, Google Workspace) using SAML or OAuth. This allows users to log in with existing credentials and automatically assigns roles based on group membership.
Managing the Cutover
The transition from Looker to Superset requires careful planning to minimize disruption and maintain data access during the switch.
Parallel Running Period
Run both systems in parallel for 2-4 weeks before decommissioning Looker. This allows:
- Users to familiarize themselves with Superset’s interface
- Validation that Superset dashboards match Looker’s in accuracy and performance
- Time to address issues and refine dashboards
- Confidence that critical reports are working correctly
During this period, clearly communicate which dashboards are available in Superset and which remain in Looker. Update any documentation, bookmarks, or embedded links to point to Superset.
Testing and Validation
Before full cutover, validate that:
- All critical dashboards are available in Superset with correct data
- Query performance is acceptable: Compare query times between Looker and Superset for the same underlying queries
- Filters and interactions work correctly: Test all dashboard filters, drill-downs, and cross-filtering
- RLS is enforced properly: Verify that users see only data they’re authorized to access
- Scheduled reports and alerts execute on schedule
- API integrations (if any) work with Superset’s API
Create a test plan document and have a representative from each user group sign off on validation before proceeding to cutover.
Decommissioning Looker
Once you’re confident in Superset, schedule Looker decommissioning:
- Set a cutover date and communicate it broadly
- Disable Looker access for all non-admin users 1-2 weeks before cutover
- Export any remaining reports or analyses from Looker for archival
- Cancel Looker licenses and cloud resources
- Document the migration for future reference
Keep Looker available for admins for 1-2 weeks post-cutover as a safety net, in case you need to reference old dashboards or troubleshoot issues.
Optimizing Performance and Costs
One of the primary benefits of migrating to Superset is cost reduction. However, achieving those savings requires thoughtful optimization.
Query Performance Tuning
Superset’s performance depends on your warehouse’s performance. Optimize queries by:
- Using appropriate aggregation levels: Avoid selecting all rows when you need only daily or hourly aggregates
- Creating indexes on frequently-filtered columns (date ranges, customer IDs, regions)
- Leveraging materialized views for complex calculations used in multiple dashboards
- Configuring query caching: Superset caches query results; set appropriate TTLs based on data freshness requirements
- Using result backends: Store large query results in Redis or S3 to avoid re-running expensive queries
Cost Comparison: Looker vs. Superset
For a mid-market organization with 100 users, here’s a typical cost breakdown:
Looker on AWS:
- Licenses: 100 users × $2,000/user/year = $200,000
- AWS infrastructure (hosted Looker): ~$5,000-10,000/month = $60,000-120,000/year
- Total: $260,000-320,000/year
Superset on AWS:
- RDS PostgreSQL (metadata): ~$500-1,000/month = $6,000-12,000/year
- ECS/EKS infrastructure (application servers): ~$1,000-2,000/month = $12,000-24,000/year
- Redis (Celery backend): ~$200-500/month = $2,400-6,000/year
- Data warehouse (unchanged): Same as before
- Total: $20,400-42,000/year for BI platform (excluding warehouse)
For teams without in-house DevOps expertise, D23’s managed Superset service typically costs $5,000-15,000/month depending on scale, which still represents 50-70% savings versus Looker while eliminating operational overhead.
Avoiding Common Performance Pitfalls
- Don’t query raw tables: Always aggregate in the warehouse or use materialized views
- Avoid SELECT * queries: Explicitly select needed columns
- Don’t cache indefinitely: Set appropriate TTLs for cached results
- Monitor query execution: Use Superset’s query logs to identify slow queries
Leveraging AI and Advanced Features in Superset
Unlike Looker, Superset’s open-source nature and modern architecture make it easier to integrate AI and advanced analytics capabilities.
Text-to-SQL and Natural Language Queries
D23 integrates AI-powered text-to-SQL capabilities that let users ask questions in plain English and automatically generate SQL queries. This feature:
- Reduces dependency on pre-built dashboards
- Enables ad-hoc analysis without SQL knowledge
- Accelerates time-to-insight for exploratory questions
- Complements your dataset definitions with intelligent query generation
Text-to-SQL works best when your datasets are well-documented with clear column names and descriptions. During migration, invest time in creating meaningful dataset metadata.
API-First Architecture
Superset’s comprehensive REST API enables:
- Embedded analytics: Embed dashboards and charts directly in your product or internal applications
- Programmatic dashboard creation: Build dashboards via API rather than UI
- Third-party integrations: Connect Superset with Slack, Teams, or other tools for automated reporting
- Custom applications: Build data applications on top of Superset’s data layer
If you were embedding Looker dashboards in your product, Superset’s API provides equivalent (and often superior) flexibility.
MCP Server Integration
Superset can be integrated with Model Context Protocol (MCP) servers, enabling:
- Semantic layer connections: Link Superset to semantic layer tools like Cube or dbt
- Custom data connectors: Build integrations with proprietary data sources
- Workflow automation: Trigger external systems based on dashboard interactions
User Training and Adoption
A successful migration requires more than technical preparation; your users need to understand and embrace Superset.
Creating Training Materials
Develop documentation covering:
- Getting started: How to log in, navigate dashboards, and run basic filters
- Dashboard-specific guides: For critical dashboards, document what each chart shows and how to interpret it
- Self-serve analysis: How to create new charts and dashboards (for alpha users)
- Common tasks: Exporting data, scheduling reports, sharing dashboards
- Troubleshooting: What to do if a dashboard isn’t loading or a filter isn’t working
Provide both written guides and video walkthroughs. Record screen captures showing common workflows.
Conducting Training Sessions
Hold live training sessions for different user groups:
- Admins and alphas: Deep dive into dataset creation, RLS configuration, and advanced features
- Business users: Focus on navigating dashboards, using filters, and interpreting results
- Executives: Brief overview of available reports and how to access them
Schedule sessions at times convenient for different time zones and departments. Record sessions for asynchronous viewing.
Establishing Support Channels
Set up clear channels for users to ask questions and report issues:
- Slack channel: For quick questions and peer support
- Email support: For detailed issues requiring investigation
- Office hours: Regular sessions where your team is available to help
- Feedback form: Allow users to suggest improvements
Respond quickly to issues during the parallel running period to build confidence in Superset.
Post-Migration Operations
After cutover, your focus shifts to maintaining and optimizing Superset.
Monitoring and Alerting
Set up monitoring for:
- Application health: Monitor ECS task health, error rates, and response times
- Database health: Monitor RDS CPU, connections, and storage
- Query performance: Track slow queries and set alerts for queries exceeding thresholds
- Cache hit rates: Monitor Superset’s cache effectiveness
Use CloudWatch for AWS-native monitoring and consider tools like Datadog or New Relic for comprehensive observability.
Regular Maintenance
- Update Superset regularly: Follow the official upgrade documentation to stay current with security patches and new features
- Archive old dashboards: Periodically review and archive dashboards with low usage
- Optimize datasets: Review dataset queries and optimize those with poor performance
- Clean up cache: Periodically clear expired cache entries to maintain performance
Continuous Improvement
- Gather user feedback: Regularly ask users what’s working well and what could improve
- Monitor usage patterns: Identify which dashboards are most valuable and which are unused
- Iterate on dashboards: Refine dashboards based on user feedback and changing business needs
- Explore new features: As Superset evolves, evaluate new capabilities that could benefit your organization
Conclusion: Charting Your Path Forward
Migrating from Looker on AWS to Apache Superset is a significant undertaking, but it’s entirely achievable with proper planning and execution. The migration path is clear: assess your current setup, design your Superset architecture, recreate dashboards, validate thoroughly, and cut over systematically.
The benefits are substantial. You’ll reduce BI platform costs by 50-70%, eliminate vendor lock-in, gain architectural flexibility, and access modern capabilities like AI-powered text-to-SQL and embedded analytics. Your data warehouse connection remains stable throughout—you’re replacing the visualization layer, not your data infrastructure.
For organizations seeking to minimize operational burden, D23’s managed Apache Superset platform provides a middle ground: all the benefits of Superset with the operational simplicity of a managed service. Whether you choose self-managed or managed deployment, the core migration process remains the same.
Start with your assessment, build your migration backlog, and tackle dashboards in priority order. Involve your users early, train thoroughly, and run parallel systems long enough to build confidence. With this approach, you’ll successfully transition to Superset while maintaining data access and user satisfaction throughout the process.
The migration is an opportunity to reassess your BI strategy, eliminate unused dashboards, and establish better data governance practices. Use this moment to build a more efficient, flexible, and cost-effective analytics platform for your organization.