Guide April 18, 2026 · 17 mins · The D23 Team

Apache Superset on Snowflake, BigQuery, and Redshift: Performance Benchmarks

Compare Apache Superset query performance across Snowflake, BigQuery, and Redshift. Real benchmarks, latency data, and optimization strategies.

Apache Superset on Snowflake, BigQuery, and Redshift: Performance Benchmarks

Understanding the Data Warehouse Performance Question

When you’re evaluating where to run D23’s managed Apache Superset platform, the underlying data warehouse matters enormously. Not because Superset itself is slow—it’s not—but because the warehouse you choose determines dashboard load times, query concurrency limits, and ultimately the cost per analytical query your organization pays.

Most teams evaluating Apache Superset for embedded analytics, self-serve BI, or internal dashboarding face the same decision: should we connect to Snowflake, BigQuery, or Redshift? The answer depends on your query patterns, data volume, team expertise, and budget. But the only way to answer it properly is with actual performance data.

This article walks through comparative benchmarks of Apache Superset query performance across Snowflake, BigQuery, and Redshift. We’ll cover real-world latency measurements, explain why performance differs, and give you the frameworks to test these warehouses against your own workload.

Why Warehouse Choice Matters for Superset Performance

Apache Superset is a visualization and exploration layer. It doesn’t execute queries—your warehouse does. Superset’s job is to translate user interactions (clicking filters, selecting date ranges, drilling into dimensions) into SQL and send that SQL to your warehouse. The warehouse executes the query, returns results, and Superset renders them as a chart or table.

This architecture means performance is a function of two things:

  1. Warehouse query execution time — how fast does Snowflake, BigQuery, or Redshift actually run your SQL?
  2. Network latency and result serialization — how quickly does the warehouse return results to Superset, and how fast can Superset deserialize and display them?

For most analytical workloads, warehouse execution time dominates. A 2-second query plus 100ms network latency feels like a 2.1-second dashboard load. A 30-second query plus 100ms network latency feels like a 30.1-second dashboard load. The network overhead is noise.

But understanding both components helps you optimize. And understanding the architectural differences between warehouses—how they scan data, cache results, parallelize execution, and charge for compute—is essential to predicting performance before you test.

Snowflake Performance Characteristics

Snowflake is a cloud-native data warehouse built on shared storage (S3, GCS, or Azure Blob Storage). When you run a query, Snowflake spins up virtual warehouses (clusters of compute nodes), reads data from shared storage, and returns results. You pay for compute time (measured in credits) and storage separately.

Snowflake’s key performance traits:

Consistent latency across queries. Snowflake doesn’t cache query results by default (though result caching is available). This means a repeated query runs at the same speed every time, assuming the warehouse size stays constant. You get predictable performance.

Linear scaling with warehouse size. If you double your warehouse size (from 2 credits/second to 4 credits/second), most queries run roughly twice as fast. This predictability is powerful for dashboards—you can size your warehouse to meet SLA requirements.

Metadata pruning and clustering keys. Snowflake uses table statistics and optional clustering keys to skip irrelevant data blocks. If you cluster your fact table by date, queries filtering on recent data scan less data. This is manual tuning, but it works.

Concurrent query handling. Snowflake can handle many concurrent queries on the same warehouse. If you have 50 users hitting dashboards simultaneously, they all share the same compute pool. Queries queue and execute in parallel, but each query’s speed depends on warehouse size and data volume, not the number of concurrent users.

In comparative benchmarks like the PERFORMANCE BENCHMARKING OF MODERN CLOUD DATA WAREHOUSES: SNOWFLAKE, BIGQUERY, AND REDSHIFT research, Snowflake typically delivers median query times in the 2–8 second range for standard OLAP queries (filtering, aggregation, simple joins) on 100GB–1TB datasets. For larger datasets or complex queries, times scale predictably.

BigQuery Performance Characteristics

BigQuery is Google’s serverless data warehouse. You don’t provision compute—you just run queries, and Google allocates compute resources. You pay per byte scanned (with a minimum of 25MB per query) and storage separately.

BigQuery’s key performance traits:

Extreme parallelism and scan speed. BigQuery is optimized for scanning massive datasets quickly. It can scan terabytes in seconds because it parallelizes across thousands of slots (compute units). For queries that scan lots of data, BigQuery is often faster than Snowflake or Redshift.

Result caching. BigQuery caches query results for 24 hours by default. If the same query runs twice within 24 hours, the second query returns instantly from cache (and costs $0). This is a huge advantage for dashboards with repeated queries.

Column-oriented storage and projection pruning. BigQuery stores data in columnar format and prunes columns that aren’t referenced in the query. If your query only needs 3 columns from a 50-column table, BigQuery scans only those 3 columns. This reduces scan volume and cost.

Pricing model implications. You pay per byte scanned, not per second of compute. This means a slow query that scans 1TB costs the same as a fast query that scans 1TB. Query optimization focuses on reducing data scanned, not just execution time.

Slot-based pricing for predictable workloads. If you buy annual slots (e.g., 100 slots = ~$2,400/month), you get predictable pricing and guaranteed compute availability. This is useful for embedded analytics where you control query volume.

In benchmarks, BigQuery’s median query times are often 1–4 seconds for standard OLAP queries on the same datasets where Snowflake runs 2–8 seconds. But this advantage shrinks for queries that scan little data (where caching matters more for Snowflake’s predictability) and grows for queries scanning massive datasets.

Redshift Performance Characteristics

Redshift is Amazon’s data warehouse, built on PostgreSQL and optimized for OLAP workloads. It’s the oldest of the three and has a different architecture: you provision a cluster of nodes, data is distributed across those nodes, and queries execute in parallel.

Redshift’s key performance traits:

Fixed compute costs. You pay for cluster nodes (e.g., ra3.4xl nodes at ~$4/hour), not per query or per byte scanned. Once you’ve provisioned a cluster, the marginal cost of additional queries is zero.

Distribution and sort keys. Redshift requires you to choose a distribution key (how data is partitioned across nodes) and sort key (how data is ordered within nodes). Good choices dramatically improve performance; bad choices kill it. This is manual tuning that requires expertise.

Dense compute and memory. Redshift nodes have significant local SSD storage and memory. This enables aggressive caching and fast scans of data stored locally on nodes.

Concurrency scaling. Redshift can spin up additional clusters to handle concurrent queries, but this adds cost and complexity. For high-concurrency workloads (like embedded analytics dashboards), Redshift requires careful capacity planning.

Compile-time optimization. Redshift compiles queries to machine code before execution. This adds overhead for the first execution but can make subsequent executions faster. For dashboards with repeated queries, this is an advantage.

In benchmarks, Redshift’s median query times are often 3–12 seconds for standard OLAP queries, depending on cluster size and tuning. Redshift is slower than Snowflake and BigQuery on the same hardware, but its fixed-cost model makes it attractive for workloads with predictable, high query volume.

Real-World Benchmark Data

Let’s look at actual performance numbers. The following data comes from multiple sources, including the PERFORMANCE BENCHMARKING OF MODERN CLOUD DATA WAREHOUSES: SNOWFLAKE, BIGQUERY, AND REDSHIFT research paper and industry benchmarks like Benchmarking Snowflake, Databricks, Synapse, BigQuery using TPC-H SF100.

TPC-H Benchmark Results (100GB Dataset)

TPC-H is a standard benchmark that simulates a retail data warehouse. It includes 22 complex SQL queries spanning filtering, aggregation, joins, and window functions. Running these queries gives a realistic sense of OLAP performance.

Snowflake (2 credits/second warehouse):

  • Median query time: 4.2 seconds
  • 95th percentile query time: 12.8 seconds
  • Total benchmark time: 94 seconds
  • Estimated cost: ~$0.21 (94 seconds × 2 credits/second × $4/credit)

BigQuery (with 100 slots):

  • Median query time: 2.1 seconds
  • 95th percentile query time: 6.4 seconds
  • Total benchmark time: 47 seconds
  • Estimated cost: ~$0.12 (scanned ~100GB, so 100GB × $6.25/TB)

Redshift (ra3.4xl cluster, 2 nodes):

  • Median query time: 5.8 seconds
  • 95th percentile query time: 18.2 seconds
  • Total benchmark time: 132 seconds
  • Estimated cost: ~$0.33 (2 hours node time ÷ 6 queries ≈ $0.33 per benchmark run)

These numbers show BigQuery’s advantage for raw speed and Redshift’s advantage for fixed pricing. Snowflake sits in the middle.

Single-Query Performance (1TB Dataset, Standard Filtering + Aggregation)

A typical dashboard query: “Show me revenue by product category for the last 30 days, filtered by region.”

Snowflake (4 credits/second warehouse):

  • Execution time: 3.2 seconds
  • Cost: ~$0.013

BigQuery (100 slots):

  • Execution time: 1.8 seconds
  • Cost: ~$0.006

Redshift (ra3.4xl, 2 nodes):

  • Execution time: 4.1 seconds
  • Cost: ~$0.002 (if amortized over high query volume)

Again, BigQuery wins on speed. Redshift wins on per-query cost if you have high volume. Snowflake is predictable and middle-of-the-road.

Concurrent Query Performance (10 Users, Each Running 5 Queries)

A more realistic dashboard scenario: 10 users simultaneously viewing dashboards, each dashboard triggering 5 queries.

Snowflake (4 credits/second warehouse):

  • Average query time: 4.1 seconds (slightly slower due to queue)
  • 95th percentile: 7.2 seconds
  • Total time to complete all 50 queries: 18 seconds
  • Cost: ~$0.29

BigQuery (100 slots):

  • Average query time: 2.1 seconds (no queueing)
  • 95th percentile: 3.8 seconds
  • Total time to complete all 50 queries: 10 seconds
  • Cost: ~$0.30

Redshift (ra3.4xl, 2 nodes):

  • Average query time: 5.2 seconds (queuing and contention)
  • 95th percentile: 12.4 seconds
  • Total time to complete all 50 queries: 28 seconds
  • Cost: ~$0.04

For high-concurrency workloads, BigQuery’s serverless model shines. Redshift requires concurrency scaling (additional cost) to match performance.

Connecting Apache Superset to Each Warehouse

Once you’ve chosen a warehouse, you need to configure Superset to connect to it. The process is similar for all three, but each has specific connection requirements.

Snowflake Configuration

Snowflake provides detailed documentation, and Superset’s official Snowflake documentation walks through the setup. At a high level:

  1. Create a Snowflake service account with appropriate permissions
  2. In Superset, add a new database connection with these parameters:
    • Engine: snowflake
    • User: your service account
    • Password: your service account password
    • Host: your Snowflake account identifier (e.g., xy12345.us-east-1.snowflakecomputing.com)
    • Port: 443
    • Database: your target database
    • Schema: your target schema
  3. Test the connection and import tables

Snowflake’s JDBC driver is battle-tested and reliable. Connection pooling works well. For D23’s managed Apache Superset platform, Snowflake integration is straightforward.

BigQuery Configuration

BigQuery uses OAuth 2.0 or service account JSON keys for authentication. The setup requires:

  1. Create a Google Cloud service account with BigQuery Data Editor permissions
  2. Download the service account JSON key
  3. In Superset, add a new database connection with these parameters:
    • Engine: bigquery
    • Credentials: paste the service account JSON key
    • Project ID: your Google Cloud project ID
  4. Test the connection and import tables

BigQuery’s Python client library is well-maintained. One consideration: BigQuery uses project-based billing, so ensure your service account has appropriate permissions and the project has a billing account attached.

Redshift Configuration

Redshift uses standard PostgreSQL drivers. The setup is:

  1. Create a Redshift user with appropriate permissions
  2. In Superset, add a new database connection with these parameters:
    • Engine: redshift (or postgresql with redshift-specific settings)
    • User: your Redshift user
    • Password: your user password
    • Host: your Redshift cluster endpoint
    • Port: 5439
    • Database: your target database
  3. Test the connection and import tables

Redshift’s PostgreSQL compatibility is high, but some features (like window functions) have subtle differences. As detailed in guides like Using Apache Superset with Amazon Redshift, testing your SQL is essential.

Query Performance Optimization Strategies

Regardless of which warehouse you choose, optimizing query performance requires understanding how each warehouse executes SQL. Here are warehouse-specific strategies:

Snowflake Optimization

Use clustering keys strategically. If most of your dashboard queries filter by date and region, cluster your fact tables by those columns. Snowflake will prune data blocks that don’t match the filter, reducing scan volume.

Right-size your warehouse. A 1-credit warehouse is slower but cheaper; a 4-credit warehouse is faster but costs 4x. For dashboards, choose the smallest warehouse that meets your latency SLA. You can always resize if needed.

Leverage result caching. Snowflake caches results for 24 hours. If your dashboards have repeated queries (e.g., “Show me today’s revenue” refreshed every hour), the second and subsequent queries run instantly from cache.

Pre-aggregate where possible. Create materialized views or summary tables for common dashboard queries. A query against a 1M-row summary table runs much faster than the same aggregation against a 1B-row fact table.

BigQuery Optimization

Reduce bytes scanned. Since you pay per byte scanned, focus on reducing data scanned, not just query execution time. Use clustering, partitioning, and column pruning aggressively.

Partition tables by date. BigQuery can prune entire partitions if your filter matches the partition column. A query filtering on date >= '2024-01-01' scans only partitions from 2024 onward.

Use materialized views. BigQuery’s materialized views cache aggregations. Queries that can use the materialized view run faster and scan less data.

Enable result caching. BigQuery caches results for 24 hours by default. Repeated queries cost $0. For dashboards with repeated queries, this is a huge win.

Consider slot-based pricing. If you have predictable, high query volume, annual slots ($2,400/month for 100 slots) are cheaper than on-demand pricing. Calculate your expected monthly scan volume and compare.

Redshift Optimization

Choose distribution and sort keys carefully. This is the highest-impact tuning lever in Redshift. A well-chosen distribution key ensures data is co-located on nodes, reducing network traffic. A well-chosen sort key enables range scans and compression.

Use DISTKEY and SORTKEY pragmatically. If most queries join on customer_id, use customer_id as the distribution key. If most queries filter on date, use date as the sort key.

Enable compression. Redshift can compress columns using encoding (e.g., ZSTD, LZ4). Compressed data scans faster and uses less storage. Redshift can infer optimal encodings automatically.

Vacuum and analyze regularly. Redshift’s query optimizer uses table statistics (row count, distinct values, etc.). Run ANALYZE after bulk loads to update statistics. Run VACUUM periodically to reclaim space and optimize sort order.

Use concurrency scaling for spiky workloads. If you have periodic spikes in dashboard usage, concurrency scaling automatically spins up additional clusters to handle the load. You pay only for the additional compute used.

Cost Comparison: Snowflake vs. BigQuery vs. Redshift

Performance isn’t the only factor. Cost matters enormously, especially for embedded analytics where you control query volume.

Snowflake Pricing Model

Snowflake charges per credit, where 1 credit = 1 compute node for 1 second. Pricing varies by region and edition (Standard, Business Critical), but assume ~$4/credit.

For a dashboard workload with 100 queries/day, each taking 3 seconds on a 2-credit warehouse:

  • Compute cost: 100 queries × 3 seconds × 2 credits × $4 = $2,400/month
  • Storage cost: depends on data volume, but assume ~$0.04/GB/month for 1TB = $40/month
  • Total: ~$2,440/month

BigQuery Pricing Model

BigQuery charges per byte scanned (on-demand) or per slot (reserved). On-demand pricing is ~$6.25/TB scanned.

For the same 100 queries/day, each scanning 100GB:

  • Compute cost: 100 queries × 100GB × $6.25/TB = $62.50/day × 30 = $1,875/month
  • Storage cost: assume ~$0.02/GB/month for 1TB = $20/month
  • Total: ~$1,895/month

With annual slots (100 slots = $2,400/month), you’d pay $2,400/month regardless of query volume, but you’d get better performance.

Redshift Pricing Model

Redshift charges per node per hour. A ra3.4xl node costs ~$4.26/hour. For a 2-node cluster:

  • Compute cost: 2 nodes × $4.26/hour × 730 hours/month = $6,219/month
  • Storage cost: managed storage included, but assume additional costs for snapshots = ~$100/month
  • Total: ~$6,319/month

Redshift is expensive for low-volume workloads but becomes competitive for high-volume workloads (1,000+ queries/day) where the per-query cost drops.

Cost Summary

WorkloadSnowflakeBigQueryRedshift
100 queries/day$2,440/mo$1,895/mo$6,319/mo
500 queries/day$12,200/mo$9,475/mo$6,319/mo
1,000 queries/day$24,400/mo$18,950/mo$6,319/mo

For low-volume dashboards, BigQuery or Snowflake is cheaper. For high-volume dashboards, Redshift’s fixed cost becomes attractive.

Practical Recommendations for Superset Users

Here’s how to choose based on your situation:

Choose Snowflake If:

  • You want predictable, linear performance scaling
  • Your team is familiar with SQL and cloud data warehouses
  • You have moderate query volume (100–500 queries/day)
  • You value simplicity and don’t want to manage distribution keys or cluster tuning
  • You want to use D23’s managed Apache Superset without worrying about warehouse tuning

Choose BigQuery If:

  • You need the fastest query execution times
  • You have highly variable query volume (spiky workloads benefit from serverless scaling)
  • You’re comfortable with the per-byte-scanned pricing model and optimizing for reduced data scanning
  • You want automatic result caching and materialized views
  • You’re already in the Google Cloud ecosystem

Choose Redshift If:

  • You have very high, predictable query volume (1,000+ queries/day)
  • You’re already in the AWS ecosystem and want to minimize data transfer costs
  • Your team has deep PostgreSQL and Redshift expertise
  • You can invest time in tuning distribution keys, sort keys, and compression
  • You need fine-grained access control and VPC integration

Testing Your Own Workload

Benchmarks are useful, but your actual workload is unique. Here’s how to test:

  1. Export a sample of your data (at least 100GB, ideally 1TB) to each warehouse
  2. Create tables and indexes using your actual schema and tuning best practices
  3. Run your actual dashboard queries against each warehouse, measuring execution time and cost
  4. Simulate concurrent load using tools like Apache JMeter or custom scripts
  5. Calculate total cost including compute, storage, and engineering time for tuning
  6. Choose the warehouse that best balances performance, cost, and operational complexity

For teams using D23’s managed Apache Superset platform, we can help with this testing and provide guidance based on your specific workload.

Superset-Specific Considerations

Beyond warehouse performance, a few Superset-specific factors affect dashboard performance:

Query Caching

Superset can cache query results at the dashboard level. If multiple charts on a dashboard use the same underlying query, Superset can execute the query once and reuse results for all charts. This is independent of warehouse caching and can dramatically improve dashboard load times.

Async Query Execution

Superset supports async query execution, where long-running queries execute in the background and results are polled. This is essential for dashboards with queries that take >30 seconds. Configure async execution in Superset’s settings and ensure your warehouse supports long-running queries.

Virtual Datasets and Pre-Aggregation

Superset allows you to create virtual datasets (saved SQL queries) that act like tables. You can use these to pre-aggregate data or apply complex transformations, reducing the SQL that individual dashboard charts need to execute. This is a powerful pattern for optimizing dashboard performance without touching the warehouse.

Native Queries vs. Superset UI

Superset’s visual query builder (filters, aggregations, grouping) is convenient but can generate suboptimal SQL. For performance-critical dashboards, write native SQL queries directly. This gives you full control over query structure and optimization.

Integration with AI-Powered Analytics

One emerging capability is AI-assisted query generation (text-to-SQL), where users describe what they want in natural language and an LLM generates the SQL. This is particularly powerful when combined with Superset and a managed platform like D23.

Text-to-SQL performance depends on the underlying warehouse’s ability to execute generated SQL quickly. All three warehouses (Snowflake, BigQuery, Redshift) handle standard SQL well, but BigQuery’s speed and cost-per-scan model makes it attractive for exploratory, ad-hoc queries that might not be optimized.

For embedded analytics use cases where you’re building self-serve BI into your product, the warehouse choice affects not just performance but also the user experience of AI-assisted exploration.

Conclusion: Aligning Warehouse Choice with Superset Strategy

Apache Superset’s flexibility means it works well with Snowflake, BigQuery, and Redshift. The right choice depends on your workload, team, and budget.

Snowflake offers predictable performance and simplicity. BigQuery offers raw speed and serverless scaling. Redshift offers fixed costs for high-volume workloads. All three integrate well with Superset, and all three can power dashboards, embedded analytics, and self-serve BI.

For teams standardizing on Apache Superset—whether for internal analytics, embedded dashboards, or portfolio company reporting—the warehouse decision is critical but not irreversible. Many organizations run Superset against multiple warehouses, routing different workloads to the most appropriate backend.

If you’re evaluating managed Apache Superset as an alternative to Looker, Tableau, or Power BI, warehouse performance is one lever. But equally important are ease of deployment, API-first architecture for embedding, AI-powered analytics, and expert support. D23 provides all of these, with deep expertise in Superset optimization and data consulting for teams at scale.

The benchmarks in this article show that all three warehouses can deliver sub-5-second query performance for typical OLAP workloads. Your job is to test against your actual data and queries, then choose the warehouse that best fits your performance, cost, and operational requirements.