Apache Superset Caching with Redis: Tuning for High Concurrency
Master Redis caching in Apache Superset for high-load deployments. Tuning strategies, configuration, and performance optimization for production analytics.
Why Caching Matters in Production Superset Deployments
When you’re running Apache Superset at scale—handling hundreds of concurrent dashboard viewers, dozens of scheduled queries, and real-time data refreshes—database load becomes your bottleneck faster than you’d expect. A single poorly-cached dashboard query hitting your data warehouse can cascade into latency spikes that degrade the experience for everyone on the platform.
Caching isn’t optional in production. It’s the difference between a dashboard that loads in 200ms and one that takes 8 seconds. It’s the difference between your data warehouse handling 10 concurrent queries and 100. For teams at mid-market and scale-up companies who’ve chosen Apache Superset as their embedded analytics foundation, caching strategy directly impacts whether self-serve BI feels responsive or frustrating.
Redis has become the de facto standard for Superset caching because it’s fast, reliable, and purpose-built for this problem. But most teams configure it with defaults and wonder why they’re still seeing performance issues under load. This guide walks through the tuning decisions that matter: memory allocation, eviction policies, connection pooling, and the subtle interaction between query caching, results caching, and Celery task queues.
Understanding Superset’s Caching Architecture
Before you tune anything, you need to understand what’s actually being cached and why. Apache Superset uses caching at multiple layers, and they serve different purposes.
Query result caching stores the output of database queries—the actual data that powers your charts. When a user views a dashboard, Superset checks the cache first. If the result exists and hasn’t expired, it returns instantly. If not, it queries the database, stores the result, and serves it. This is where you see the biggest performance wins.
Metadata caching stores information about your databases, tables, columns, and schemas. This is lighter-weight but critical for dashboard load times. When you open a dashboard editor, Superset needs to know what tables are available; without caching, this becomes a repeated database scan.
Filter and chart state caching preserves the state of filters and chart parameters, reducing the need to recalculate these on every page load.
The official Apache Superset caching documentation details how Flask-Caching powers all of this. Under the hood, Superset uses Flask-Caching as its abstraction layer, which supports multiple backends: simple in-memory caching (dangerous at scale), filesystem caching (slow and not distributed), and Redis (fast, distributed, and what you should use in production).
Redis acts as a centralized cache store that all Superset instances can read from and write to. If you run three Superset application servers behind a load balancer, they all hit the same Redis instance, ensuring consistency and eliminating duplicate queries.
Setting Up Redis for Superset: The Foundation
You need Redis running and accessible to your Superset instances. If you’re on Kubernetes, this is a Helm chart. If you’re on AWS, ElastiCache. If you’re self-hosted, a dedicated Redis server or cluster.
The minimum viable Redis setup for Superset is straightforward:
REDIS_HOST=your-redis-endpoint.redis.cache.amazonaws.com
REDIS_PORT=6379
REDIS_DB=0
REDIS_PASSWORD=your-secure-password
In your Superset superset_config.py, you’d configure the cache backend like this:
CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_REDIS_URL": "redis://:password@host:6379/0",
"CACHE_DEFAULT_TIMEOUT": 300, # 5 minutes default
}
But this is where most teams stop, and where performance problems begin. The default timeout is 5 minutes, which might be fine for light usage but becomes a liability under concurrency. Default connection pooling doesn’t account for bursty traffic. Memory eviction isn’t configured, so Redis can run out of space.
Let’s talk about what actually matters for high-concurrency deployments.
Tuning Redis Memory and Eviction Policies
Redis stores everything in RAM. This makes it fast but means you need to be deliberate about memory allocation and what happens when you run out of space.
Memory allocation depends on your query result size and cache TTL (time-to-live). A rough estimate: if you have 50 active dashboards, each with 10 charts, and each chart result is 500KB, you’re looking at 250MB of cache data. Add 50% overhead for Redis internals, and you need at least 400MB. In production, allocate 2-3x your expected working set. If your dashboards are heavy (large result sets, many charts), this grows quickly.
The harder decision is eviction policy. When Redis hits its memory limit, it needs to decide what to delete. The default policy (noeviction) just refuses new writes, which breaks caching. You want an eviction policy that’s intelligent about what gets removed.
maxmemory 2gb
maxmemory-policy allkeys-lru
This tells Redis: “When you hit 2GB, remove the least-recently-used keys.” LRU (Least Recently Used) works because frequently-accessed dashboard results stay cached, while old or rarely-used results get evicted. This is the right policy for Superset in almost all cases.
Alternatively, allkeys-lfu (Least Frequently Used) evicts based on access frequency rather than recency. This is slightly better if you have a few “power user” dashboards that should always be cached, but it’s more computationally expensive and the difference is marginal in practice.
Avoid volatile-lru or volatile-ttl—these only evict keys with an explicit TTL, which isn’t how Superset’s cache works by default. You’ll end up evicting nothing and hitting the memory limit anyway.
Query Result Caching: The Core Performance Lever
Query result caching is where you get the biggest wins. A cached result that returns in 50ms instead of 2 seconds is a 40x improvement. When you have 100 concurrent users viewing the same dashboard, that one cached result serves all of them.
The key decision is cache TTL (time-to-live). How long should a query result stay in cache before being discarded?
For dashboards that display historical or slowly-changing data, a TTL of 1-2 hours is reasonable. Users see consistent data, and your database gets queried maybe once per hour instead of once per minute per user.
For real-time dashboards (operational metrics, live KPIs), you might use 30-60 seconds. You’re still getting 90% of the benefit—each query result serves dozens of users in that minute—but the data feels fresh.
For scheduled reports or overnight ETL dashboards, 24 hours is fine. The data doesn’t change until the next load.
You configure this per-dataset or globally:
CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_REDIS_URL": "redis://:password@host:6379/0",
"CACHE_DEFAULT_TIMEOUT": 3600, # 1 hour default
}
SQLALCHEMY_QUERY_TIMEOUT = 30 # Query timeout in seconds
CACHE_QUERY_BY_DEFAULT = True # Cache all queries by default
But there’s a subtlety: you can also set cache TTL at the dashboard or chart level. A dashboard editor can override the global default and say “this chart should cache for 5 minutes” or “this chart should never cache.” This flexibility is powerful but requires discipline. You need a caching policy that your team understands and follows.
A common mistake is setting TTL too low (30 seconds) because you think you need “fresh” data. In reality, you’re defeating the purpose of caching. If a query result is valid for 5 minutes, cache it for 5 minutes. Let multiple users benefit. Your database will thank you.
Results Backend Caching vs. Query Caching
Here’s where it gets confusing. Superset has two different caching mechanisms that often get conflated:
Query caching stores the raw SQL result set. When a user requests a chart, Superset checks: “Have I executed this exact query recently?” If yes, it returns the cached result. This is what we’ve been discussing.
Results backend caching (also called “async query caching”) stores the result of asynchronous queries executed by Celery workers. This is relevant when you have long-running queries that can’t complete in a single HTTP request. The worker executes the query, stores the result in the results backend (Redis, S3, database), and the frontend polls for completion.
For high-concurrency deployments, you want both configured, but they serve different purposes.
Query caching handles synchronous requests—the common case. Results backend caching handles async requests—the long-tail of expensive queries.
In your superset_config.py:
# Query caching (synchronous)
CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_REDIS_URL": "redis://:password@host:6379/0",
"CACHE_DEFAULT_TIMEOUT": 3600,
}
# Results backend (async query results)
RESULTS_BACKEND = "redis"
RESULTS_BACKEND_USE_MSGPACK = True
RESULTS_BACKEND_CONFIGURATION = {
"host": "your-redis-endpoint.redis.cache.amazonaws.com",
"port": 6379,
"db": 1,
"password": "your-secure-password",
}
Note that we’re using different Redis databases (db=0 for cache, db=1 for results). This is a best practice—it keeps the namespaces separate and makes debugging easier. If you need to flush cache without wiping async results, you can.
The guide to Redis caching for Superset dashboards covers this in depth, including how Celery task queues interact with the results backend.
Connection Pooling: Handling Concurrency Without Bottlenecks
Here’s a production gotcha that catches teams off guard: by default, Superset doesn’t pool connections to Redis efficiently. Under high concurrency, you can exhaust Redis connections and start seeing timeout errors, even though Redis itself is healthy.
Connection pooling means Superset maintains a pool of open connections to Redis and reuses them instead of opening a new connection for every cache operation. This is essential when you have 100+ concurrent dashboard requests.
You configure this in your cache URL:
CACHE_REDIS_URL = "redis://:password@host:6379/0?max_connections=50"
But max_connections is just the client-side limit. You also need to configure Redis itself to accept enough connections:
tcp-backlog 511
maxclients 10000
A rule of thumb: set max_connections to 2-3x the number of Superset application workers. If you have 10 workers, use max_connections=30. This accounts for bursty traffic without wasting resources.
You can verify connection usage by checking Redis:
INFO clients
Look at connected_clients. If it’s consistently near your maxclients limit, you’re starved for connections. Increase the limit.
Celery Task Queue Caching and Rate Limiting
Superset uses Celery to handle background tasks: scheduled queries, email reports, cache warming. Celery also needs Redis to coordinate work.
You can use the same Redis instance for both caching and Celery, but it’s cleaner to use separate databases:
CELERY_BROKER_URL = "redis://:password@host:6379/2"
CELERY_RESULT_BACKEND = "redis://:password@host:6379/3"
This keeps cache traffic (database 0), results (database 1), broker tasks (database 2), and task results (database 3) separate. It makes debugging easier and prevents one workload from starving another.
For high-concurrency scenarios, you also want rate limiting to prevent dashboard queries from overwhelming your database. Superset supports Redis-based rate limiting:
RATE_LIMIT_ENABLED = True
RATE_LIMIT = "100 per 1 hour" # Global limit
This is crude. Better to use a more granular approach with API-first architecture where you rate-limit by user, dashboard, or API key.
Monitoring and Debugging Redis Cache Performance
You can’t tune what you don’t measure. Set up monitoring on your Redis instance to understand what’s actually happening under load.
Key metrics to watch:
Hit rate: The percentage of cache requests that return a cached result. In a well-tuned system, this should be 70-90%. If it’s below 50%, your TTL is too short or your memory is too small.
INFO stats | grep keyspace_hits
INFO stats | grep keyspace_misses
Calculate hit rate: hits / (hits + misses). If you’re seeing 40% hit rate on a dashboard that hasn’t changed in hours, your cache TTL is misconfigured.
Memory usage: Monitor used_memory and used_memory_human to see how much of your allocated memory is actually in use.
INFO memory
If used memory is consistently low (less than 30% of maxmemory), you’re over-provisioned. If it’s consistently high (above 90%), you’re under-provisioned and evicting too aggressively.
Eviction rate: How often Redis is evicting keys due to memory pressure.
INFO stats | grep evicted_keys
If this is increasing rapidly, your memory allocation is too small or your TTL is too long.
Command latency: How long Redis operations take. In a healthy system, this should be under 1ms for cache operations.
You can use Redis benchmarking tools to simulate load and measure latency:
redis-benchmark -h your-redis-host -n 100000 -c 50
This simulates 50 concurrent clients executing 100,000 operations. You’ll see throughput and latency numbers. In a well-configured system, you should see 50,000+ operations per second with sub-millisecond latency.
For Superset-specific monitoring, enable query logging and cache hit tracking. In your superset_config.py:
LOGGING_CONFIG = {
"version": 1,
"disable_existing_loggers": False,
"handlers": {
"console": {
"class": "logging.StreamHandler",
"level": "INFO",
},
},
"loggers": {
"superset.models.cache": {
"handlers": ["console"],
"level": "DEBUG",
},
},
}
This logs cache hits and misses, so you can see in real-time what’s being cached and what’s not.
Advanced Tuning: Cluster Mode and Replication
As you scale beyond a single Redis instance, you need to think about availability. A single Redis server is a single point of failure. Under load, a single instance might not have enough throughput.
Redis Replication is the simpler approach: one primary Redis instance, multiple read replicas. Superset writes to the primary, reads from either primary or replicas. If the primary fails, you promote a replica. This is straightforward to set up and works well for most teams.
Redis Cluster Mode shards data across multiple Redis instances. Each instance holds a subset of the keyspace. This gives you horizontal scalability but adds complexity. For Superset caching, replication is usually sufficient unless you’re running thousands of concurrent users.
If you go the replication route, configure Superset to use the primary for writes and a replica for reads (when possible):
CACHE_REDIS_URL = "redis://:password@primary-host:6379/0"
CACHE_REDIS_REPLICA_URL = "redis://:password@replica-host:6379/0"
Not all cache operations support read replicas, but query result caching does, which is the high-volume operation.
For cluster mode, you’d use a cluster-aware client:
from rediscluster import RedisCluster
CACHE_REDIS_URL = "rediscluster://:password@node1:6379,node2:6379,node3:6379/0"
But be aware that cluster mode has limitations with Superset’s caching library. Test thoroughly before deploying to production.
Real-World Tuning Example: A Scale-Up’s Journey
Let’s walk through a concrete example. You’re a data analytics team at a 200-person scale-up. You’ve deployed Superset for embedded analytics, and your customers are using self-serve dashboards. You start with a single Redis instance (t3.medium on AWS ElastiCache), 1GB memory, default configuration.
Week 1: Everything works fine. Cache hit rate is 80%. Latency is sub-100ms.
Week 4: You’ve onboarded 10 customers. Dashboard load time is creeping up. Cache hit rate drops to 40%. You check Redis: memory is nearly full, and you’re evicting 1000s of keys per minute.
Problem: Your memory allocation is too small, and your TTL is too aggressive.
Fix: Increase memory to 4GB. Increase default cache TTL from 300 seconds to 1800 seconds (30 minutes). Confirm that your data freshness requirements allow this.
Week 6: Hit rate recovers to 75%. Latency is back to 100-150ms. But you’re now seeing occasional “connection timeout” errors in Superset logs.
Problem: Connection pooling is exhausted under peak load.
Fix: Increase max_connections from the default (25) to 75. Configure Superset to use 5 application workers instead of 2.
Week 8: Smooth sailing. But you’re paranoid about availability. You set up Redis replication: primary + replica. Configure Superset to read from replica when possible.
Week 12: You’re now at 50 customers. A single Redis instance is handling 1000 concurrent dashboard requests. Memory is at 80% utilization, hit rate is 70%, latency is still sub-200ms. You’re at the ceiling for a single instance.
Decision: Move to Redis Cluster Mode with 3 shards. This gives you 3x throughput headroom and distributes load.
This isn’t a theoretical example—it’s the typical journey for teams scaling Superset. The key insight is that tuning is iterative. You start with reasonable defaults, monitor, and adjust based on what you observe.
The D23 Approach: Managed Caching Without the Headache
If you’re evaluating whether to self-manage Superset’s caching or use a managed platform, understand the operational burden. Redis tuning, monitoring, failover, capacity planning—these are non-trivial tasks that distract from your core business.
D23 handles this for you. We manage the Redis infrastructure, tune caching policies based on your workload, monitor hit rates and latency, and scale automatically as your concurrency grows. You get production-grade analytics performance without the platform overhead.
For teams that choose to self-manage, this guide gives you the roadmap. For teams that want to focus on dashboards instead of infrastructure, there’s a reason managed Superset is becoming the standard.
Configuration Checklist for Production Deployments
Here’s a complete, production-ready Redis configuration for Superset handling 500+ concurrent users:
# superset_config.py
import os
# Primary cache (query results)
CACHE_CONFIG = {
"CACHE_TYPE": "RedisCache",
"CACHE_REDIS_URL": "redis://:" + os.environ.get("REDIS_PASSWORD") + "@" + os.environ.get("REDIS_HOST") + ":6379/0?max_connections=75",
"CACHE_DEFAULT_TIMEOUT": 1800, # 30 minutes
"CACHE_KEY_PREFIX": "superset_cache_",
}
# Results backend (async query results)
RESULTS_BACKEND = "redis"
RESULTS_BACKEND_USE_MSGPACK = True
RESULTS_BACKEND_CONFIGURATION = {
"host": os.environ.get("REDIS_HOST"),
"port": 6379,
"db": 1,
"password": os.environ.get("REDIS_PASSWORD"),
}
# Celery configuration
CELERY_BROKER_URL = "redis://:" + os.environ.get("REDIS_PASSWORD") + "@" + os.environ.get("REDIS_HOST") + ":6379/2"
CELERY_RESULT_BACKEND = "redis://:" + os.environ.get("REDIS_PASSWORD") + "@" + os.environ.get("REDIS_HOST") + ":6379/3"
# Query timeouts
SQLALCHEMY_QUERY_TIMEOUT = 30
CACHE_QUERY_BY_DEFAULT = True
# Rate limiting
RATE_LIMIT_ENABLED = True
RATE_LIMIT = "1000 per 1 hour"
And the corresponding Redis configuration:
# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lru
tcp-backlog 511
maxclients 10000
timeout 300
tcp-keepalive 60
Deploy this, monitor your hit rates and latency, and adjust based on your actual usage patterns. This is the foundation. Everything else is tuning.
Common Pitfalls and How to Avoid Them
Pitfall 1: Using filesystem cache in production. Filesystem caching is slow, doesn’t scale across multiple application servers, and creates debugging nightmares. Use Redis. Period.
Pitfall 2: Setting TTL too short. A 30-second TTL sounds conservative but defeats the purpose of caching. If your data is valid for 5 minutes, cache it for 5 minutes. Let multiple users benefit.
Pitfall 3: Ignoring memory pressure. Redis eviction is silent. You don’t get an error; you just get lower hit rates. Monitor evicted_keys and adjust memory allocation before it becomes a problem.
Pitfall 4: Not separating Celery and cache databases. Using the same Redis database for both causes one workload to starve the other. Use separate databases.
Pitfall 5: Misconfiguring connection pooling. Too few connections and you get timeouts. Too many and you waste resources. Start with max_connections = 2 * num_workers and adjust based on monitoring.
Benchmarking Your Setup
After tuning, benchmark your setup to understand the actual performance gains. The data engineer’s guide to Lightning-Fast Apache Superset dashboards documents real-world improvements: uncached queries taking 5+ seconds, cached queries returning in 50-100ms. That’s a 50-100x improvement.
You can replicate this with a load test. Use Apache JMeter or k6 to simulate 100 concurrent users viewing the same dashboard. Measure latency with and without Redis caching. You should see:
- Without caching: 2-5 second response times, high database load
- With caching (properly tuned): 100-300ms response times, minimal database load
This isn’t theoretical—this is what you should expect in production.
Conclusion: Caching as a Competitive Advantage
Redis caching in Apache Superset isn’t a nice-to-have optimization. It’s the difference between a platform that feels responsive and one that feels sluggish. It’s the difference between supporting 100 concurrent users and 1000.
The tuning decisions matter: memory allocation, eviction policy, TTL, connection pooling. But the good news is that the defaults are conservative. Start with the configuration in this guide, monitor your metrics, and adjust. You’ll quickly find the sweet spot for your workload.
For teams running Superset at scale, caching is non-negotiable. For teams evaluating whether to self-manage or use a managed platform, understand that this operational complexity is real. Whether you choose to own it or delegate it, understand what’s happening under the hood. That’s how you build analytics platforms that actually work at scale.