Guide April 18, 2026 · 15 mins · The D23 Team

Real-Time Dashboards in Apache Superset Without Streaming Infrastructure

Build real-time dashboards in Apache Superset using cache TTLs and incremental loads—no Kafka or streaming infrastructure required.

Real-Time Dashboards in Apache Superset Without Streaming Infrastructure

Understanding Real-Time Dashboards Without Streaming

When most teams think about real-time dashboards, they imagine complex streaming architectures—Kafka topics flowing into message queues, Flink jobs transforming events, and dedicated infrastructure teams managing failover. That mental model is expensive and operationally heavy. It’s also unnecessary for most use cases.

Real-time dashboards in Apache Superset don’t require streaming infrastructure. Instead, they rely on pragmatic patterns: aggressive caching with short time-to-live (TTL) values, intelligent incremental data loads, and strategic database design. This approach trades some latency (typically 10-60 seconds rather than sub-second) for dramatic simplicity and cost savings.

The distinction matters because “real-time” is context-dependent. A financial trading desk needs sub-millisecond latency. A product analytics team tracking daily active users can tolerate 30 seconds. A sales leader reviewing pipeline metrics can live with a 5-minute refresh. By calibrating your caching strategy to actual business requirements, you avoid over-engineering while still delivering dashboards that feel responsive and current.

This guide walks through the concrete patterns that make this work: how to configure cache layers, design incremental loads, optimize database queries, and integrate with tools like ClickHouse and Apache Pinot when you need sub-second query performance. We’ll focus on what actually ships in production at scale-ups and mid-market companies, not theoretical ideals.

How Cache TTL Drives Real-Time Perception

Cache time-to-live (TTL) is your primary lever for controlling dashboard freshness without streaming infrastructure. When a user opens a dashboard in Superset, the platform checks whether cached results exist and whether they’ve expired. If fresh cache exists, the dashboard renders instantly. If cache has expired, Superset queries the underlying database and caches the new results.

The key insight: users don’t perceive latency if results appear immediately. A 45-second-old cached result served in 100ms feels more real-time than a 5-second-old result that takes 12 seconds to compute.

Superset gives you multiple layers of caching control:

Query-level cache TTL: Set directly on individual charts. A metric showing “revenue today” might have a 30-second TTL, while a cohort analysis chart has a 5-minute TTL. This granular control lets you match cache duration to the sensitivity of the metric.

Dashboard-level cache: Configure across all charts on a dashboard simultaneously. Useful when you want consistent freshness across related metrics.

Database-level cache: Superset can cache at the database connection level, reducing redundant queries to the same underlying table. This is especially powerful when multiple charts query the same fact table with different filters or aggregations.

Result set cache: The most aggressive layer. When two users run identical queries, the second user gets cached results from the first. This compounds savings in high-traffic dashboards.

The practical pattern: start conservative (60-second TTL) and tighten based on user feedback and query performance. If a dashboard takes 8 seconds to refresh and users are checking it every 2 minutes, a 30-second TTL feels snappy without hammering your database.

Incremental Loads and Delta Patterns

Incremental loading is the second pillar of fast refresh cycles without streaming. Instead of recomputing an entire dataset from scratch, you load only the rows that changed since the last update.

This requires discipline in your data warehouse design. Your fact tables need a reliable updated_at or created_at timestamp. Your ETL pipelines must partition data by date and maintain this timestamp accurately. When these conditions are met, incremental loads can reduce query time from minutes to seconds.

Here’s the pattern:

Materialized views with incremental refresh: Create a materialized view in your database (PostgreSQL, Snowflake, BigQuery—all support this) that pre-aggregates data. Instead of Superset querying raw fact tables every time, it queries the lightweight materialized view. Your ETL refreshes the materialized view incrementally, touching only rows where the timestamp is recent.

Example: a dashboard showing “orders by region, last 7 days” might query a raw fact table with billions of rows. Instead, create a materialized view that pre-aggregates to daily granularity by region. Your ETL refreshes only the last 3 days of this view every 5 minutes. Superset queries the tiny materialized view (thousands of rows) instead of scanning billions.

Virtual datasets with SQL-based incremental logic: Superset’s virtual dataset feature lets you define a SQL query as a logical table. You can embed incremental logic directly in this SQL—for example, “select all rows from events where created_at > now() - interval ‘1 hour’, then union with cached results from older data.” This approach works well when your database doesn’t support materialized views or when you need more flexibility.

Time-series-specific optimizations: For dashboards showing metrics over time (revenue trend, user growth, etc.), partition your data by time. Most analytical databases (ClickHouse, Pinot, Snowflake) support time-based partitioning natively. Superset can then query only recent partitions, dramatically reducing scan time.

When you combine incremental loads with short cache TTLs, the effect compounds. A query that normally takes 30 seconds, when incremental-loaded and cached for 45 seconds, creates the illusion of real-time responsiveness.

Database Selection and Query Optimization

Your underlying database choice shapes what’s possible. A traditional OLTP database (PostgreSQL, MySQL) can support real-time dashboards, but you’ll need aggressive caching and careful query optimization. Analytical databases designed for OLAP workloads (ClickHouse, Pinot, Snowflake) give you more headroom.

PostgreSQL and traditional SQL databases: These work fine for dashboards under moderate load (10-100 concurrent users). The constraint is query performance, not streaming capability. Optimize by indexing on filter columns, pre-aggregating in materialized views, and using Superset’s virtual dataset feature to push aggregation logic into the database rather than fetching raw data to Superset.

ClickHouse: Purpose-built for analytical queries. It compresses data heavily, supports sub-second queries on billion-row tables, and handles incremental updates efficiently. Visualizing real-time data with ClickHouse and Superset is a natural pairing—ClickHouse’s columnar format and aggressive compression mean your 7-day rolling window of events stays small enough to query in milliseconds.

Apache Pinot: Optimized for real-time OLAP. Pinot ingests data in real-time, maintains both real-time and offline segments, and automatically merges them for queries. When you connect Superset to Pinot, you get sub-second query latency on fresh data without custom streaming code. Pinot handles the streaming complexity internally.

Snowflake, BigQuery, Redshift: These cloud data warehouses are increasingly viable for real-time dashboards. Snowflake’s clustering and result caching can deliver sub-second query times. BigQuery’s columnar storage and query optimization handle large scans efficiently. If you’re already invested in one of these platforms, Superset integrates cleanly.

The decision tree:

  • Under 100 concurrent users, moderate data volume: PostgreSQL with aggressive caching and materialized views.
  • High concurrency or very large datasets: ClickHouse or Pinot for sub-second baseline query times, reducing reliance on cache.
  • Already in cloud ecosystem: Snowflake or BigQuery with Superset, leveraging their native optimization.

Configuring Superset for Real-Time Performance

Once you’ve chosen your database and optimized your data model, Superset configuration determines whether you actually achieve real-time dashboards or end up with slow, flaky dashboards that users avoid.

Cache backend selection: Superset supports Redis, Memcached, and file-based caching. Redis is standard for production. It’s fast, supports TTL natively, and integrates cleanly with Superset’s cache invalidation logic. Configure Redis with sufficient memory—a modest deployment serving 50 dashboards might need 2-4GB.

Query timeout settings: Set query timeouts to force fast-fail behavior. If a query takes longer than 30 seconds, it times out and returns an error rather than hanging. This prevents cascading failures where slow queries back up and block subsequent requests. In superset_config.py, set SQLALCHEMY_DATABASE_URI timeout and SUPERSET_SQLALCHEMY_QUERY_TIMEOUT to match your SLA.

Async query execution: Enable Superset’s async query feature. When a user opens a dashboard, Superset doesn’t wait for all charts to load. Instead, it renders the dashboard immediately and loads charts asynchronously as results arrive. This makes dashboards feel responsive even if some charts take 5-10 seconds.

Connection pooling: Configure your database connection pool to handle concurrent queries without exhausting connections. A pool size of 5-10 per Superset worker is typical. Too small and you get connection timeouts; too large and you overwhelm the database.

Celery task queue for background refresh: Use Celery to refresh cache in the background rather than on-demand. Schedule a Celery task to refresh key dashboards every 5 minutes, updating cache proactively. When users open the dashboard, cache is already warm. This trades storage (keeping cache warm) for latency (instant results).

Here’s a concrete configuration pattern:

CACHE_CONFIG = {
    'CACHE_TYPE': 'redis',
    'CACHE_REDIS_URL': 'redis://localhost:6379/0',
    'CACHE_DEFAULT_TIMEOUT': 300,  # 5 minutes default
}

SUPERSET_SQLALCHEMY_QUERY_TIMEOUT = 30  # 30-second timeout
SUPERSET_WEBDRIVER_BASEURL = "http://localhost:8088"

# Enable async query execution
FEATURE_FLAGS = {
    'ENABLE_SCHEDULED_QUERIES': True,
    'QUERY_ASYNC_EXECUTION': True,
}

Adjust cache TTLs per dashboard based on freshness requirements. A KPI dashboard refreshes every 30 seconds; a detailed analysis dashboard every 5 minutes.

Designing Incremental ETL for Superset Dashboards

Your ETL pipeline determines whether incremental loading actually works. A poorly designed pipeline defeats the entire strategy.

Timestamp discipline: Every fact table must have reliable created_at and updated_at timestamps. If your application doesn’t capture these, add them. This is the foundation of incremental loads.

Partition by time: Organize your data warehouse by date or hour. Most analytical databases support this natively. Superset queries only recent partitions, avoiding full table scans.

Incremental merge logic: Your ETL should:

  1. Identify rows changed since last run (using updated_at > last_run_time)
  2. Delete those rows from the target table
  3. Re-insert the updated rows

This “delete-then-insert” pattern (sometimes called “upsert”) ensures correctness even if a row was updated multiple times since the last ETL run.

Example ETL pattern in dbt or Airflow:

-- Load only rows updated in the last 2 hours
with incremental_data as (
  select *
  from source_events
  where updated_at > {{ var('last_run_time') }}
)
delete from fact_orders where id in (select id from incremental_data);
insert into fact_orders select * from incremental_data;

Run this ETL every 5 minutes. Superset’s 30-second cache TTL ensures users see data within 5-35 seconds of it landing in the warehouse.

Real-World Example: E-Commerce Dashboard

Let’s walk through a concrete example: a real-time e-commerce dashboard showing orders, revenue, and top products.

Baseline requirements:

  • 50 concurrent users
  • Dashboard with 8 charts
  • Data freshness: 30 seconds acceptable
  • Data volume: 500M orders, growing 5M/day

Architecture:

  1. Database: ClickHouse, partitioned by day. Fact table contains orders with created_at and updated_at timestamps.
  2. Materialized view: Pre-aggregates to hourly granularity by product and region. Refreshed incrementally every 5 minutes.
  3. Superset cache: 30-second TTL on all charts. Celery task refreshes cache every 5 minutes.
  4. Query optimization: Each chart queries the materialized view (10K rows) instead of the fact table (500M rows).

Result: Dashboard loads in under 500ms (from cache) with data 30-60 seconds fresh. Query latency (when cache misses) is 2-3 seconds. Database CPU stays at 15% even during peak traffic.

Without this approach: same dashboard would query the raw 500M-row fact table, taking 45+ seconds per query, requiring 8x the database resources, and creating a poor user experience.

Handling Cache Invalidation

Cache invalidation is notoriously hard (“There are only two hard things in Computer Science: cache invalidation and naming things”). Superset provides several patterns:

Time-based expiration: The simplest and most robust. Set a TTL and let cache expire automatically. Trade some staleness for simplicity and reliability.

Event-based invalidation: When data changes in your source system, explicitly invalidate relevant Superset caches. This requires integration between your application and Superset (via API) and works well for high-value dashboards where staleness is costly.

Scheduled refresh: Proactively refresh cache on a schedule (every 5 minutes) rather than waiting for users to request it. Requires Celery and additional infrastructure but delivers instant results to users.

Manual invalidation: Superset UI allows admins to manually clear cache. Use this for one-off scenarios (data corrections, backfills) but don’t rely on it for operational freshness.

The recommended pattern: combine time-based expiration (primary) with scheduled refresh (secondary). Set TTL to 30-60 seconds. Schedule Celery tasks to refresh cache every 5 minutes. If a user opens a dashboard 5 seconds after refresh, they get instant cached results. If they open 65 seconds after refresh, cache has expired but Celery is already refreshing in the background.

Monitoring and Observability

You can’t operate real-time dashboards blindly. Set up monitoring for:

Query latency: Track p50, p95, p99 query times. If p95 latency exceeds your TTL, you have a problem—users will see stale cache or slow loads. Alert if latency degrades.

Cache hit rate: Monitor the percentage of queries served from cache. A healthy real-time dashboard should have 80%+ cache hit rate. Low hit rate indicates insufficient TTL or cache size.

Database resource usage: CPU, memory, disk I/O. Real-time dashboards should not spike resource usage. If they do, your queries aren’t optimized or your cache isn’t working.

Dashboard load time: Measure end-to-end time from user click to rendered dashboard. Superset’s built-in metrics help here. Track p50 and p95.

Celery task execution: If using scheduled refresh, monitor task duration and success rate. Failing refresh tasks mean cache goes stale.

Set up a simple Prometheus/Grafana stack to track these metrics. Even basic monitoring prevents surprises.

Comparing Approaches: Streaming vs. Non-Streaming

When should you actually build streaming infrastructure instead of using the patterns described here?

Use non-streaming (Superset cache + incremental loads) when:

  • Latency tolerance is 10+ seconds
  • Data volume is under 10B rows
  • Concurrent users are under 500
  • You want minimal operational overhead
  • Cost is a constraint

Consider streaming (Kafka + Pinot or Flink) when:

  • Sub-second latency is mandatory
  • Data arrives in high-velocity streams (millions of events/second)
  • You need to detect anomalies in real-time
  • You already have streaming infrastructure
  • You have dedicated platform/data engineering teams

Most companies fall into the first category. Building real-time dashboards with Apache Superset using cache and incremental loads is the pragmatic default. Streaming is a specialized tool for specialized problems.

Integration with D23 Managed Superset

D23 simplifies this entire stack by managing Superset infrastructure, caching, and optimization for you. Instead of configuring Redis, tuning Celery, and debugging cache invalidation, you define cache policies and D23 handles the rest.

D23’s approach:

  • Pre-configured caching: Sensible defaults for TTL, Redis sizing, and connection pooling. Adjust per dashboard without touching infrastructure.
  • Incremental load optimization: D23’s data consulting team helps design materialized views and ETL patterns that work with Superset’s caching.
  • Database integration: D23 connects Superset to your ClickHouse, Pinot, Snowflake, or PostgreSQL instance, handling connection pooling and optimization.
  • Monitoring: Built-in observability for query latency, cache hit rate, and dashboard performance.
  • API-first design: D23 exposes Superset’s caching and refresh APIs, letting you integrate with your application workflow.

This matters because cache configuration and incremental load design are where most teams stumble. D23’s expertise accelerates the path from “we want real-time dashboards” to “our dashboards are fast and our database isn’t melting.”

Common Pitfalls and How to Avoid Them

Pitfall 1: Cache TTL too long: You configure 5-minute TTL to reduce database load. Users complain that dashboards show stale data. Solution: Start with 30-60 second TTL and increase only if you hit database limits. Cache is cheap; staleness is expensive.

Pitfall 2: Queries that can’t be optimized: A dashboard chart runs a complex query joining 5 tables, taking 20 seconds. You can’t cache away bad query design. Solution: Optimize queries before caching. Use virtual datasets and pre-aggregated views. If a query takes 20 seconds, cache won’t fix it—redesign it.

Pitfall 3: Cache size too small: Redis runs out of memory, evicting old cache entries. Cache hit rate drops. Solution: Monitor Redis memory usage. Size Redis to hold 2-3 hours of cache at peak load. Use Redis eviction policies (LRU) to automatically remove least-used entries.

Pitfall 4: Incremental loads with stale timestamps: Your ETL loads only rows where updated_at > last_run_time. But a row’s updated_at is never updated (it’s set once at creation). You miss corrections and backfills. Solution: Use separate updated_at column that actually updates. Or use event-based CDC (change data capture) to track all changes, not just new rows.

Pitfall 5: No monitoring: You launch real-time dashboards. Everything works for a week. Then a slow query appears, database load spikes, and users complain. You have no data on what changed. Solution: Set up monitoring from day one. Track query latency, cache hit rate, and database resources. Alert on anomalies.

Advanced: Text-to-SQL and AI-Assisted Real-Time Queries

AI is changing what’s possible with real-time dashboards. Large language models can translate natural language to SQL, letting non-technical users ask questions of real-time data.

D23 integrates text-to-SQL capabilities, allowing users to ask “what was revenue yesterday by region?” and receive a real-time dashboard chart. The LLM generates SQL, Superset executes it against your ClickHouse or PostgreSQL instance, and results are cached and served.

This pattern combines:

  • LLM-generated SQL: Fast, accurate translation of natural language to queries
  • Superset execution: Optimized query execution with caching
  • Real-time freshness: Short TTLs ensure users get current answers

The result is a dramatically better user experience—users don’t need to know SQL or dashboard design. They ask questions and get answers.

Conclusion: Real-Time Without Complexity

Real-time dashboards in Apache Superset don’t require streaming infrastructure, Kafka, or dedicated platform teams. They require:

  1. Short cache TTLs (30-60 seconds) to ensure freshness
  2. Incremental ETL to keep query times fast
  3. Database optimization through materialized views and partitioning
  4. Proper Superset configuration for caching, timeouts, and async execution
  5. Monitoring to catch problems early

Start with these fundamentals. If you hit performance limits, optimize queries and increase cache size. Only if latency requirements drop below 5 seconds should you consider streaming infrastructure.

Most companies never get there. Most dashboards work beautifully with 30-60 second freshness, aggressive caching, and a well-designed data warehouse. The result is real-time dashboards that feel responsive, cost a fraction of traditional BI platforms, and require minimal operational overhead.

If you’re evaluating Apache Superset for real-time dashboards, focus on database design and caching strategy. That’s where the wins happen. If you need guidance designing materialized views, configuring cache policies, or integrating with tools like ClickHouse or Pinot, D23’s data consulting and managed Superset platform provide the expertise and infrastructure to make it work at scale.