Apache Superset Audit Logging: Compliance-Ready Patterns
Learn how to implement audit logging in Apache Superset for SOC 2, HIPAA, and compliance. Real patterns, code examples, and best practices for production deployments.
Why Audit Logging Matters in Apache Superset
Audit logging is not a nice-to-have feature in modern analytics platforms—it’s a requirement. When you’re running Apache Superset at scale across your organization, you need to answer critical questions: Who accessed which dashboard? When did that SQL query run? Did anyone modify a sensitive data source? What changed in that alert configuration?
These aren’t just operational questions. They’re compliance questions. If you’re building toward SOC 2 Type II certification, HIPAA compliance, or PCI DSS standards, audit logging is your evidence layer. It’s the difference between “we think we’re compliant” and “we can prove we’re compliant.”
Apache Superset’s audit logging capabilities have evolved significantly, and when configured properly, they provide production-grade compliance-ready patterns that rival expensive enterprise platforms. The challenge is that audit logging isn’t enabled by default, and the configuration patterns aren’t always obvious to engineering teams new to Superset.
This guide walks through the complete picture: why audit logging matters for compliance, how Superset’s audit system works under the hood, practical implementation patterns, and real-world examples you can deploy today.
Understanding Superset’s Audit Event System
Apache Superset tracks user actions through an event-based architecture. Every significant action—logging in, viewing a dashboard, executing a query, modifying a chart, accessing a data source—generates an event. These events flow through Superset’s logging pipeline and can be captured, stored, and analyzed.
The core mechanism is the DBEventLogger, which is the default event logging class in Superset. When enabled, it writes events to a dedicated database table (logs table by default) that records:
- User identity: Who performed the action
- Action type: What they did (view_dashboard, execute_sql, edit_chart, etc.)
- Resource identifier: Which dashboard, chart, or data source
- Timestamp: When it happened (UTC)
- Request context: IP address, user agent, request duration
- Result status: Success or failure
- Additional metadata: Query parameters, chart IDs, database names
This is fundamentally different from application logs. Application logs capture system behavior and errors. Audit logs capture user behavior and data access patterns. For compliance, you need both, but audit logs are the compliance-critical layer.
When you’re evaluating Superset audit logging patterns against competitors like Looker or Tableau, the key difference is that Superset gives you direct control over where logs are stored and how they’re processed. You’re not locked into a vendor’s logging infrastructure. This is critical for organizations that need to integrate audit logs with existing SIEM systems or compliance platforms.
Enabling Event Logging in Superset Configuration
Event logging in Superset is controlled through the superset_config.py file (or environment variables in containerized deployments). Here’s the fundamental configuration:
# Enable event logging
EVENT_LOGGER = ["superset.utils.log.DBEventLogger"]
# Optional: Use StatsD for metrics (in addition to database logging)
STATSD_HOST = "localhost"
STATSD_PORT = 8125
STATSD_PREFIX = "superset"
The DBEventLogger is the most common choice for compliance because it writes directly to your Superset database, making logs immutable and queryable through SQL. When you enable this, Superset automatically creates the logs table on startup if it doesn’t exist.
For organizations running D23’s managed Superset platform, this configuration is pre-enabled and optimized for compliance workloads. The logging infrastructure is isolated, encrypted at rest, and backed by automated retention policies.
But if you’re self-hosting or running Superset on your own infrastructure, you need to ensure the logging database has adequate capacity. Audit logs grow quickly. A mid-market organization with 500 active users and 200 dashboards can generate 10,000-50,000 events per day. Over a year, that’s 3.6-18 million log records. Your database needs to handle this volume without impacting query performance.
This is where many teams stumble. They enable logging, but don’t provision separate storage or implement log rotation. After six months, the logs table has 10 million rows, and dashboard queries start timing out because the logging infrastructure is competing for I/O.
Database Schema and Event Storage
Understanding what Superset actually stores in the audit log table is critical for compliance reporting. Here’s the schema:
CREATE TABLE logs (
id INTEGER PRIMARY KEY AUTO_INCREMENT,
action VARCHAR(512),
user_id INTEGER,
username VARCHAR(512),
slice_id INTEGER,
dashboard_id INTEGER,
query_context TEXT,
database_id INTEGER,
database_name VARCHAR(255),
schema VARCHAR(255),
sql_text TEXT,
execution_time_ms FLOAT,
rows BIGINT,
has_error BOOLEAN,
error_text TEXT,
referrer VARCHAR(1024),
request_duration FLOAT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
dttm DATETIME DEFAULT CURRENT_TIMESTAMP
);
Key fields for compliance:
- action: The event type (view_dashboard, execute_sql, edit_chart, delete_dashboard, etc.)
- user_id and username: User attribution—critical for SOC 2 evidence
- dashboard_id, slice_id (chart), database_id: Resource identification
- sql_text: The actual SQL executed—important for data access audits
- execution_time_ms: Query performance baseline
- has_error, error_text: Failed access attempts—important for security monitoring
- timestamp: Event timing for incident investigation
For HIPAA or PII-sensitive workloads, the sql_text field deserves special attention. If your SQL queries contain patient identifiers, account numbers, or other protected data, you need to decide whether to:
- Store full SQL: Easier for debugging but creates a secondary data store of sensitive information
- Redact sensitive values: Parse and hash parameter values before logging
- Store query hashes only: Log a hash of the query for deduplication but not the full SQL
Most compliance frameworks recommend option 2—store full queries but implement access controls on the audit log table itself. This way, your compliance team can review what queries were executed, but only authorized personnel can see the actual query text.
Compliance-Ready Audit Logging Patterns
Now let’s move from configuration to patterns. A compliance-ready audit logging setup has several layers:
Pattern 1: Immutable Audit Log Storage
Audit logs must be immutable. If an attacker (or a careless administrator) can delete or modify log entries, your audit trail is worthless. Here’s how to implement immutable logging:
Use a separate database for audit logs. Don’t store audit logs in the same database as your Superset metadata. This creates a separation of duties. Your Superset application database can be restored, migrated, or modified without touching audit logs.
# In superset_config.py
FEATURE_FLAGS = {
"ENABLE_ADVANCED_DATA_TYPES": True,
}
# Separate audit database
AUDIT_DATABASE_URI = "postgresql://audit_user:password@audit-db.internal:5432/superset_audit"
# Custom event logger that writes to separate database
class ImmutableAuditLogger:
def log(self, user_id, action, resource_id, timestamp, metadata):
# Connect to AUDIT_DATABASE_URI, not the main database
# Use parameterized queries to prevent SQL injection
# Implement append-only semantics
pass
Enable database-level immutability. PostgreSQL supports immutable tables through constraints and triggers. AWS RDS offers S3 export for immutable backup storage. This ensures that even if someone gains database access, they can’t modify historical logs.
Pattern 2: Real-Time Log Streaming for SIEM Integration
Compliance teams often need to monitor logs in real-time. Rather than querying the audit table periodically, stream logs to your SIEM (Security Information and Event Management) system as they’re generated.
import json
from kafka import KafkaProducer
from superset.utils.log import DBEventLogger
class SIEMEventLogger(DBEventLogger):
def __init__(self):
super().__init__()
self.kafka_producer = KafkaProducer(
bootstrap_servers=['kafka.internal:9092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
def log(self, user_id, action, *args, **kwargs):
# First, log to database (parent class)
super().log(user_id, action, *args, **kwargs)
# Then stream to SIEM
event = {
'timestamp': kwargs.get('timestamp'),
'user_id': user_id,
'action': action,
'resource_id': kwargs.get('resource_id'),
'ip_address': kwargs.get('ip_address'),
'status': kwargs.get('status')
}
self.kafka_producer.send('superset-audit', value=event)
This pattern integrates with tools like Splunk, Datadog, or open-source SIEM platforms. Your compliance team gets real-time visibility into who’s accessing what, and you have a centralized audit trail.
Pattern 3: Automated Log Retention and Archival
Most compliance frameworks require audit logs to be retained for 1-3 years. Keeping 3 years of logs in hot storage is expensive. Implement a tiered approach:
-- Daily job: Archive logs older than 90 days
CREATE PROCEDURE archive_old_logs()
BEGIN
-- Move to cold storage (S3, GCS, etc.)
INSERT INTO logs_archive_s3
SELECT * FROM logs
WHERE timestamp < DATE_SUB(NOW(), INTERVAL 90 DAY);
-- Delete from hot storage
DELETE FROM logs
WHERE timestamp < DATE_SUB(NOW(), INTERVAL 90 DAY);
-- Compress and encrypt archive
-- (via external process)
END;
This keeps your audit table lean while maintaining compliance retention requirements. Cold storage (S3, GCS) is 10-100x cheaper than hot database storage.
Pattern 4: Audit Log Querying and Reporting
Compliance audits require specific queries. Build a set of canned reports that your compliance team can run:
-- Who accessed sensitive dashboards?
SELECT username, COUNT(*) as access_count, MAX(timestamp) as last_access
FROM logs
WHERE action = 'view_dashboard'
AND dashboard_id IN (SELECT id FROM dashboards WHERE is_sensitive = true)
AND timestamp > DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY username
ORDER BY access_count DESC;
-- What SQL queries ran against the production database?
SELECT username, sql_text, execution_time_ms, timestamp
FROM logs
WHERE action = 'execute_sql'
AND database_name = 'production'
AND timestamp > DATE_SUB(NOW(), INTERVAL 7 DAY)
AND has_error = false
ORDER BY timestamp DESC;
-- Failed access attempts (security monitoring)
SELECT username, action, error_text, COUNT(*) as failure_count, timestamp
FROM logs
WHERE has_error = true
AND timestamp > DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY username, action
HAVING failure_count > 5;
These queries become your compliance evidence. When an auditor asks “Who accessed patient data dashboards in Q3?” you have a query ready to go.
SOC 2 Type II Compliance Patterns
SOC 2 Type II certification requires evidence that your controls are operating effectively over time. For Superset, this means:
CC6.1 - Logical Access Control: Evidence that user access is controlled and monitored.
- Query: Show all users with dashboard access, their roles, and when access was granted
- Audit log evidence:
logstable filtered byaction = 'update_user_permissions'andaction = 'view_dashboard'
CC7.2 - System Monitoring: Evidence that system activity is monitored and recorded.
- Query: Show all SQL queries executed in the last 90 days, who ran them, and their execution time
- Audit log evidence:
logstable filtered byaction = 'execute_sql'with full SQL text
A1.1 - Change Management: Evidence that changes to dashboards, charts, and data sources are tracked.
- Query: Show all modifications to production dashboards, who made them, and when
- Audit log evidence:
logstable filtered byaction IN ('edit_dashboard', 'edit_chart', 'delete_dashboard')
When you’re preparing for a SOC 2 audit, your auditor will ask to see these logs. If you don’t have them, you fail. If you have them but they’re incomplete or tampered with, you fail. Audit logging is the foundation of your compliance posture.
HIPAA-Specific Audit Logging Requirements
HIPAA (Health Insurance Portability and Accountability Act) has stricter audit logging requirements than SOC 2. The HIPAA Security Rule (§164.312(b)) requires:
Audit controls: Implement hardware, software, and procedural mechanisms that record and examine access and activity in information systems containing ePHI (electronic Protected Health Information).
For Superset deployments handling healthcare data, this means:
- Log all access to ePHI dashboards: Every view, every query, every export
- Capture user identity: Username, employee ID, and optionally IP address and timestamp
- Track modifications: Who changed what, when, and why
- Implement access controls: Only authorized personnel can view audit logs
- Retain logs for 6 years: HIPAA’s standard retention period
Here’s a HIPAA-focused audit logging implementation:
class HIPAACompliantAuditLogger(DBEventLogger):
HIPAA_SENSITIVE_ACTIONS = [
'view_dashboard', # Any ePHI dashboard view
'execute_sql', # Any query against patient data
'export_data', # Any data export
'edit_dashboard', # Modifications to ePHI dashboards
'delete_dashboard' # Deletion of ePHI dashboards
]
def log(self, user_id, action, *args, **kwargs):
super().log(user_id, action, *args, **kwargs)
# For HIPAA-sensitive actions, also log to a separate immutable store
if action in self.HIPAA_SENSITIVE_ACTIONS:
self.log_to_hipaa_audit_store({
'user_id': user_id,
'action': action,
'resource_id': kwargs.get('resource_id'),
'timestamp': kwargs.get('timestamp'),
'ip_address': kwargs.get('ip_address'),
'user_agent': kwargs.get('user_agent'),
'query_hash': self.hash_query(kwargs.get('sql_text')),
'status': kwargs.get('status')
})
def log_to_hipaa_audit_store(self, event):
# Write to immutable, encrypted storage
# Implement 6-year retention
# Restrict access to compliance team only
pass
The key HIPAA difference from SOC 2: HIPAA requires longer retention (6 years vs. 1 year for SOC 2), stricter access controls on the audit logs themselves, and explicit documentation of your audit logging procedures.
Monitoring and Alerting on Audit Logs
Audit logs are only useful if you’re actively monitoring them. Set up alerts for suspicious patterns:
# Alert: Unusual access patterns
if logs_in_last_hour.filter(action='view_dashboard').groupby('user_id').count() > 100:
alert("User accessed 100+ dashboards in 1 hour - possible data exfiltration")
# Alert: Unauthorized access attempts
if logs_in_last_hour.filter(has_error=True).count() > 10:
alert("10+ failed access attempts in 1 hour")
# Alert: Modifications outside business hours
if logs_in_last_hour.filter(action__in=['edit_dashboard', 'delete_dashboard']).hour < 9 or > 17:
alert("Dashboard modification outside business hours")
# Alert: Sensitive data access
if logs_in_last_hour.filter(dashboard_id__in=SENSITIVE_DASHBOARDS).count() > 0:
alert(f"Access to sensitive dashboard: {dashboard_name}")
These alerts give you real-time visibility into potential compliance violations. Community discussions on Superset’s monitoring capabilities show that teams are increasingly using these patterns for security monitoring.
Practical Implementation: Step-by-Step
Here’s how to implement audit logging in your Superset deployment:
Step 1: Update superset_config.py
# Enable event logging
EVENT_LOGGER = ["superset.utils.log.DBEventLogger"]
# Optional: Add StatsD for metrics
STATSD_HOST = os.getenv("STATSD_HOST", "localhost")
STATSD_PORT = int(os.getenv("STATSD_PORT", 8125))
STATSD_PREFIX = "superset"
Step 2: Provision Database Storage
Ensure your database has adequate space for audit logs. For a mid-market deployment:
-- Create separate tablespace for audit logs
CREATE TABLESPACE audit_space LOCATION '/var/lib/postgresql/audit';
-- Create logs table in separate tablespace
CREATE TABLE logs (...) TABLESPACE audit_space;
-- Add indexes for common queries
CREATE INDEX idx_logs_timestamp ON logs(timestamp);
CREATE INDEX idx_logs_user_id ON logs(user_id);
CREATE INDEX idx_logs_action ON logs(action);
CREATE INDEX idx_logs_dashboard_id ON logs(dashboard_id);
Step 3: Restart Superset
# Docker
docker-compose restart superset
# Kubernetes
kubectl rollout restart deployment/superset
Step 4: Verify Logging is Working
-- Check for recent log entries
SELECT COUNT(*) FROM logs WHERE timestamp > NOW() - INTERVAL 1 HOUR;
-- View recent actions
SELECT action, COUNT(*) FROM logs GROUP BY action ORDER BY COUNT(*) DESC;
Step 5: Configure Log Retention and Archival
Set up automated archival (see Pattern 3 above) to manage storage costs.
Comparing Superset Audit Logging to Competitors
How does Superset’s audit logging stack up against Looker, Tableau, and Power BI?
Looker: Looker has built-in audit logging, but it’s a black box. You can’t see exactly what’s being logged or modify the schema. For Superset, you have full control.
Tableau: Tableau’s audit logs are available through their REST API, but they’re limited to 1 year of history. Superset lets you retain as long as you want.
Power BI: Power BI logging requires Azure audit logs, which adds complexity and cost. Superset logs directly to your database.
Metabase: Metabase has basic audit logging, but it’s not designed for compliance workloads. The logs are stored in the same database as application data, making them vulnerable to tampering.
Superset’s advantage: You own the audit infrastructure. You can integrate with your existing SIEM, implement immutability, and maintain full compliance control. This is why organizations evaluating managed Superset solutions like D23 often choose it over enterprise alternatives—better compliance, lower cost, and more flexibility.
Advanced: Custom Event Loggers
For specialized compliance needs, you can build custom event loggers. Here’s an example that logs to a webhook (useful for compliance platforms like Workiva or Domo):
import requests
import json
from superset.utils.log import AbstractEventLogger
class WebhookEventLogger(AbstractEventLogger):
def __init__(self):
self.webhook_url = os.getenv("COMPLIANCE_WEBHOOK_URL")
def log(self, user_id, action, *args, **kwargs):
event = {
'timestamp': datetime.utcnow().isoformat(),
'user_id': user_id,
'action': action,
'resource_id': kwargs.get('resource_id'),
'ip_address': kwargs.get('ip_address'),
'status': kwargs.get('status'),
'metadata': kwargs
}
try:
requests.post(
self.webhook_url,
json=event,
timeout=5,
headers={'Authorization': f'Bearer {os.getenv("COMPLIANCE_API_KEY")}'}
)
except Exception as e:
# Don't let logging failures break the application
logger.error(f"Failed to send audit event: {e}")
You can also chain multiple loggers:
EVENT_LOGGER = [
"superset.utils.log.DBEventLogger", # Database
"my_app.loggers.WebhookEventLogger", # Webhook
"my_app.loggers.SIEMEventLogger" # SIEM
]
Each logger receives the same event and can process it independently. This gives you flexibility to route different events to different systems.
Troubleshooting Common Audit Logging Issues
Issue: Logs table is growing too fast and impacting performance
Solution: Implement log archival (Pattern 3) and add indexes on frequently queried columns.
Issue: Sensitive data appearing in SQL text
Solution: Implement query redaction or hash the SQL before logging. Store full queries only in an access-controlled audit database.
Issue: Audit logs not being generated for certain actions
Solution: Check that EVENT_LOGGER is properly configured and Superset has been restarted. Not all actions generate events—only user-initiated actions. System background jobs don’t generate audit logs.
Issue: Compliance team can’t access audit logs
Solution: Create a read-only database user for the compliance team. Use row-level security (RLS) to limit visibility to specific dashboards or time periods if needed.
Integrating Audit Logs with D23
If you’re running D23’s managed Superset platform, audit logging is pre-configured and optimized for compliance. Your audit logs are:
- Automatically stored in an immutable, encrypted database
- Backed up daily with 90-day retention
- Accessible through a compliance-grade API
- Integrated with common SIEM platforms
- Monitored for suspicious patterns in real-time
You get the compliance benefits of Superset without the operational overhead of managing the logging infrastructure yourself. This is particularly valuable for organizations with strict compliance requirements but limited internal infrastructure expertise.
Best Practices Summary
-
Enable audit logging immediately: Don’t wait until you need it for compliance. Make it a day-one configuration.
-
Use a separate database for logs: Keep audit logs separate from application data to ensure immutability and performance isolation.
-
Implement log retention policies: Archive logs older than 90 days to cold storage. Retain for 1-3 years depending on compliance requirements.
-
Monitor logs in real-time: Set up alerts for suspicious patterns. Audit logs are only useful if you’re actively monitoring them.
-
Integrate with your SIEM: Stream logs to your centralized security monitoring system for enterprise-wide visibility.
-
Document your audit procedures: Compliance auditors will ask how you’re logging, retaining, and protecting audit data. Have documentation ready.
-
Test your audit logging: Periodically verify that logs are being generated correctly. Run test queries and confirm they appear in the audit table.
-
Restrict access to audit logs: Only authorized personnel (compliance team, security team, executives) should be able to query audit logs.
Conclusion
Audit logging is the foundation of compliance-ready analytics. Apache Superset’s event logging capabilities, when properly configured, provide production-grade compliance evidence that rivals expensive enterprise platforms. The key is understanding how the logging infrastructure works, implementing immutable storage patterns, and actively monitoring for compliance violations.
Whether you’re building toward SOC 2, HIPAA, or PCI DSS certification, Apache Superset audit logging patterns give you the tools to prove compliance. And if managing the compliance infrastructure is a burden, D23’s managed Superset service handles the operational complexity while you focus on analytics.
Start with the basics: enable DBEventLogger, provision adequate database storage, and build a few compliance queries. From there, you can layer in advanced patterns like SIEM integration, real-time alerting, and custom event loggers. Your compliance team will thank you, and your auditors will have the evidence they need.