Guide April 18, 2026 · 19 mins · The D23 Team

Portfolio Operations Playbook: One Source of Truth Across 20 Companies

Build a unified analytics platform across your PE portfolio. Learn how to standardize reporting, reduce data silos, and drive value creation with managed Apache Superset.

The Portfolio Analytics Problem: Why Most PE Firms Still Use Spreadsheets

You’re running a portfolio of 20 companies. Each one has a different tech stack, different data warehouse, different reporting cadence. Your CFO needs consolidated EBITDA by vertical. Your operations team needs to compare SaaS churn rates across portfolio companies. Your board needs clean KPI dashboards by Friday.

Instead, you get seventeen different Excel files, three Tableau instances you’re not sure who owns, and a Slack channel where someone asks “can someone send me the Q3 numbers?” every month.

This is the state of portfolio operations at most private equity firms. And it’s expensive—not just in time, but in decision velocity, audit readiness, and the ability to actually see where value is being created or destroyed across your portfolio.

The solution isn’t another enterprise BI tool or a sprawling data lake that takes eighteen months to build. It’s a single, standardized analytics platform—one that your portfolio companies can plug into, that your operating partners can use without training, and that your data team can actually maintain without burning out.

This playbook walks you through building exactly that: a unified source of truth for portfolio operations using managed Apache Superset, API-first architecture, and a repeatable playbook for data integration across companies with wildly different starting points.

Why Apache Superset for Portfolio Analytics

Before we get into the mechanics, let’s establish why Superset is the right foundation for this problem.

Superset is open-source, lightweight, and designed to run on your infrastructure or in a managed environment. Unlike Looker or Tableau—which charge per user, per seat, and lock you into their data modeling layer—Superset is flexible. You define your data model once, and every portfolio company connects to the same source of truth. No licensing surprises. No vendor lock-in.

More importantly, Superset has become the de facto standard for embedded analytics. If you’re building dashboards that your portfolio companies’ teams will use internally, or if you want to embed KPI dashboards into your own portfolio management platform, Superset’s REST API and embedding capabilities make that straightforward.

For a PE firm managing 20+ companies, this matters because you’re not just building dashboards for yourself. You’re building a system that portfolio companies can adopt, that your operating partners can use to monitor performance, and that your CFO can trust as the single source of truth for reporting.

Using D23—a managed Apache Superset platform—removes the operational overhead of running Superset yourself. You get a hosted, production-grade instance with AI-powered analytics, API integration, and expert consulting built in. Your data team focuses on the data model, not infrastructure.

The Three Pillars of a Portfolio Analytics Platform

Building a unified analytics system across a portfolio isn’t just a technical problem. It’s an organizational and data governance problem. There are three things that have to work together:

Data Integration and Standardization

Your portfolio companies have different databases, different schemas, different definitions of “revenue” and “customer.” The first pillar is getting all that data into one place with consistent definitions.

This typically means:

Centralized data warehouse: A single Snowflake, BigQuery, or Redshift instance that all portfolio companies feed data into. This is the source of truth.
ETL/ELT pipelines: Automated processes (Fivetran, Airbyte, dbt) that pull data from each company’s operational systems and transform it into a standardized schema.
Data dictionary and governance: A single definition of what “ARR” means, what counts as “active customer,” what the fiscal calendar is. This sounds boring but it’s where most portfolio operations fail.

The goal is that when your CFO asks “what’s our blended NRR across SaaS companies,” the answer comes from one place, not five.

Dashboarding and Visualization Layer

The second pillar is making that data accessible. This is where Superset comes in.

You build a set of standard dashboards:

Consolidated KPI dashboard: Monthly EBITDA, growth rates, unit economics by vertical
Company-level operational dashboards: Revenue, churn, customer acquisition cost, burn rate—tailored to each company’s business model
Cohort analysis dashboards: How are 2023 acquisitions performing vs. 2024? Which verticals have the best margins?
Risk and alerts dashboard: Which companies are trending below plan? Where are we seeing red flags?

These dashboards live in a single Superset instance. Your CFO logs in and sees everything. Your operating partner for the SaaS vertical logs in and sees only SaaS companies (via role-based access control). A portfolio company’s CEO can view their own KPIs via an embedded dashboard in your portfolio management platform.

AI-Powered Analytics and Self-Service

The third pillar—and this is where you get leverage—is making the system self-service. Not every question needs a data analyst. Not every ad-hoc request should take a week.

This is where AI comes in. Text-to-SQL capabilities let business users ask questions in plain English: “Show me cohort retention for customers acquired in Q2” or “What’s our revenue by geography for the last four quarters?” The system translates that into SQL, runs it against your data warehouse, and returns a result.

For portfolio operations specifically, this means your CFO can ask “which portfolio companies are tracking above plan?” and get an answer without involving your data team. Your operating partners can run their own analyses. Your board gets clean, accurate answers faster.

AI also powers anomaly detection and alerting. If one of your portfolio companies’ unit economics suddenly shifts, you know about it.

Building the Data Foundation: The Standardized Schema

Let’s get concrete. Here’s what the data foundation actually looks like.

You need a central data warehouse (let’s assume Snowflake, but BigQuery or Redshift work the same way). Into that warehouse, you’re loading data from each portfolio company’s operational systems: their accounting software, their CRM, their product database, their HR system.

The key is that you transform all of that into a standardized schema. Here’s a simplified example:

Core Fact Tables:

fact_revenue: transaction-level revenue data, standardized across all companies (revenue date, company ID, customer ID, amount, revenue type, currency)
fact_expenses: operating expenses by category and company
fact_customers: customer acquisition date, cohort, LTV, churn status
fact_headcount: headcount by role, department, company, date

Core Dimension Tables:

dim_company: portfolio company metadata (name, acquisition date, vertical, geography, target EBITDA)
dim_customer: customer attributes (industry, geography, company size, contract value)
dim_date: standard calendar table (fiscal vs. calendar, quarters, fiscal year)

This schema is intentionally simple. The point is consistency. Every company’s revenue flows into the same table with the same columns. Every company’s expenses use the same category taxonomy.

Data integration tools like Fivetran or Airbyte handle the heavy lifting of extracting from each company’s source systems. Your data team configures the transformations once (usually in dbt), and then it runs automatically every day.

The whole thing should be automated. If you’re manually pulling Excel files and uploading them, you’ve already lost.

Setting Up Superset: The Dashboard Layer

Once your data warehouse is built and your ETL pipelines are running, you connect Superset to it.

Superset connects to your data warehouse via standard database drivers (Snowflake, BigQuery, Redshift, Postgres, etc.). You point it at your warehouse, and Superset introspects your schema.

From there, you define your datasets. A dataset in Superset is a SQL query (or a table) that you want to make available for visualization. For portfolio operations, you might define datasets like:

portfolio_kpis: A query that pulls consolidated EBITDA, revenue, headcount, and other key metrics for all companies
saas_unit_economics: A query that pulls NRR, churn, CAC, and LTV specifically for SaaS portfolio companies
quarterly_performance: A query that compares actual vs. plan for each company

Once you’ve defined your datasets, building dashboards is straightforward. You create a new dashboard, add charts and tables, configure filters (by company, by quarter, by vertical), and save it.

The power of Superset is that you’re not locked into a proprietary data model. You write SQL. You have full control. And if you need to change a calculation or add a new metric, you just update the query—no rebuilding the entire data model.

For a PE portfolio, you typically end up with:

One consolidated dashboard that your CFO and board use
Vertical-specific dashboards (SaaS, healthcare, B2B services, etc.)
Company-specific dashboards for each portfolio company
Operating partner dashboards with alerts and trend analysis

All of these live in the same Superset instance. Access is controlled via role-based permissions. A portfolio company’s CFO sees their own company’s data. Your CFO sees everything.

API-First Architecture: Connecting Your Portfolio Management Platform

Here’s where most PE firms miss the opportunity: they build beautiful dashboards that live in a BI tool, but they don’t connect them to their actual portfolio management workflow.

Instead, you want your portfolio analytics to be API-first. This means every dashboard, every metric, every data point is accessible via an API that your other systems can call.

Superset’s REST API lets you do this. You can:

Query data programmatically (no need to log into Superset)
Embed dashboards directly into your portfolio management platform
Trigger alerts when metrics fall outside thresholds
Export data for further analysis or reporting

For a PE firm, this looks like:

Portfolio Management Platform Integration: Your internal portfolio management tool (whether that’s a custom app or something like Carta or Altvia) can pull KPIs directly from Superset. When you open a company profile, you see real-time dashboards embedded right there—no separate login, no context switching.

Automated Reporting: Instead of manually pulling data and building monthly reports, your system generates them automatically. KPI reports go out to board members via email every month, with data pulled live from Superset.

Alert Systems: If a portfolio company’s revenue dips below plan by more than 10%, an alert fires automatically. Your operating partner gets notified. No more surprises at board meetings.

Portfolio Company Self-Service: Each portfolio company can embed their own dashboards into their internal systems. They see their KPIs in real time, without needing to log into Superset or ask your team for reports.

This is where the leverage really shows up. Once the system is built, it runs itself. Your data team maintains the data model and ETL pipelines. Everything else is automated.

Text-to-SQL and AI-Powered Analytics for Portfolio Operations

Now let’s talk about the AI piece. This is relatively new, but it’s transformative for portfolio operations.

Text-to-SQL (also called natural language to SQL) lets users ask questions in plain English and get SQL queries back. Instead of your CFO asking an analyst “can you show me the top 5 portfolio companies by EBITDA margin,” she just types that question into Superset and gets the answer.

This works because large language models (like GPT-4) have learned to translate natural language into SQL. You give the model your database schema, some examples of good queries, and it can generate accurate SQL for new questions.

For portfolio operations, this is powerful because:

Speed: Ad-hoc questions get answered in seconds, not days. Your CFO can explore “what if” scenarios in real time.

Democratization: Non-technical users can ask complex questions without learning SQL. Your operating partners, your board members, your portfolio company CEOs can all self-serve.

Consistency: Every query runs against the same data model, so everyone’s working from the same numbers.

The catch is that text-to-SQL isn’t magic. It works best when your data model is clean and well-documented. If your schema is a mess, or if your metric definitions are ambiguous, the AI will struggle.

This is why the data foundation (step 1) matters so much. Clean schema + clear definitions = AI that works.

Governance and Access Control Across a Portfolio

Here’s a critical point: you can’t just give everyone access to everything. You need governance.

In Superset, this means:

Role-Based Access Control (RBAC): Different users have different permissions. Your CFO sees all companies. A portfolio company’s CEO sees only their company. An operating partner sees their vertical.

Dataset-Level Permissions: You can restrict which datasets certain users can access. Maybe your portfolio company CEOs can see their own KPIs, but not their unit economics or cost structure.

Row-Level Security (RLS): For sensitive data, you can filter results at the database level. When a portfolio company’s CFO queries the fact_revenue table, the database automatically filters to show only their company’s data.

Audit Logging: Every query, every export, every dashboard view is logged. If there’s a question about who accessed what data and when, you have a record.

For a PE firm, governance is non-negotiable. You have fiduciary duties. You have audit requirements. You need to know exactly who accessed what data.

The right approach is to build governance into the system from day one, not bolt it on later. D23’s managed platform includes role-based access control, audit logging, and compliance features out of the box.

The Implementation Playbook: From Zero to One Source of Truth

Let’s talk about the actual implementation. How do you go from spreadsheets to a unified analytics platform?

Here’s a realistic timeline and approach:

Phase 1: Foundation (Weeks 1-8)

Weeks 1-2: Data Audit and Schema Design

You start by understanding what data you have. What are your portfolio companies’ source systems? Where does financial data live? Where’s operational data? What’s the current state of reporting?

You interview your CFO, your operating partners, your portfolio company CEOs. What metrics do they care about? What questions do they ask repeatedly?

Then you design your standardized schema. This is the most important step. Get it right, and everything else is easy. Get it wrong, and you’re rebuilding later.

Weeks 3-8: Data Warehouse Setup and ETL Development

You provision a Snowflake (or BigQuery) instance. You set up your core fact and dimension tables. You build ETL pipelines using dbt or Airbyte to pull data from each company’s source systems.

This is the heavy lifting. You’re configuring connectors, writing transformation logic, handling edge cases (different fiscal calendars, different revenue recognition policies, etc.).

By week 8, you should have clean, consistent data flowing into your warehouse on a daily basis.

Phase 2: Dashboarding (Weeks 9-14)

Weeks 9-10: Superset Setup and Dataset Definition

You set up your Superset instance (or sign up for D23). You connect it to your data warehouse. You define your core datasets.

Weeks 11-14: Dashboard Development

You build your core dashboards:

Consolidated portfolio KPI dashboard
Vertical-specific dashboards
Company-specific dashboards
Operating partner dashboards

You test with actual users. Your CFO logs in and sees what she needs. Your operating partners can navigate to their vertical. Portfolio company CEOs can see their own data.

Phase 3: API Integration and Automation (Weeks 15-20)

Weeks 15-16: API Integration

You build API connectors so your portfolio management platform can pull data from Superset. You set up embedded dashboards in your internal tools.

Weeks 17-20: Automation and Alerting

You build automated reporting. Monthly board reports generate themselves. Alerts fire when KPIs go off-plan. Your system is now self-serve and automated.

Phase 4: Rollout and Adoption (Weeks 21-24)

You train your users. You document the system. You handle questions and edge cases. By week 24, everyone’s using it.

This timeline assumes you have a dedicated data engineer and a data analyst. If you’re starting from scratch, you might want to partner with a consulting firm that specializes in this (like the team at D23) to accelerate.

Real-World Challenges and How to Handle Them

Let’s be honest about what actually goes wrong:

Challenge 1: Portfolio Companies Won’t Clean Their Data

You set up your standardized schema, and then a portfolio company comes back and says “our revenue data is messy.” Their customers are in one system, their invoices are in another, and nobody’s really sure what the single source of truth is.

Solution: Build a data quality framework into your ETL. Validate data as it comes in. Flag issues. Make it the portfolio company’s responsibility to fix their source data, not your problem to work around.

For critical metrics (revenue, headcount, cash), you might need to audit the data yourself. Have your CFO or a finance manager spot-check the numbers quarterly.

Challenge 2: Metric Definitions Keep Changing

Your CFO agrees that “ARR” means “annual recurring revenue from active customers.” Then a portfolio company says “but we count this differently because of our contract structure.” Then another company has a different definition.

Solution: Document everything. Create a data dictionary that defines every metric, every calculation, every assumption. Make it a single source of truth that everyone references.

When there are legitimate differences (because some companies do have different contract structures), create separate metrics. Call them arr_standard and arr_company_specific. Make it clear which is which.

Challenge 3: Performance Degrades as You Scale

Your Superset instance is fast with 5 companies’ data. But as you add the 15th, 20th company, queries slow down. Dashboards take 30 seconds to load.

Solution: This is why you need a managed platform. D23 handles performance optimization, caching, query optimization, and scaling. You don’t have to.

If you’re running Superset yourself, you’ll need to invest in infrastructure: better hardware, caching layers (Redis), query optimization, data warehouse tuning.

Challenge 4: Portfolio Companies Don’t Trust the Data

You roll out your dashboards, and a portfolio company’s CEO says “these numbers don’t match what I see in my accounting system.” Now nobody trusts your platform.

Solution: This usually means there’s a reconciliation issue. Maybe you’re pulling data at different times. Maybe there’s a timezone issue. Maybe you’re counting transactions differently.

Before you roll out to the entire portfolio, do a detailed reconciliation with the first few companies. Make sure your numbers match their source systems exactly. Once you’ve done that, you have proof that the system works.

Extending the Platform: MCP Servers and Advanced Integration

Once you have a working analytics platform, you can extend it further using MCP (Model Context Protocol) servers.

MCP is a protocol that lets AI agents (like Claude or ChatGPT) interact with external systems. You can build an MCP server that gives an AI agent access to your portfolio data.

What does this mean in practice? Your CFO could say to an AI assistant: “Generate a board presentation with the latest KPIs, include a section on portfolio company performance vs. plan, and flag any red flags.” The AI agent would query your Superset instance, pull the data, generate the presentation, all automatically.

This is still emerging, but it’s the future of portfolio analytics. The system doesn’t just answer questions—it proactively surfaces insights.

The Business Case: Why This Matters

Let’s talk about ROI. Why should you invest in a unified analytics platform?

Time Savings: Instead of your CFO spending 20 hours per month pulling data and building reports, it’s automated. That’s 240 hours per year. At a CFO’s salary, that’s easily $100K+ in saved time.

Better Decision Making: When you can see your portfolio’s performance in real time, you make better decisions. You catch problems earlier. You identify opportunities faster. This translates directly to value creation.

Operational Leverage: Once the system is built, adding a new portfolio company is easy. You just plug in their data. You don’t rebuild everything from scratch.

Audit and Compliance: When you need to report to your LPs or to auditors, you have a single source of truth. You can generate reports in minutes, not weeks.

Portfolio Company Efficiency: Your portfolio companies can use the system to run their own operations. They don’t need to wait for reports from your team. This improves their decision velocity and operational efficiency.

For a PE firm managing a 20-company portfolio, a unified analytics platform typically pays for itself in the first year through time savings alone. The strategic benefits (better decision making, faster value creation) are worth multiples of that.

Comparing Approaches: Superset vs. Looker vs. Tableau vs. Power BI

You might be wondering: why Superset instead of Looker, Tableau, or Power BI?

Each has trade-offs:

Looker: Powerful, flexible, but expensive. You pay per user, and it adds up fast with 20 portfolio companies. Looker locks you into its data model (LookML), which is great for some use cases but inflexible if you need to customize heavily.

Tableau: Beautiful visualizations, good for presentations. But similar pricing model to Looker—expensive at scale. And Tableau is more of a presentation layer than an operational analytics system.

Power BI: Good if you’re already in the Microsoft ecosystem. But it’s not as flexible as Superset for embedded analytics or API-first workflows.

Superset: Open-source, flexible, API-first, designed for embedded analytics. You can customize it to your exact needs. And you’re not paying per user—you’re paying for infrastructure, which scales much more efficiently.

For a PE firm, Superset (especially via a managed platform like D23) is the sweet spot. You get flexibility, cost efficiency, and the ability to build exactly what you need.

Key Takeaways: Your Portfolio Operations Playbook

Building a single source of truth across a portfolio of 20+ companies is a big project, but it’s doable. Here’s the summary:

Start with a clean data foundation: A standardized schema in a central data warehouse is everything. Get this right, and everything else is easy.
Use Superset as your analytics layer: It’s flexible, open-source, API-first, and designed for exactly this use case.
Automate everything: Your ETL pipelines should run automatically. Your reports should generate automatically. Your system should require minimal manual intervention.
Make it self-serve: With text-to-SQL and good documentation, your CFO and operating partners should be able to ask questions without involving your data team.
Build governance from day one: Role-based access control, audit logging, and data quality frameworks aren’t optional. They’re foundational.
Plan for 20 companies, build for 50: Design your system to scale. You’ll add more portfolio companies. You don’t want to rebuild.
Partner with experts: If you don’t have deep data engineering expertise in-house, bring in a partner. The investment in getting this right pays for itself quickly.

The firms that have a single source of truth for portfolio operations move faster, make better decisions, and create more value. It’s not magic—it’s just good data infrastructure and disciplined execution.

If you’re building this for your portfolio, D23 can help. We specialize in exactly this: managed Apache Superset, API-first architecture, and expert consulting for portfolio analytics. We’ve built this playbook with dozens of PE firms.

The question isn’t whether to build a unified analytics platform. The question is how fast you can get it running. The longer you wait, the more value you’re leaving on the table.