Guide April 18, 2026 · 19 mins · The D23 Team

Multi-Agent Document Extraction for Financial Reports

Learn how multi-agent systems extract tables, narratives, and footnotes from financial documents. Technical guide for data teams building scalable extraction pipelines.

Understanding Multi-Agent Document Extraction

Financial documents—10-Ks, earnings reports, regulatory filings—contain critical business intelligence locked in unstructured text, tables, and footnotes. Extracting this data at scale has historically required either manual labor or brittle, single-purpose extraction scripts that break when document layouts change.

Multi-agent document extraction flips this problem. Instead of one monolithic parser trying to handle every element of a financial report, you deploy specialized agents—each optimized for a specific extraction task. One agent pulls structured tables. Another extracts narrative insights from management discussion sections. A third surfaces regulatory footnotes and risk disclosures. These agents work in parallel, coordinate results, and feed clean, normalized data into your analytics pipeline.

This architecture solves a real pain point: financial documents are intentionally complex. Companies format 10-Ks differently. Tables nest inside prose. Footnotes reference back to line items. A single extraction engine either oversimplifies (missing nuance) or becomes so complex it’s unmaintainable. Multi-agent systems handle this complexity by dividing labor according to document structure and semantic meaning.

The payoff is significant. Teams using multi-agent extraction report 3–5× faster time-to-insight on financial data, fewer manual validation steps, and extraction pipelines that scale to thousands of documents without retraining. For data leaders building embedded analytics or self-serve BI platforms—especially those using Apache Superset for analytics dashboards—this approach enables you to ingest financial data at the speed and quality your business demands.

Why Single-Agent Extraction Falls Short

Before diving into multi-agent architecture, it’s worth understanding why traditional extraction approaches struggle with financial documents.

A single-agent extractor typically works like this: you feed it a document, it applies a general-purpose language model or rule-based parser, and it outputs extracted fields. This works fine for simple, repetitive documents—invoices, receipts, standardized forms. Financial reports are different.

Structural Complexity

A 10-K contains multiple document types within a single file: cover pages with metadata, narrative sections (Item 1: Business), structured financial statements (balance sheets, income statements), tables of financial data, and dense footnotes explaining accounting policies. A single agent trying to extract “revenue” must recognize revenue in a narrative paragraph and in a structured table and in a footnote cross-reference. Each context requires different parsing logic.

Semantic Ambiguity

Financial language is intentionally precise but contextual. “Operating income” means different things in different sections of a 10-K. A single agent must maintain state across the entire document to correctly disambiguate. As documents grow longer, this state management becomes a bottleneck.

Layout Variability

Companies format financial reports differently. One company’s balance sheet is a clean HTML table; another’s is a scanned PDF with OCR artifacts. A single extraction pipeline must handle all variations, leading to either low precision (missing real data) or low recall (extracting noise). Specialist agents can be tuned for specific layouts.

Error Cascading

In a monolithic pipeline, an error early in extraction (e.g., misidentifying a table header) cascades downstream, corrupting dependent extractions. Multi-agent systems isolate failures. If the table-extraction agent makes a mistake, the narrative-extraction agent can still succeed.

Research on multi-agent systems for extracting financial KPIs demonstrates that specialized agents significantly outperform single-engine approaches on precision and recall for financial documents. The key insight: financial documents have structure. Exploit that structure by matching agent specialization to document components.

Core Architecture: Three Specialist Agents

A production multi-agent extraction system for financial reports typically deploys three core agents, each optimized for a specific document layer:

The Table Extraction Agent

This agent identifies and extracts structured financial data—balance sheets, income statements, cash flow statements, and supporting schedules. Its job is straightforward in concept but complex in execution: find tables, parse their structure, normalize values, and output clean, queryable data.

Why it’s specialist: Tables have consistent structure within a document but variable formatting across documents. Some are HTML tables (easy to parse). Others are PDF tables with merged cells (harder). Some are text-based tables with whitespace alignment. A table-specialized agent can apply document-type detection and format-specific parsing logic that a general-purpose agent would struggle with.

Typical workflow:

Detect table boundaries (via OCR confidence, layout analysis, or heuristics).
Identify headers and data rows.
Normalize units (e.g., “in thousands” notation in financial statements).
Validate data types (dates, numbers, currency).
Output structured JSON with table metadata (source page, confidence score, table caption).

Real-world example: A 10-K’s consolidated balance sheet lists assets, liabilities, and equity across two fiscal years. The table agent must recognize that columns represent different time periods, extract row labels correctly (distinguishing “Current assets” from “Total current assets”), and flag ambiguous entries for human review. It outputs:

{
  "table_type": "balance_sheet",
  "fiscal_periods": ["2024", "2023"],
  "rows": [
    {"label": "Current assets", "2024": 5000000, "2023": 4500000},
    {"label": "Total assets", "2024": 12000000, "2023": 11000000}
  ],
  "source_page": 42,
  "confidence": 0.98
}

This structured output feeds directly into your analytics pipeline or embedded analytics dashboards, enabling immediate visualization and querying.

The Narrative Extraction Agent

Narrative sections of financial documents—management’s discussion and analysis (MD&A), risk factors, business description—contain strategic insights that don’t fit into tables. This agent extracts key themes, forward-looking statements, and contextual information that explains why financial numbers changed.

Why it’s specialist: Narrative extraction requires semantic understanding, not just structural parsing. You’re looking for concepts (“market expansion,” “supply chain disruption,” “regulatory risk”) scattered across paragraphs. A table-extraction agent is useless here. A narrative agent applies natural language understanding to identify, summarize, and tag relevant passages.

Typical workflow:

Identify narrative sections (MD&A, risk factors, business overview).
Segment text into logical chunks (paragraphs or topic boundaries).
Apply entity and concept recognition (companies, markets, risks, opportunities).
Extract key statements (forward guidance, management commentary, strategic pivots).
Generate summaries and tag content by business impact.

Real-world example: An MD&A section discusses a 15% revenue increase. The narrative agent extracts:

{
  "section": "MD&A",
  "key_drivers": [
    {"driver": "geographic expansion", "impact": "positive", "magnitude": "high"},
    {"driver": "product mix shift", "impact": "positive", "magnitude": "medium"}
  ],
  "forward_statements": [
    "We expect continued growth in the Asia-Pacific region",
    "Margin expansion will be challenged by rising labor costs"
  ],
  "risks_mentioned": ["currency exposure", "competitive pressure"],
  "confidence": 0.85
}

This output enriches financial dashboards with context, enabling analysts to understand not just what changed, but why. For teams building self-serve BI platforms, this narrative data becomes searchable, filterable dimensions in your analytics layer.

The Footnote Extraction Agent

Footnotes in financial documents are where the complexity lives. They explain accounting policies, detail contingent liabilities, describe related-party transactions, and provide reconciliations between reported and non-GAAP metrics. Extracting footnotes correctly is critical for accurate financial analysis.

Why it’s specialist: Footnotes have unique structure and semantics. They’re often numbered, cross-referenced, and dense with technical accounting language. They may reference tables or other footnotes. A general-purpose agent struggles because footnotes require:

Anchor identification (which line item does this footnote explain?).
Cross-reference resolution (footnote 3 references footnote 1; parse both).
Accounting-domain knowledge (understanding what “deferred tax asset” means).
Reconciliation extraction (footnotes often provide detailed breakdowns of consolidated figures).

Typical workflow:

Identify footnote anchors (superscript numbers, links, or markers in main text).
Extract footnote text and associate with anchors.
Resolve cross-references between footnotes.
Parse structured content within footnotes (sub-tables, numbered lists).
Extract quantitative reconciliations (e.g., “non-GAAP operating income reconciliation”).
Tag by accounting domain (revenue recognition, debt, equity, tax).

Real-world example: A footnote explains that reported net income of $100M differs from non-GAAP net income of $110M due to stock-based compensation ($5M) and acquisition-related costs ($5M). The footnote agent extracts:

{
  "footnote_number": 5,
  "anchor_line_item": "Net income",
  "accounting_domain": "non_GAAP_reconciliation",
  "reconciliation": [
    {"item": "Reported net income", "value": 100000000},
    {"item": "Add: Stock-based compensation", "value": 5000000},
    {"item": "Add: Acquisition costs", "value": 5000000},
    {"item": "Non-GAAP net income", "value": 110000000}
  ],
  "cross_references": [2, 8],
  "confidence": 0.92
}

This enables your analytics platform to track both GAAP and non-GAAP metrics, crucial for investors and financial analysts.

Agent Coordination and Orchestration

Deploying three specialist agents is only half the battle. They must coordinate, share context, and resolve conflicts. This is where orchestration becomes critical.

Sequential vs. Parallel Execution

In a simple orchestration, agents run sequentially: table agent first, then narrative, then footnote. This is safe but slow. A better approach runs agents in parallel where possible, with dependencies managed explicitly. For example:

Table and narrative agents can run in parallel (they don’t depend on each other).
The footnote agent can start immediately but may need to wait for table results to resolve cross-references.
A final validation agent runs after all three, checking for inconsistencies (e.g., a footnote references a table that wasn’t extracted).

Context Sharing

Agents need shared context to coordinate. A typical context object includes:

Document metadata: Title, filing type, fiscal period, company name.
Page structure: Which pages contain which sections (tables, narratives, footnotes).
Extraction results: As each agent completes, results are added to shared context.
Confidence scores: Each agent assigns confidence to its outputs; orchestration uses these to decide whether to retry, escalate, or proceed.

Conflict Resolution

When agents disagree—e.g., the table agent extracts a revenue figure that conflicts with a narrative statement—the orchestrator must decide. Strategies include:

Confidence-weighted voting: Trust the agent with higher confidence.
Source prioritization: Prefer structured data (tables) over narrative claims.
Human escalation: Flag conflicts for manual review if confidence is low.
Reconciliation: Ask a specialized agent to investigate the conflict and propose resolution.

Research on agentic document extraction techniques shows that well-designed orchestration can reduce errors by 20–30% compared to independent agent execution.

Implementation: From LLMs to Production Pipelines

Building a multi-agent extraction system requires choices about technology, tooling, and integration patterns.

Foundation: Large Language Models

Modern multi-agent extraction relies on large language models (LLMs) as the reasoning engine. An LLM—whether OpenAI’s GPT-4, Anthropic’s Claude, or an open-source model like Llama—can understand financial language, extract information from unstructured text, and follow complex instructions.

But raw LLMs aren’t enough. They hallucinate, they’re slow on long documents, and they’re expensive at scale. Production systems layer additional techniques:

Prompt Engineering

Each agent’s behavior is shaped by its system prompt. A table-extraction agent’s prompt might be:

You are a financial table extraction specialist. Your job is to:
1. Identify all tables in the provided document section.
2. Extract table structure (headers, rows, columns).
3. Normalize numeric values (remove currency symbols, convert to standard units).
4. Assign confidence scores based on clarity and consistency.
5. Output JSON with the structure: {"tables": [{"headers": [...], "rows": [...], "confidence": 0.95}]}

Be precise. If you're uncertain about a value, mark it with low confidence rather than guessing.

A narrative agent’s prompt is different, emphasizing semantic extraction over structural parsing:

You are a financial narrative analysis specialist. Extract key business drivers, risks, and forward-looking statements from the provided text. Output JSON with: {"drivers": [...], "risks": [...], "forward_statements": [...]}

Focus on material information that explains financial performance.

Retrieval-Augmented Generation (RAG)

For long documents, RAG improves accuracy. Instead of feeding the entire 10-K to an agent, you:

Chunk the document into sections.
Embed chunks into a vector database.
When an agent needs context, retrieve relevant chunks via semantic search.
Feed only relevant chunks to the LLM.

This reduces token usage, improves latency, and often improves accuracy (the agent focuses on relevant information rather than getting lost in a 100-page document).

Structured Output Enforcement

LLMs naturally output free-form text. To ensure agents produce valid JSON or structured formats, use:

JSON schema validation: Define expected output schema; reject or retry if LLM output doesn’t match.
Output parsing libraries: Tools like Pydantic (Python) or Zod (JavaScript) enforce structure.
Few-shot prompting: Show the agent examples of correct output format.

Integration with Document Processing Pipelines

Multi-agent extraction doesn’t start with LLMs. It starts with document ingestion and preprocessing.

OCR and Document Parsing

If documents are PDFs or scans, you need OCR (optical character recognition) to convert images to text. Modern OCR tools like LandingAI’s document extraction APIs handle layout analysis, table detection, and text extraction in one step, outputting structured representations that agents can consume directly.

Document Type Detection

Not all financial documents are 10-Ks. You might process earnings transcripts, proxy statements, or investor presentations. A preprocessing step classifies document type and routes to appropriate agent configurations.

Quality Control and Validation

After extraction, validate:

Schema compliance: Does output match expected structure?
Consistency: Do extracted values align across tables and narrative?
Completeness: Are all required fields present?
Reasonableness: Are numeric values in expected ranges (e.g., revenue > 0)?

Failed validations trigger human review or re-extraction with different agent configurations.

Real-World Implementation Example

Consider building an extraction pipeline for a portfolio company’s 10-K filing. Here’s a typical flow:

Ingest: Upload 10-K PDF to your pipeline.
Preprocess: OCR to text, detect document structure, identify major sections.
Parallel extraction:
- Table agent extracts financial statements (balance sheet, income statement, cash flow).
- Narrative agent extracts MD&A insights and risk factors.
- Footnote agent extracts accounting policies and reconciliations.
Orchestrate: Combine results, resolve conflicts, assign confidence scores.
Validate: Check for inconsistencies; flag low-confidence extractions for review.
Load: Push extracted data to your data warehouse or analytics platform.
Visualize: Create dashboards showing financial metrics, trends, and narrative context.

The entire pipeline runs in minutes. A human reviewer spends 10–15 minutes checking flagged items. Compare this to manual extraction, which takes hours per document.

Advanced Patterns: MCP Servers and API-First Integration

For teams building embedded analytics or integrating extraction into larger platforms, advanced patterns emerge.

Model Context Protocol (MCP) for Agent Communication

MCP is a protocol for structured communication between AI agents and external tools. In a multi-agent extraction system, MCP enables:

Tool integration: Agents call tools (database lookups, API calls, calculation engines) without hard-coded dependencies.
Composability: Build complex extraction workflows by composing simple agent-tool combinations.
Standardization: Multiple agents use the same tool interface, reducing duplication.

Example: A footnote agent needs to resolve a cross-reference to a table. Instead of re-parsing the table, it calls a “lookup_table” tool via MCP, which retrieves the already-extracted table from shared context. This is faster and ensures consistency.

API-First Architecture

Production systems expose extraction as an API:

POST /api/v1/extract
Content-Type: application/json

{
  "document_url": "https://sec.gov/cgi-bin/browse-edgar?...",
  "document_type": "10-K",
  "extraction_agents": ["tables", "narrative", "footnotes"],
  "output_format": "json"
}

Response:
{
  "extraction_id": "ext_123abc",
  "status": "processing",
  "results": {
    "tables": [...],
    "narrative": [...],
    "footnotes": [...]
  },
  "confidence": 0.91,
  "processing_time_ms": 45000
}

This API can be called from your data pipeline, embedded analytics application, or BI tool. For teams using D23’s managed Apache Superset platform, this API integrates seamlessly via custom connectors, enabling you to build dashboards that automatically ingest extracted financial data.

Feedback Loops and Continuous Improvement

A production extraction system learns from feedback. When a human corrects an agent’s extraction, that correction is:

Logged as a training example.
Used to fine-tune agent prompts or retrain underlying models.
Analyzed to identify systematic failures (e.g., “agent struggles with footnotes in italics”).

Over time, extraction quality improves. Research on financial document extraction systems shows that systems with feedback loops achieve 95%+ accuracy after processing 100–200 documents in a domain.

Use Cases: Where Multi-Agent Extraction Delivers Value

Multi-agent document extraction is most valuable in specific scenarios where financial documents are both mission-critical and high-volume.

Portfolio Analytics for Private Equity

PE firms manage dozens to hundreds of portfolio companies. Each company files 10-Ks, quarterly reports, and investor updates. Extracting KPIs (revenue, EBITDA, customer count, churn) from these documents at scale is essential for tracking value creation and managing LP reporting.

Traditional approach: hire analysts to manually extract KPIs from each document (2–4 hours per company per quarter). With multi-agent extraction: automated extraction in 5 minutes, with a 10-minute human review. Across a 50-company portfolio, this saves 300+ analyst hours per year.

The extracted data feeds into a centralized analytics dashboard where PE investors track portfolio performance, benchmark companies against peers, and identify value-creation opportunities.

Embedded Analytics in Fintech Products

Fintech platforms (wealth management, corporate treasury, investor relations software) often need to ingest client financial documents and surface insights. A wealth manager, for example, might want to analyze a client’s 10-K to assess investment risk. Manual extraction is impractical; the client expects instant insights.

Multi-agent extraction enables this. When a client uploads a 10-K, the platform automatically extracts key metrics, risk factors, and financial trends, populating a dashboard within minutes. The user sees a structured view of the company’s financial health without waiting for manual processing.

Regulatory Compliance and Risk Monitoring

Regulatory bodies and compliance teams track financial disclosures for red flags: accounting changes, going-concern warnings, related-party transactions. Multi-agent extraction, particularly the footnote agent, surfaces these signals automatically. A compliance system can ingest all SEC filings for a sector and flag companies with unusual disclosures for investigation.

Venture Capital Portfolio Tracking

VC firms need to track portfolio company performance, often relying on financial updates, cap tables, and occasional 10-Ks (as companies mature). Multi-agent extraction standardizes this data, enabling VCs to build dashboards showing portfolio growth, burn rates, and valuation trends. For fund reporting to LPs, extracted data is compiled into performance summaries and benchmarking analyses.

Challenges and Limitations

Multi-agent extraction is powerful but not a silver bullet. Real-world deployments encounter challenges.

Hallucination and Confidence Calibration

LLMs sometimes invent data (hallucination). A narrative agent might extract a forward statement the company never made. Confidence scores help, but they’re imperfect. Production systems use multiple validation techniques:

Re-extraction with different prompts or models (if results differ, confidence drops).
Semantic consistency checks (extracted data shouldn’t contradict other extractions).
Domain-specific validation (extracted revenue shouldn’t be negative).

Cost at Scale

LLM API calls are expensive. Extracting 1,000 10-Ks at $0.10 per document costs $100. For high-volume use cases, this adds up. Strategies to reduce cost:

Use smaller, cheaper models where possible (e.g., a smaller model for footnote extraction).
Implement caching: if you’ve already extracted a company’s 10-K, don’t re-extract it.
Batch processing: send multiple documents in a single request.
Fine-tune open-source models on your domain, reducing reliance on expensive APIs.

Document Variability

Even within a single document type (10-K), formatting varies. One company’s balance sheet is a clean table; another’s is embedded in prose. Agents must be robust to these variations. This requires:

Testing on diverse document samples before production deployment.
Monitoring extraction quality over time; if a new document format appears, retrain or adjust prompts.
Maintaining fallback strategies (e.g., if table extraction fails, try narrative extraction for the same metric).

Regulatory and Compliance Considerations

Financial data is sensitive. Extraction systems must:

Comply with data protection regulations (GDPR, CCPA, etc.).
Maintain audit trails (who extracted what, when).
Ensure extracted data isn’t used for unauthorized purposes.
Handle confidential information (private financial data) securely.

For teams deploying extraction in regulated environments, these considerations are non-negotiable.

Building Your Extraction Pipeline: Practical Steps

If you’re ready to implement multi-agent extraction, here’s a practical roadmap.

Phase 1: Proof of Concept

Define scope: Which document types? Which fields to extract? What’s your target accuracy?
Gather samples: Collect 10–20 representative documents.
Design agents: Sketch out your three specialist agents (or adapt to your use case).
Prototype: Use an LLM API (GPT-4, Claude) to test extraction on samples. Don’t worry about optimization yet.
Measure: How accurate is extraction? Where do agents fail? What’s the cost per document?

Improve prompts: Based on Phase 1 results, refine agent prompts. Use few-shot examples.
Add validation: Build checks to catch hallucinations and inconsistencies.
Implement orchestration: Coordinate agents, manage context, resolve conflicts.
Test at scale: Run on 100+ documents. Monitor quality and cost.
Optimize: Switch to cheaper models where possible. Implement caching. Batch requests.

Phase 3: Production Deployment

Build API: Expose extraction as a service.
Integrate with downstream systems: Connect to your data warehouse, BI platform, or application.
Implement feedback loops: Log corrections; use them to improve extraction.
Monitor: Track extraction quality, latency, and cost in production.
Scale: Deploy to handle your full document volume.

Choosing Tools and Platforms

You have options:

LLM APIs: OpenAI, Anthropic, Google Cloud, AWS Bedrock. Easy to get started; expensive at scale.
Open-source models: Llama, Mistral, others. Requires infrastructure; cheaper at scale.
Document processing platforms: LandingAI’s agentic extraction is purpose-built for financial documents. Combines OCR, layout analysis, and LLM-based extraction.
Workflow orchestration: Airflow, Prefect, Temporal. Manage multi-agent execution, retries, and error handling.
Data platforms: Once extracted, push data to your data warehouse (Snowflake, BigQuery) or analytics platform. For teams building self-serve BI dashboards, extracted financial data becomes a queryable data source.

Connecting Extraction to Analytics: The Complete Picture

Extraction is only valuable if extracted data fuels insights. The final step is integrating extraction with your analytics stack.

Imagine you’ve extracted financial data from 50 portfolio companies’ 10-Ks. The extracted tables (balance sheets, income statements) are now in your data warehouse. Your narrative extractions (key business drivers, risks) are in a separate table. Footnote extractions (accounting policies, reconciliations) are indexed and searchable.

Now, a PE investor wants to understand which portfolio companies have high debt levels and what risks they face. They open your analytics dashboard and create a query:

SELECT 
  company_name,
  total_debt,
  debt_to_equity_ratio,
  key_risks
FROM extracted_financials
WHERE debt_to_equity_ratio > 2.0
ORDER BY debt_to_equity_ratio DESC

The dashboard shows a table of high-leverage companies, their risk factors (extracted from narratives), and links to source documents. The investor can drill into any company, see the original 10-K, and understand both the quantitative and qualitative picture.

This is the power of multi-agent extraction: it transforms unstructured financial documents into structured, queryable data that drives decision-making. For data leaders building analytics platforms or embedded BI, multi-agent extraction is the bridge between document sources and analytical insights.

Conclusion: The Future of Financial Data Extraction

Multi-agent document extraction is rapidly becoming the standard for processing financial documents at scale. The architecture—specialist agents for tables, narratives, and footnotes, coordinated by intelligent orchestration—mirrors how humans read financial documents: different experts focusing on different aspects, then sharing findings.

The technology is mature. LLMs are capable. Tools like LandingAI’s agentic extraction APIs and open-source frameworks make implementation accessible. The business case is clear: extract financial data 10–100× faster than manual processes, with comparable or better accuracy.

For data and engineering leaders evaluating how to ingest financial data—whether for embedded analytics, portfolio tracking, or regulatory compliance—multi-agent extraction should be on your roadmap. Start with a proof of concept on a small document set. Measure accuracy and cost. Integrate with your analytics platform. Scale from there.

The future of financial intelligence is automated, specialized, and intelligent. Multi-agent extraction is how you get there.

Understanding Multi-Agent Document Extraction

Why Single-Agent Extraction Falls Short

Core Architecture: Three Specialist Agents

The Table Extraction Agent

The Narrative Extraction Agent

The Footnote Extraction Agent

Agent Coordination and Orchestration

Implementation: From LLMs to Production Pipelines

Foundation: Large Language Models

Integration with Document Processing Pipelines

Real-World Implementation Example

Advanced Patterns: MCP Servers and API-First Integration

Model Context Protocol (MCP) for Agent Communication

API-First Architecture

Feedback Loops and Continuous Improvement

Use Cases: Where Multi-Agent Extraction Delivers Value

Portfolio Analytics for Private Equity

Embedded Analytics in Fintech Products

Regulatory Compliance and Risk Monitoring

Venture Capital Portfolio Tracking

Challenges and Limitations

Building Your Extraction Pipeline: Practical Steps

Phase 1: Proof of Concept

Phase 2: Refinement

Phase 3: Production Deployment

Choosing Tools and Platforms

Connecting Extraction to Analytics: The Complete Picture

Conclusion: The Future of Financial Data Extraction