Blog
7 min read

AI Document Analysis API — Multi-Step Reasoning and Code Execution Over Any Document

Extraction Is Not Analysis

Most document APIs extract data — they pull text, tables, and fields from a PDF and hand them to you as JSON. That works when you need raw values. It does not work when you need answers.

"What is the gross margin trend over the last three quarters?" is not an extraction problem. It requires reading a financial report, locating the right tables across multiple pages, computing gross margins from revenue and COGS figures, comparing them, and identifying the trend. That is analysis.

The standard approach is to extract the data with one tool, then write application code to compute the answer. That works until the document format changes, the table layout shifts, or the calculation requires context that is only available in the surrounding text.

We built an API that does both — reads documents and reasons over them, with sandboxed code execution for precise calculations.

How It Works

The Drive AI Analyze API takes a document and a question, then returns a computed answer with a full reasoning trace showing every step.

POST https://dev.thedrive.ai/api/v1/analyze

Ask a Question

{
  "url": "https://storage.example.com/reports/annual-report-2024.pdf",
  "question": "What was the year-over-year revenue growth rate, and how did operating expenses change as a percentage of revenue compared to the prior year?"
}

Get a Reasoned Answer

{
  "answer": "Revenue grew 23.4% year-over-year from $142M to $175.3M. Operating expenses as a percentage of revenue decreased from 71.2% to 67.8%, indicating improving operational efficiency. The largest contributor to this improvement was a 4.1 percentage point reduction in sales and marketing spend relative to revenue.",
  "reasoning": [
    {
      "step": 1,
      "action": "Located financial summary table on pages 12-13",
      "detail": "Identified revenue, COGS, and operating expense line items for FY2023 and FY2024"
    },
    {
      "step": 2,
      "action": "Extracted key figures",
      "detail": "FY2024 Revenue: $175.3M, FY2023 Revenue: $142.0M, FY2024 OpEx: $118.9M, FY2023 OpEx: $101.1M"
    },
    {
      "step": 3,
      "action": "Computed growth rate",
      "code": "growth = (175.3 - 142.0) / 142.0 * 100  # 23.45%",
      "result": "23.45%"
    },
    {
      "step": 4,
      "action": "Computed OpEx as percentage of revenue",
      "code": "opex_2024 = 118.9 / 175.3 * 100  # 67.83%\nopex_2023 = 101.1 / 142.0 * 100  # 71.20%",
      "result": "FY2024: 67.83%, FY2023: 71.20%"
    },
    {
      "step": 5,
      "action": "Identified primary driver of OpEx improvement",
      "detail": "S&M spend decreased from 38.1% to 34.0% of revenue per the expense breakdown on page 14"
    }
  ]
}

Every number is computed, not hallucinated. The reasoning trace shows exactly where data came from and how calculations were performed.

Authentication

curl -X POST https://dev.thedrive.ai/api/v1/analyze \
  -H "X-API-Key: tda_live_..." \
  -H "Content-Type: application/json" \
  -d '{ "url": "...", "question": "..." }'

Why This Matters

Sandboxed Python Execution

When the API encounters a question that requires computation — sums, averages, ratios, growth rates, comparisons — it writes and executes Python code in a sandboxed environment. The code appears in the reasoning trace so you can verify exactly what was computed.

This eliminates the class of errors where an LLM "calculates" by generating plausible-looking numbers. Every arithmetic operation runs as actual code.

Progressive Document Reading

Financial reports, legal filings, and technical manuals can run to hundreds or thousands of pages. The Analyze API uses progressive reading — it navigates the document structure, reads relevant sections, and skips irrelevant ones. A 1,000-page filing does not get stuffed into a context window. It gets read strategically.

Table-Aware Parsing

Tables in PDFs are notoriously difficult. Merged cells, spanning headers, footnotes, multi-page tables — the API preserves table structure during extraction so that computations on tabular data are accurate.

Cross-Document Context

The reasoning engine uses surrounding text to resolve ambiguities. "Revenue" in the executive summary might refer to gross revenue, while the same word in the income statement refers to net revenue. Context from headings, footnotes, and section structure disambiguates automatically.

Use Cases

Financial Report Analysis

Ask questions about financial documents and get computed answers:

from thedriveai import TheDriveAI

client = TheDriveAI(api_key="tda_live_...")

result = client.analyze(
    url="https://storage.example.com/sec-filings/10-K-2024.pdf",
    question="Calculate the current ratio, quick ratio, and debt-to-equity ratio from the balance sheet. Compare each to industry benchmarks for SaaS companies."
)

print(result.answer)

# Inspect the computation steps
for step in result.reasoning:
    print(f"Step {step['step']}: {step['action']}")
    if 'code' in step:
        print(f"  Code: {step['code']}")
        print(f"  Result: {step['result']}")

Contract Risk Assessment

Go beyond extraction to actual analysis of legal risk:

import { TheDriveAI } from '@thedriveai/sdk';

const client = new TheDriveAI({ apiKey: 'tda_live_...' });

const result = await client.analyze({
  url: 'https://storage.example.com/contracts/vendor-msa.pdf',
  question: 'Identify all clauses that could expose us to unlimited liability, automatic renewal without opt-out, or unilateral termination by the vendor. For each, quote the relevant language and explain the risk.',
});

Insurance Claims Processing

Analyze claim documents to compute coverage and flag discrepancies:

result = client.analyze(
    url="https://storage.example.com/claims/CLM-2024-8891.pdf",
    question="Calculate the total claimed amount, compare it against the policy coverage limits listed in the attached policy schedule, and identify any line items that exceed per-incident caps."
)

Due Diligence

Analyze target company financials during M&A due diligence:

result = client.analyze(
    url="https://storage.example.com/dd/target-financials-2024.pdf",
    question="Compute the EBITDA margin for each quarter, identify any quarter-over-quarter margin compression, and flag revenue concentration if any single customer exceeds 20% of total revenue."
)

Medical and Scientific Documents

Analyze research papers, lab results, or clinical trial data:

result = client.analyze(
    url="https://example.com/papers/clinical-trial-results.pdf",
    question="What was the primary endpoint result? Calculate the relative risk reduction and number needed to treat from the reported event rates in the treatment and control groups."
)

How It Compares

CapabilityDrive AI AnalyzeChatGPT / Claude (direct)AWS Textract + custom codeGoogle Document AI
Multi-step reasoningBuilt-inManual promptingYou build itNo
Code execution for mathSandboxed PythonChatGPT only (Code Interpreter)You build itNo
Reasoning tracesFull trace returnedNot structuredN/AN/A
Large document handlingProgressive readingContext window limitedPage-by-pagePage-by-page
Table-aware parsingYesDepends on formatYesYes
API-firstYesChat-firstYesYes
File formats107+ typesUpload-basedPDF, imagesPDF, images

The difference: this is an API built for programmatic document analysis, not a chat interface repurposed for it. You get structured output with reasoning traces that your application code can parse, route, and act on.

Pricing

PlanCreditsCost
Free100/month$0
ProPay as you go$0.01/credit
EnterpriseCustom volumeContact us

Analysis costs 2 credits per page for documents, 10 credits per site for websites. A 5-page financial report costs 10 credits. The free tier covers roughly 50 pages of analysis per month.

Get Started

Install the SDK:

npm install @thedriveai/sdk
pip install thedriveai

Or use cURL:

curl -X POST https://dev.thedrive.ai/api/v1/analyze \
  -H "X-API-Key: tda_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/report.pdf",
    "question": "Summarize the key financial metrics and flag any concerns."
  }'

Get your API key at dev.thedrive.ai and start analyzing documents programmatically.


Have a complex analysis use case? Reach out at contact@thedrive.ai — we are happy to help you design the right queries for your pipeline.

Share it with your network