Automate Invoice Processing with AI — No Templates Required
Your accounts payable team processes invoices from 200 vendors. Each vendor uses a different format. Some send clean PDFs generated from their billing system. Others send scanned copies with handwritten notes in the margins. A few still fax theirs, and someone on your team photographs them with a phone before uploading.
Every month, someone manually keys in vendor names, invoice numbers, line items, totals, and tax amounts. They make mistakes. They fall behind. And when a new vendor shows up with yet another layout, the whole process slows down again.
Template-based extraction was supposed to fix this. It didn't.
The problem with template-based invoice processing
Template-based systems work by mapping fixed coordinates on a page to specific fields. "The invoice number is always at position (x: 312, y: 85)" or "The total is in the bottom-right cell of the last table." You define these rules for each vendor layout, and the system follows them.
This approach has three fundamental problems.
Every new vendor requires a new template. If you onboard 20 new vendors this quarter, someone needs to build 20 new templates. Each one requires sample invoices, manual field mapping, and testing. That's hours of configuration work per vendor.
Layouts change without warning. A vendor updates their billing software. The invoice number moves from the top-right to below the vendor address. Your template breaks silently. The system either extracts the wrong value or returns nothing, and you don't find out until reconciliation fails downstream.
They can't handle real-world messiness. Scanned invoices arrive slightly rotated. Photographed invoices have shadows and uneven lighting. Handwritten purchase order numbers sit in the margins. Stamps overlap printed text. Multi-page invoices split line items across pages. Template-based systems fail on all of these because they depend on pixel-perfect positioning that messy documents don't provide.
The result is a system that works well for your top 10 vendors and fails unpredictably for everyone else.
How AI-powered extraction works differently
Instead of mapping coordinates, AI-powered extraction reads the document the way a human would. It understands what an invoice number looks like, where totals typically appear, and how line item tables are structured, regardless of the specific layout.
The approach is called LLM-grounded extraction. A large language model reads the full document text (obtained via native PDF parsing or OCR for scanned documents), then extracts the fields you define in a schema. The model understands context: it knows that "Total Due" and "Amount Payable" and "Balance" all refer to the same concept. It handles variations in formatting, language, and structure without any template configuration.
Here is what changes:
- Zero template maintenance. Define your schema once. It works across every vendor format.
- New vendors work immediately. No configuration needed. Upload the invoice, get structured data back.
- Scanned and photographed invoices work. OCR extracts the text, and a vision model proofreads the result to catch OCR errors.
- Confidence scores tell you what to trust. Each extracted field comes with a confidence level, so you can auto-process high-confidence results and flag uncertain ones for human review.
Setting up the Drive AI Extract API
The Drive AI Extract API provides schema-based document extraction. You define the fields you want, send a document, and get structured JSON back with extracted values, confidence scores, and source citations.
Install the SDK:
# Node.js
npm install @thedriveai/sdk
# Python
pip install thedriveai
Get your API key from dev.thedrive.ai. Keys start with tda_live_. You get 100 free credits per month (1 credit per page), and additional credits cost $0.01 each.
Defining an invoice extraction schema
The schema tells the API exactly what fields to extract and what types to expect. Here is a complete schema for invoice processing:
{
"schema": [
{
"name": "vendor_name",
"type": "string",
"description": "The company or individual that issued the invoice",
"required": true
},
{
"name": "vendor_address",
"type": "string",
"description": "Full mailing address of the vendor"
},
{
"name": "invoice_number",
"type": "string",
"description": "Unique invoice identifier assigned by the vendor",
"required": true
},
{
"name": "invoice_date",
"type": "string",
"description": "Date the invoice was issued, in YYYY-MM-DD format",
"required": true
},
{
"name": "due_date",
"type": "string",
"description": "Payment due date, in YYYY-MM-DD format"
},
{
"name": "purchase_order_number",
"type": "string",
"description": "PO number referenced on the invoice, if any"
},
{
"name": "currency",
"type": "string",
"description": "Three-letter currency code (e.g., USD, EUR, GBP)"
},
{
"name": "line_items",
"type": "array",
"description": "Each line item on the invoice with description, quantity, unit price, and total",
"required": true
},
{
"name": "subtotal",
"type": "number",
"description": "Sum of all line items before tax and discounts"
},
{
"name": "tax_amount",
"type": "number",
"description": "Total tax amount applied to the invoice"
},
{
"name": "tax_rate",
"type": "string",
"description": "Tax rate as a percentage, e.g. '8.25%'"
},
{
"name": "discount_amount",
"type": "number",
"description": "Any discount applied to the invoice total"
},
{
"name": "total_amount",
"type": "number",
"description": "Final amount due including tax and discounts",
"required": true
},
{
"name": "payment_terms",
"type": "string",
"description": "Payment terms such as 'Net 30', 'Due on Receipt', etc."
},
{
"name": "bank_details",
"type": "string",
"description": "Bank account or payment routing information, if listed"
}
]
}
A few things to note about the schema design:
- Use descriptive
descriptionfields. The LLM uses these to understand what to look for. "Date the invoice was issued, in YYYY-MM-DD format" is much better than just "invoice date" because it tells the model the expected format. - Mark critical fields as
required. If the API cannot extract a required field, you'll know immediately rather than discovering missing data later. - Use
arrayfor line items. The API returns structured arrays, so each line item comes back as an object with its own fields.
Extracting data from an invoice
Here is a complete example using the Node.js SDK:
import TheDriveAI from '@thedriveai/sdk';
import fs from 'fs';
const client = new TheDriveAI({ apiKey: 'tda_live_...' });
const invoiceSchema = [
{ name: "vendor_name", type: "string", description: "Company that issued the invoice", required: true },
{ name: "invoice_number", type: "string", description: "Unique invoice identifier", required: true },
{ name: "invoice_date", type: "string", description: "Date issued in YYYY-MM-DD format", required: true },
{ name: "due_date", type: "string", description: "Payment due date in YYYY-MM-DD format" },
{ name: "line_items", type: "array", description: "Line items with description, quantity, unit_price, and amount" },
{ name: "subtotal", type: "number", description: "Sum before tax" },
{ name: "tax_amount", type: "number", description: "Total tax" },
{ name: "total_amount", type: "number", description: "Final amount due", required: true },
{ name: "currency", type: "string", description: "Three-letter currency code" },
{ name: "payment_terms", type: "string", description: "e.g. Net 30, Due on Receipt" },
];
async function extractInvoice(filePath) {
const file = fs.readFileSync(filePath);
const result = await client.extract({
file,
fileName: filePath.split('/').pop(),
schema: invoiceSchema,
});
return result;
}
const result = await extractInvoice('./invoices/acme-corp-2026-0412.pdf');
console.log(JSON.stringify(result, null, 2));
The response looks like this:
{
"data": {
"vendor_name": "Acme Industrial Supply Co.",
"invoice_number": "INV-2026-0412",
"invoice_date": "2026-05-28",
"due_date": "2026-06-27",
"line_items": [
{ "description": "Steel bolts M8x40 (box of 500)", "quantity": 3, "unit_price": 42.50, "amount": 127.50 },
{ "description": "Flat washers M8 (box of 1000)", "quantity": 2, "unit_price": 18.00, "amount": 36.00 },
{ "description": "Hex nuts M8 (box of 500)", "quantity": 3, "unit_price": 29.75, "amount": 89.25 }
],
"subtotal": 252.75,
"tax_amount": 20.85,
"total_amount": 273.60,
"currency": "USD",
"payment_terms": "Net 30"
},
"confidence": {
"vendor_name": "high",
"invoice_number": "high",
"invoice_date": "high",
"due_date": "high",
"line_items": "high",
"subtotal": "high",
"tax_amount": "medium",
"total_amount": "high",
"currency": "high",
"payment_terms": "high"
},
"citations": {
"vendor_name": "Acme Industrial Supply Co.\n1234 Commerce Blvd, Suite 200",
"invoice_number": "Invoice #: INV-2026-0412",
"total_amount": "Total Due: $273.60"
}
}
Three things come back with every extraction:
datacontains the extracted values, structured exactly as your schema defines.confidencerates each field ashigh,medium, orlow. This is what makes automated routing possible.citationsshows the source text from the document that each value was extracted from. This is useful for audit trails and debugging.
Handling different invoice types
The same schema and code works across all common invoice formats. Here is what happens under the hood for each type.
Digital PDFs (generated from billing software like QuickBooks, Xero, FreshBooks): The API parses the native PDF text layer directly. This is the fastest and most accurate path. Text is clean, tables are structured, and confidence is almost always high across all fields.
Scanned invoices (paper invoices run through a flatbed or document scanner): OCR extracts the text, then a vision model proofreads the OCR output against the original image. This catches common OCR errors like confusing 0 with O, 1 with l, or 5 with S. Scanned invoices at 200+ DPI typically produce high-confidence results.
Photographed invoices (taken with a phone camera): These are the hardest. Uneven lighting, perspective distortion, shadows, and low resolution all introduce noise. The API's vision model handles most of these, but you'll see more medium confidence scores. For best results, advise your team to photograph invoices flat, in good lighting, without shadows crossing the text.
Multi-page invoices: The API processes all pages together as a single document. Line items that span page breaks are merged correctly. The cost is 1 credit per page, so a 3-page invoice uses 3 credits.
Invoices with handwritten annotations: Purchase order numbers written in pen, approval signatures, or margin notes are all handled by the vision model. The API distinguishes between printed invoice content and handwritten additions.
The Python SDK works the same way:
from thedriveai import TheDriveAI
client = TheDriveAI(api_key="tda_live_...")
schema = [
{"name": "vendor_name", "type": "string", "description": "Company that issued the invoice", "required": True},
{"name": "invoice_number", "type": "string", "description": "Unique invoice identifier", "required": True},
{"name": "invoice_date", "type": "string", "description": "Date issued in YYYY-MM-DD format", "required": True},
{"name": "line_items", "type": "array", "description": "Line items with description, quantity, unit_price, amount"},
{"name": "subtotal", "type": "number", "description": "Sum before tax"},
{"name": "tax_amount", "type": "number", "description": "Total tax"},
{"name": "total_amount", "type": "number", "description": "Final amount due", "required": True},
{"name": "currency", "type": "string", "description": "Three-letter currency code"},
]
with open("invoice.pdf", "rb") as f:
result = client.extract(
file=f.read(),
file_name="invoice.pdf",
schema=schema,
)
print(result.data)
print(result.confidence)
Confidence-based routing
Raw extraction is only half the problem. The other half is deciding what to do with the results. Confidence scores make this decision straightforward.
function routeInvoice(extractionResult) {
const { data, confidence } = extractionResult;
// Check if all required fields have high confidence
const requiredFields = ['vendor_name', 'invoice_number', 'invoice_date', 'total_amount'];
const allRequiredHighConfidence = requiredFields.every(
field => confidence[field] === 'high'
);
// Check if line items match the total (basic validation)
const lineItemSum = data.line_items?.reduce((sum, item) => sum + item.amount, 0) || 0;
const totalMatchesLineItems = Math.abs(
(data.subtotal || lineItemSum) + (data.tax_amount || 0) - data.total_amount
) < 0.02; // Allow for rounding
if (allRequiredHighConfidence && totalMatchesLineItems) {
return { action: 'auto_process', data };
}
if (allRequiredHighConfidence && !totalMatchesLineItems) {
return {
action: 'review',
reason: 'Line item total does not match invoice total',
data,
};
}
// Identify which fields need human review
const lowConfidenceFields = Object.entries(confidence)
.filter(([, level]) => level !== 'high')
.map(([field, level]) => ({ field, level }));
return {
action: 'review',
reason: 'Low confidence on one or more fields',
flaggedFields: lowConfidenceFields,
data,
};
}
In practice, this routing logic gives you three tiers:
- Auto-process (typically 70-85% of invoices): All required fields extracted with high confidence, and the math checks out. These go straight into your accounting system.
- Light review (10-20%): Most fields are high confidence, but one or two need a human glance. Maybe the tax amount is medium confidence because the invoice lists multiple tax rates. A reviewer confirms or corrects the flagged fields in seconds.
- Manual entry (5-10%): Poor quality scans, unusual formats, or heavily annotated documents. These get routed to a human for full manual processing. Even here, the extraction result serves as a starting point rather than a blank form.
The key insight is that you don't need 100% accuracy to get massive value from automation. Processing 80% of invoices automatically and flagging 20% for quick review is dramatically faster than manually processing 100%.
Building a complete invoice pipeline
A production invoice pipeline needs to handle batch processing, error recovery, and integration with your accounting system. Here is a practical implementation:
import TheDriveAI from '@thedriveai/sdk';
import fs from 'fs';
import path from 'path';
const client = new TheDriveAI({ apiKey: process.env.DRIVE_AI_API_KEY });
const invoiceSchema = [
{ name: "vendor_name", type: "string", description: "Company that issued the invoice", required: true },
{ name: "invoice_number", type: "string", description: "Unique invoice identifier", required: true },
{ name: "invoice_date", type: "string", description: "Date issued in YYYY-MM-DD format", required: true },
{ name: "due_date", type: "string", description: "Payment due date in YYYY-MM-DD format" },
{ name: "purchase_order_number", type: "string", description: "PO number if referenced" },
{ name: "line_items", type: "array", description: "Line items with description, quantity, unit_price, amount" },
{ name: "subtotal", type: "number", description: "Sum before tax" },
{ name: "tax_amount", type: "number", description: "Total tax" },
{ name: "total_amount", type: "number", description: "Final amount due", required: true },
{ name: "currency", type: "string", description: "Three-letter currency code" },
{ name: "payment_terms", type: "string", description: "e.g. Net 30, Due on Receipt" },
];
async function processInvoiceBatch(invoiceDir) {
const files = fs.readdirSync(invoiceDir).filter(f =>
['.pdf', '.jpg', '.jpeg', '.png', '.tiff', '.docx'].includes(path.extname(f).toLowerCase())
);
const results = {
autoProcessed: [],
needsReview: [],
failed: [],
};
for (const fileName of files) {
try {
const filePath = path.join(invoiceDir, fileName);
const file = fs.readFileSync(filePath);
const extraction = await client.extract({
file,
fileName,
schema: invoiceSchema,
});
const routing = routeInvoice(extraction);
if (routing.action === 'auto_process') {
results.autoProcessed.push({ fileName, data: routing.data });
// Send to accounting system
// await postToAccountingSystem(routing.data);
} else {
results.needsReview.push({
fileName,
data: routing.data,
reason: routing.reason,
flaggedFields: routing.flaggedFields,
});
}
} catch (error) {
results.failed.push({ fileName, error: error.message });
}
}
return results;
}
// Process a folder of invoices
const results = await processInvoiceBatch('./invoices/june-2026');
console.log(Auto-processed: ${results.autoProcessed.length});
console.log(Needs review: ${results.needsReview.length});
console.log(Failed: ${results.failed.length});
For higher throughput, you can run extractions in parallel with concurrency control:
async function processInvoiceBatchParallel(invoiceDir, concurrency = 5) {
const files = fs.readdirSync(invoiceDir).filter(f =>
['.pdf', '.jpg', '.jpeg', '.png', '.tiff', '.docx'].includes(path.extname(f).toLowerCase())
);
const results = { autoProcessed: [], needsReview: [], failed: [] };
// Process in batches of concurrency
for (let i = 0; i < files.length; i += concurrency) {
const batch = files.slice(i, i + concurrency);
const batchResults = await Promise.allSettled(
batch.map(async (fileName) => {
const file = fs.readFileSync(path.join(invoiceDir, fileName));
const extraction = await client.extract({ file, fileName, schema: invoiceSchema });
return { fileName, extraction };
})
);
for (const result of batchResults) {
if (result.status === 'fulfilled') {
const { fileName, extraction } = result.value;
const routing = routeInvoice(extraction);
if (routing.action === 'auto_process') {
results.autoProcessed.push({ fileName, data: routing.data });
} else {
results.needsReview.push({ fileName, ...routing });
}
} else {
results.failed.push({ fileName: batch[batchResults.indexOf(result)], error: result.reason.message });
}
}
}
return results;
}
Cost comparison: manual vs. template vs. AI extraction
Here is a realistic comparison for a company processing 2,000 invoices per month, averaging 1.5 pages per invoice (3,000 pages total).
Manual data entry
- Time: ~4 minutes per invoice (read, key in fields, verify)
- Monthly hours: ~133 hours
- Cost at $25/hr: $3,325/month
- Error rate: 2-4% (keystroke errors, transposition, missed fields)
- New vendor setup: None, but every invoice takes the same time
Template-based extraction
- Software cost: $500-2,000/month (depending on vendor)
- Template creation: 2-4 hours per new vendor layout
- Ongoing maintenance: 10-20 hours/month fixing broken templates
- Maintenance cost at $40/hr: $400-800/month
- Accuracy on templated vendors: 95%+
- Accuracy on new/changed layouts: 0% until template is built
AI extraction with Drive AI
- 3,000 pages at $0.01/credit: $30/month
- Template creation: None
- Maintenance: None
- Review time for flagged invoices (~20%): ~13 hours/month
- Review cost at $25/hr: $325/month
- Total monthly cost: ~$355/month
- Accuracy on auto-processed invoices: 98%+
The difference is stark. AI extraction costs roughly one-tenth of manual processing and eliminates the template maintenance burden entirely. The accuracy on auto-processed invoices is higher than manual entry because there are no keystroke errors, and the confidence routing ensures uncertain extractions get human review.
Getting started
To start extracting data from invoices with the Drive AI Extract API:
- Sign up at dev.thedrive.ai and get your API key.
- Install the SDK (
npm install @thedriveai/sdkorpip install thedriveai). - Copy the invoice schema from this post and adjust the fields for your use case.
- Send a few test invoices from different vendors to validate the extraction quality.
- Implement confidence-based routing to separate auto-processable invoices from those needing review.
- Connect the output to your accounting system or ERP.
The free tier gives you 100 credits per month, enough to test with real invoices before committing. Supported formats include PDF, JPG, PNG, TIFF, HEIC, and DOCX.
The API reference and full SDK documentation are available at dev.thedrive.ai.
Have questions? Reach out at contact@thedrive.ai.
Share it with your network
