Free URL to Markdown API — Clean, LLM-Ready Markdown from Any URL or Document
The Problem Every AI Developer Hits
You are building a RAG pipeline, an AI agent, or a research tool. You need to feed a webpage into your LLM. So you fetch the HTML and discover that a 500-word article is buried inside 47,000 tokens of navigation bars, ad scripts, cookie banners, and tracking pixels.
Raw HTML is not LLM-ready. It wastes tokens, confuses models, and makes retrieval unreliable.
The standard fix is to run a headless browser, strip boilerplate, parse the DOM, and convert to markdown yourself. That means managing Puppeteer, handling JavaScript-rendered SPAs, dealing with anti-bot measures, and maintaining the infrastructure to do it at scale.
We built a simpler option.
One GET Request. Clean Markdown Back.
The Drive AI Markdown API converts any URL or document into clean, structured markdown with a single request:
GET https://dev.thedrive.ai/md/{url}
Pass a URL. Get markdown. That is the entire integration.
curl https://dev.thedrive.ai/md/https://openai.com/index/gpt-4o
The response is clean markdown — headings preserved, tables intact, code blocks formatted, boilerplate stripped. Ready to drop into a vector database, an LLM prompt, or a document store.
Authentication
Include your API key in the header:
curl -H "X-API-Key: tda_live_..." \
https://dev.thedrive.ai/md/https://openai.com/index/gpt-4o
Get your free API key at dev.thedrive.ai. The free tier includes 100 credits per month — each markdown conversion costs 1 credit.
What Makes This Different
There are other URL-to-markdown tools. Here is why we built another one.
JavaScript Rendering
Most converters fetch raw HTML and parse it. That works for static sites. It fails on SPAs, client-rendered dashboards, and any page that loads content via JavaScript.
The Drive AI Markdown API renders JavaScript before extraction. React apps, Next.js pages, dynamically loaded content — it all gets captured.
Boilerplate Removal
Navigation, footers, sidebars, cookie consent modals, ad blocks — all stripped automatically. What remains is the primary content of the page, structured as markdown with proper heading hierarchy.
Document Support
This is not just a webpage converter. The same endpoint handles:
- PDFs — including scanned documents via OCR
- DOCX, PPTX, XLSX — Office formats converted to structured markdown
- Google Docs, Sheets, Slides — public links work directly
- Any publicly accessible file URL
One endpoint, 107+ file types.
Use Cases
RAG Pipelines
Feed clean, chunked content into your vector database without preprocessing:
from thedriveai import TheDriveAI
client = TheDriveAI(api_key="tda_live_...")
# Convert webpage to markdown
md = client.markdown("https://docs.python.org/3/tutorial/classes.html")
# Chunk and embed into your vector store
chunks = split_into_chunks(md, max_tokens=512)
embeddings = embed(chunks)
vector_db.upsert(chunks, embeddings)
AI Agents and Tool Use
Give your agent the ability to read any webpage:
def read_url(url: str) -> str:
"""Tool: Read the content of a URL and return it as markdown."""
return client.markdown(url)
Your agent calls read_url, gets clean text, and reasons over it — no HTML parsing logic needed in your agent code.
Research and Monitoring
Pull structured content from pages on a schedule for competitive monitoring, price tracking, or content aggregation:
import { TheDriveAI } from '@thedriveai/sdk';
const client = new TheDriveAI({ apiKey: 'tda_live_...' });
const sources = [
'https://competitor.com/pricing',
'https://competitor.com/changelog',
'https://competitor.com/blog',
];
const results = await Promise.all(
sources.map(url => client.markdown(url))
);
Documentation Ingestion
Converting technical documentation into a format your LLM can consume:
# Convert an entire docs page to markdown
curl -H "X-API-Key: tda_live_..." \
https://dev.thedrive.ai/md/https://react.dev/reference/react/useState \
-o usestate-docs.md
Feed the output into fine-tuning datasets, knowledge bases, or context windows.
How It Compares
| Feature | Drive AI Markdown | Jina Reader | Firecrawl | Crawl4AI |
|---|---|---|---|---|
| JS rendering | Yes | Limited | Yes | Yes |
| Document support (PDF, DOCX) | 107+ formats | No | No | No |
| OCR for scanned docs | Yes | No | No | No |
| Free tier | 100/month | 1,000/month | 500/month | Self-host |
| Setup | One GET request | One GET request | API key + SDK | Docker |
| Boilerplate removal | Automatic | Automatic | Automatic | Configurable |
The key differentiator: this is not just a web scraper that outputs markdown. It is a document conversion engine that handles PDFs, Office files, and scanned documents through the same endpoint. If your pipeline ingests more than just webpages, one API replaces three.
Pricing
| Plan | Credits | Cost |
|---|---|---|
| Free | 100/month | $0 |
| Pro | Pay as you go | $0.01/credit |
| Enterprise | Custom volume | Contact us |
Each markdown conversion costs 1 credit. The free tier is enough to build and test your integration. Pro pricing scales linearly with no tiers or overages.
Get Started
Install the SDK in your language:
npm install @thedriveai/sdk
pip install thedriveai
Or skip the SDK entirely and use a GET request:
curl -H "X-API-Key: tda_live_..." \
https://dev.thedrive.ai/md/https://example.com
Get your API key at dev.thedrive.ai and start converting URLs to markdown in under a minute.
Have questions or want to see a specific format supported? Reach out at contact@thedrive.ai.
Share it with your network
