Free URL to Markdown API — Clean, LLM-Ready Markdown from Any URL or Document

The Problem Every AI Developer Hits

You are building a RAG pipeline, an AI agent, or a research tool. You need to feed a webpage into your LLM. So you fetch the HTML and discover that a 500-word article is buried inside 47,000 tokens of navigation bars, ad scripts, cookie banners, and tracking pixels.

Raw HTML is not LLM-ready. It wastes tokens, confuses models, and makes retrieval unreliable.

The standard fix is to run a headless browser, strip boilerplate, parse the DOM, and convert to markdown yourself. That means managing Puppeteer, handling JavaScript-rendered SPAs, dealing with anti-bot measures, and maintaining the infrastructure to do it at scale.

We built a simpler option.

One GET Request. Clean Markdown Back.

The Drive AI Markdown API converts any URL or document into clean, structured markdown with a single request:

GET https://dev.thedrive.ai/md/{url}

Pass a URL. Get markdown. That is the entire integration.

curl https://dev.thedrive.ai/md/https://openai.com/index/gpt-4o

The response is clean markdown — headings preserved, tables intact, code blocks formatted, boilerplate stripped. Ready to drop into a vector database, an LLM prompt, or a document store.

Authentication

Include your API key in the header:

curl -H "X-API-Key: tda_live_..." \
  https://dev.thedrive.ai/md/https://openai.com/index/gpt-4o

Get your free API key at dev.thedrive.ai. The free tier includes 100 credits per month — each markdown conversion costs 1 credit.

What Makes This Different

There are other URL-to-markdown tools. Here is why we built another one.

JavaScript Rendering

Most converters fetch raw HTML and parse it. That works for static sites. It fails on SPAs, client-rendered dashboards, and any page that loads content via JavaScript.

The Drive AI Markdown API renders JavaScript before extraction. React apps, Next.js pages, dynamically loaded content — it all gets captured.

Boilerplate Removal

Navigation, footers, sidebars, cookie consent modals, ad blocks — all stripped automatically. What remains is the primary content of the page, structured as markdown with proper heading hierarchy.

Document Support

This is not just a webpage converter. The same endpoint handles:

PDFs — including scanned documents via OCR
DOCX, PPTX, XLSX — Office formats converted to structured markdown
Google Docs, Sheets, Slides — public links work directly
Any publicly accessible file URL

One endpoint, 107+ file types.

Use Cases

RAG Pipelines

Feed clean, chunked content into your vector database without preprocessing:

from thedriveai import TheDriveAI

client = TheDriveAI(api_key="tda_live_...")

# Convert webpage to markdown
md = client.markdown("https://docs.python.org/3/tutorial/classes.html")

# Chunk and embed into your vector store
chunks = split_into_chunks(md, max_tokens=512)
embeddings = embed(chunks)
vector_db.upsert(chunks, embeddings)

AI Agents and Tool Use

Give your agent the ability to read any webpage:

def read_url(url: str) -> str:
    """Tool: Read the content of a URL and return it as markdown."""
    return client.markdown(url)

Your agent calls read_url, gets clean text, and reasons over it — no HTML parsing logic needed in your agent code.

Research and Monitoring

Pull structured content from pages on a schedule for competitive monitoring, price tracking, or content aggregation:

import { TheDriveAI } from '@thedriveai/sdk';

const client = new TheDriveAI({ apiKey: 'tda_live_...' });

const sources = [
  'https://competitor.com/pricing',
  'https://competitor.com/changelog',
  'https://competitor.com/blog',
];

const results = await Promise.all(
  sources.map(url => client.markdown(url))
);

Documentation Ingestion

Converting technical documentation into a format your LLM can consume:

# Convert an entire docs page to markdown
curl -H "X-API-Key: tda_live_..." \
  https://dev.thedrive.ai/md/https://react.dev/reference/react/useState \
  -o usestate-docs.md

Feed the output into fine-tuning datasets, knowledge bases, or context windows.

How It Compares

Feature	Drive AI Markdown	Jina Reader	Firecrawl	Crawl4AI
JS rendering	Yes	Limited	Yes	Yes
Document support (PDF, DOCX)	107+ formats	No	No	No
OCR for scanned docs	Yes	No	No	No
Free tier	100/month	1,000/month	500/month	Self-host
Setup	One GET request	One GET request	API key + SDK	Docker
Boilerplate removal	Automatic	Automatic	Automatic	Configurable

The key differentiator: this is not just a web scraper that outputs markdown. It is a document conversion engine that handles PDFs, Office files, and scanned documents through the same endpoint. If your pipeline ingests more than just webpages, one API replaces three.

Pricing

Plan	Credits	Cost
Free	100/month	$0
Pro	Pay as you go	$0.01/credit
Enterprise	Custom volume	Contact us

Each markdown conversion costs 1 credit. The free tier is enough to build and test your integration. Pro pricing scales linearly with no tiers or overages.

Get Started

Install the SDK in your language:

npm install @thedriveai/sdk

pip install thedriveai

Or skip the SDK entirely and use a GET request:

curl -H "X-API-Key: tda_live_..." \
  https://dev.thedrive.ai/md/https://example.com

Get your API key at dev.thedrive.ai and start converting URLs to markdown in under a minute.

Have questions or want to see a specific format supported? Reach out at contact@thedrive.ai.