Blog
5 min read

Free URL to Markdown API — Clean, LLM-Ready Markdown from Any URL or Document

The Problem Every AI Developer Hits

You are building a RAG pipeline, an AI agent, or a research tool. You need to feed a webpage into your LLM. So you fetch the HTML and discover that a 500-word article is buried inside 47,000 tokens of navigation bars, ad scripts, cookie banners, and tracking pixels.

Raw HTML is not LLM-ready. It wastes tokens, confuses models, and makes retrieval unreliable.

The standard fix is to run a headless browser, strip boilerplate, parse the DOM, and convert to markdown yourself. That means managing Puppeteer, handling JavaScript-rendered SPAs, dealing with anti-bot measures, and maintaining the infrastructure to do it at scale.

We built a simpler option.

One GET Request. Clean Markdown Back.

The Drive AI Markdown API converts any URL or document into clean, structured markdown with a single request:

GET https://dev.thedrive.ai/md/{url}

Pass a URL. Get markdown. That is the entire integration.

curl https://dev.thedrive.ai/md/https://openai.com/index/gpt-4o

The response is clean markdown — headings preserved, tables intact, code blocks formatted, boilerplate stripped. Ready to drop into a vector database, an LLM prompt, or a document store.

Authentication

Include your API key in the header:

curl -H "X-API-Key: tda_live_..." \
  https://dev.thedrive.ai/md/https://openai.com/index/gpt-4o

Get your free API key at dev.thedrive.ai. The free tier includes 100 credits per month — each markdown conversion costs 1 credit.

What Makes This Different

There are other URL-to-markdown tools. Here is why we built another one.

JavaScript Rendering

Most converters fetch raw HTML and parse it. That works for static sites. It fails on SPAs, client-rendered dashboards, and any page that loads content via JavaScript.

The Drive AI Markdown API renders JavaScript before extraction. React apps, Next.js pages, dynamically loaded content — it all gets captured.

Boilerplate Removal

Navigation, footers, sidebars, cookie consent modals, ad blocks — all stripped automatically. What remains is the primary content of the page, structured as markdown with proper heading hierarchy.

Document Support

This is not just a webpage converter. The same endpoint handles:

  • PDFs — including scanned documents via OCR
  • DOCX, PPTX, XLSX — Office formats converted to structured markdown
  • Google Docs, Sheets, Slides — public links work directly
  • Any publicly accessible file URL

One endpoint, 107+ file types.

Use Cases

RAG Pipelines

Feed clean, chunked content into your vector database without preprocessing:

from thedriveai import TheDriveAI

client = TheDriveAI(api_key="tda_live_...")

# Convert webpage to markdown
md = client.markdown("https://docs.python.org/3/tutorial/classes.html")

# Chunk and embed into your vector store
chunks = split_into_chunks(md, max_tokens=512)
embeddings = embed(chunks)
vector_db.upsert(chunks, embeddings)

AI Agents and Tool Use

Give your agent the ability to read any webpage:

def read_url(url: str) -> str:
    """Tool: Read the content of a URL and return it as markdown."""
    return client.markdown(url)

Your agent calls read_url, gets clean text, and reasons over it — no HTML parsing logic needed in your agent code.

Research and Monitoring

Pull structured content from pages on a schedule for competitive monitoring, price tracking, or content aggregation:

import { TheDriveAI } from '@thedriveai/sdk';

const client = new TheDriveAI({ apiKey: 'tda_live_...' });

const sources = [
  'https://competitor.com/pricing',
  'https://competitor.com/changelog',
  'https://competitor.com/blog',
];

const results = await Promise.all(
  sources.map(url => client.markdown(url))
);

Documentation Ingestion

Converting technical documentation into a format your LLM can consume:

# Convert an entire docs page to markdown
curl -H "X-API-Key: tda_live_..." \
  https://dev.thedrive.ai/md/https://react.dev/reference/react/useState \
  -o usestate-docs.md

Feed the output into fine-tuning datasets, knowledge bases, or context windows.

How It Compares

FeatureDrive AI MarkdownJina ReaderFirecrawlCrawl4AI
JS renderingYesLimitedYesYes
Document support (PDF, DOCX)107+ formatsNoNoNo
OCR for scanned docsYesNoNoNo
Free tier100/month1,000/month500/monthSelf-host
SetupOne GET requestOne GET requestAPI key + SDKDocker
Boilerplate removalAutomaticAutomaticAutomaticConfigurable

The key differentiator: this is not just a web scraper that outputs markdown. It is a document conversion engine that handles PDFs, Office files, and scanned documents through the same endpoint. If your pipeline ingests more than just webpages, one API replaces three.

Pricing

PlanCreditsCost
Free100/month$0
ProPay as you go$0.01/credit
EnterpriseCustom volumeContact us

Each markdown conversion costs 1 credit. The free tier is enough to build and test your integration. Pro pricing scales linearly with no tiers or overages.

Get Started

Install the SDK in your language:

npm install @thedriveai/sdk
pip install thedriveai

Or skip the SDK entirely and use a GET request:

curl -H "X-API-Key: tda_live_..." \
  https://dev.thedrive.ai/md/https://example.com

Get your API key at dev.thedrive.ai and start converting URLs to markdown in under a minute.


Have questions or want to see a specific format supported? Reach out at contact@thedrive.ai.

Share it with your network