linkprobe logo

linkprobe

v1.2.1 Python 3.10+ MIT

Find broken links before your users do.

A CLI tool that recursively crawls your site, checks every URL in parallel, and exports a clean CSV report — with optional email alerts.

bash
$ python src/checker.py https://example.com --workers 20
[Crawling] https://example.com/
[Crawling] https://example.com/about/
[Crawling] https://example.com/blog/
[Crawling] https://example.com/contact/
5 pages crawled · 47 links found. Checking statuses…
[200] https://example.com/about/
[404] https://example.com/old-page/ ← broken
[301] https://example.com/redirect
[200] https://external.com/resource
[500] https://api.example.com/data ← broken
Results → scans/example.com/2026-04-18T19-54-54/results.csv
44 OK · 2 broken · 1 redirect

Everything you need

Built on the Python standard library. Zero dependencies for the core tool — just clone and scan.

Recursive crawl

BFS traversal from the start URL through all internal pages. Visited-URL tracking prevents infinite loops on any site structure.

External links checked

External URLs are verified for their HTTP status without being crawled — full coverage with no recursion noise.

Parallel requests

Thread pool with configurable workers (default: 10). Scans large sites fast without hammering your server.

CSV + Markdown output

Each scan writes a machine-readable CSV and a human-readable Markdown summary, organised by timestamp under scans/.

Status-code filtering

--keep-status-codes 404,500 trims the report to only the codes you care about — no noise, straight to the problems.

Email notifications

Receive a post-scan alert via the Resend API when broken links are found. Optional 3xx inclusion for redirect awareness.

How it works

A three-stage pipeline — crawl, check, report.

1

Crawl

BFS from the start URL. Internal links are followed; external links are collected but not recursed into. Fragments are stripped and URL normalisation prevents duplicates.

crawler.py → parser.py → normaliser.py
2

Check

Every discovered URL is status-checked in parallel. HEAD is attempted first; 405 responses fall back to GET. 3xx codes are recorded as-is — redirects are not followed.

fetcher.check_url() · ThreadPoolExecutor
3

Report

Results are sorted by (referrer, link) and written to CSV. A Markdown summary highlights all non-200 links. Optionally, a Resend email is dispatched.

reporter.py · emailer.py

Quick start

Python 3.10+ required. Core tool needs no packages.

1 — Clone

git clone https://github.com/JeremieLitzler/linkprobe.git
cd linkprobe

# Optional: Resend SDK (only needed for --notify-email)
pip install -r requirements.txt

2 — Run a scan

# Auto-saves to scans/[WEBSITE]/[TIMESTAMP]/
python src/checker.py https://example.com

# Custom output file, more workers
python src/checker.py https://example.com \
  --output report.csv \
  --workers 20

# Show only broken links
python src/checker.py https://example.com \
  --keep-status-codes 404,500

3 — Email alerts (optional)

export RESEND_API_KEY=re_xxxx
export RESEND_FROM_ADDRESS=scanner@yourdomain.com

python src/checker.py https://example.com \
  --notify-email you@example.com

CLI options

All flags are optional except start_url.

Option Default Description
start_url required The URL to crawl from — must use http or https
--output, -o scans/[site]/[ts]/ Path to the output CSV file
--workers, -w 10 Number of parallel threads for status checking
--timeout, -t 10 Per-request timeout in seconds
--user-agent linkprobe/1.0 User-Agent header sent with every request
--keep-status-codes all Comma-separated codes to retain in the report, e.g. 404,500
--notify-email off Recipient address for a post-scan email (requires Resend env vars)
--include-3xx-status-code off Include 3xx redirects in email notifications

Output format

Results sorted by referrer then link — broken links are immediately actionable.

results.csv

link,referrer,http_status_code
https://example.com/about/,https://example.com/,404
https://example.com/blog/,https://example.com/,200
https://example.com/contact/,https://example.com/,200
https://external.com/missing,https://example.com/contact/,404
https://external.com/page,https://example.com/contact/,200
link

The URL that was checked

referrer

The page containing the link — tells you exactly where to fix it

http_status_code

HTTP code or ERROR:<ExceptionName> for network failures

Email notifications

Get a summary email whenever broken links are detected. Powered by Resend — set two environment variables and you're done.

Variable Description
RESEND_API_KEY API key from your Resend account
RESEND_FROM_ADDRESS Verified sender address in Resend
export RESEND_API_KEY=re_xxxx
export RESEND_FROM_ADDRESS=scanner@yourdomain.com

python src/checker.py https://example.com \
  --notify-email you@example.com \
  --include-3xx-status-code

Need help setting it up?

Contact Me