Skip to content

sine-io/web-clipper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Clipper

Save the main content of a web page to local Markdown, with optional image downloading for offline reading and long-term note taking.

Features

  • Clip a single URL or multiple URLs
  • Batch import URLs from a text file (one URL per line; blank lines and # comments are allowed)
  • Extract main content and convert to Markdown (static HTML first)
  • Optionally download images and rewrite them as relative paths

Installation

python -m pip install -r requirements-web-clipper.txt

Usage

Single URL:

python web_clipper.py "https://example.com/"

Multiple URLs:

python web_clipper.py "https://example.com/" "https://www.python.org/"

Batch import:

python web_clipper.py --input urls.txt

Output layout

By default, outputs go to clippings/, one directory per page:

clippings/
  <title__hash>/
    index.md
    assets/
      img_...

Common options

  • --out-dir <dir>: output directory (default: clippings)
  • --no-images: do not download images (downloads by default)
  • --timeout <sec>: request timeout in seconds (default: 25)
  • --fail-fast: stop immediately on the first error (continues by default)

Known limitations

  • Limited support for heavily JS-rendered pages (this tool focuses on static HTML extraction)
  • Anti-bot protections / auth walls / paywalls may prevent full content extraction

License

Licensed under GNU AGPLv3. See LICENSE.

About

100% AI coding. Save the main content of a web page to local Markdown, with optional image downloading for offline reading and long-term note taking.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages