Batch download multiple pages from MediaWiki sites (All pages or pages of a category) to printable PDFs.
pip install mwpdfify
...or clone repo and pip install .
...or directly download and run src/mwpdfify.py
There are two PDF rendering backends to choose from: pdfkit (installed as a dependency by default) or weasyprint. Use pip install -r requirements.txt to install both or choose one yourself. If using the former remember to also install wkhtmltopdf on your system.
- Get the address of the root of your wiki, where its
api.phpandindex.phpresides. Typically it's identical to the site's root (/). For Wikipedia it's at/w/; tell me if there are other exceptions ;) - (optional) If you want only a specific category, get its title (in the form of
Category:XXX) - Run the script. eg.:
mwpdfify https://lycoris-recoil.fandom.com- Download all pages (as in Special:AllPages) from Lycoris Recoil Fandom Wiki as PDFmwpdfify wiki.archlinux.org -c Category:Installation_process- Download all pages under Category:Installation_process from ArchWiki as PDFmwpdfify https://en.wikipedia.org/w/ -c Category:Guangzhou_Metro_stations -l 10 -t 4- Download all pages under Category:Guangzhou_Metro_stations (except subcategories) from Wikipedia, with 4 download threads and an one-time query limit of 10
The downloaded PDFs should be avaliable in a folder marked with the site's domain name in the current directory.
See below for other parameters:
usage: mwpdfify [-h] [-c CATEGORY] [-p] [-t THREADS] [-l LIMIT] [-w] url
positional arguments:
url site root of destination site
options:
-h, --help show this help message and exit
-c CATEGORY, --category CATEGORY
Download only a specified category
-p, --no-printable Force normal instead of printable version of pages
-t THREADS, --threads THREADS
Number of download threads, defaults to 8
-l LIMIT, --limit LIMIT
Limit of JSON info returned at once, defaults to maximum
(0)
-w, --use-weasyprint Use weasyprint as PDF rendering backend
&printable=yesis deprecated in recent versions of MediaWiki (while no substitute API solutions are provided) so there might be layout issues when used with certain wikis; especially Fandom wikis as they also contain ads.- Recursively download pages from subcategories of a category is currently not supported.
- v1.1.2 (2022/09/30):
- Set
pdfkitas required dependency
- Set
- v1.1 (2022/09/04):
- Changed address handling logic
- Bug fixes
- v1.0 (2022/09/03):
- Initial release
LGPLv3