mwicat/html2xml
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
This script can crunch through very dirty HTML documents. First, it passes document to Webkit layout engine and then to HTML Tidy. Usage: python html2xml.py < document.html > document.xml