html2rss is a Ruby gem that generates RSS 2.0 feeds from websites by scraping HTML or JSON content with CSS selectors or auto-detection.
This gem is the core of the html2rss-web application.
Most people looking for a first working feed should start with html2rss-web, run it with Docker, and open one of the included feeds from their own instance before moving to custom configs or the gem APIs.
Detailed usage guides, reference docs, and the feed directory live on the project website:
- Ruby gem documentation
- Web application
- Feed directory
- Contributing guide
- GitHub Discussions
- Sponsor on GitHub
You can develop html2rss directly in your browser using GitHub Codespaces:
The Codespace comes pre-configured with Ruby 3.4 (compatible with Ruby 4.0), all dependencies, and VS Code extensions ready to go!
Please see the contributing guide for details on how to contribute.
- Config - Loads and validates configuration (YAML/hash)
- RequestService - Fetches pages using Faraday or Browserless
- Selectors - Extracts content via CSS selectors with extractors/post-processors
- AutoSource - Auto-detects content using Schema.org, JSON state blobs, semantic HTML, and structural patterns
- RssBuilder - Assembles Article objects and renders RSS 2.0
Config -> Request -> Extraction -> Processing -> Building -> Output
The config schema is generated from the runtime dry-validation contracts and exported for client-side tooling.
- Ruby API:
Html2rss::Config.json_schema - CLI:
html2rss schema - CLI options:
html2rss schema --write tmp/html2rss-config.schema.jsonhtml2rss schema --no-pretty
- Runtime validation API:
Html2rss::Config.validate(config_hash) - Runtime validation CLI:
html2rss validate config.yml - Packaged JSON file:
schema/html2rss-config.schema.json
If you are an editor integration, automation script, or AI tool, prefer these stable discovery points:
- call
html2rss schemato read the current exported schema - read
schema/html2rss-config.schema.jsonwhen working from the repository or installed gem - use
Html2rss::Config.schema_pathif you already have Ruby loaded - use
Html2rss::Config.validateorhtml2rss validate config.ymlwhen you need authoritative runtime validation of selector references
Run bundle exec rake config:schema before committing to regenerate schema/html2rss-config.schema.json and keep the checked-in JSON Schema in sync with the validators. The exported schema covers client-side validation, while runtime validation remains authoritative for dynamic cross-field checks such as selector-key references.
This project is licensed under the MIT License - see the LICENSE file for details.
If you find html2rss useful, please consider sponsoring the project.

