Skip to content

clemmy/proxyfarm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Proxy Farm

proxyfarm is a node script that scrapes proxy lists from websites without caring for its underlying HTML structure. This allows proxy lists to be easily harvested from a large amount of sources, without implementing custom scraping logic for each source. It does this via using a PhantomJS driver along with the Javascript Selection API. This strips away all HTML tags and makes regex matching trivial. Proxy lists can be used with things like scrapy-proxies in order to bypass IP restrictions and improve web crawling speed.

demo

Getting Started

Simply clone the repository, run npm install, and node --harmony proxyfarm --in sources.txt --out proxies.txt

NPM module coming soon!

Arguments

Parameter Description
in A text file with line delimited urls to scrape proxies from. See defaults/sources.txt for an example.
out The path to save the scraped proxy list to, in the format <host>:<port>

Prerequisites

  • Node.js v6.x and later

Running the tests

Coming soon!

Contributing

There are many ways that you can contribute:

  • Improving documentation - Submit a pull request with the fixes.
  • Requesting a feature - Simply create a new issue with the said feature.
  • Suggesting a proxy list source - Create a new issue mentioning the new source.
  • Report a bug - Find a problem? Create an issue with your environment, screenshot of the error, and reproduction steps.
  • Fix a bug - All help appreciated!

Future Roadmap

  • Validating the scraped proxy list
  • Detecting anonymity, speed, and country of the proxy list
  • Automatic crawling of websites rather than manually specifying all proxy lists
  • Handling of ajax pages

License

This project is licensed under the MIT License - see the LICENSE.md file for details

About

Command line utility that intelligently scrapes proxy lists from various sources.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors