Skip to content

roostico/scooby

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

301 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPS-22-Scooby 🔍

Team:

👨‍💻 Giovanni Antonioni - giovanni.antonioni2@studio.unibo.it

👨‍💻 Valerio Di Zio - valerio.dizio@studio.unibo.it

👨‍💻 Francesco Magnani - francesco.magnani14@studio.unibo.it

👨‍💻 Luca Rubboli - luca.rubboli2@studio.unibo.it

Technologies:

🔄 Scrum

🛠 SBT

🔗 Git

🎯 YouTrack

🚀 Github Actions

Overview:

PPS-22-Scooby is a web scraping and crawling application. It enables users to extract data from web pages by crawling through links and scraping specific content according to predefined rules.

Features:

🕷 Crawling: The application navigates web pages, follows links, and retrieves content.

🔍 Scraping: Relevant data is extracted from HTML/XML pages using XPath, CSS selectors, or regular expressions.

🛠 Customization: Users can define custom scraping and crawling rules to suit their specific needs.

⚙️ Parallel Processing: Aspects of parallel programming are integrated for efficient execution.

📤 Export: Users can export extracted data in various formats according to their preferences.

Implementation:

PPS-22-Scooby is built using Scala with Actor libraries for concurrency management. The application utilizes Git for version control, YouTrack for project management, and Github Actions for continuous integration.

Get Started:

To use PPS-22-Scooby, have a look at the section Get Started at https://pps-22-scooby.github.io/

About

Scala application that allows web crawling and web scraping of web pages given as input with the use of special rules passed to it through the use of a DSL.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Scala 92.9%
  • Gherkin 5.7%
  • HTML 1.4%