Web-based frontend application for the E-commerce Site Scraper & Analyzer. This is a user-friendly interface that connects to the scraper-backend API service to perform web scraping, product detection, category analysis, and content extraction from e-commerce websites.
This frontend application provides a modern, interactive web interface for:
- Taking screenshots of web pages
- Extracting links from websites
- Detecting products on e-commerce sites
- Analyzing category structures
- Extracting full page content
- Configuring Puppeteer scraping options
┌─────────────────────────────────┐
│ Frontend (this directory) │
│ - index.html │
│ - app.js │
│ - style.css │
└──────────────┬──────────────────┘
│ HTTP/HTTPS
│ API calls
▼
┌─────────────────────────────────┐
│ scraper-backend API │
│ agents-api.humaticai.com:3100 │
│ (Node.js service) │
└─────────────────────────────────┘
- Screenshot Capture: Take screenshots of web pages with configurable viewport settings
- Link Extraction: Extract and analyze all links from a webpage
- Product Detection: Automatically detect products on e-commerce sites
- Category Analysis: Analyze and extract category structures from e-commerce sites
- Full Page Content: Extract complete page content including HTML, text, and metadata
- Modern, responsive design
- Real-time status updates
- JSON result viewer with syntax highlighting
- Configurable Puppeteer options
- URL autocomplete/combobox
- Per-domain option persistence
The frontend connects to the scraper-backend service. The API endpoint is configured in app.js:
const PROXY_BASE = (window.location.protocol === 'https:' ? 'https://' : 'http://') + 'agents-api.humaticai.com:3100';To change the backend URL, modify this constant in app.js.
The application supports various Puppeteer configuration options:
- Viewport dimensions (width, height)
- Wait strategies (waitUntil, waitAfter)
- Scroll behavior
- Language settings
- Initial wait times
- Asset handling (inline vs external)
Options can be configured per-domain and are persisted in browser localStorage.
scraper/
├── index.html # Main HTML interface
├── app.js # Frontend JavaScript application
├── style.css # Main stylesheet
├── styles/ # Additional stylesheets
├── backend/ # Backend components
│ ├── server.js # Backend server
│ ├── Harvester.js # Harvesting logic
│ ├── package.json # Backend dependencies
│ └── ...
├── scraper-backend/ # Scraper backend instance
│ ├── server.js # Scraper server
│ ├── lib/ # Core modules
│ └── ...
└── README.md # This file
- Deploy the files to a web server (Apache, Nginx, etc.)
- Ensure the scraper-backend service is running on port 3100
- Open
index.htmlin a web browser or access via web server
- Enter Site URL: Input the e-commerce site URL you want to analyze
- Optional Search Phrase: Add a search phrase if needed
- Configure Options: Adjust Puppeteer settings as needed
- Select Action: Choose what you want to extract:
- Screenshot
- Extract Links
- Detect Products
- Detect Categories
- Get Full Page
- View Results: Results are displayed in JSON format with syntax highlighting
The frontend communicates with the scraper-backend API using the following endpoints:
/content- Extract page content/screenshot- Capture page screenshots/links- Extract links from page/products- Detect products/categories- Analyze categories/fullpage- Get full page content
- No build process required - pure HTML/CSS/JavaScript
- Uses Highlight.js for JSON syntax highlighting (CDN)
- Google Fonts (Inter) for typography
See backend/package.json and scraper-backend/package.json for backend dependencies.
- Modern browsers (Chrome, Firefox, Safari, Edge)
- Requires JavaScript enabled
- HTTPS recommended for production (to avoid mixed-content issues)
- Ensure scraper-backend is running locally on port 3100
- Update
PROXY_BASEinapp.jsto point to your local backend:const PROXY_BASE = 'http://localhost:3100';
- Serve the files using a local web server or open directly in browser
- Styling: Edit
style.cssor files instyles/ - Functionality: Edit
app.js - Layout: Edit
index.html
- scraper-backend: The backend API service that powers this frontend
- Location:
/home/bitnami/scraper-backend - GitHub: https://github.com/humatic-ai/scraper-backend
- Location:
Proprietary - All rights reserved. See LICENSE file for details.
© Humatic AI. All rights reserved.