E-commerce Site Scraper & Analyzer - Frontend

Web-based frontend application for the E-commerce Site Scraper & Analyzer. This is a user-friendly interface that connects to the scraper-backend API service to perform web scraping, product detection, category analysis, and content extraction from e-commerce websites.

Overview

This frontend application provides a modern, interactive web interface for:

Taking screenshots of web pages
Extracting links from websites
Detecting products on e-commerce sites
Analyzing category structures
Extracting full page content
Configuring Puppeteer scraping options

Architecture

┌─────────────────────────────────┐
│   Frontend (this directory)     │
│   - index.html                  │
│   - app.js                      │
│   - style.css                   │
└──────────────┬──────────────────┘
               │ HTTP/HTTPS
               │ API calls
               ▼
┌─────────────────────────────────┐
│   scraper-backend API            │
│   agents-api.humaticai.com:3100  │
│   (Node.js service)              │
└─────────────────────────────────┘

Features

Core Functionality

Screenshot Capture: Take screenshots of web pages with configurable viewport settings
Link Extraction: Extract and analyze all links from a webpage
Product Detection: Automatically detect products on e-commerce sites
Category Analysis: Analyze and extract category structures from e-commerce sites
Full Page Content: Extract complete page content including HTML, text, and metadata

User Interface

Modern, responsive design
Real-time status updates
JSON result viewer with syntax highlighting
Configurable Puppeteer options
URL autocomplete/combobox
Per-domain option persistence

Configuration

Backend API Connection

The frontend connects to the scraper-backend service. The API endpoint is configured in app.js:

const PROXY_BASE = (window.location.protocol === 'https:' ? 'https://' : 'http://') + 'agents-api.humaticai.com:3100';

To change the backend URL, modify this constant in app.js.

Puppeteer Options

The application supports various Puppeteer configuration options:

Viewport dimensions (width, height)
Wait strategies (waitUntil, waitAfter)
Scroll behavior
Language settings
Initial wait times
Asset handling (inline vs external)

Options can be configured per-domain and are persisted in browser localStorage.

Project Structure

scraper/
├── index.html              # Main HTML interface
├── app.js                  # Frontend JavaScript application
├── style.css               # Main stylesheet
├── styles/                 # Additional stylesheets
├── backend/                # Backend components
│   ├── server.js          # Backend server
│   ├── Harvester.js       # Harvesting logic
│   ├── package.json       # Backend dependencies
│   └── ...
├── scraper-backend/        # Scraper backend instance
│   ├── server.js          # Scraper server
│   ├── lib/               # Core modules
│   └── ...
└── README.md              # This file

Usage

Accessing the Application

Deploy the files to a web server (Apache, Nginx, etc.)
Ensure the scraper-backend service is running on port 3100
Open index.html in a web browser or access via web server

Using the Interface

Enter Site URL: Input the e-commerce site URL you want to analyze
Optional Search Phrase: Add a search phrase if needed
Configure Options: Adjust Puppeteer settings as needed
Select Action: Choose what you want to extract:
- Screenshot
- Extract Links
- Detect Products
- Detect Categories
- Get Full Page
View Results: Results are displayed in JSON format with syntax highlighting

API Endpoints Used

The frontend communicates with the scraper-backend API using the following endpoints:

/content - Extract page content
/screenshot - Capture page screenshots
/links - Extract links from page
/products - Detect products
/categories - Analyze categories
/fullpage - Get full page content

Dependencies

Frontend

No build process required - pure HTML/CSS/JavaScript
Uses Highlight.js for JSON syntax highlighting (CDN)
Google Fonts (Inter) for typography

Backend

See backend/package.json and scraper-backend/package.json for backend dependencies.

Browser Compatibility

Modern browsers (Chrome, Firefox, Safari, Edge)
Requires JavaScript enabled
HTTPS recommended for production (to avoid mixed-content issues)

Development

Local Development

Ensure scraper-backend is running locally on port 3100
Update PROXY_BASE in app.js to point to your local backend:
```
const PROXY_BASE = 'http://localhost:3100';
```
Serve the files using a local web server or open directly in browser

Modifying the Interface

Styling: Edit style.css or files in styles/
Functionality: Edit app.js
Layout: Edit index.html

Related Projects

scraper-backend: The backend API service that powers this frontend
- Location: /home/bitnami/scraper-backend
- GitHub: https://github.com/humatic-ai/scraper-backend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

E-commerce Site Scraper & Analyzer - Frontend

Overview

Architecture

Features

Core Functionality

User Interface

Configuration

Backend API Connection

Puppeteer Options

Project Structure

Usage

Accessing the Application

Using the Interface

API Endpoints Used

Dependencies

Frontend

Backend

Browser Compatibility

Development

Local Development

Modifying the Interface

Related Projects

License

Copyright

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
styles		styles
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.js		app.js
index.html		index.html
style.css		style.css

Folders and files

Latest commit

History

Repository files navigation

E-commerce Site Scraper & Analyzer - Frontend

Overview

Architecture

Features

Core Functionality

User Interface

Configuration

Backend API Connection

Puppeteer Options

Project Structure

Usage

Accessing the Application

Using the Interface

API Endpoints Used

Dependencies

Frontend

Backend

Browser Compatibility

Development

Local Development

Modifying the Interface

Related Projects

License

Copyright

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages