Skip to content

humatic-ai/scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

E-commerce Site Scraper & Analyzer - Frontend

Web-based frontend application for the E-commerce Site Scraper & Analyzer. This is a user-friendly interface that connects to the scraper-backend API service to perform web scraping, product detection, category analysis, and content extraction from e-commerce websites.

Overview

This frontend application provides a modern, interactive web interface for:

  • Taking screenshots of web pages
  • Extracting links from websites
  • Detecting products on e-commerce sites
  • Analyzing category structures
  • Extracting full page content
  • Configuring Puppeteer scraping options

Architecture

┌─────────────────────────────────┐
│   Frontend (this directory)     │
│   - index.html                  │
│   - app.js                      │
│   - style.css                   │
└──────────────┬──────────────────┘
               │ HTTP/HTTPS
               │ API calls
               ▼
┌─────────────────────────────────┐
│   scraper-backend API            │
│   agents-api.humaticai.com:3100  │
│   (Node.js service)              │
└─────────────────────────────────┘

Features

Core Functionality

  • Screenshot Capture: Take screenshots of web pages with configurable viewport settings
  • Link Extraction: Extract and analyze all links from a webpage
  • Product Detection: Automatically detect products on e-commerce sites
  • Category Analysis: Analyze and extract category structures from e-commerce sites
  • Full Page Content: Extract complete page content including HTML, text, and metadata

User Interface

  • Modern, responsive design
  • Real-time status updates
  • JSON result viewer with syntax highlighting
  • Configurable Puppeteer options
  • URL autocomplete/combobox
  • Per-domain option persistence

Configuration

Backend API Connection

The frontend connects to the scraper-backend service. The API endpoint is configured in app.js:

const PROXY_BASE = (window.location.protocol === 'https:' ? 'https://' : 'http://') + 'agents-api.humaticai.com:3100';

To change the backend URL, modify this constant in app.js.

Puppeteer Options

The application supports various Puppeteer configuration options:

  • Viewport dimensions (width, height)
  • Wait strategies (waitUntil, waitAfter)
  • Scroll behavior
  • Language settings
  • Initial wait times
  • Asset handling (inline vs external)

Options can be configured per-domain and are persisted in browser localStorage.

Project Structure

scraper/
├── index.html              # Main HTML interface
├── app.js                  # Frontend JavaScript application
├── style.css               # Main stylesheet
├── styles/                 # Additional stylesheets
├── backend/                # Backend components
│   ├── server.js          # Backend server
│   ├── Harvester.js       # Harvesting logic
│   ├── package.json       # Backend dependencies
│   └── ...
├── scraper-backend/        # Scraper backend instance
│   ├── server.js          # Scraper server
│   ├── lib/               # Core modules
│   └── ...
└── README.md              # This file

Usage

Accessing the Application

  1. Deploy the files to a web server (Apache, Nginx, etc.)
  2. Ensure the scraper-backend service is running on port 3100
  3. Open index.html in a web browser or access via web server

Using the Interface

  1. Enter Site URL: Input the e-commerce site URL you want to analyze
  2. Optional Search Phrase: Add a search phrase if needed
  3. Configure Options: Adjust Puppeteer settings as needed
  4. Select Action: Choose what you want to extract:
    • Screenshot
    • Extract Links
    • Detect Products
    • Detect Categories
    • Get Full Page
  5. View Results: Results are displayed in JSON format with syntax highlighting

API Endpoints Used

The frontend communicates with the scraper-backend API using the following endpoints:

  • /content - Extract page content
  • /screenshot - Capture page screenshots
  • /links - Extract links from page
  • /products - Detect products
  • /categories - Analyze categories
  • /fullpage - Get full page content

Dependencies

Frontend

  • No build process required - pure HTML/CSS/JavaScript
  • Uses Highlight.js for JSON syntax highlighting (CDN)
  • Google Fonts (Inter) for typography

Backend

See backend/package.json and scraper-backend/package.json for backend dependencies.

Browser Compatibility

  • Modern browsers (Chrome, Firefox, Safari, Edge)
  • Requires JavaScript enabled
  • HTTPS recommended for production (to avoid mixed-content issues)

Development

Local Development

  1. Ensure scraper-backend is running locally on port 3100
  2. Update PROXY_BASE in app.js to point to your local backend:
    const PROXY_BASE = 'http://localhost:3100';
  3. Serve the files using a local web server or open directly in browser

Modifying the Interface

  • Styling: Edit style.css or files in styles/
  • Functionality: Edit app.js
  • Layout: Edit index.html

Related Projects

License

Proprietary - All rights reserved. See LICENSE file for details.

Copyright

© Humatic AI. All rights reserved.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors