Skip to content

Cerebrock/scraprop

Repository files navigation

Property Scraper

Scrapes property listings from Zonaprop, Argenprop, and MercadoLibre, saving all details to a CSV and sending new listings via Telegram.

Repo Structure

  • src/ — All main code modules (scraprop.py, scraper.py, utils.py)
  • tests/ — For future test scripts
  • urls_to_scrap.txt — List of search URLs
  • outputs/scraped_properties.csv — All scraped property data
  • outputs/seen.txt — Tracks already-notified properties
  • .env — Environment variables for Telegram
  • outputs/ — All output files (CSV, logs)

Features

  • Multi-source scraping: Zonaprop, Argenprop, and MercadoLibre
  • LLM Analysis: AI-powered property scoring and analysis using Google Gemini
  • Smart filtering: Penalizes commercial properties ("local", "deposito") with -15 points
  • Live Google Sheets: Real-time updates to shareable Google Sheets (optional)
  • CSV Export: All scraped data saved to outputs/scraped_properties.csv
  • Telegram notifications: New high-scoring properties sent via Telegram
  • Duplicate prevention: Tracks seen properties to avoid spam
  • Detailed extraction: Price, expenses, neighbourhood, surface, rooms, descriptions, and more

Setup

  1. Clone the repo
  2. Install dependencies (preferably in a conda or venv):
    pip install -r requirements.txt
  3. Set up environment variables in a .env file:
    # Required for Telegram notifications
    TELEGRAM_BOT_ID=your_bot_id
    TELEGRAM_ID=your_telegram_user_id
    
    # Required for LLM analysis
    GEMINI_API_KEY=your_gemini_api_key
    
    # Optional: Google Sheets integration (see setup guide)
    GOOGLE_SHEETS_CREDENTIALS_FILE=credentials.json
    GOOGLE_SHEETS_SHARE_EMAIL=your-email@gmail.com
    # GOOGLE_SHEET_ID=your_existing_sheet_id
  4. Add search URLs to urls_to_scrap.txt (one per line)

Google Sheets Integration (Optional)

Set up live Google Sheets for real-time property data viewing:

  1. Follow the detailed setup guide: See setup_google_sheets.md
  2. Test the integration: Run python test_google_sheets.py
  3. Features:
    • Automatic updates with every scraper run
    • Data sorted by LLM score (best properties first)
    • Shareable with colleagues or family
    • Includes timestamps and all analysis data
    • Shared history: Seen URLs tracked in Google Sheet (no more duplicate notifications)

Running

  • To run the main workflow:
    python src/scraprop.py
  • To test scrapers for each source:
    python src/scraper.py

Output

CSV Export

All scraped properties are saved to outputs/scraped_properties.csv with columns:

  • Basic: url, price, expenses, neighbourhood, surface, rooms, description
  • Analysis: score, score_breakdown, llm_neighbourhood, llm_surface_m2, etc.

Google Sheets (if configured)

  • Live updates: Real-time data accessible from anywhere
  • Sorted by score: Best properties appear at the top
  • Shareable: Easy to share with others
  • Timestamped: Last update time included

Telegram Notifications

High-scoring new properties are sent with LLM analysis:

⭐ SCORE: 23

📊 Score Breakdown:
  • Location (Belgrano): +10
  • Ground Floor: +10
  • Outdoor Space: +3

🏠 Analysis:
  📍 Neighborhood: Belgrano
  🌳 Ground Floor: Yes
  🌿 Outdoor Space: Yes
  📏 Surface: 80m²
  💰 Price: $500,000

📍 Zona: Belgrano
💰 Precio: $500.000
📏 Sup.: 80 m²
🏠 Ambientes: 3

https://www.zonaprop.com.ar/propiedades/...

Customization

  • Add or remove search URLs in urls_to_scrap.txt
  • Adjust scraping logic in src/scraper.py for new fields
  • Change CSV filename in src/scraprop.py if needed

Cron Example

To run every 6 hours and log output:

30 */6 * * * /path/to/python /path/to/scraprop/src/scraprop.py >> /path/to/scraprop/outputs/logs/scraprop-cron.log

Tests

  • Place test scripts in the tests/ directory.
  • (Coming soon: example test scripts for scrapers and utilities)

For questions or improvements, open an issue or PR.

About

scraper para propiedades

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors