Skip to content

omarha96/screen-translator

Repository files navigation

ScreenSnap

A screen capture and translation tool built with React, Vite, and Tailwind CSS v4. Uses Windows OCR for text extraction and Ollama for translation.

Architecture

┌─────────────┐      ┌──────────────┐      ┌─────────────┐
│   Browser   │──────▶ Express      │──────▶ Windows    │
│  (React UI) │◀─────│ Server       │◀─────│ OCR API     │
└─────────────┘      └──────────────┘      └─────────────┘
                            │
                            ▼
                     ┌──────────────┐
                     │ Ollama       │
                     │ (Translation)│
                     └──────────────┘

Features

  • 🖥️ Screen Sharing - Share your screen and capture frames
  • 🔤 Windows OCR - Extract text using native Windows OCR (requires Windows 10/11)
  • 🌐 Translation - Translate extracted text using local Ollama models
  • 🌙 Dark Mode - Toggle between light and dark themes
  • ⌨️ Keyboard Shortcuts - Quick actions with hotkeys
  • 📸 Capture Gallery - View all captured frames with translations

Prerequisites

  • Node.js 18+
  • Windows 10/11 with OCR language pack installed
  • Ollama running locally with a text model (e.g., gemma3:4b)

Install Windows OCR Language Pack

  1. Open Windows Settings → Time & Language → Language & Region
  2. Add a language (e.g., German, French)
  3. Click the language → Language options → Install "Optical Character Recognition"

Installation

# Clone the repository
git clone https://github.com/yourusername/screen-translator.git
cd screen-translator

# Install dependencies
npm install

Running the App

Option 1: Run both client and server (recommended for development)

npm run dev

This starts:

Option 2: Run separately

Terminal 1 - Server:

npm run server

Terminal 2 - Client:

npm run client

Ollama Setup

Make sure Ollama is running locally on port 11434. Install a text model:

ollama pull gemma3:4b

Or use a dedicated translation model:

ollama pull translate-gemma

API Endpoints

The Express server provides these endpoints:

Endpoint Method Description
/api/health GET Health check
/api/ocr POST Extract text from image using Windows OCR
/api/translate POST Translate text using Ollama
/api/extract-and-translate POST OCR + Translate in one call

Example: Extract and Translate

curl -X POST http://localhost:3001/api/extract-and-translate \
  -H "Content-Type: application/json" \
  -d "{
    "image": "data:image/png;base64,iVBORw0...",
    "sourceLang": "German",
    "targetLang": "English"
  }"

Response:

{
  "extractedText": "Hallo Welt",
  "translation": "Hello World",
  "rawOutput": "Hello World"
}

Keyboard Shortcuts

Shortcut Action
Shift + Enter Start/Stop screen sharing
Alt + S Swap source/target languages
Ctrl + L Clear all captures
Alt + C Copy selected translation
Ctrl + Alt + S Capture current frame

Build for Production

npm run build

The built files will be in the dist/ directory.

Project Structure

screen-translator/
├── server/                 # Express server
│   ├── index.js           # Main server file
│   ├── ocr.ps1            # Windows OCR PowerShell script
│   └── WindowsOcr.cs        # C# OCR reference
├── src/                    # React frontend
│   ├── components/         # React components
│   ├── hooks/              # Custom hooks
│   └── utils/              # Utilities
├── package.json
├── vite.config.js
└── README.md

Tech Stack

  • Frontend: React, Vite, Tailwind CSS v4
  • Backend: Express.js, Node.js
  • OCR: Windows.Media.Ocr (Windows 10/11 native API)
  • Translation: Ollama (local AI)

Troubleshooting

"No OCR language pack installed"

Install an OCR language pack in Windows Settings:

  1. Settings → Time & Language → Language & Region
  2. Add the source language you want to OCR
  3. Install "Optical Character Recognition" for that language

"Ollama connection refused"

Make sure Ollama is running:

ollama serve

Or start Ollama from the application menu.

Server won't start on Windows

Run PowerShell as Administrator and set execution policy:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors