A screen capture and translation tool built with React, Vite, and Tailwind CSS v4. Uses Windows OCR for text extraction and Ollama for translation.
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Browser │──────▶ Express │──────▶ Windows │
│ (React UI) │◀─────│ Server │◀─────│ OCR API │
└─────────────┘ └──────────────┘ └─────────────┘
│
▼
┌──────────────┐
│ Ollama │
│ (Translation)│
└──────────────┘
- 🖥️ Screen Sharing - Share your screen and capture frames
- 🔤 Windows OCR - Extract text using native Windows OCR (requires Windows 10/11)
- 🌐 Translation - Translate extracted text using local Ollama models
- 🌙 Dark Mode - Toggle between light and dark themes
- ⌨️ Keyboard Shortcuts - Quick actions with hotkeys
- 📸 Capture Gallery - View all captured frames with translations
- Node.js 18+
- Windows 10/11 with OCR language pack installed
- Ollama running locally with a text model (e.g.,
gemma3:4b)
- Open Windows Settings → Time & Language → Language & Region
- Add a language (e.g., German, French)
- Click the language → Language options → Install "Optical Character Recognition"
# Clone the repository
git clone https://github.com/yourusername/screen-translator.git
cd screen-translator
# Install dependencies
npm installnpm run devThis starts:
- Express server on http://localhost:3001
- Vite dev server on http://localhost:5173
Terminal 1 - Server:
npm run serverTerminal 2 - Client:
npm run clientMake sure Ollama is running locally on port 11434. Install a text model:
ollama pull gemma3:4bOr use a dedicated translation model:
ollama pull translate-gemmaThe Express server provides these endpoints:
| Endpoint | Method | Description |
|---|---|---|
/api/health |
GET | Health check |
/api/ocr |
POST | Extract text from image using Windows OCR |
/api/translate |
POST | Translate text using Ollama |
/api/extract-and-translate |
POST | OCR + Translate in one call |
curl -X POST http://localhost:3001/api/extract-and-translate \
-H "Content-Type: application/json" \
-d "{
"image": "data:image/png;base64,iVBORw0...",
"sourceLang": "German",
"targetLang": "English"
}"Response:
{
"extractedText": "Hallo Welt",
"translation": "Hello World",
"rawOutput": "Hello World"
}| Shortcut | Action |
|---|---|
Shift + Enter |
Start/Stop screen sharing |
Alt + S |
Swap source/target languages |
Ctrl + L |
Clear all captures |
Alt + C |
Copy selected translation |
Ctrl + Alt + S |
Capture current frame |
npm run buildThe built files will be in the dist/ directory.
screen-translator/
├── server/ # Express server
│ ├── index.js # Main server file
│ ├── ocr.ps1 # Windows OCR PowerShell script
│ └── WindowsOcr.cs # C# OCR reference
├── src/ # React frontend
│ ├── components/ # React components
│ ├── hooks/ # Custom hooks
│ └── utils/ # Utilities
├── package.json
├── vite.config.js
└── README.md
- Frontend: React, Vite, Tailwind CSS v4
- Backend: Express.js, Node.js
- OCR: Windows.Media.Ocr (Windows 10/11 native API)
- Translation: Ollama (local AI)
Install an OCR language pack in Windows Settings:
- Settings → Time & Language → Language & Region
- Add the source language you want to OCR
- Install "Optical Character Recognition" for that language
Make sure Ollama is running:
ollama serveOr start Ollama from the application menu.
Run PowerShell as Administrator and set execution policy:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUserMIT