An AI-powered full-stack application that translates source code between programming languages. Paste code (or upload a PDF), pick your source and target languages, and get idiomatic translated output in seconds — powered by any OpenAI-compatible LLM endpoint or a locally running Ollama model.
- CodeTrans — AI-Powered Code Translation
CodeTrans demonstrates how code-specialized large language models can be used to translate source code between programming languages. It supports six languages — Java, C, C++, Python, Rust, and Go — and works with any OpenAI-compatible inference endpoint or a locally running Ollama instance.
This makes CodeTrans suitable for:
- Enterprise deployments — connect to a GenAI Gateway or any managed LLM API
- Air-gapped environments — run fully offline with Ollama and a locally hosted model
- Local experimentation — quick setup on a laptop with GPU-accelerated inference
- Hardware benchmarking — measure SLM throughput on Apple Silicon, CUDA, or Intel Gaudi hardware
- The user pastes code or uploads a PDF in the browser.
- The React frontend sends the source code and language selection to the FastAPI backend.
- If a PDF was uploaded, a text extraction service pulls the code out of the document.
- The backend constructs a structured prompt and calls the configured LLM endpoint (remote API or local Ollama).
- The LLM returns the translated code, which is displayed in the output panel.
- The user copies the result with one click.
All inference logic is abstracted behind a single INFERENCE_PROVIDER environment variable — switching between providers requires only a .env change and a container restart.
The application follows a modular two-service architecture with a React frontend and a FastAPI backend. The backend handles all inference orchestration, PDF extraction, and optional LLM observability tracing. The inference layer is fully pluggable — any OpenAI-compatible remote endpoint or a locally running Ollama instance can be used without any code changes.
graph TB
subgraph "User Interface (port 3000)"
A[React Frontend]
A1[Code Input]
A2[PDF Upload]
A3[Language Selection]
end
subgraph "FastAPI Backend (port 5001)"
B[API Server]
C[PDF Service]
D[API Client]
end
subgraph "Inference - Option A: Remote"
E[OpenAI / Groq / OpenRouter<br/>Enterprise Gateway]
end
subgraph "Inference - Option B: Local"
F[Ollama on Host<br/>host.docker.internal:11434]
end
A1 --> B
A2 --> B
A3 --> B
B --> C
C -->|Extracted Code| B
B --> D
D -->|INFERENCE_PROVIDER=remote| E
D -->|INFERENCE_PROVIDER=ollama| F
E -->|Translated Code| D
F -->|Translated Code| D
D --> B
B --> A
style A fill:#e1f5ff,color:#000
style B fill:#fff4e1,color:#000
style E fill:#e1ffe1,color:#000
style F fill:#f3e5f5,color:#000
Frontend (React + Vite)
- Side-by-side code editor with language pill selectors for source and target
- PDF drag-and-drop upload that populates the source panel automatically
- Real-time character counter and live status indicator
- Dark mode (default) with
localStoragepersistence and flash prevention - One-click copy of translated output
- Nginx serves the production build and proxies all
/api/requests to the backend
Backend Services
- API Server (
server.py): FastAPI application with CORS middleware, request validation, and routing - API Client (
services/api_client.py): Handles both inference paths — text completions for remote endpoints and chat completions for Ollama — with token-based auth support - PDF Service (
services/pdf_service.py): Extracts code from uploaded PDF files using pattern recognition
External Integration
- Remote inference: Any OpenAI-compatible API (OpenAI, Groq, OpenRouter, GenAI Gateway)
- Local inference: Ollama running natively on the host machine, accessed from the container via
host.docker.internal:11434
| Service | Container | Host Port | Description |
|---|---|---|---|
transpiler-api |
transpiler-api |
5001 |
FastAPI backend — input validation, PDF extraction, inference orchestration |
transpiler-ui |
transpiler-ui |
3000 |
React frontend — served by Nginx, proxies /api/ to the backend |
Ollama is intentionally not a Docker service. On macOS (Apple Silicon), running Ollama in Docker bypasses Metal GPU (MPS) acceleration, resulting in CPU-only inference. Ollama must run natively on the host so the backend container can reach it via
host.docker.internal:11434.
- User enters code or uploads a PDF in the web UI.
- The backend validates the input; PDF text is extracted if needed.
- The backend calls the configured inference endpoint (remote API or Ollama).
- The model returns translated code, which is displayed in the right panel.
- User copies the result with one click.
Before you begin, ensure you have the following installed and configured:
- Docker and Docker Compose (v2)
- An inference endpoint — one of:
- A remote OpenAI-compatible API key (OpenAI, Groq, OpenRouter, or enterprise gateway)
- Ollama installed natively on the host machine
docker --version
docker compose version
docker psgit clone https://github.com/cld2labs/CodeTrans.git
cd CodeTranscp .env.example .envOpen .env and set INFERENCE_PROVIDER plus the corresponding variables for your chosen provider. See LLM Provider Configuration for per-provider instructions.
# Standard (attached)
docker compose up --build
# Detached (background)
docker compose up -d --buildOnce containers are running:
- Frontend UI: http://localhost:3000
- Backend API: http://localhost:5001
- API Docs (Swagger): http://localhost:5001/docs
# Health check
curl http://localhost:5001/health
# View running containers
docker compose psView logs:
# All services
docker compose logs -f
# Backend only
docker compose logs -f transpiler-api
# Frontend only
docker compose logs -f transpiler-uidocker compose downRun the backend and frontend directly on the host without Docker.
Backend (Python / FastAPI)
cd api
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp ../.env.example ../.env # configure your .env at the repo root
uvicorn server:app --reload --port 5001Frontend (Node / Vite)
cd ui
npm install
npm run devThe Vite dev server proxies /api/ to http://localhost:5001. Open http://localhost:5173.
CodeTrans/
├── api/ # FastAPI backend
│ ├── config.py # All environment-driven settings
│ ├── models.py # Pydantic request/response schemas
│ ├── server.py # FastAPI app, routes, and middleware
│ ├── services/
│ │ ├── api_client.py # LLM inference client (remote + Ollama)
│ │ └── pdf_service.py # PDF text and code extraction
│ ├── Dockerfile
│ └── requirements.txt
├── ui/ # React frontend
│ ├── src/
│ │ ├── App.jsx
│ │ ├── components/
│ │ │ ├── CodeTranslator.jsx # Main editor panel
│ │ │ ├── Header.jsx
│ │ │ ├── PDFUploader.jsx
│ │ │ └── StatusBar.jsx
│ │ └── main.jsx
│ ├── Dockerfile
│ └── vite.config.js
├── docs/
│ └── assets/ # Documentation images
├── docker-compose.yaml # Main orchestration file
├── .env.example # Environment variable reference
└── README.md
Translate code:
- Open the application at http://localhost:3000.
- Select the source language using the pill buttons at the top-left.
- Select the target language using the pill buttons at the top-right.
- Paste or type your code in the left panel.
- Click Translate Code.
- View the result in the right panel and click Copy to copy it to the clipboard.
Upload a PDF:
- Scroll to the Upload PDF section below the code panels.
- Drag and drop a PDF file, or click to browse.
- Code is extracted automatically and placed in the source panel.
- Select your languages and translate as normal.
Dark mode:
The app defaults to dark mode. Click the theme toggle in the header to switch to light mode. Your preference is saved in localStorage.
- Use the largest model your hardware can sustain.
codellama:34bproduces the best translation quality;codellama:7bis faster and good for benchmarking. - Lower
LLM_TEMPERATURE(e.g.,0.1) for more deterministic, literal translations. Raise it slightly (e.g.,0.3–0.5) if you want more idiomatic rewrites. - Keep inputs under
MAX_CODE_LENGTH. Shorter, focused snippets translate more accurately than entire files. Split large files by class or function. - On Apple Silicon, always run Ollama natively — never inside Docker. The MPS (Metal) GPU backend delivers 5–10x the throughput of CPU-only inference.
- On Linux with an NVIDIA GPU, set
CUDA_VISIBLE_DEVICESbefore starting Ollama to target a specific GPU. - For enterprise remote APIs, choose a model with a large context window (≥16k tokens) to avoid truncation on longer inputs.
The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized code-translation workload (averaged over 3 runs).
| Provider | Model | Deployment | Context Window | Avg Input Tokens | Avg Output Tokens | Avg Tokens / Request | P50 Latency (ms) | P95 Latency (ms) | Throughput (req/s) | Hardware |
|---|---|---|---|---|---|---|---|---|---|---|
| Ollama | qwen3:4b-instruct |
Local | 8K | 218 | 210.3 | 428.3 | 10,361 | 10,521 | 0.1186 | Apple Silicon (Metal) (Macbook Pro M4) |
| vLLM | Qwen3-4B-Instruct-2507 |
Local | 4K | 218 | 211.3 | 429.3 | 11,965 | 18,806 | 0.0706 | Apple Silicon (Metal) (Macbook Pro M4) |
| Intel OPEA EI | Qwen/Qwen3-4B-Instruct-2507 |
Enterprise (On-Prem) | 8.1K | 218 | 211.7 | 429.7 | 12,732 | 13,277 | 0.1036 | CPU-only (Xeon) |
| OpenAI (Cloud) | gpt-4o-mini |
API (Cloud) | 128K | 216.7 | 204.7 | 421.3 | 4,563 | 6,969 | 0.2126 | N/A |
Notes:
- Context Window for Ollama (8K) and vLLM (4K) reflects the
LLM_MAX_TOKENS/--max-model-lenused during benchmarking, not the model's native 262K context. vLLM shares its 4K context between input and output tokens.- All benchmarks use the same CodeTrans translation prompt and identical inputs (3 runs: small python→java, medium python→rust, large python→go). Token counts may vary slightly per run due to non-deterministic model output.
- Ollama on Apple Silicon uses Metal (MPS) GPU acceleration — running it inside Docker would fall back to CPU-only inference. The
qwen3:4b-instructtag must be used (notqwen3:4b) to disable the default thinking mode.- vLLM on Apple Silicon uses vllm-metal — the standard
pip install vllmdoes not support macOS.- Intel OPEA Enterprise Inference runs on Intel Xeon CPUs without GPU acceleration.
A 4-billion-parameter open-weight code model from Alibaba's Qwen team (July 2025 release), designed for on-prem and edge deployment.
| Attribute | Details |
|---|---|
| Parameters | 4.0B total (3.6B non-embedding) |
| Architecture | Transformer with Grouped Query Attention (GQA) — 36 layers, 32 Q-heads / 8 KV-heads |
| Context Window | 262,144 tokens (256K) native |
| Reasoning Mode | Non-thinking only (Instruct-2507 variant). Separate Thinking-2507 variant available with always-on chain-of-thought |
| Tool / Function Calling | Supported; MCP (Model Context Protocol) compatible |
| Structured Output | JSON-structured responses supported |
| Multilingual | 100+ languages and dialects |
| Code Benchmarks | MultiPL-E: 76.8%, LiveCodeBench v6: 35.1%, BFCL-v3 (tool use): 61.9 |
| Quantization Formats | GGUF (Q4_K_M ~2.5 GB, Q8_0 ~4.3 GB), AWQ (int4), GPTQ (int4), MLX (4-bit ~2.3 GB) |
| Inference Runtimes | Ollama, vLLM, llama.cpp, LMStudio, SGLang, KTransformers |
| Fine-Tuning | Full fine-tuning and adapter-based (LoRA); 5,000+ community adapters on HuggingFace |
| License | Apache 2.0 |
| Deployment | Local, on-prem, air-gapped, cloud — full data sovereignty |
OpenAI's cost-efficient multimodal model, accessible exclusively via cloud API.
| Attribute | Details |
|---|---|
| Parameters | Not publicly disclosed |
| Architecture | Multimodal Transformer (text + image input, text output) |
| Context Window | 128,000 tokens input / 16,384 tokens max output |
| Reasoning Mode | Standard inference (no explicit chain-of-thought toggle) |
| Tool / Function Calling | Supported; parallel function calling |
| Structured Output | JSON mode and strict JSON schema adherence supported |
| Multilingual | Broad multilingual support |
| Code Benchmarks | MMMLU: ~87%, strong HumanEval and MBPP scores |
| Pricing | $0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount) |
| Fine-Tuning | Supervised fine-tuning via OpenAI API |
| License | Proprietary (OpenAI Terms of Use) |
| Deployment | Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option |
| Knowledge Cutoff | October 2023 |
| Capability | Qwen3-4B-Instruct-2507 | GPT-4o-mini |
|---|---|---|
| Code translation | Yes | Yes |
| Function / tool calling | Yes | Yes |
| JSON structured output | Yes | Yes |
| On-prem / air-gapped deployment | Yes | No |
| Data sovereignty | Full (weights run locally) | No (data sent to cloud API) |
| Open weights | Yes (Apache 2.0) | No (proprietary) |
| Custom fine-tuning | Full fine-tuning + LoRA adapters | Supervised fine-tuning (API only) |
| Quantization for edge devices | GGUF / AWQ / GPTQ / MLX | N/A |
| Multimodal (image input) | No | Yes |
| Native context window | 256K | 128K |
Both models support code translation, function calling, and JSON-structured output. However, only Qwen3-4B offers open weights, data sovereignty, and local deployment flexibility — making it suitable for air-gapped, regulated, or cost-sensitive environments. GPT-4o-mini offers lower latency and higher throughput via OpenAI's cloud infrastructure, with added multimodal capabilities.
All providers are configured via the .env file. Set INFERENCE_PROVIDER=remote for any cloud or API-based provider, and INFERENCE_PROVIDER=ollama for local inference.
INFERENCE_PROVIDER=remote
INFERENCE_API_ENDPOINT=https://api.openai.com
INFERENCE_API_TOKEN=sk-...
INFERENCE_MODEL_NAME=gpt-4oRecommended models: gpt-4o, gpt-4o-mini, gpt-4-turbo.
Groq provides OpenAI-compatible endpoints with extremely fast inference (LPU hardware).
INFERENCE_PROVIDER=remote
INFERENCE_API_ENDPOINT=https://api.groq.com/openai
INFERENCE_API_TOKEN=gsk_...
INFERENCE_MODEL_NAME=llama3-70b-8192Recommended models: llama3-70b-8192, mixtral-8x7b-32768, llama-3.1-8b-instant.
Runs inference locally on the host machine with full GPU acceleration.
- Install Ollama: https://ollama.com/download
- Pull a model:
# Production — best translation quality (~20 GB)
ollama pull codellama:34b
# Testing / SLM benchmarking (~4 GB, fast)
ollama pull codellama:7b
# Other strong code models
ollama pull deepseek-coder:6.7b
ollama pull qwen2.5-coder:7b
ollama pull codellama:13b- Confirm Ollama is running:
curl http://localhost:11434/api/tags- Configure
.env:
INFERENCE_PROVIDER=ollama
INFERENCE_API_ENDPOINT=http://host.docker.internal:11434
INFERENCE_MODEL_NAME=codellama:7b
# INFERENCE_API_TOKEN is not required for OllamaOpenRouter provides a unified API across hundreds of models from different providers.
INFERENCE_PROVIDER=remote
INFERENCE_API_ENDPOINT=https://openrouter.ai/api
INFERENCE_API_TOKEN=sk-or-...
INFERENCE_MODEL_NAME=meta-llama/llama-3.1-70b-instructRecommended models: meta-llama/llama-3.1-70b-instruct, deepseek/deepseek-coder, qwen/qwen-2.5-coder-32b-instruct.
Any enterprise gateway that exposes an OpenAI-compatible /v1/completions or /v1/chat/completions endpoint works without code changes.
GenAI Gateway (LiteLLM-backed):
INFERENCE_PROVIDER=remote
INFERENCE_API_ENDPOINT=https://genai-gateway.example.com
INFERENCE_API_TOKEN=your-litellm-master-key
INFERENCE_MODEL_NAME=codellama/CodeLlama-34b-Instruct-hfIf the endpoint uses a private domain mapped in /etc/hosts, also set:
LOCAL_URL_ENDPOINT=your-private-domain.internal- Edit
.envwith the new provider's values. - Restart the backend container:
docker compose restart transpiler-apiNo rebuild is needed — all settings are injected at runtime via environment variables.
All variables are defined in .env (copied from .env.example). The backend reads them at startup via python-dotenv.
| Variable | Description | Default | Type |
|---|---|---|---|
INFERENCE_PROVIDER |
remote for any OpenAI-compatible API; ollama for local inference |
remote |
string |
INFERENCE_API_ENDPOINT |
Base URL of the inference service (no /v1 suffix) |
— | string |
INFERENCE_API_TOKEN |
Bearer token / API key. Not required for Ollama | — | string |
INFERENCE_MODEL_NAME |
Model identifier passed to the API | codellama/CodeLlama-34b-Instruct-hf |
string |
| Variable | Description | Default | Type |
|---|---|---|---|
LLM_TEMPERATURE |
Sampling temperature. Lower = more deterministic output (0.0–2.0) | 0.2 |
float |
LLM_MAX_TOKENS |
Maximum tokens in the translated output | 4096 |
integer |
MAX_CODE_LENGTH |
Maximum input code length in characters | 4000 |
integer |
| Variable | Description | Default | Type |
|---|---|---|---|
MAX_FILE_SIZE |
Maximum PDF upload size in bytes (default: 10 MB) | 10485760 |
integer |
| Variable | Description | Default | Type |
|---|---|---|---|
CORS_ALLOW_ORIGINS |
Allowed CORS origins (comma-separated or *). Restrict in production |
["*"] |
string |
| Variable | Description | Default | Type |
|---|---|---|---|
BACKEND_PORT |
Port the FastAPI server listens on | 5001 |
integer |
LOCAL_URL_ENDPOINT |
Private domain in /etc/hosts the container must resolve. Leave as not-needed if not applicable |
not-needed |
string |
VERIFY_SSL |
Set false only for environments with self-signed certificates |
true |
boolean |
- Framework: FastAPI (Python 3.11+) with Uvicorn ASGI server
- LLM Integration:
openaiPython SDK — works with any OpenAI-compatible endpoint (remote or Ollama) - Local Inference: Ollama — runs natively on host with full Metal (MPS) or CUDA GPU acceleration
- PDF Processing: PyMuPDF (
fitz) for text and code extraction from uploaded documents - Config Management:
python-dotenvfor environment variable injection at startup - Data Validation: Pydantic v2 for request/response schema enforcement
- Framework: React 18 with Vite (fast HMR and production bundler)
- Styling: Tailwind CSS v3 with custom
surface-* dark mode color palette - Production Server: Nginx — serves the built assets and proxies
/api/to the backend container - UI Features: Language pill selectors, side-by-side code editor, drag-and-drop PDF upload, real-time character counter, one-click copy, dark/light theme toggle
For common issues and solutions, see TROUBLESHOOTING.md.
Issue: Backend returns 500 on translate
# Check backend logs for error details
docker compose logs backend
# Verify the inference endpoint and token are set correctly
grep INFERENCE .env- Confirm
INFERENCE_API_ENDPOINTis reachable from your machine. - Verify
INFERENCE_API_TOKENis valid and has the correct permissions.
Issue: Ollama connection refused
# Confirm Ollama is running on the host
curl http://localhost:11434/api/tags
# If not running, start it
ollama serveIssue: Ollama is slow / appears to be CPU-only
- Ensure Ollama is running natively on the host, not inside Docker.
- On macOS, verify the Ollama app is using MPS in Activity Monitor (GPU History).
- See the Ollama section for correct setup.
Issue: SSL certificate errors
# In .env
VERIFY_SSL=false
# Restart the backend
docker compose restart transpiler-apiIssue: PDF upload fails or returns no code
- Max file size: 10 MB (
MAX_FILE_SIZE) - Supported format: PDF only (text-based; scanned image PDFs are not supported)
- Ensure the file is not corrupted or password-protected
Issue: Frontend cannot connect to API
# Verify both containers are running
docker compose ps
# Check CORS settings
grep CORS .envEnsure CORS_ALLOW_ORIGINS includes the frontend origin (e.g., http://localhost:3000).
Issue: Private domain not resolving inside container
Set LOCAL_URL_ENDPOINT=your-private-domain.internal in .env — this adds the host-gateway mapping for the container.
Enable verbose logging for deeper inspection:
# Not a built-in env var — increase FastAPI log level via Uvicorn
# Edit docker-compose.yaml command or run locally:
uvicorn server:app --reload --port 5001 --log-level debugOr view real-time container logs:
docker compose logs -f transpiler-apiThis project is licensed under our LICENSE file for details.
CodeTrans is provided as-is for demonstration and educational purposes. While we strive for accuracy:
- Translated code should be reviewed by a qualified engineer before use in production systems
- Do not rely solely on AI-generated translations without testing and validation
- Do not submit confidential or proprietary code to third-party API providers without reviewing their data handling policies
- The quality of translation depends on the underlying model and may vary across language pairs and code complexity
For full disclaimer details, see DISCLAIMER.md.
