Skip to content

cld2labs/CodeTrans

Repository files navigation

Company Logo

CodeTrans — AI-Powered Code Translation

An AI-powered full-stack application that translates source code between programming languages. Paste code (or upload a PDF), pick your source and target languages, and get idiomatic translated output in seconds — powered by any OpenAI-compatible LLM endpoint or a locally running Ollama model.


Table of Contents


Project Overview

CodeTrans demonstrates how code-specialized large language models can be used to translate source code between programming languages. It supports six languages — Java, C, C++, Python, Rust, and Go — and works with any OpenAI-compatible inference endpoint or a locally running Ollama instance.

This makes CodeTrans suitable for:

  • Enterprise deployments — connect to a GenAI Gateway or any managed LLM API
  • Air-gapped environments — run fully offline with Ollama and a locally hosted model
  • Local experimentation — quick setup on a laptop with GPU-accelerated inference
  • Hardware benchmarking — measure SLM throughput on Apple Silicon, CUDA, or Intel Gaudi hardware

How It Works

  1. The user pastes code or uploads a PDF in the browser.
  2. The React frontend sends the source code and language selection to the FastAPI backend.
  3. If a PDF was uploaded, a text extraction service pulls the code out of the document.
  4. The backend constructs a structured prompt and calls the configured LLM endpoint (remote API or local Ollama).
  5. The LLM returns the translated code, which is displayed in the output panel.
  6. The user copies the result with one click.

All inference logic is abstracted behind a single INFERENCE_PROVIDER environment variable — switching between providers requires only a .env change and a container restart.


Architecture

The application follows a modular two-service architecture with a React frontend and a FastAPI backend. The backend handles all inference orchestration, PDF extraction, and optional LLM observability tracing. The inference layer is fully pluggable — any OpenAI-compatible remote endpoint or a locally running Ollama instance can be used without any code changes.

Architecture Diagram

graph TB
    subgraph "User Interface (port 3000)"
        A[React Frontend]
        A1[Code Input]
        A2[PDF Upload]
        A3[Language Selection]
    end

    subgraph "FastAPI Backend (port 5001)"
        B[API Server]
        C[PDF Service]
        D[API Client]
    end

    subgraph "Inference - Option A: Remote"
        E[OpenAI / Groq / OpenRouter<br/>Enterprise Gateway]
    end

    subgraph "Inference - Option B: Local"
        F[Ollama on Host<br/>host.docker.internal:11434]
    end

    A1 --> B
    A2 --> B
    A3 --> B
    B --> C
    C -->|Extracted Code| B
    B --> D
    D -->|INFERENCE_PROVIDER=remote| E
    D -->|INFERENCE_PROVIDER=ollama| F
    E -->|Translated Code| D
    F -->|Translated Code| D
    D --> B
    B --> A

    style A fill:#e1f5ff,color:#000
    style B fill:#fff4e1,color:#000
    style E fill:#e1ffe1,color:#000
    style F fill:#f3e5f5,color:#000
Loading

Architecture Components

Frontend (React + Vite)

  • Side-by-side code editor with language pill selectors for source and target
  • PDF drag-and-drop upload that populates the source panel automatically
  • Real-time character counter and live status indicator
  • Dark mode (default) with localStorage persistence and flash prevention
  • One-click copy of translated output
  • Nginx serves the production build and proxies all /api/ requests to the backend

Backend Services

  • API Server (server.py): FastAPI application with CORS middleware, request validation, and routing
  • API Client (services/api_client.py): Handles both inference paths — text completions for remote endpoints and chat completions for Ollama — with token-based auth support
  • PDF Service (services/pdf_service.py): Extracts code from uploaded PDF files using pattern recognition

External Integration

  • Remote inference: Any OpenAI-compatible API (OpenAI, Groq, OpenRouter, GenAI Gateway)
  • Local inference: Ollama running natively on the host machine, accessed from the container via host.docker.internal:11434

Service Components

Service Container Host Port Description
transpiler-api transpiler-api 5001 FastAPI backend — input validation, PDF extraction, inference orchestration
transpiler-ui transpiler-ui 3000 React frontend — served by Nginx, proxies /api/ to the backend

Ollama is intentionally not a Docker service. On macOS (Apple Silicon), running Ollama in Docker bypasses Metal GPU (MPS) acceleration, resulting in CPU-only inference. Ollama must run natively on the host so the backend container can reach it via host.docker.internal:11434.

Typical Flow

  1. User enters code or uploads a PDF in the web UI.
  2. The backend validates the input; PDF text is extracted if needed.
  3. The backend calls the configured inference endpoint (remote API or Ollama).
  4. The model returns translated code, which is displayed in the right panel.
  5. User copies the result with one click.

Get Started

Prerequisites

Before you begin, ensure you have the following installed and configured:

  • Docker and Docker Compose (v2)
  • An inference endpoint — one of:
    • A remote OpenAI-compatible API key (OpenAI, Groq, OpenRouter, or enterprise gateway)
    • Ollama installed natively on the host machine

Verify Installation

docker --version
docker compose version
docker ps

Quick Start (Docker Deployment)

1. Clone the Repository

git clone https://github.com/cld2labs/CodeTrans.git
cd CodeTrans

2. Configure the Environment

cp .env.example .env

Open .env and set INFERENCE_PROVIDER plus the corresponding variables for your chosen provider. See LLM Provider Configuration for per-provider instructions.

3. Build and Start the Application

# Standard (attached)
docker compose up --build

# Detached (background)
docker compose up -d --build

4. Access the Application

Once containers are running:

5. Verify Services

# Health check
curl http://localhost:5001/health

# View running containers
docker compose ps

View logs:

# All services
docker compose logs -f

# Backend only
docker compose logs -f transpiler-api

# Frontend only
docker compose logs -f transpiler-ui

6. Stop the Application

docker compose down

Local Development Setup

Run the backend and frontend directly on the host without Docker.

Backend (Python / FastAPI)

cd api
python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp ../.env.example ../.env       # configure your .env at the repo root
uvicorn server:app --reload --port 5001

Frontend (Node / Vite)

cd ui
npm install
npm run dev

The Vite dev server proxies /api/ to http://localhost:5001. Open http://localhost:5173.


Project Structure

CodeTrans/
├── api/                        # FastAPI backend
│   ├── config.py               # All environment-driven settings
│   ├── models.py               # Pydantic request/response schemas
│   ├── server.py               # FastAPI app, routes, and middleware
│   ├── services/
│   │   ├── api_client.py       # LLM inference client (remote + Ollama)
│   │   └── pdf_service.py      # PDF text and code extraction
│   ├── Dockerfile
│   └── requirements.txt
├── ui/                         # React frontend
│   ├── src/
│   │   ├── App.jsx
│   │   ├── components/
│   │   │   ├── CodeTranslator.jsx   # Main editor panel
│   │   │   ├── Header.jsx
│   │   │   ├── PDFUploader.jsx
│   │   │   └── StatusBar.jsx
│   │   └── main.jsx
│   ├── Dockerfile
│   └── vite.config.js
├── docs/
│   └── assets/                 # Documentation images
├── docker-compose.yaml         # Main orchestration file
├── .env.example                # Environment variable reference
└── README.md

Usage Guide

Translate code:

  1. Open the application at http://localhost:3000.
  2. Select the source language using the pill buttons at the top-left.
  3. Select the target language using the pill buttons at the top-right.
  4. Paste or type your code in the left panel.
  5. Click Translate Code.
  6. View the result in the right panel and click Copy to copy it to the clipboard.

Upload a PDF:

  1. Scroll to the Upload PDF section below the code panels.
  2. Drag and drop a PDF file, or click to browse.
  3. Code is extracted automatically and placed in the source panel.
  4. Select your languages and translate as normal.

Dark mode:

The app defaults to dark mode. Click the theme toggle in the header to switch to light mode. Your preference is saved in localStorage.


Performance Tips

  • Use the largest model your hardware can sustain. codellama:34b produces the best translation quality; codellama:7b is faster and good for benchmarking.
  • Lower LLM_TEMPERATURE (e.g., 0.1) for more deterministic, literal translations. Raise it slightly (e.g., 0.3–0.5) if you want more idiomatic rewrites.
  • Keep inputs under MAX_CODE_LENGTH. Shorter, focused snippets translate more accurately than entire files. Split large files by class or function.
  • On Apple Silicon, always run Ollama natively — never inside Docker. The MPS (Metal) GPU backend delivers 5–10x the throughput of CPU-only inference.
  • On Linux with an NVIDIA GPU, set CUDA_VISIBLE_DEVICES before starting Ollama to target a specific GPU.
  • For enterprise remote APIs, choose a model with a large context window (≥16k tokens) to avoid truncation on longer inputs.

Inference Benchmarks

The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized code-translation workload (averaged over 3 runs).

Provider Model Deployment Context Window Avg Input Tokens Avg Output Tokens Avg Tokens / Request P50 Latency (ms) P95 Latency (ms) Throughput (req/s) Hardware
Ollama qwen3:4b-instruct Local 8K 218 210.3 428.3 10,361 10,521 0.1186 Apple Silicon (Metal) (Macbook Pro M4)
vLLM Qwen3-4B-Instruct-2507 Local 4K 218 211.3 429.3 11,965 18,806 0.0706 Apple Silicon (Metal) (Macbook Pro M4)
Intel OPEA EI Qwen/Qwen3-4B-Instruct-2507 Enterprise (On-Prem) 8.1K 218 211.7 429.7 12,732 13,277 0.1036 CPU-only (Xeon)
OpenAI (Cloud) gpt-4o-mini API (Cloud) 128K 216.7 204.7 421.3 4,563 6,969 0.2126 N/A

Notes:

  • Context Window for Ollama (8K) and vLLM (4K) reflects the LLM_MAX_TOKENS / --max-model-len used during benchmarking, not the model's native 262K context. vLLM shares its 4K context between input and output tokens.
  • All benchmarks use the same CodeTrans translation prompt and identical inputs (3 runs: small python→java, medium python→rust, large python→go). Token counts may vary slightly per run due to non-deterministic model output.
  • Ollama on Apple Silicon uses Metal (MPS) GPU acceleration — running it inside Docker would fall back to CPU-only inference. The qwen3:4b-instruct tag must be used (not qwen3:4b) to disable the default thinking mode.
  • vLLM on Apple Silicon uses vllm-metal — the standard pip install vllm does not support macOS.
  • Intel OPEA Enterprise Inference runs on Intel Xeon CPUs without GPU acceleration.

Model Capabilities

Qwen3-4B-Instruct-2507

A 4-billion-parameter open-weight code model from Alibaba's Qwen team (July 2025 release), designed for on-prem and edge deployment.

Attribute Details
Parameters 4.0B total (3.6B non-embedding)
Architecture Transformer with Grouped Query Attention (GQA) — 36 layers, 32 Q-heads / 8 KV-heads
Context Window 262,144 tokens (256K) native
Reasoning Mode Non-thinking only (Instruct-2507 variant). Separate Thinking-2507 variant available with always-on chain-of-thought
Tool / Function Calling Supported; MCP (Model Context Protocol) compatible
Structured Output JSON-structured responses supported
Multilingual 100+ languages and dialects
Code Benchmarks MultiPL-E: 76.8%, LiveCodeBench v6: 35.1%, BFCL-v3 (tool use): 61.9
Quantization Formats GGUF (Q4_K_M ~2.5 GB, Q8_0 ~4.3 GB), AWQ (int4), GPTQ (int4), MLX (4-bit ~2.3 GB)
Inference Runtimes Ollama, vLLM, llama.cpp, LMStudio, SGLang, KTransformers
Fine-Tuning Full fine-tuning and adapter-based (LoRA); 5,000+ community adapters on HuggingFace
License Apache 2.0
Deployment Local, on-prem, air-gapped, cloud — full data sovereignty

GPT-4o-mini

OpenAI's cost-efficient multimodal model, accessible exclusively via cloud API.

Attribute Details
Parameters Not publicly disclosed
Architecture Multimodal Transformer (text + image input, text output)
Context Window 128,000 tokens input / 16,384 tokens max output
Reasoning Mode Standard inference (no explicit chain-of-thought toggle)
Tool / Function Calling Supported; parallel function calling
Structured Output JSON mode and strict JSON schema adherence supported
Multilingual Broad multilingual support
Code Benchmarks MMMLU: ~87%, strong HumanEval and MBPP scores
Pricing $0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount)
Fine-Tuning Supervised fine-tuning via OpenAI API
License Proprietary (OpenAI Terms of Use)
Deployment Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option
Knowledge Cutoff October 2023

Comparison Summary

Capability Qwen3-4B-Instruct-2507 GPT-4o-mini
Code translation Yes Yes
Function / tool calling Yes Yes
JSON structured output Yes Yes
On-prem / air-gapped deployment Yes No
Data sovereignty Full (weights run locally) No (data sent to cloud API)
Open weights Yes (Apache 2.0) No (proprietary)
Custom fine-tuning Full fine-tuning + LoRA adapters Supervised fine-tuning (API only)
Quantization for edge devices GGUF / AWQ / GPTQ / MLX N/A
Multimodal (image input) No Yes
Native context window 256K 128K

Both models support code translation, function calling, and JSON-structured output. However, only Qwen3-4B offers open weights, data sovereignty, and local deployment flexibility — making it suitable for air-gapped, regulated, or cost-sensitive environments. GPT-4o-mini offers lower latency and higher throughput via OpenAI's cloud infrastructure, with added multimodal capabilities.


LLM Provider Configuration

All providers are configured via the .env file. Set INFERENCE_PROVIDER=remote for any cloud or API-based provider, and INFERENCE_PROVIDER=ollama for local inference.

OpenAI

INFERENCE_PROVIDER=remote
INFERENCE_API_ENDPOINT=https://api.openai.com
INFERENCE_API_TOKEN=sk-...
INFERENCE_MODEL_NAME=gpt-4o

Recommended models: gpt-4o, gpt-4o-mini, gpt-4-turbo.

Groq

Groq provides OpenAI-compatible endpoints with extremely fast inference (LPU hardware).

INFERENCE_PROVIDER=remote
INFERENCE_API_ENDPOINT=https://api.groq.com/openai
INFERENCE_API_TOKEN=gsk_...
INFERENCE_MODEL_NAME=llama3-70b-8192

Recommended models: llama3-70b-8192, mixtral-8x7b-32768, llama-3.1-8b-instant.

Ollama

Runs inference locally on the host machine with full GPU acceleration.

  1. Install Ollama: https://ollama.com/download
  2. Pull a model:
 # Production — best translation quality (~20 GB)
 ollama pull codellama:34b

 # Testing / SLM benchmarking (~4 GB, fast)
 ollama pull codellama:7b

 # Other strong code models
 ollama pull deepseek-coder:6.7b
 ollama pull qwen2.5-coder:7b
 ollama pull codellama:13b
  1. Confirm Ollama is running:
 curl http://localhost:11434/api/tags
  1. Configure .env:
 INFERENCE_PROVIDER=ollama
 INFERENCE_API_ENDPOINT=http://host.docker.internal:11434
 INFERENCE_MODEL_NAME=codellama:7b
 # INFERENCE_API_TOKEN is not required for Ollama

OpenRouter

OpenRouter provides a unified API across hundreds of models from different providers.

INFERENCE_PROVIDER=remote
INFERENCE_API_ENDPOINT=https://openrouter.ai/api
INFERENCE_API_TOKEN=sk-or-...
INFERENCE_MODEL_NAME=meta-llama/llama-3.1-70b-instruct

Recommended models: meta-llama/llama-3.1-70b-instruct, deepseek/deepseek-coder, qwen/qwen-2.5-coder-32b-instruct.

Custom OpenAI-Compatible API

Any enterprise gateway that exposes an OpenAI-compatible /v1/completions or /v1/chat/completions endpoint works without code changes.

GenAI Gateway (LiteLLM-backed):

INFERENCE_PROVIDER=remote
INFERENCE_API_ENDPOINT=https://genai-gateway.example.com
INFERENCE_API_TOKEN=your-litellm-master-key
INFERENCE_MODEL_NAME=codellama/CodeLlama-34b-Instruct-hf

If the endpoint uses a private domain mapped in /etc/hosts, also set:

LOCAL_URL_ENDPOINT=your-private-domain.internal

Switching Providers

  1. Edit .env with the new provider's values.
  2. Restart the backend container:
 docker compose restart transpiler-api

No rebuild is needed — all settings are injected at runtime via environment variables.


Environment Variables

All variables are defined in .env (copied from .env.example). The backend reads them at startup via python-dotenv.

Core LLM Configuration

Variable Description Default Type
INFERENCE_PROVIDER remote for any OpenAI-compatible API; ollama for local inference remote string
INFERENCE_API_ENDPOINT Base URL of the inference service (no /v1 suffix) string
INFERENCE_API_TOKEN Bearer token / API key. Not required for Ollama string
INFERENCE_MODEL_NAME Model identifier passed to the API codellama/CodeLlama-34b-Instruct-hf string

Generation Parameters

Variable Description Default Type
LLM_TEMPERATURE Sampling temperature. Lower = more deterministic output (0.0–2.0) 0.2 float
LLM_MAX_TOKENS Maximum tokens in the translated output 4096 integer
MAX_CODE_LENGTH Maximum input code length in characters 4000 integer

File Upload Limits

Variable Description Default Type
MAX_FILE_SIZE Maximum PDF upload size in bytes (default: 10 MB) 10485760 integer

Session Management

Variable Description Default Type
CORS_ALLOW_ORIGINS Allowed CORS origins (comma-separated or *). Restrict in production ["*"] string

Server Configuration

Variable Description Default Type
BACKEND_PORT Port the FastAPI server listens on 5001 integer
LOCAL_URL_ENDPOINT Private domain in /etc/hosts the container must resolve. Leave as not-needed if not applicable not-needed string
VERIFY_SSL Set false only for environments with self-signed certificates true boolean

Technology Stack

Backend

  • Framework: FastAPI (Python 3.11+) with Uvicorn ASGI server
  • LLM Integration: openai Python SDK — works with any OpenAI-compatible endpoint (remote or Ollama)
  • Local Inference: Ollama — runs natively on host with full Metal (MPS) or CUDA GPU acceleration
  • PDF Processing: PyMuPDF (fitz) for text and code extraction from uploaded documents
  • Config Management: python-dotenv for environment variable injection at startup
  • Data Validation: Pydantic v2 for request/response schema enforcement

Frontend

  • Framework: React 18 with Vite (fast HMR and production bundler)
  • Styling: Tailwind CSS v3 with custom surface-* dark mode color palette
  • Production Server: Nginx — serves the built assets and proxies /api/ to the backend container
  • UI Features: Language pill selectors, side-by-side code editor, drag-and-drop PDF upload, real-time character counter, one-click copy, dark/light theme toggle

Troubleshooting

For common issues and solutions, see TROUBLESHOOTING.md.

Common Issues

Issue: Backend returns 500 on translate

# Check backend logs for error details
docker compose logs backend

# Verify the inference endpoint and token are set correctly
grep INFERENCE .env
  • Confirm INFERENCE_API_ENDPOINT is reachable from your machine.
  • Verify INFERENCE_API_TOKEN is valid and has the correct permissions.

Issue: Ollama connection refused

# Confirm Ollama is running on the host
curl http://localhost:11434/api/tags

# If not running, start it
ollama serve

Issue: Ollama is slow / appears to be CPU-only

  • Ensure Ollama is running natively on the host, not inside Docker.
  • On macOS, verify the Ollama app is using MPS in Activity Monitor (GPU History).
  • See the Ollama section for correct setup.

Issue: SSL certificate errors

# In .env
VERIFY_SSL=false

# Restart the backend
docker compose restart transpiler-api

Issue: PDF upload fails or returns no code

  • Max file size: 10 MB (MAX_FILE_SIZE)
  • Supported format: PDF only (text-based; scanned image PDFs are not supported)
  • Ensure the file is not corrupted or password-protected

Issue: Frontend cannot connect to API

# Verify both containers are running
docker compose ps

# Check CORS settings
grep CORS .env

Ensure CORS_ALLOW_ORIGINS includes the frontend origin (e.g., http://localhost:3000).

Issue: Private domain not resolving inside container

Set LOCAL_URL_ENDPOINT=your-private-domain.internal in .env — this adds the host-gateway mapping for the container.

Debug Mode

Enable verbose logging for deeper inspection:

# Not a built-in env var — increase FastAPI log level via Uvicorn
# Edit docker-compose.yaml command or run locally:
uvicorn server:app --reload --port 5001 --log-level debug

Or view real-time container logs:

docker compose logs -f transpiler-api

License

This project is licensed under our LICENSE file for details.


Disclaimer

CodeTrans is provided as-is for demonstration and educational purposes. While we strive for accuracy:

  • Translated code should be reviewed by a qualified engineer before use in production systems
  • Do not rely solely on AI-generated translations without testing and validation
  • Do not submit confidential or proprietary code to third-party API providers without reviewing their data handling policies
  • The quality of translation depends on the underlying model and may vary across language pairs and code complexity

For full disclaimer details, see DISCLAIMER.md.

About

AI-powered code translation tool that converts source code between programming languages using LLMs. Built with a FastAPI backend and React frontend, supports remote OpenAI-compatible APIs and local Ollama inference. Containerized with Docker Compose.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors