Skip to content

Charpup/game-localization-mvr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

139 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Game Localization MVR (Minimum Viable Rules) v2.1

A robust, automated workflow system for game localization with strict validation, AI translation/repair, glossary management, and multi-format export.

Core Principle: Input rows == Output rows ALWAYS. No silent data loss.


๐Ÿค– For AI Coding Agents

Quick Commands for Agents:

# 1. Verify LLM connectivity (MUST run first)
python scripts/llm_ping.py

# 2. Validate workflow configuration (dry-run)
python scripts/translate_llm.py --input input.csv --output output.csv --style workflow/style_guide.md --glossary glossary/compiled.yaml --style-profile workflow/style_profile.generated.yaml --dry-run

# 3. Run E2E test
python scripts/test_e2e_workflow.py

Environment Variables (REQUIRED):

LLM_BASE_URL=https://api.example.com/v1
LLM_API_KEY=sk-your-key
LLM_MODEL=gpt-4.1-mini
LLM_TRACE_PATH=data/llm_trace.jsonl

Key Rules for Agents:

  1. Never hardcode API keys - Use environment variables only
  2. Run llm_ping.py first - Fail-fast if LLM unavailable
  3. Check WORKSPACE_RULES.md - See docs/WORKSPACE_RULES.md for hard constraints
  4. Row preservation is P0 - Empty source rows must be preserved with status=skipped_empty
  5. Glossary is mandatory - glossary/compiled.yaml must exist before translation

๐Ÿš€ Pipeline Overview

Input CSV โ†’ Normalize โ†’ Translate โ†’ QA_Hard โ†’ Repair โ†’ Export
                โ†“
            Glossary (required)
Step Script Purpose Blocking?
0 llm_ping.py ๐Ÿ”Œ LLM connectivity check YES
1 normalize_guard.py ๐ŸงŠ Freeze placeholders โ†’ tokens YES
2-4 extract_terms.py โ†’ glossary_compile.py ๐Ÿ“– Build glossary YES
5 translate_llm.py ๐Ÿค– AI Translation YES
6 qa_hard.py ๐Ÿ›ก๏ธ Validate tokens/patterns YES
7 repair_loop.py ๐Ÿ”ง Auto-repair hard errors -
8 soft_qa_llm.py ๐Ÿง  Quality review -
10 rehydrate_export.py ๐Ÿ’ง Restore tokens โ†’ placeholders YES

๐Ÿ“ Project Structure

loc-mvr/
โ”œโ”€โ”€ config/
โ”‚   โ”œโ”€โ”€ llm_routing.yaml    # Model routing per step
โ”‚   โ””โ”€โ”€ pricing.yaml        # Cost calculation
โ”œโ”€โ”€ glossary/
โ”‚   โ”œโ”€โ”€ compiled.yaml       # Active glossary (generated)
โ”‚   โ””โ”€โ”€ generic_terms_zh.txt # Blacklist for extraction
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ llm_ping.py         # โ˜… Run first - connectivity check
โ”‚   โ”œโ”€โ”€ normalize_guard.py  # Step 1: Placeholder freezing
โ”‚   โ”œโ”€โ”€ translate_llm.py    # Step 5: Translation
โ”‚   โ”œโ”€โ”€ qa_hard.py          # Step 6: Hard validation
โ”‚   โ”œโ”€โ”€ repair_loop.py      # Step 7: Auto-repair
โ”‚   โ””โ”€โ”€ runtime_adapter.py  # LLM client with routing
โ”œโ”€โ”€ workflow/
โ”‚   โ”œโ”€โ”€ style_guide.md      # Translation style rules
โ”‚   โ”œโ”€โ”€ forbidden_patterns.txt
โ”‚   โ””โ”€โ”€ placeholder_schema.yaml
โ””โ”€โ”€ docs/
    โ””โ”€โ”€ WORKSPACE_RULES.md  # โ˜… Hard constraints for agents

๐Ÿ”ง Quick Start (Human)

1. Setup

git clone https://github.com/Charpup/game-localization-mvr.git
cd game-localization-mvr
pip install pyyaml requests numpy pandas jieba

2. Configure LLM (ๆŽจ่ๆŒไน…ๅŒ–)

# Windows PowerShell
$env:LLM_BASE_URL="https://api.apiyi.com/v1"
$env:LLM_API_KEY="sk-your-key"
$env:LLM_MODEL="gpt-4.1-mini"

ไนŸๅฏๅœจๆœฌๅœฐๆŒไน…ๅŒ–ๆ–‡ไปถไธญ้…็ฝฎ๏ผˆไผ˜ๅ…ˆไบŽ็Žฏๅขƒๅ˜้‡่‡ชๅŠจ่ฏปๅ–๏ผ‰๏ผš

# ๅœจ main_worktree/.llm_credentials ๅˆ›ๅปบ
LLM_BASE_URL=https://api.apiyi.com/v1
LLM_API_KEY=sk-your-key

ๅฝ“ๅ‰ๅŠ ่ฝฝ้กบๅบ๏ผšLLM_API_KEY_FILE -> ./.llm_credentials/./.llm_env/./config/llm_credentials.env/~/.game-localization-mvr/.llm_credentials -> LLM_API_KEY

4. Dependency + Environment Quick Check (before every smoke run)

python - <<'PY'
import os
import importlib

for pkg in ["requests", "numpy", "yaml", "pandas"]:
    try:
        importlib.import_module(pkg)
        print(f"[OK] {pkg}")
    except Exception:
        print(f"[MISSING] {pkg}")

for key in ["LLM_BASE_URL", "LLM_API_KEY", "LLM_MODEL"]:
    print(f"{key}={'SET' if os.getenv(key) else 'MISSING'}")
PY

If any dependency shows MISSING or env variable shows MISSING, do not start smoke run yet.

PowerShell ๅฟซ้€Ÿๆฃ€ๆŸฅ๏ผš

$missing = @()
foreach ($m in @("requests","numpy","yaml","pandas","jieba")) {
  try {
    python -c "import importlib.util; print(bool(importlib.util.find_spec('$m')))"
    Write-Host "[OK] $m"
  } catch {
    $missing += $m
    Write-Host "[MISSING] $m"
  }
}
Write-Host "LLM_BASE_URL=$([bool]$env:LLM_BASE_URL)"
Write-Host "LLM_API_KEY=$([bool]$env:LLM_API_KEY)"
Write-Host "LLM_MODEL=$([bool]$env:LLM_MODEL)"

3. Run Pipeline

# Bootstrap tracked style assets once per clean worktree
python scripts/style_guide_bootstrap.py --dry-run

# Verify LLM
python scripts/llm_ping.py

# Normalize โ†’ Translate โ†’ QA โ†’ Export
python scripts/normalize_guard.py input.csv normalized.csv map.json workflow/placeholder_schema.yaml
python scripts/translate_llm.py --input normalized.csv --output translated.csv --style workflow/style_guide.md --glossary glossary/compiled.yaml --style-profile workflow/style_profile.generated.yaml
python scripts/qa_hard.py translated.csv qa_report.json map.json
python scripts/rehydrate_export.py translated.csv map.json final.csv

3.1 Smoke Pipeline (Manifest + Issue Record)

# Full smoke pass with manifest output + issue recording
python scripts/run_smoke_pipeline.py --input "D:\\Dev_Env\\loc-mvr ๆต‹่ฏ•ๆ–‡ๆกฃ\\test_input_200-row.csv" --target-lang en-US
# ๅฏ้€‰๏ผšไป…ๅš้ข„ๆฃ€
python scripts/run_smoke_pipeline.py --input "D:\\Dev_Env\\loc-mvr ๆต‹่ฏ•ๆ–‡ๆกฃ\\test_input_200-row.csv" --target-lang en-US --verify-mode preflight

This command:

  • auto-bootstraps workflow/style_profile.generated.yaml if the clean worktree does not have one yet
  • runs llm_ping -> normalize_guard -> translate_llm -> qa_hard -> rehydrate_export
  • generates a run manifest: data/smoke_run_<timestamp>/run_manifest.json
  • runs smoke_verify --manifest ...
  • records issues to reports/smoke_issues_<run-id>.json and .jsonl
  • emits manifest.stage_artifacts with:
    • connectivity_log
    • normalize_log
    • translate_log
    • qa_hard_report
    • final_csv
    • smoke_verify_log
  • verify_mode supports preflight|full๏ผŒ้ป˜่ฎค full๏ผˆๅซ่กŒๆ•ฐ/QA ็ปŸ่ฎก๏ผ‰

ๅปบ่ฎฎๆฏๆฌกๅ†’็ƒŸๅ›บๅฎšๆฃ€ๆŸฅไปฅไธ‹ไบง็‰ฉ๏ผš

  • D:\Dev_Env\GPT_Codex_Workspace\game-localization-mvr\main_worktree\data\smoke_runs\<run>\run_manifest.json
  • D:\Dev_Env\GPT_Codex_Workspace\game-localization-mvr\main_worktree\reports\smoke_issues_<run_id>.json
  • D:\Dev_Env\GPT_Codex_Workspace\game-localization-mvr\main_worktree\reports\smoke_verify_<run_id>.json

โšก Key Features

  • Row Preservation: Empty rows kept with status=skipped_empty
  • Drift Guard: Refresh stage blocks non-placeholder text changes
  • Progress Reporting: --progress_every N for translation progress
  • Router-based Models: Configure per-step models in llm_routing.yaml
  • LLM Tracing: All calls logged to LLM_TRACE_PATH for billing

๐Ÿ“‹ Testing

# Unit tests
python scripts/test_normalize.py
python scripts/test_qa_hard.py
python scripts/test_rehydrate.py

# E2E test (small dataset)
python scripts/test_e2e_workflow.py

# Dry-run validation
python scripts/translate_llm.py --input input.csv --output out.csv --style workflow/style_guide.md --glossary glossary/compiled.yaml --style-profile workflow/style_profile.generated.yaml --dry-run

๐Ÿ“„ License

MIT License. Built for game localization automation.


๐Ÿ”— Links

About

Game localization workflow with placeholder freezing, QA validation, and export automation

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors