A standalone translation utility for multilingual web projects. This tool can be integrated into any project to manage the automatic translation of content using AI-powered translation services.
- Clone this repository into your project:
git clone https://github.com/your-org/translator.git
cd translator
chmod +x bin/*The translator uses three main configuration files, all located in your project root:
| File | Purpose | Default Template |
|---|---|---|
translator.config.yaml |
Project-specific settings | translator/config/translator.config.template.yaml |
translator.models.yaml |
Translation model configuration | translator/config/translator.models.template.yaml |
translator.role.tpl |
Translation role template | translator/config/translator.role.template.tpl |
If any of these files are not found in your project root, the default templates from the translator directory will be used.
- Setup: Clone the translator repository into your project
- Configure: Create the configuration files in your project root
- Run: Execute the translation scripts
The translator will automatically:
- Load project configuration
- Load model definitions
- Find or create translation roles
- Process your content
Create a translator.config.yaml file in your project root directory:
# Project translation configuration
# Source and target directories
source_directory: content/english # Path to your source content
target_directory: content # Parent directory containing all languages
# Translation parameters
translation_chunk_size: 6144 # Maximum size of text chunks for translation
workers: 1 # Parallel worker count for files/languages/chunks
openrouter_timeout: 30 # Seconds before an OpenRouter request times out
# Cache directory for translations (per-document cache structure)
cache_directory: .translation-cache # Directory for per-document cache files
# Role template location (project-specific)
role_template: translator.role.tpl # Template for translation roles
# YAML front matter validation
check_yaml_keys: false # If true, validates that YAML keys contain only valid characters (a-z, A-Z, 0-9, _, ., -)
# Rejects translations that would corrupt YAML structure. Set to false for Hugo sites
# where front matter keys should remain in English.
# YAML keys whose values should never be translated
yaml_keys_to_skip:
- code
- layoutA template file is available at translator/config/translator.config.template.yaml.
Create a translator.models.yaml file in your project root directory to configure the AI models used for translation:
# Translation models configuration
models:
default:
- openai:gpt-4o-mini
- claude:claude-3-5-haiku-latest
# Optional per-language override
chinese:
- qwen/qwen-2.5-7b-instruct
- openai:gpt-4o-miniA template file is available at translator/config/translator.models.template.yaml.
If either of these configuration files are not found, default values will be used.
The translation role template (translator.role.tpl) defines the instructions given to the AI model for translation tasks. This is a critical component that affects translation quality and accuracy.
- Primary location:
<project_root>/translator.role.tpl - Default template:
translator/config/translator.role.template.tpl
Create a translator.role.tpl file in your project root with instructions for the translation model. Here's a template:
You are a professional translator to $LANGUAGE language. You are specialized in translating markdown files with precise line-by-line translation. Your task is to:
1. Translate the entire document provided
2. Preserve the original document's structure exactly: all original formatting, spacing, empty lines and special characters
3. DO NOT translate:
- comment blocks in <!-- ... -->
- Any code blocks
- File paths
- Variables
- HTML tags
4. Do not ask questions or request continuations
5. ENSURE each line in the original corresponds to the same line in the translated version, even EMPTY line follows EMPTY line, very important to make translation LINE perfect same as original
6. Do not translate lines with the following strings: CODE_BLOCK_0 where 0 can be any number, this is a special string that indicates the start of a code block
Translate the following document exactly as instructed. Reply with just the translation WITHOUT adding anything additional from your side.The template uses the following environment variables for substitution:
$LANGUAGE: Replaced with the target language name during role creation
You can customize the role template to provide more specific instructions for your content types. For example:
- Add specific terminology to maintain across translations
- Include special instructions for domain-specific content
- Define rules for handling particular content elements
During translation:
- The translator loads
translator.role.tpl(or the default template) and substitutes$LANGUAGE. - The resulting prompt is sent as the system/role instruction for each OpenRouter request.
To automatically translate all content:
./translator/bin/auto-translate [project_directory]If project_directory is not specified, the current directory will be used as the project directory.
The auto-translate script will:
- Detect which files need translation by comparing line counts and checking the cache
- Use cached translations when available to avoid retranslating unchanged content
- Preserve code blocks and HTML comments exactly as they appear
- Maintain the exact line structure between source and translation
- Automatically clean up deleted source files from translations
Use -f/--force to re-render files from cached chunks even if they appear up-to-date (useful after changing merging/validation logic).
You can verify OpenRouter connectivity and credentials with:
./translator/bin/openrouter-healthcheckIf a .env file exists in the project root, the CLI loads it at startup using vlucas/phpdotenv. This is the recommended place to store OPENROUTER_TRANSLATOR_API_KEY for local runs. Environment variables in your shell still take precedence.
OPENROUTER_TRANSLATOR_API_KEY: OpenRouter API key (required).OPENROUTER_BASE_URL: Override OpenRouter base URL.OPENROUTER_TIMEOUT: Request timeout in seconds.OPENROUTER_RETRIES: Retry count for non-timeout errors.TRANSLATION_CHUNK_SIZE: Overridetranslation_chunk_size.TRANSLATOR_LANGUAGES: Comma-separated language override.TRANSLATOR_MODEL: Single model override.TRANSLATOR_MODELS: Comma-separated model list override.DEBUG=1: Enable verbose logs and dump helpers.PROMPT=1: Dump prompts without calling models.
The translator is designed to be called the same way in CI as in local usage. In workflows like check_docs.yml, the invocation should remain:
./translator/bin/auto-translateExample workflow step:
- name: Run auto-translate
env:
OPENROUTER_TRANSLATOR_API_KEY: ${{ secrets.OPENROUTER_TRANSLATOR_API_KEY }}
run: |
set -e
apt-get update -y
apt-get install -y php-cli php-curl
./translator/bin/auto-translateThe translator tool requires the following dependencies:
- PHP (8.2+ recommended; requires
curlextension) - OpenRouter API key (
OPENROUTER_TRANSLATOR_API_KEY)
To add a new language, create a directory with the language name inside your target_directory. The auto-translate tool will automatically detect and translate content for all language directories found.
The translator expects the following directory structure:
project/
├── content/ # target_directory
│ ├── english/ # source_directory
│ │ ├── file1.md
│ │ └── folder/
│ │ └── file2.md
│ ├── spanish/ # target language
│ └── french/ # target language
└── translator/ # translator directory
├── bin/
└── config/
The translation process works as follows:
- Scan for markdown files in the source directory
- Check if translation is needed by:
- Comparing line counts between source and target files
- Checking if all content chunks are already in the cache
- Extract code blocks and comments - These are preserved exactly as-is and cached separately
- Chunk the content - Split the document into manageable chunks (default: 6144 bytes)
- Check cache first - For each chunk, check if a translation already exists in the cache
- Translate missing chunks - Only translate chunks that aren't in the cache, using AI models in listed order
- Preserve structure - Ensure the translated file has the exact same line structure as the source (same line numbers for code blocks, comments, and empty lines)
- Validate output - Reject chunks that look untranslated and ensure link URLs stay unchanged
- Update cache - Store all translated chunks in the cache for future use
- Clean up - Remove translation files for deleted source files
For YAML-only files (front matter only), the translator:
- Extracts values into a one-value-per-line list (keys/indentation preserved)
- Skips URLs and HTML tag-only values
- Preserves quoted scalars (quotes are stripped before translation and restored on merge)
- Chunks the values list using the same
translation_chunk_size
The translator uses a cache-based approach to optimize performance:
- Cache directory: Defined by
cache_directoryin config (default:.translation-cache) - Per-document cache: Each source document has its own cache file, following the same directory structure as the source files
- Uncompressed storage: Cache files are stored as plain JSON files (not compressed) for easy inspection and debugging
- Block-level caching: Each content block is hashed and cached independently
- Code block preservation: Code blocks and HTML comments are cached with their original content (not translated)
- Incremental updates: Only new or changed chunks are translated, existing cached translations are reused
- Multi-language support: The cache stores translations for each language separately
Cache Structure Example:
project/
├── content/
│ └── english/
│ ├── docs/
│ │ └── guide.md
│ └── api.md
└── .translation-cache/
├── docs/
│ └── guide.md.json
└── api.md.json
This approach ensures:
- Efficiency: Unchanged content is never retranslated
- Consistency: Code blocks and comments are always preserved exactly
- Speed: Large documents with small changes translate quickly
- Cost savings: Reduces API calls to translation services
- Maintainability: Per-document cache files are easy to inspect, debug, and manage
A comprehensive test suite is available to verify the translation system:
./run-all-tests.shThis script tests:
- New document translation
- Line changes (single and multiple)
- Empty line handling (addition, removal, at various positions)
- Whitespace-only line preservation
- HTML comment preservation
- Code block preservation
- File deletion handling
- Cache reuse
- Line structure matching (line counts, code block positions, comment positions, empty line positions)
- Translation failures: Check the output for specific error messages. The tool will attempt multiple models in the listed order until a good translation is found.
- API keys: Ensure
OPENROUTER_TRANSLATOR_API_KEYis set for OpenRouter access. - Line count mismatches: The tool automatically retries with different models if line counts don't match. Check that your role template emphasizes line-by-line preservation.
- Cache issues: If translations seem stale, you can delete the cache directory (
.translation-cache) or specific cache files to force a full retranslation. Cache files are stored as plain JSON for easy inspection. - Structure preservation: The system validates that code blocks, HTML comments, and empty lines appear on the same line numbers in source and translation. If this fails, the translation is retried.
- Timeouts: OpenRouter timeouts skip retries and immediately fall back to the next model. Increase
openrouter_timeoutif needed. - Prompt inspection: Use
PROMPT=1to dump prompts without calling models.
This project is licensed under the MIT License - see the LICENSE file for details.