Translator Tool

A standalone translation utility for multilingual web projects. This tool can be integrated into any project to manage the automatic translation of content using AI-powered translation services.

Installation

Clone this repository into your project:

git clone https://github.com/your-org/translator.git
cd translator
chmod +x bin/*

Configuration Overview

The translator uses three main configuration files, all located in your project root:

File	Purpose	Default Template
`translator.config.yaml`	Project-specific settings	`translator/config/translator.config.template.yaml`
`translator.models.yaml`	Translation model configuration	`translator/config/translator.models.template.yaml`
`translator.role.tpl`	Translation role template	`translator/config/translator.role.template.tpl`

If any of these files are not found in your project root, the default templates from the translator directory will be used.

Configuration Workflow

Setup: Clone the translator repository into your project
Configure: Create the configuration files in your project root
Run: Execute the translation scripts

The translator will automatically:

Load project configuration
Load model definitions
Find or create translation roles
Process your content

Project Configuration

Create a translator.config.yaml file in your project root directory:

# Project translation configuration

# Source and target directories
source_directory: content/english  # Path to your source content
target_directory: content         # Parent directory containing all languages

# Translation parameters
translation_chunk_size: 6144      # Maximum size of text chunks for translation
workers: 1                        # Parallel worker count for files/languages/chunks
openrouter_timeout: 30            # Seconds before an OpenRouter request times out

# Cache directory for translations (per-document cache structure)
cache_directory: .translation-cache  # Directory for per-document cache files

# Role template location (project-specific)
role_template: translator.role.tpl  # Template for translation roles

# YAML front matter validation
check_yaml_keys: false  # If true, validates that YAML keys contain only valid characters (a-z, A-Z, 0-9, _, ., -)
                       # Rejects translations that would corrupt YAML structure. Set to false for Hugo sites
                       # where front matter keys should remain in English.

# YAML keys whose values should never be translated
yaml_keys_to_skip:
  - code
  - layout

A template file is available at translator/config/translator.config.template.yaml.

Translation Models Configuration

Create a translator.models.yaml file in your project root directory to configure the AI models used for translation:

# Translation models configuration
models:
  default:
    - openai:gpt-4o-mini
    - claude:claude-3-5-haiku-latest

  # Optional per-language override
  chinese:
    - qwen/qwen-2.5-7b-instruct
    - openai:gpt-4o-mini

A template file is available at translator/config/translator.models.template.yaml.

If either of these configuration files are not found, default values will be used.

Translation Role Configuration

The translation role template (translator.role.tpl) defines the instructions given to the AI model for translation tasks. This is a critical component that affects translation quality and accuracy.

File Location

Primary location: <project_root>/translator.role.tpl
Default template: translator/config/translator.role.template.tpl

Role Template Format

Create a translator.role.tpl file in your project root with instructions for the translation model. Here's a template:

You are a professional translator to $LANGUAGE language. You are specialized in translating markdown files with precise line-by-line translation. Your task is to:

1. Translate the entire document provided
2. Preserve the original document's structure exactly: all original formatting, spacing, empty lines and special characters
3. DO NOT translate:
   - comment blocks in <!-- ... -->
   - Any code blocks
   - File paths
   - Variables
   - HTML tags
4. Do not ask questions or request continuations
5. ENSURE each line in the original corresponds to the same line in the translated version, even EMPTY line follows EMPTY line, very important to make translation LINE perfect same as original
6. Do not translate lines with the following strings: CODE_BLOCK_0 where 0 can be any number, this is a special string that indicates the start of a code block

Translate the following document exactly as instructed. Reply with just the translation WITHOUT adding anything additional from your side.

Environment Variables

The template uses the following environment variables for substitution:

$LANGUAGE: Replaced with the target language name during role creation

Custom Roles

You can customize the role template to provide more specific instructions for your content types. For example:

Add specific terminology to maintain across translations
Include special instructions for domain-specific content
Define rules for handling particular content elements

How Roles are Used

During translation:

The translator loads translator.role.tpl (or the default template) and substitutes $LANGUAGE.
The resulting prompt is sent as the system/role instruction for each OpenRouter request.

Usage

Auto-Translate

To automatically translate all content:

./translator/bin/auto-translate [project_directory]

If project_directory is not specified, the current directory will be used as the project directory.

The auto-translate script will:

Detect which files need translation by comparing line counts and checking the cache
Use cached translations when available to avoid retranslating unchanged content
Preserve code blocks and HTML comments exactly as they appear
Maintain the exact line structure between source and translation
Automatically clean up deleted source files from translations

Use -f/--force to re-render files from cached chunks even if they appear up-to-date (useful after changing merging/validation logic).

OpenRouter Health Check

You can verify OpenRouter connectivity and credentials with:

./translator/bin/openrouter-healthcheck

Environment File (.env)

If a .env file exists in the project root, the CLI loads it at startup using vlucas/phpdotenv. This is the recommended place to store OPENROUTER_TRANSLATOR_API_KEY for local runs. Environment variables in your shell still take precedence.

Supported Environment Variables

OPENROUTER_TRANSLATOR_API_KEY: OpenRouter API key (required).
OPENROUTER_BASE_URL: Override OpenRouter base URL.
OPENROUTER_TIMEOUT: Request timeout in seconds.
OPENROUTER_RETRIES: Retry count for non-timeout errors.
TRANSLATION_CHUNK_SIZE: Override translation_chunk_size.
TRANSLATOR_LANGUAGES: Comma-separated language override.
TRANSLATOR_MODEL: Single model override.
TRANSLATOR_MODELS: Comma-separated model list override.
DEBUG=1: Enable verbose logs and dump helpers.
PROMPT=1: Dump prompts without calling models.

CI / GitHub Actions

The translator is designed to be called the same way in CI as in local usage. In workflows like check_docs.yml, the invocation should remain:

./translator/bin/auto-translate

Example workflow step:

- name: Run auto-translate
  env:
    OPENROUTER_TRANSLATOR_API_KEY: ${{ secrets.OPENROUTER_TRANSLATOR_API_KEY }}
  run: |
    set -e
    apt-get update -y
    apt-get install -y php-cli php-curl
    ./translator/bin/auto-translate

Prerequisites

The translator tool requires the following dependencies:

PHP (8.2+ recommended; requires curl extension)
OpenRouter API key (OPENROUTER_TRANSLATOR_API_KEY)

Advanced Configuration

Adding New Languages

To add a new language, create a directory with the language name inside your target_directory. The auto-translate tool will automatically detect and translate content for all language directories found.

Content Structure

The translator expects the following directory structure:

project/
├── content/               # target_directory
│   ├── english/          # source_directory
│   │   ├── file1.md
│   │   └── folder/
│   │       └── file2.md
│   ├── spanish/          # target language
│   └── french/           # target language
└── translator/           # translator directory
    ├── bin/
    └── config/

Translation Process

The translation process works as follows:

Scan for markdown files in the source directory
Check if translation is needed by:
- Comparing line counts between source and target files
- Checking if all content chunks are already in the cache
Extract code blocks and comments - These are preserved exactly as-is and cached separately
Chunk the content - Split the document into manageable chunks (default: 6144 bytes)
Check cache first - For each chunk, check if a translation already exists in the cache
Translate missing chunks - Only translate chunks that aren't in the cache, using AI models in listed order
Preserve structure - Ensure the translated file has the exact same line structure as the source (same line numbers for code blocks, comments, and empty lines)
Validate output - Reject chunks that look untranslated and ensure link URLs stay unchanged
Update cache - Store all translated chunks in the cache for future use
Clean up - Remove translation files for deleted source files

For YAML-only files (front matter only), the translator:

Extracts values into a one-value-per-line list (keys/indentation preserved)
Skips URLs and HTML tag-only values
Preserves quoted scalars (quotes are stripped before translation and restored on merge)
Chunks the values list using the same translation_chunk_size

Caching System

The translator uses a cache-based approach to optimize performance:

Cache directory: Defined by cache_directory in config (default: .translation-cache)
Per-document cache: Each source document has its own cache file, following the same directory structure as the source files
Uncompressed storage: Cache files are stored as plain JSON files (not compressed) for easy inspection and debugging
Block-level caching: Each content block is hashed and cached independently
Code block preservation: Code blocks and HTML comments are cached with their original content (not translated)
Incremental updates: Only new or changed chunks are translated, existing cached translations are reused
Multi-language support: The cache stores translations for each language separately

Cache Structure Example:

project/
├── content/
│   └── english/
│       ├── docs/
│       │   └── guide.md
│       └── api.md
└── .translation-cache/
    ├── docs/
    │   └── guide.md.json
    └── api.md.json

This approach ensures:

Efficiency: Unchanged content is never retranslated
Consistency: Code blocks and comments are always preserved exactly
Speed: Large documents with small changes translate quickly
Cost savings: Reduces API calls to translation services
Maintainability: Per-document cache files are easy to inspect, debug, and manage

Testing

A comprehensive test suite is available to verify the translation system:

./run-all-tests.sh

This script tests:

New document translation
Line changes (single and multiple)
Empty line handling (addition, removal, at various positions)
Whitespace-only line preservation
HTML comment preservation
Code block preservation
File deletion handling
Cache reuse
Line structure matching (line counts, code block positions, comment positions, empty line positions)

Troubleshooting

Translation failures: Check the output for specific error messages. The tool will attempt multiple models in the listed order until a good translation is found.
API keys: Ensure OPENROUTER_TRANSLATOR_API_KEY is set for OpenRouter access.
Line count mismatches: The tool automatically retries with different models if line counts don't match. Check that your role template emphasizes line-by-line preservation.
Cache issues: If translations seem stale, you can delete the cache directory (.translation-cache) or specific cache files to force a full retranslation. Cache files are stored as plain JSON for easy inspection.
Structure preservation: The system validates that code blocks, HTML comments, and empty lines appear on the same line numbers in source and translation. If this fails, the translation is retried.
Timeouts: OpenRouter timeouts skip retries and immediately fall back to the next model. Increase openrouter_timeout if needed.
Prompt inspection: Use PROMPT=1 to dump prompts without calling models.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
bin		bin
config		config
src		src
test		test
.gitignore		.gitignore
DEVELOPER_GUIDE.md		DEVELOPER_GUIDE.md
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
composer.lock		composer.lock
translator.config.yaml		translator.config.yaml
translator.models.yaml		translator.models.yaml
translator.role.tpl		translator.role.tpl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Translator Tool

Installation

Configuration Overview

Configuration Workflow

Project Configuration

Translation Models Configuration

Translation Role Configuration

File Location

Role Template Format

Environment Variables

Custom Roles

How Roles are Used

Usage

Auto-Translate

OpenRouter Health Check

Environment File (.env)

Supported Environment Variables

CI / GitHub Actions

Prerequisites

Advanced Configuration

Adding New Languages

Content Structure

Translation Process

Caching System

Testing

Troubleshooting

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

manticoresoftware/translator

Folders and files

Latest commit

History

Repository files navigation

Translator Tool

Installation

Configuration Overview

Configuration Workflow

Project Configuration

Translation Models Configuration

Translation Role Configuration

File Location

Role Template Format

Environment Variables

Custom Roles

How Roles are Used

Usage

Auto-Translate

OpenRouter Health Check

Environment File (.env)

Supported Environment Variables

CI / GitHub Actions

Prerequisites

Advanced Configuration

Adding New Languages

Content Structure

Translation Process

Caching System

Testing

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages