Skip to content

lloyd1515/AutomatedDocxTranslator

Repository files navigation

AutomatedDocxTranslator

AI-powered document translator that preserves Word formatting, images, and layout. Built with Flask and Google Gemini API.

Features

  • Translates .docx files while preserving:
    • Bold and italic formatting
    • Images and graphics
    • Tables and headers/footers
    • Tab spacing and alignment
  • Web-based interface with drag-and-drop upload
  • Automatic file cleanup
  • Error recovery with checkpoint system
  • Progress tracking support

Setup

1. Install Dependencies

pip install -r requirements.txt

2. Configure Environment

Copy the example env file and add your API key:

cp .env.example .env

Then edit .env and set your Gemini API key:

GEMINI_API_KEY=your_google_gemini_api_key_here
TARGET_LANGUAGE=German
BATCH_SIZE=25
BATCH_DELAY=0.5
MAX_FILE_SIZE_MB=10
FILE_CLEANUP_HOURS=1

Important: Get your Gemini API key from https://aistudio.google.com/app/apikey

3. Run the Application

Option A: Python directly

python app.py

Option B: Windows batch file

StartTranslator.bat

The server will start at http://127.0.0.1:5000

Usage

  1. Open http://127.0.0.1:5000 in your browser
  2. Drag and drop a .docx file (or click to select)
  3. Click "Translate & Download"
  4. Wait 10-30 seconds (depending on document size)
  5. Translated document will download automatically

Configuration Options

Edit .env file to customize:

Variable Description Default
GEMINI_API_KEY Your Google Gemini API key (required)
TARGET_LANGUAGE Translation target language German
BATCH_SIZE Segments per API call (1-100) 25
BATCH_DELAY Delay between batches (seconds) 0.5
MAX_FILE_SIZE_MB Maximum upload size 10
FILE_CLEANUP_HOURS Auto-delete files after hours 1
FLASK_DEBUG Enable Flask debug mode True
FLASK_PORT Server port 5000

Supported Languages

Currently configured for German, but you can change TARGET_LANGUAGE to:

  • French
  • Spanish
  • Italian
  • Portuguese
  • Dutch
  • Any language supported by Google Gemini

Error Recovery

If translation fails mid-process:

  1. Checkpoint files (.checkpoint, .tmp) are created automatically
  2. Re-running translation will resume from the last successful batch
  3. Checkpoint files are cleaned up after successful completion

Project Structure

AutomatedDocxTranslator/
├── app.py                  # Flask web application
├── translator_core.py      # Core translation logic
├── config.py               # Centralized configuration
├── .env                    # Environment variables (not in git)
├── requirements.txt        # Python dependencies
├── templates/
│   └── index.html         # Web UI
├── validators.py          # Document validation
├── tests/                 # Unit tests
├── uploads/               # Temporary uploaded files
└── downloads/             # Temporary translated files

Security Notes

  • Never commit .env file to version control (already in .gitignore)
  • API keys are loaded from environment variables only
  • Uploaded files are automatically deleted after configured hours
  • Maximum file size is enforced (default 10MB)

Troubleshooting

"GEMINI_API_KEY not found" error:

  • Make sure .env file exists in project root
  • Check that GEMINI_API_KEY is set correctly

Translation fails mid-process:

  • Check your internet connection
  • Verify API key is valid
  • Check checkpoint files (.checkpoint) for resume capability

Files not cleaning up:

  • Verify FILE_CLEANUP_HOURS in .env
  • Check console for [CLEANUP] messages
  • Scheduler runs every N hours (not immediately)

Testing

pytest tests/ -v

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors