- Added a modern, user-friendly GUI interface using customtkinter
- Features include:
- Theme switching (Light/Dark/System)
- Real-time preview of processed pages
- Live scanning animation during OCR
- Resource monitoring (CPU, Memory, Threads)
- Progress tracking with time estimates
- Detailed processing log
- Pause/Cancel functionality
- PDF file support with automatic image conversion
- Direct PDF file processing support
- Automatic conversion of PDF pages to images
- Maintains original PDF structure in output
- Creates corpus documents with metadata
- Intelligent page naming and organization
- Real-time system resource monitoring
- Automatic cleanup of temporary files
- Image caching for better performance
- Memory-efficient processing of large files
- Added comprehensive type hints
- Enhanced documentation with detailed docstrings
- Removed dead code and unused functions
- Improved error handling and recovery
- Real-time processing feedback
- Progress bars for both PDF conversion and OCR
- Estimated time remaining calculations
- File information display
- Status messages with emoji indicators
- Modular design with clear separation of concerns
- New classes:
ResourceMonitor: System resource trackingScanAnimation: Visual processing feedbackOCRApp: Main GUI application
- Better state management and cleanup
OCR/
├── images/ # Input images directory
├── raw_texts/ # Raw OCR output directory
├── texts/ # Cleaned OCR output directory
├── __init__.py # Package initialization
├── auth.py # Google Drive authentication and service setup
├── cli.py # Command line interface and argument parsing
├── config.py # Configuration classes and constants
├── credentials.json # Google API credentials (user-provided)
├── logger.py # Logging utilities with colored output
├── main.py # Main entry point
├── ocr_processor.py # Core OCR processing logic
├── PROJECT_STRUCTURE.md # Architecture documentation
├── README.md # User documentation
├── text_processor.py # Text cleaning and combination utilities
├── gui.py # New GUI interface
└── token.json # OAuth token (auto-generated)
└── output/ # Organized output directory
└── {pdf_name}/ # Separate directory for each PDF
├── page_*.jpg # Extracted pages
└── corpus.txt # Combined output with metadata
-
Processing Flow
- Original: Single image files → OCR → Text files
- New: PDF files → Image conversion → OCR → Corpus creation
-
Output Organization
- Original: Flat directory structure
- New: Hierarchical organization by PDF source
-
User Interaction
- Original: Command-line only
- New: Both GUI and CLI interfaces
-
Progress Tracking
- Original: Basic console output
- New: Visual progress bars, time estimates, preview
- Fixed-size preview panel (400x500 pixels)
- Responsive layout with grid system
- Theme-aware interface elements
- Resource-efficient image handling
- Automatic aspect ratio maintenance
- Smooth scanning animation
- Parallel processing capabilities
- Automatic error recovery
- Session persistence
- Improved Google API authentication flow
- Better memory management
- Theme selection
- Output directory customization
- Processing pause/resume
- Preview options
- Resource monitoring toggles
The new version maintains full compatibility with the original command-line interface while adding the GUI capabilities. All existing scripts and workflows will continue to work as before.
-
PDF Processing
- Large PDFs may require significant memory
- Processing time increases with page count
-
GUI Performance
- Preview generation may slow with very large images
- Resource monitoring adds minimal overhead
-
Planned Enhancements
- Drag-and-drop support
- Batch processing improvements
- Advanced PDF handling options
- Custom OCR region selection
- Export format options
-
Under Consideration
- Multi-language support
- Cloud storage integration
- Automated testing suite
- Plugin system for extensions
Initial release with:
- Command-line interface
- Basic OCR processing
- Text cleaning and combination
- Google Drive API integration
- Basic error handling
- Configuration options
- OCR Processing: Extract text from images using Google Drive API
- Multiple Format Support: JPG, JPEG, PNG, GIF, BMP, TIFF
- Text Cleaning: Automatic removal of metadata and cleaning
- Text Combination: Flexible combining with or without headers
- Batch Processing: Process multiple images efficiently
- Modular Architecture: Clean separation of concerns across 8 modules
- Enhanced CLI: Comprehensive command-line interface with help
- Smart Logging: Colored output with configurable verbosity
- Error Handling: Robust error handling with detailed reporting
- Progress Tracking: Real-time progress indicators
- Clean Output: No duplicate logging messages
- Verbose Control:
--verboseflag for detailed information - File Logging: Optional persistent logging to file
- OAuth Integration: Seamless Google Drive authentication
- Duplicate Detection: Automatic filtering of duplicate files
OCR/
├── images/ # Input images directory
├── raw_texts/ # Raw OCR output directory
├── texts/ # Cleaned OCR output directory
├── __init__.py # Package initialization
├── auth.py # Google Drive authentication and service setup
├── cli.py # Command line interface and argument parsing
├── config.py # Configuration classes and constants
├── credentials.json # Google API credentials (user-provided)
├── logger.py # Logging utilities with colored output
├── main.py # Main entry point
├── ocr_processor.py # Core OCR processing logic
├── PROJECT_STRUCTURE.md # Architecture documentation
├── README.md # User documentation
├── text_processor.py # Text cleaning and combination utilities
└── token.json # OAuth token (auto-generated)
--credentials PATH: Custom credentials file path--no-combine-texts: Skip text combination--combine-raw: Include raw text combination--include-headers: Add file headers to combined output--extensions LIST: Specify supported image formats--verbose: Enable detailed logging--enable-file-logging: Create persistent log files
-
Install Python dependencies:
uv python install 3.11 uv venv --python 3.11 uv sync
-
Set up Google Drive API credentials:
- Follow Google Drive API Quickstart
- Download
credentials.jsonto project directory
-
Run the application:
uv run main.py
- Success Rate: High accuracy OCR processing
- Error Recovery: Graceful handling of API failures
- Memory Efficient: Streaming file processing
- Scalable: Handles multiple files efficiently
- None reported in this initial release
- Python 3.11+
- Google Drive API credentials
- Internet connection for API access
For issues, questions, or contributions, refer to the README.md and PROJECT_STRUCTURE.md documentation.
Release Date: June 17, 2025
Version: 1.0.0
Build: Stable
License: MIT (if applicable)