Skip to content

Latest commit

 

History

History
265 lines (208 loc) · 8.71 KB

File metadata and controls

265 lines (208 loc) · 8.71 KB

OCR Text Extractor - Changelog

Version 1.1.0 (Latest)

Major New Features

1. Graphical User Interface (GUI)

  • Added a modern, user-friendly GUI interface using customtkinter
  • Features include:
    • Theme switching (Light/Dark/System)
    • Real-time preview of processed pages
    • Live scanning animation during OCR
    • Resource monitoring (CPU, Memory, Threads)
    • Progress tracking with time estimates
    • Detailed processing log
    • Pause/Cancel functionality
    • PDF file support with automatic image conversion

2. PDF Processing Capabilities

  • Direct PDF file processing support
  • Automatic conversion of PDF pages to images
  • Maintains original PDF structure in output
  • Creates corpus documents with metadata
  • Intelligent page naming and organization

3. Enhanced Resource Management

  • Real-time system resource monitoring
  • Automatic cleanup of temporary files
  • Image caching for better performance
  • Memory-efficient processing of large files

Technical Improvements

1. Code Quality

  • Added comprehensive type hints
  • Enhanced documentation with detailed docstrings
  • Removed dead code and unused functions
  • Improved error handling and recovery

2. User Experience

  • Real-time processing feedback
  • Progress bars for both PDF conversion and OCR
  • Estimated time remaining calculations
  • File information display
  • Status messages with emoji indicators

3. Architecture Enhancements

  • Modular design with clear separation of concerns
  • New classes:
    • ResourceMonitor: System resource tracking
    • ScanAnimation: Visual processing feedback
    • OCRApp: Main GUI application
  • Better state management and cleanup

File Structure Updates

OCR/
├── images/               # Input images directory
├── raw_texts/            # Raw OCR output directory
├── texts/                # Cleaned OCR output directory
├── __init__.py            # Package initialization
├── auth.py                # Google Drive authentication and service setup
├── cli.py                 # Command line interface and argument parsing
├── config.py              # Configuration classes and constants
├── credentials.json       # Google API credentials (user-provided)
├── logger.py              # Logging utilities with colored output
├── main.py                 # Main entry point
├── ocr_processor.py       # Core OCR processing logic
├── PROJECT_STRUCTURE.md   # Architecture documentation
├── README.md              # User documentation
├── text_processor.py      # Text cleaning and combination utilities
├── gui.py                # New GUI interface
└── token.json            # OAuth token (auto-generated)
└── output/              # Organized output directory
    └── {pdf_name}/     # Separate directory for each PDF
        ├── page_*.jpg  # Extracted pages
        └── corpus.txt  # Combined output with metadata

Changes from Original Design

  1. Processing Flow

    • Original: Single image files → OCR → Text files
    • New: PDF files → Image conversion → OCR → Corpus creation
  2. Output Organization

    • Original: Flat directory structure
    • New: Hierarchical organization by PDF source
  3. User Interaction

    • Original: Command-line only
    • New: Both GUI and CLI interfaces
  4. Progress Tracking

    • Original: Basic console output
    • New: Visual progress bars, time estimates, preview

Technical Details

GUI Features

  • Fixed-size preview panel (400x500 pixels)
  • Responsive layout with grid system
  • Theme-aware interface elements
  • Resource-efficient image handling
  • Automatic aspect ratio maintenance
  • Smooth scanning animation

Processing Enhancements

  • Parallel processing capabilities
  • Automatic error recovery
  • Session persistence
  • Improved Google API authentication flow
  • Better memory management

New Configuration Options

  • Theme selection
  • Output directory customization
  • Processing pause/resume
  • Preview options
  • Resource monitoring toggles

Migration Notes

The new version maintains full compatibility with the original command-line interface while adding the GUI capabilities. All existing scripts and workflows will continue to work as before.

Known Limitations

  1. PDF Processing

    • Large PDFs may require significant memory
    • Processing time increases with page count
  2. GUI Performance

    • Preview generation may slow with very large images
    • Resource monitoring adds minimal overhead

Future Plans

  1. Planned Enhancements

    • Drag-and-drop support
    • Batch processing improvements
    • Advanced PDF handling options
    • Custom OCR region selection
    • Export format options
  2. Under Consideration

    • Multi-language support
    • Cloud storage integration
    • Automated testing suite
    • Plugin system for extensions

Version 1.0.0 (Previous)

Initial release with:

  • Command-line interface
  • Basic OCR processing
  • Text cleaning and combination
  • Google Drive API integration
  • Basic error handling
  • Configuration options

🚀 Key Features

Core Functionality

  • OCR Processing: Extract text from images using Google Drive API
  • Multiple Format Support: JPG, JPEG, PNG, GIF, BMP, TIFF
  • Text Cleaning: Automatic removal of metadata and cleaning
  • Text Combination: Flexible combining with or without headers
  • Batch Processing: Process multiple images efficiently

Technical Features

  • Modular Architecture: Clean separation of concerns across 8 modules
  • Enhanced CLI: Comprehensive command-line interface with help
  • Smart Logging: Colored output with configurable verbosity
  • Error Handling: Robust error handling with detailed reporting
  • Progress Tracking: Real-time progress indicators

User Experience

  • Clean Output: No duplicate logging messages
  • Verbose Control: --verbose flag for detailed information
  • File Logging: Optional persistent logging to file
  • OAuth Integration: Seamless Google Drive authentication
  • Duplicate Detection: Automatic filtering of duplicate files

📁 Project Structure

OCR/
├── images/               # Input images directory
├── raw_texts/            # Raw OCR output directory
├── texts/                # Cleaned OCR output directory
├── __init__.py            # Package initialization
├── auth.py                # Google Drive authentication and service setup
├── cli.py                 # Command line interface and argument parsing
├── config.py              # Configuration classes and constants
├── credentials.json       # Google API credentials (user-provided)
├── logger.py              # Logging utilities with colored output
├── main.py                 # Main entry point
├── ocr_processor.py       # Core OCR processing logic
├── PROJECT_STRUCTURE.md   # Architecture documentation
├── README.md              # User documentation
├── text_processor.py      # Text cleaning and combination utilities
└── token.json            # OAuth token (auto-generated)

🎯 Command Line Options

  • --credentials PATH: Custom credentials file path
  • --no-combine-texts: Skip text combination
  • --combine-raw: Include raw text combination
  • --include-headers: Add file headers to combined output
  • --extensions LIST: Specify supported image formats
  • --verbose: Enable detailed logging
  • --enable-file-logging: Create persistent log files

🔧 Installation & Setup

  1. Install Python dependencies:

    uv python install 3.11
    uv venv --python 3.11
    uv sync
  2. Set up Google Drive API credentials:

  3. Run the application:

    uv run main.py

📊 Performance & Reliability

  • Success Rate: High accuracy OCR processing
  • Error Recovery: Graceful handling of API failures
  • Memory Efficient: Streaming file processing
  • Scalable: Handles multiple files efficiently

🐛 Known Issues

  • None reported in this initial release

📋 Requirements

  • Python 3.11+
  • Google Drive API credentials
  • Internet connection for API access

📞 Support

For issues, questions, or contributions, refer to the README.md and PROJECT_STRUCTURE.md documentation.


Release Date: June 17, 2025
Version: 1.0.0
Build: Stable
License: MIT (if applicable)