Receipt OCR test#184
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR implements a Receipt OCR prototype that extracts and structures receipt information from images and PDFs using Google Cloud Vision OCR and Gemini AI. The implementation includes data models for organizing parsed receipt data and a Jupyter notebook demonstrating the functionality.
Changes:
- Added Jupyter notebook with OCR text detection and AI-powered receipt parsing
- Defined Pydantic models for structured receipt data (items, summary, metadata)
- Included sample DigiKey PDF receipt for testing
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tests/ocr_demo.ipynb | Jupyter notebook implementing OCR detection and Gemini AI parsing with data models |
| tests/1DigiKey.pdf | Sample PDF receipt file for testing the OCR functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "\n", | ||
| "import re\n", | ||
| "import json\n", | ||
| "import pdfplumber\n", |
There was a problem hiding this comment.
The import 'pdfplumber' on line 21 is unused throughout the notebook. Consider removing it to keep dependencies minimal.
| "import pdfplumber\n", |
| "class ReceiptData(BaseModel):\n", | ||
| " store: str\n", | ||
| " order_number: Optional[str]\n", | ||
| " date: Optional[str]\n", | ||
| " currency: str\n", | ||
| " items: List[ReceiptItem]\n", | ||
| " summary: ReceiptSummary\n", | ||
| "\n", | ||
| "class ReceiptSummary(BaseModel):\n", | ||
| " number_of_items: int\n", | ||
| " subtotal: float\n", | ||
| " discount: float\n", | ||
| " delivery_fee: float\n", | ||
| " service_fee: float\n", | ||
| " tax: float\n", | ||
| " tip: float\n", | ||
| " total: float" |
There was a problem hiding this comment.
The class ReceiptSummary is referenced before it is defined. ReceiptData uses ReceiptSummary on line 48, but ReceiptSummary is not defined until lines 50-58. In Python, this will cause a NameError at runtime when the class is instantiated.
| "outputs": [], | ||
| "source": [ | ||
| "load_dotenv()\n", | ||
| "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n", |
There was a problem hiding this comment.
Hardcoded API key file path 'ocr_demo_key.json' is set directly in the environment variable. This file path should be documented in a README or configuration guide, and the file itself should be added to .gitignore to prevent accidental commit of credentials. Consider using a more flexible configuration approach that allows for different environments.
| "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n", | |
| "google_creds = os.getenv('GOOGLE_APPLICATION_CREDENTIALS')\n", | |
| "if not google_creds:\n", | |
| " raise RuntimeError(\n", | |
| " 'GOOGLE_APPLICATION_CREDENTIALS is not set. Please set it to your Google Cloud '\n", | |
| " 'credentials JSON path via environment variable or .env file.'\n", | |
| " )\n", |
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n", |
There was a problem hiding this comment.
The GEMINI_API_KEY is loaded from environment variables without validation. If the key is missing or invalid, the code will fail later during the API call with a less clear error message. Add validation to check if the API key exists and provide a helpful error message if it's missing.
| "api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n", | |
| "api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n", | |
| "if not api_key:\n", | |
| " raise RuntimeError(\n", | |
| " \"GEMINI_API_KEY environment variable is not set. \"\n", | |
| " \"Please set it (for example in your environment or .env file) before running this notebook.\"\n", | |
| " )\n", |
| "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n", | ||
| "WORD = re.compile(r\"\\w+\")" |
There was a problem hiding this comment.
The variable WORD is defined but never used in the notebook. Remove unused imports and variables to keep the code clean.
| "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n", | |
| "WORD = re.compile(r\"\\w+\")" | |
| "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n" |
User description
What is this issue for and how does it solve it
Link to the Github Issue
#94
PR Type
Enhancement, Tests
Description
Implemented Receipt OCR prototype using Google Vision API
Integrated Gemini AI for structured receipt data extraction
Created Pydantic models for receipt data organization
Developed Jupyter notebook demonstrating end-to-end OCR workflow
Diagram Walkthrough
File Walkthrough
ocr_demo.ipynb
Complete Receipt OCR implementation notebooktests/ocr_demo.ipynb
ReceiptItem,ReceiptData, andReceiptSummaryPydantic modelsfor structured receipt parsing
detect_text()function to extract text from images andmulti-page PDFs using Google Cloud Vision API
parse_result()function to process extracted text throughGemini 2.5 Flash with JSON schema validation
with performance timing