Skip to content

Receipt OCR test#184

Open
ian-yeh wants to merge 3 commits into
mainfrom
add-ocr-test
Open

Receipt OCR test#184
ian-yeh wants to merge 3 commits into
mainfrom
add-ocr-test

Conversation

@ian-yeh

@ian-yeh ian-yeh commented Jan 22, 2026

Copy link
Copy Markdown
Collaborator

User description

What is this issue for and how does it solve it

  • Implemented Receipt OCR prototype that parses text from receipt images (.png, .jpg, .pdf)
  • Integrated Google OCR and Gemini AI to extract and structure receipt information
  • Created ReceiptData class to organize parsed data in OOP/JSON format

Link to the Github Issue

#94


PR Type

Enhancement, Tests


Description

  • Implemented Receipt OCR prototype using Google Vision API

  • Integrated Gemini AI for structured receipt data extraction

  • Created Pydantic models for receipt data organization

  • Developed Jupyter notebook demonstrating end-to-end OCR workflow


Diagram Walkthrough

flowchart LR
  A["Receipt Image/PDF"] -- "Google Vision OCR" --> B["Extracted Text"]
  B -- "Gemini 2.5 Flash" --> C["Structured JSON"]
  C -- "Pydantic Models" --> D["ReceiptData Object"]
Loading

File Walkthrough

Relevant files
Tests
ocr_demo.ipynb
Complete Receipt OCR implementation notebook                         

tests/ocr_demo.ipynb

  • Defined ReceiptItem, ReceiptData, and ReceiptSummary Pydantic models
    for structured receipt parsing
  • Implemented detect_text() function to extract text from images and
    multi-page PDFs using Google Cloud Vision API
  • Created parse_result() function to process extracted text through
    Gemini 2.5 Flash with JSON schema validation
  • Provided demonstration cells for testing OCR workflow on receipt files
    with performance timing
+309/-0 

@ian-yeh ian-yeh requested a review from rajpandya737 as a code owner January 22, 2026 02:48
Copilot AI review requested due to automatic review settings January 22, 2026 02:48
@qodo-code-review

qodo-code-review Bot commented Jan 22, 2026

Copy link
Copy Markdown
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Credential handling

Description: The notebook sets os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "ocr_demo_key.json" which
can lead to accidental committing/usage of a local service account key file and encourages
insecure secret handling if ocr_demo_key.json is present in the repo or shared artifacts.
ocr_demo.ipynb [68-71]

Referred Code
 "load_dotenv()\n",
 "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",
 "WORD = re.compile(r\"\\w+\")"
]
Sensitive data exposure

Description: The notebook prints raw OCR output and structured parsed results (print(result) /
print(output)), which can expose sensitive receipt contents (names, addresses, order
numbers, payment details) in CI logs, shared notebook outputs, or screenshots. ocr_demo.ipynb [210-243]

Referred Code
  "result = detect_text(path)\n",
  "print(result)"
 ]
},
{
 "cell_type": "code",
 "execution_count": null,
 "id": "8fd066ce-04a5-4785-99ce-545490eacb1d",
 "metadata": {},
 "outputs": [],
 "source": [
  "import time\n",
  "\n",
  "start = time.perf_counter()\n",
  "\n",
  "output = parse_result(result)\n",
  "\n",
  "end = time.perf_counter()\n",
  "print(f\"Parsing took {end - start:.4f} seconds\")"
 ]
},


 ... (clipped 13 lines)
Ticket Compliance
🟡
🎫 #94
🟢 Explore a combination of OCR + an LLM to interpret OCR results into desired structured
outputs.
Create Jupyter notebooks demonstrating the models’ capabilities.
🔴 Consider whether receipt parsing is appropriate for the application, including potential
taxation implications.
Investigate using OCR and computer vision models to parse receipts and extract information
(as a primary source instead of relying on user input).
Achieve very high accuracy by identifying the best possible solution and combination of
tools.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🔴
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Generic identifiers: Added variables like path, result, and output are overly generic for receipt OCR/parse
outputs and reduce code readability/self-documentation.

Referred Code
  "path = \"./1DigiKey.pdf\""
 ]
},
{
 "cell_type": "code",
 "execution_count": null,
 "id": "480eaff3-615b-4410-a32c-5e3725823d2c",
 "metadata": {
  "collapsed": true,
  "jupyter": {
   "outputs_hidden": true
  }
 },
 "outputs": [],
 "source": [
  "result = detect_text(path)\n",
  "print(result)"
 ]
},
{
 "cell_type": "code",


 ... (clipped 11 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing validations: External inputs and dependencies are not validated (e.g., missing checks for path
existence/type and GEMINI_API_KEY presence) and OCR errors are raised as generic Exception
without structured handling.

Referred Code
  "api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n",
  "client = genai.Client(api_key=api_key)"
 ]
},
{
 "cell_type": "code",
 "execution_count": 15,
 "id": "de95443d-6931-4612-9f37-da8defebab01",
 "metadata": {
  "jupyter": {
   "source_hidden": true
  }
 },
 "outputs": [],
 "source": [
  "def detect_text(path):\n",
  "    \"\"\"\n",
  "    Detects text in a file using Google Cloud Vision OCR. \n",
  "    Handles images and multi-page PDFs by converting PDF pages to images.\n",
  "    \"\"\"\n",
  "    client = vision.ImageAnnotatorClient()\n",


 ... (clipped 86 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Verbose error output: The exception handler prints detailed exception information (type, str, and repr) which
can expose internal details to the notebook user instead of keeping detail in secure logs.

Referred Code
 "def parse_result(receipt_text):    \n",
 "    try:\n",
 "        response = client.models.generate_content(\n",
 "            model=\"gemini-2.5-flash\",\n",
 "            contents=receipt_text,\n",
 "            config={\n",
 "                \"system_instruction\": RECEIPT_PARSER_PROMPT,\n",
 "                \"response_mime_type\": \"application/json\",\n",
 "                \"response_schema\": ReceiptData.model_json_schema(),\n",
 "            }\n",
 "        )\n",
 "        return ReceiptData.model_validate_json(response.text)\n",
 "    except Exception as e:\n",
 "        print(f\"Error type: {type(e).__name__}\")\n",
 "        print(f\"Error message: {str(e)}\")\n",
 "        print(f\"Full error: {repr(e)}\")\n",
 "        raise"
]

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Sensitive data printed: The notebook prints raw OCR text and parsed receipt fields to stdout, which may include
sensitive receipt/PII data and is not structured/redacted logging.

Referred Code
  "result = detect_text(path)\n",
  "print(result)"
 ]
},
{
 "cell_type": "code",
 "execution_count": null,
 "id": "8fd066ce-04a5-4785-99ce-545490eacb1d",
 "metadata": {},
 "outputs": [],
 "source": [
  "import time\n",
  "\n",
  "start = time.perf_counter()\n",
  "\n",
  "output = parse_result(result)\n",
  "\n",
  "end = time.perf_counter()\n",
  "print(f\"Parsing took {end - start:.4f} seconds\")"
 ]
},


 ... (clipped 44 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Hardcoded credentials path: The code hardcodes GOOGLE_APPLICATION_CREDENTIALS to ocr_demo_key.json and does not
validate/safeguard credential handling, increasing risk of insecure secret management and
accidental exposure.

Referred Code
 "load_dotenv()\n",
 "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",
 "WORD = re.compile(r\"\\w+\")"
]

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No audit trail: The notebook performs external OCR/LLM processing and prints outputs but does not
implement any structured logging/audit trail with user identity, timestamp, action, and
outcome.

Referred Code
"def detect_text(path):\n",
"    \"\"\"\n",
"    Detects text in a file using Google Cloud Vision OCR. \n",
"    Handles images and multi-page PDFs by converting PDF pages to images.\n",
"    \"\"\"\n",
"    client = vision.ImageAnnotatorClient()\n",
"    file_ext = Path(path).suffix.lower()\n",
"    all_text = []\n",
"\n",
"    image_contents = []\n",
"\n",
"    if file_ext == '.pdf':\n",
"        # opening PDF and iterating through all pages\n",
"        pdf_document = fitz.open(path)\n",
"        for page_num in range(len(pdf_document)):\n",
"            page = pdf_document[page_num]\n",
"\n",
"            # convert each page to an image\n",
"            matrix = fitz.Matrix(2, 2)\n",
"            pix = page.get_pixmap(matrix=matrix)\n",
"            image_contents.append(pix.tobytes(\"png\"))\n",


 ... (clipped 114 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review

qodo-code-review Bot commented Jan 22, 2026

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Prototype requires cost and accuracy analysis

Before this prototype is productionized, it is important to perform a
cost-benefit analysis of the paid APIs being used. Additionally, a plan to
systematically evaluate and ensure high accuracy, as required by the project
ticket, should be established.

Examples:

Solution Walkthrough:

Before:

# The current prototype defines a two-step process using paid APIs.

def detect_text(path):
    # ...
    # For each page in a PDF/image...
    response = vision_client.document_text_detection(...) # API call 1..N
    # ...
    return extracted_text

def parse_result(receipt_text):
    # ...
    response = gemini_client.generate_content(...) # API call 2
    # ...
    return structured_data

# Executed in notebook
text = detect_text("receipt.pdf")
output = parse_result(text)

After:

# The suggestion is about adding analysis, not changing code.
# The proposed next steps would involve:

# 1. Cost Analysis
# - Estimate cost per receipt (Vision calls + Gemini call).
# - Project costs based on expected volume.
# - Compare with alternative solutions.

# 2. Accuracy Evaluation
# - Create a validation dataset of receipts with ground truth.
# - for receipt in validation_dataset:
#       parsed_data = run_ocr_pipeline(receipt)
#       compare(parsed_data, ground_truth)
# - Calculate accuracy metrics for key fields (total, date, store).
# - Iterate on prompts and models to meet the "VERY accurate" requirement.

Suggestion importance[1-10]: 8

__

Why: The suggestion addresses critical non-functional requirements of cost and accuracy, which are essential for moving this prototype to production and are directly tied to the project's goals.

Medium
Security
Avoid hardcoding credential file paths

Remove the hardcoded GOOGLE_APPLICATION_CREDENTIALS path and add a comment
instructing users to set it as an environment variable instead.

tests/ocr_demo.ipynb [69]

-"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\\n",
+"# Make sure to set the GOOGLE_APPLICATION_CREDENTIALS environment variable in your shell before running this notebook.\\n",
+"# Example: export GOOGLE_APPLICATION_CREDENTIALS=\"/path/to/your/ocr_demo_key.json\"\\n",
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a security anti-pattern of hardcoding a credential path and proposes the standard best practice, which improves the notebook's security and portability.

Medium
Possible issue
Prevent resource leaks with context managers

Use a with statement for opening the PDF with fitz.open() to ensure the file is
closed automatically, preventing potential resource leaks.

tests/ocr_demo.ipynb [106-116]

 "    if file_ext == '.pdf':\n",
 "        # opening PDF and iterating through all pages\n",
-"        pdf_document = fitz.open(path)\n",
-"        for page_num in range(len(pdf_document)):\n",
-"            page = pdf_document[page_num]\n",
+"        with fitz.open(path) as pdf_document:\n",
+"            for page_num in range(len(pdf_document)):\n",
+"                page = pdf_document[page_num]\n",
 "\n",
-"            # convert each page to an image\n",
-"            matrix = fitz.Matrix(2, 2)\n",
-"            pix = page.get_pixmap(matrix=matrix)\n",
-"            image_contents.append(pix.tobytes(\"png\"))\n",
-"        pdf_document.close()\n",
+"                # convert each page to an image\n",
+"                matrix = fitz.Matrix(2, 2)\n",
+"                pix = page.get_pixmap(matrix=matrix)\n",
+"                image_contents.append(pix.tobytes(\"png\"))\n",
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a potential resource leak and proposes using a with statement, which is the idiomatic and safer way to handle file resources in Python.

Low
General
Improve clarity when printing model data

Replace the loop that iterates over the output.summary object with a direct
print(output.summary) call for a more readable output.

tests/ocr_demo.ipynb [283-284]

-"for item in output.summary:\n",
-"    print(item)"
+"print(output.summary)"
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: This suggestion correctly identifies that iterating a Pydantic model is possible and proposes a simpler, more readable way to print the object, which has a smaller code footprint than the alternative suggestion.

Low
  • Update

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a Receipt OCR prototype that extracts and structures receipt information from images and PDFs using Google Cloud Vision OCR and Gemini AI. The implementation includes data models for organizing parsed receipt data and a Jupyter notebook demonstrating the functionality.

Changes:

  • Added Jupyter notebook with OCR text detection and AI-powered receipt parsing
  • Defined Pydantic models for structured receipt data (items, summary, metadata)
  • Included sample DigiKey PDF receipt for testing

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 5 comments.

File Description
tests/ocr_demo.ipynb Jupyter notebook implementing OCR detection and Gemini AI parsing with data models
tests/1DigiKey.pdf Sample PDF receipt file for testing the OCR functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/ocr_demo.ipynb
"\n",
"import re\n",
"import json\n",
"import pdfplumber\n",

Copilot AI Jan 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import 'pdfplumber' on line 21 is unused throughout the notebook. Consider removing it to keep dependencies minimal.

Suggested change
"import pdfplumber\n",

Copilot uses AI. Check for mistakes.
Comment thread tests/ocr_demo.ipynb
Comment on lines +42 to +58
"class ReceiptData(BaseModel):\n",
" store: str\n",
" order_number: Optional[str]\n",
" date: Optional[str]\n",
" currency: str\n",
" items: List[ReceiptItem]\n",
" summary: ReceiptSummary\n",
"\n",
"class ReceiptSummary(BaseModel):\n",
" number_of_items: int\n",
" subtotal: float\n",
" discount: float\n",
" delivery_fee: float\n",
" service_fee: float\n",
" tax: float\n",
" tip: float\n",
" total: float"

Copilot AI Jan 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class ReceiptSummary is referenced before it is defined. ReceiptData uses ReceiptSummary on line 48, but ReceiptSummary is not defined until lines 50-58. In Python, this will cause a NameError at runtime when the class is instantiated.

Copilot uses AI. Check for mistakes.
Comment thread tests/ocr_demo.ipynb Outdated
"outputs": [],
"source": [
"load_dotenv()\n",
"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",

Copilot AI Jan 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded API key file path 'ocr_demo_key.json' is set directly in the environment variable. This file path should be documented in a README or configuration guide, and the file itself should be added to .gitignore to prevent accidental commit of credentials. Consider using a more flexible configuration approach that allows for different environments.

Suggested change
"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",
"google_creds = os.getenv('GOOGLE_APPLICATION_CREDENTIALS')\n",
"if not google_creds:\n",
" raise RuntimeError(\n",
" 'GOOGLE_APPLICATION_CREDENTIALS is not set. Please set it to your Google Cloud '\n",
" 'credentials JSON path via environment variable or .env file.'\n",
" )\n",

Copilot uses AI. Check for mistakes.
Comment thread tests/ocr_demo.ipynb Outdated
"metadata": {},
"outputs": [],
"source": [
"api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n",

Copilot AI Jan 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GEMINI_API_KEY is loaded from environment variables without validation. If the key is missing or invalid, the code will fail later during the API call with a less clear error message. Add validation to check if the API key exists and provide a helpful error message if it's missing.

Suggested change
"api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n",
"api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n",
"if not api_key:\n",
" raise RuntimeError(\n",
" \"GEMINI_API_KEY environment variable is not set. \"\n",
" \"Please set it (for example in your environment or .env file) before running this notebook.\"\n",
" )\n",

Copilot uses AI. Check for mistakes.
Comment thread tests/ocr_demo.ipynb Outdated
Comment on lines +69 to +70
"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",
"WORD = re.compile(r\"\\w+\")"

Copilot AI Jan 22, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable WORD is defined but never used in the notebook. Remove unused imports and variables to keep the code clean.

Suggested change
"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",
"WORD = re.compile(r\"\\w+\")"
"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants