Receipt OCR test by ian-yeh · Pull Request #184 · McMaster-Solar-Car-Project/purchase-request-site

ian-yeh · 2026-01-22T02:48:26Z

User description

What is this issue for and how does it solve it

Implemented Receipt OCR prototype that parses text from receipt images (.png, .jpg, .pdf)
Integrated Google OCR and Gemini AI to extract and structure receipt information
Created ReceiptData class to organize parsed data in OOP/JSON format

Link to the Github Issue

#94

PR Type

Enhancement, Tests

Description

Implemented Receipt OCR prototype using Google Vision API
Integrated Gemini AI for structured receipt data extraction
Created Pydantic models for receipt data organization
Developed Jupyter notebook demonstrating end-to-end OCR workflow

Diagram Walkthrough

flowchart LR
  A["Receipt Image/PDF"] -- "Google Vision OCR" --> B["Extracted Text"]
  B -- "Gemini 2.5 Flash" --> C["Structured JSON"]
  C -- "Pydantic Models" --> D["ReceiptData Object"]

File Walkthrough

Relevant files

Tests

ocr_demo.ipynb `Complete Receipt OCR implementation notebook` tests/ocr_demo.ipynb Defined `ReceiptItem`, `ReceiptData`, and `ReceiptSummary` Pydantic models for structured receipt parsing Implemented `detect_text()` function to extract text from images and multi-page PDFs using Google Cloud Vision API Created `parse_result()` function to process extracted text through Gemini 2.5 Flash with JSON schema validation Provided demonstration cells for testing OCR workflow on receipt files with performance timing	+309/-0

qodo-code-review · 2026-01-22T02:48:52Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance

⚪

Credential handling

Description: The notebook sets os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "ocr_demo_key.json" which
can lead to accidental committing/usage of a local service account key file and encourages
insecure secret handling if ocr_demo_key.json is present in the repo or shared artifacts.
ocr_demo.ipynb [68-71]

Referred Code

 "load_dotenv()\n",
 "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",
 "WORD = re.compile(r\"\\w+\")"
]

Sensitive data exposure

Description: The notebook prints raw OCR output and structured parsed results (print(result) /
print(output)), which can expose sensitive receipt contents (names, addresses, order
numbers, payment details) in CI logs, shared notebook outputs, or screenshots. ocr_demo.ipynb [210-243]

Referred Code

  "result = detect_text(path)\n",
  "print(result)"
 ]
},
{
 "cell_type": "code",
 "execution_count": null,
 "id": "8fd066ce-04a5-4785-99ce-545490eacb1d",
 "metadata": {},
 "outputs": [],
 "source": [
  "import time\n",
  "\n",
  "start = time.perf_counter()\n",
  "\n",
  "output = parse_result(result)\n",
  "\n",
  "end = time.perf_counter()\n",
  "print(f\"Parsing took {end - start:.4f} seconds\")"
 ]
},


 ... (clipped 13 lines)

Ticket Compliance

🟡

🎫 #94

🟢	Explore a combination of OCR + an LLM to interpret OCR results into desired structured outputs.
🟢	Create Jupyter notebooks demonstrating the models’ capabilities.
🔴	Consider whether receipt parsing is appropriate for the application, including potential taxation implications.
⚪	Investigate using OCR and computer vision models to parse receipts and extract information (as a primary source instead of relying on user input).
⚪	Achieve very high accuracy by identifying the best possible solution and combination of tools.

Codebase Duplication Compliance

⚪

Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance

🔴

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Generic identifiers: Added variables like path, result, and output are overly generic for receipt OCR/parse
outputs and reduce code readability/self-documentation.

Referred Code

  "path = \"./1DigiKey.pdf\""
 ]
},
{
 "cell_type": "code",
 "execution_count": null,
 "id": "480eaff3-615b-4410-a32c-5e3725823d2c",
 "metadata": {
  "collapsed": true,
  "jupyter": {
   "outputs_hidden": true
  }
 },
 "outputs": [],
 "source": [
  "result = detect_text(path)\n",
  "print(result)"
 ]
},
{
 "cell_type": "code",


 ... (clipped 11 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing validations: External inputs and dependencies are not validated (e.g., missing checks for path
existence/type and GEMINI_API_KEY presence) and OCR errors are raised as generic Exception
without structured handling.

Referred Code

  "api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n",
  "client = genai.Client(api_key=api_key)"
 ]
},
{
 "cell_type": "code",
 "execution_count": 15,
 "id": "de95443d-6931-4612-9f37-da8defebab01",
 "metadata": {
  "jupyter": {
   "source_hidden": true
  }
 },
 "outputs": [],
 "source": [
  "def detect_text(path):\n",
  "    \"\"\"\n",
  "    Detects text in a file using Google Cloud Vision OCR. \n",
  "    Handles images and multi-page PDFs by converting PDF pages to images.\n",
  "    \"\"\"\n",
  "    client = vision.ImageAnnotatorClient()\n",


 ... (clipped 86 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Verbose error output: The exception handler prints detailed exception information (type, str, and repr) which
can expose internal details to the notebook user instead of keeping detail in secure logs.

Referred Code

 "def parse_result(receipt_text):    \n",
 "    try:\n",
 "        response = client.models.generate_content(\n",
 "            model=\"gemini-2.5-flash\",\n",
 "            contents=receipt_text,\n",
 "            config={\n",
 "                \"system_instruction\": RECEIPT_PARSER_PROMPT,\n",
 "                \"response_mime_type\": \"application/json\",\n",
 "                \"response_schema\": ReceiptData.model_json_schema(),\n",
 "            }\n",
 "        )\n",
 "        return ReceiptData.model_validate_json(response.text)\n",
 "    except Exception as e:\n",
 "        print(f\"Error type: {type(e).__name__}\")\n",
 "        print(f\"Error message: {str(e)}\")\n",
 "        print(f\"Full error: {repr(e)}\")\n",
 "        raise"
]

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Sensitive data printed: The notebook prints raw OCR text and parsed receipt fields to stdout, which may include
sensitive receipt/PII data and is not structured/redacted logging.

Referred Code

  "result = detect_text(path)\n",
  "print(result)"
 ]
},
{
 "cell_type": "code",
 "execution_count": null,
 "id": "8fd066ce-04a5-4785-99ce-545490eacb1d",
 "metadata": {},
 "outputs": [],
 "source": [
  "import time\n",
  "\n",
  "start = time.perf_counter()\n",
  "\n",
  "output = parse_result(result)\n",
  "\n",
  "end = time.perf_counter()\n",
  "print(f\"Parsing took {end - start:.4f} seconds\")"
 ]
},


 ... (clipped 44 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Hardcoded credentials path: The code hardcodes GOOGLE_APPLICATION_CREDENTIALS to ocr_demo_key.json and does not
validate/safeguard credential handling, increasing risk of insecure secret management and
accidental exposure.

Referred Code

 "load_dotenv()\n",
 "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",
 "WORD = re.compile(r\"\\w+\")"
]

Learn more about managing compliance generic rules or creating your own custom rules

⚪

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No audit trail: The notebook performs external OCR/LLM processing and prints outputs but does not
implement any structured logging/audit trail with user identity, timestamp, action, and
outcome.

Referred Code

"def detect_text(path):\n",
"    \"\"\"\n",
"    Detects text in a file using Google Cloud Vision OCR. \n",
"    Handles images and multi-page PDFs by converting PDF pages to images.\n",
"    \"\"\"\n",
"    client = vision.ImageAnnotatorClient()\n",
"    file_ext = Path(path).suffix.lower()\n",
"    all_text = []\n",
"\n",
"    image_contents = []\n",
"\n",
"    if file_ext == '.pdf':\n",
"        # opening PDF and iterating through all pages\n",
"        pdf_document = fitz.open(path)\n",
"        for page_num in range(len(pdf_document)):\n",
"            page = pdf_document[page_num]\n",
"\n",
"            # convert each page to an image\n",
"            matrix = fitz.Matrix(2, 2)\n",
"            pix = page.get_pixmap(matrix=matrix)\n",
"            image_contents.append(pix.tobytes(\"png\"))\n",


 ... (clipped 114 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2026-01-22T02:49:58Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Prototype requires cost and accuracy analysis Before this prototype is productionized, it is important to perform a cost-benefit analysis of the paid APIs being used. Additionally, a plan to systematically evaluate and ensure high accuracy, as required by the project ticket, should be established. Examples: Solution Walkthrough: Before: `# The current prototype defines a two-step process using paid APIs. def detect_text(path): # ... # For each page in a PDF/image... response = vision_client.document_text_detection(...) # API call 1..N # ... return extracted_text def parse_result(receipt_text): # ... response = gemini_client.generate_content(...) # API call 2 # ... return structured_data # Executed in notebook text = detect_text("receipt.pdf") output = parse_result(text)` After: # The suggestion is about adding analysis, not changing code. # The proposed next steps would involve: # 1. Cost Analysis # - Estimate cost per receipt (Vision calls + Gemini call). # - Project costs based on expected volume. # - Compare with alternative solutions. # 2. Accuracy Evaluation # - Create a validation dataset of receipts with ground truth. # - for receipt in validation_dataset: # parsed_data = run_ocr_pipeline(receipt) # compare(parsed_data, ground_truth) # - Calculate accuracy metrics for key fields (total, date, store). # - Iterate on prompts and models to meet the "VERY accurate" requirement. Suggestion importance[1-10]: 8 __ Why: The suggestion addresses critical non-functional requirements of cost and accuracy, which are essential for moving this prototype to production and are directly tied to the project's goals.	Medium
Security	Avoid hardcoding credential file paths Remove the hardcoded `GOOGLE_APPLICATION_CREDENTIALS` path and add a comment instructing users to set it as an environment variable instead. tests/ocr_demo.ipynb [69] -"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\\n", +"# Make sure to set the GOOGLE_APPLICATION_CREDENTIALS environment variable in your shell before running this notebook.\\n", +"# Example: export GOOGLE_APPLICATION_CREDENTIALS=\"/path/to/your/ocr_demo_key.json\"\\n", Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies a security anti-pattern of hardcoding a credential path and proposes the standard best practice, which improves the notebook's security and portability.	Medium
Possible issue	Prevent resource leaks with context managers Use a `with` statement for opening the PDF with `fitz.open()` to ensure the file is closed automatically, preventing potential resource leaks. tests/ocr_demo.ipynb [106-116] " if file_ext == '.pdf':\n", " # opening PDF and iterating through all pages\n", -" pdf_document = fitz.open(path)\n", -" for page_num in range(len(pdf_document)):\n", -" page = pdf_document[page_num]\n", +" with fitz.open(path) as pdf_document:\n", +" for page_num in range(len(pdf_document)):\n", +" page = pdf_document[page_num]\n", "\n", -" # convert each page to an image\n", -" matrix = fitz.Matrix(2, 2)\n", -" pix = page.get_pixmap(matrix=matrix)\n", -" image_contents.append(pix.tobytes(\"png\"))\n", -" pdf_document.close()\n", +" # convert each page to an image\n", +" matrix = fitz.Matrix(2, 2)\n", +" pix = page.get_pixmap(matrix=matrix)\n", +" image_contents.append(pix.tobytes(\"png\"))\n", Apply / Chat Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies a potential resource leak and proposes using a `with` statement, which is the idiomatic and safer way to handle file resources in Python.	Low
General	Improve clarity when printing model data Replace the loop that iterates over the `output.summary` object with a direct `print(output.summary)` call for a more readable output. tests/ocr_demo.ipynb [283-284] -"for item in output.summary:\n", -" print(item)" +"print(output.summary)" Apply / Chat Suggestion importance[1-10]: 5 __ Why: This suggestion correctly identifies that iterating a Pydantic model is possible and proposes a simpler, more readable way to print the object, which has a smaller code footprint than the alternative suggestion.	Low
Update

Copilot

Pull request overview

This PR implements a Receipt OCR prototype that extracts and structures receipt information from images and PDFs using Google Cloud Vision OCR and Gemini AI. The implementation includes data models for organizing parsed receipt data and a Jupyter notebook demonstrating the functionality.

Changes:

Added Jupyter notebook with OCR text detection and AI-powered receipt parsing
Defined Pydantic models for structured receipt data (items, summary, metadata)
Included sample DigiKey PDF receipt for testing

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 5 comments.

File	Description
tests/ocr_demo.ipynb	Jupyter notebook implementing OCR detection and Gemini AI parsing with data models
tests/1DigiKey.pdf	Sample PDF receipt file for testing the OCR functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-22T02:51:29Z

+    "\n",
+    "import re\n",
+    "import json\n",
+    "import pdfplumber\n",


The import 'pdfplumber' on line 21 is unused throughout the notebook. Consider removing it to keep dependencies minimal.

Suggested change

"import pdfplumber\n",

Copilot · 2026-01-22T02:51:30Z

+    "class ReceiptData(BaseModel):\n",
+    "    store: str\n",
+    "    order_number: Optional[str]\n",
+    "    date: Optional[str]\n",
+    "    currency: str\n",
+    "    items: List[ReceiptItem]\n",
+    "    summary: ReceiptSummary\n",
+    "\n",
+    "class ReceiptSummary(BaseModel):\n",
+    "    number_of_items: int\n",
+    "    subtotal: float\n",
+    "    discount: float\n",
+    "    delivery_fee: float\n",
+    "    service_fee: float\n",
+    "    tax: float\n",
+    "    tip: float\n",
+    "    total: float"


The class ReceiptSummary is referenced before it is defined. ReceiptData uses ReceiptSummary on line 48, but ReceiptSummary is not defined until lines 50-58. In Python, this will cause a NameError at runtime when the class is instantiated.

Copilot · 2026-01-22T02:51:30Z

+   "outputs": [],
+   "source": [
+    "load_dotenv()\n",
+    "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",


Hardcoded API key file path 'ocr_demo_key.json' is set directly in the environment variable. This file path should be documented in a README or configuration guide, and the file itself should be added to .gitignore to prevent accidental commit of credentials. Consider using a more flexible configuration approach that allows for different environments.

Suggested change

"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",

"google_creds = os.getenv('GOOGLE_APPLICATION_CREDENTIALS')\n",

"if not google_creds:\n",

" raise RuntimeError(\n",

" 'GOOGLE_APPLICATION_CREDENTIALS is not set. Please set it to your Google Cloud '\n",

" 'credentials JSON path via environment variable or .env file.'\n",

" )\n",

Copilot · 2026-01-22T02:51:30Z

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n",


The GEMINI_API_KEY is loaded from environment variables without validation. If the key is missing or invalid, the code will fail later during the API call with a less clear error message. Add validation to check if the API key exists and provide a helpful error message if it's missing.

Suggested change

"api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n",

"api_key = os.getenv('GEMINI_API_KEY') # generate gemini api key\n",

"if not api_key:\n",

" raise RuntimeError(\n",

" \"GEMINI_API_KEY environment variable is not set. \"\n",

" \"Please set it (for example in your environment or .env file) before running this notebook.\"\n",

" )\n",

Copilot · 2026-01-22T02:51:30Z

+    "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",
+    "WORD = re.compile(r\"\\w+\")"


The variable WORD is defined but never used in the notebook. Remove unused imports and variables to keep the code clean.

Suggested change

"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",

"WORD = re.compile(r\"\\w+\")"

"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n"

added files

ee6791c

ian-yeh requested a review from rajpandya737 as a code owner January 22, 2026 02:48

Copilot AI review requested due to automatic review settings January 22, 2026 02:48

Copilot started reviewing on behalf of ian-yeh January 22, 2026 02:48 View session

qodo-code-review Bot added the Review effort 3/5 label Jan 22, 2026

Copilot AI reviewed Jan 22, 2026

View reviewed changes

rajpandya737 and others added 2 commits April 3, 2026 19:05

Merge branch 'main' into add-ocr-test

3f62ef4

style: format code with ruff and yamlfmt

b1c5cd4

ian-yeh mentioned this pull request Apr 4, 2026

feat: add receipt_parser library #211

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Receipt OCR test#184

Receipt OCR test#184
ian-yeh wants to merge 3 commits into
mainfrom
add-ocr-test

ian-yeh commented Jan 22, 2026 •

edited by qodo-code-review Bot

Loading

Uh oh!

qodo-code-review Bot commented Jan 22, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented Jan 22, 2026 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Copilot AI Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    "os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",
+    "google_creds = os.getenv('GOOGLE_APPLICATION_CREDENTIALS')\n",
+    "if not google_creds:\n",
+    "    raise RuntimeError(\n",
+    "        'GOOGLE_APPLICATION_CREDENTIALS is not set. Please set it to your Google Cloud '\n",
+    "        'credentials JSON path via environment variable or .env file.'\n",
+    "    )\n",

		"os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'ocr_demo_key.json' # generate this from Google OCR GCP service\n",
		"WORD = re.compile(r\"\\w+\")"

Conversation

ian-yeh commented Jan 22, 2026 • edited by qodo-code-review Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

What is this issue for and how does it solve it

Link to the Github Issue

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review Bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-code-review Bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ian-yeh commented Jan 22, 2026 •

edited by qodo-code-review Bot

Loading

qodo-code-review Bot commented Jan 22, 2026 •

edited

Loading

qodo-code-review Bot commented Jan 22, 2026 •

edited

Loading