Skip to content

CSV import #118

@joaopcm

Description

@joaopcm

CSV Import Implementation Plan

Overview

Add CSV import alongside existing PDF pipeline. For CSV: dialog opens immediately after upload, AI generates questions synchronously, user answers inline, then processing starts. Handles unhappy path where user closes browser mid-flow.

Flow Diagram

PDF Upload → process mutation → upload-breakdown task → ...existing flow

CSV Upload → upload to Supabase
                ↓
    createCsvUpload mutation (creates record with waiting_for_csv_answers status)
                ↓
    ┌─────────── dialog opens immediately ───────────┐
    │                                                 │
    │   analyzeCsv mutation → AI generates questions │
    │                    ↓                           │
    │           User answers in dialog               │
    │                    ↓                           │
    │   submitCsvAnswers mutation → csv-breakdown    │
    └─────────────────────────────────────────────────┘

If user closes browser before answering:
    → Upload stays at waiting_for_csv_answers
    → User sees "Answer" button on /uploads page
    → Clicking opens same dialog to continue

Key difference: CSV analysis happens synchronously in tRPC, not as a background task.


1. Database Changes

1.1 New upload status

src/db/schema.ts - Add to uploadStatusEnum:

"waiting_for_csv_answers"

1.2 New file type enum + column

src/db/schema.ts:

export const fileTypeEnum = pgEnum("file_type", ["pdf", "csv"])

// In upload table:
fileType: fileTypeEnum("file_type").notNull().default("pdf")

1.3 Extend UploadMetadata

src/db/schema.ts:

export interface CsvQuestion {
  id: string
  type: 'text' | 'select' | 'date' | 'boolean'
  label: string
  description?: string
  options?: string[]
  required: boolean
  defaultValue?: string
}

export interface CsvConfig {
  questions?: CsvQuestion[]  // stored after analyzeCsv
  answers?: Record<string, string>  // stored after user submits
  inferredMapping?: {
    dateColumn?: string
    merchantColumn?: string
    amountColumn?: string
    descriptionColumn?: string
  }
}

export interface UploadMetadata {
  // existing fields...
  csvConfig?: CsvConfig  // only populated for CSV uploads
}

1.4 Update constants

src/constants/uploads.ts:

export const CANCELLABLE_STATUSES: Status[] = [
  "queued",
  "processing",
  "waiting_for_password",
  "waiting_for_csv_answers",  // NEW
]

2. tRPC Router Changes

2.1 src/server/routers/uploads.ts

New: createCsvUpload mutation
Create upload record immediately after file upload to Supabase.

createCsvUpload: protectedProcedure
  .input(z.object({
    fileName: z.string(),
    filePath: z.string(),
    fileSize: z.number(),
  }))
  .mutation(async ({ input, ctx }) => {
    // Create upload record with:
    //   - status: waiting_for_csv_answers
    //   - metadata: { fileType: 'csv' }
    // Return uploadId
  })

New: analyzeCsv mutation
Synchronous AI call to generate questions.

analyzeCsv: protectedProcedure
  .input(z.object({
    uploadId: z.string().uuid(),
  }))
  .mutation(async ({ input, ctx }) => {
    // 1. Get upload record (verify status = waiting_for_csv_answers)
    // 2. Download CSV from Supabase (first 50 rows)
    // 3. Call AI (generateObject) to analyze and generate questions
    // 4. Store questions + inferredMapping in metadata.csvConfig
    // 5. Return { questions, preview, inferredMapping }
  })

Returns:

{
  questions: CsvQuestion[]
  preview: {
    headers: string[]
    sampleRows: string[][]
  }
  inferredMapping: {
    dateColumn?: string
    merchantColumn?: string
    amountColumn?: string
    descriptionColumn?: string
  }
}

New: submitCsvAnswers mutation
Store answers and trigger processing.

submitCsvAnswers: protectedProcedure
  .input(z.object({
    uploadId: z.string().uuid(),
    answers: z.record(z.string(), z.string()),
  }))
  .mutation(async ({ input, ctx }) => {
    // 1. Get upload record (verify status = waiting_for_csv_answers)
    // 2. Update metadata.csvConfig.answers
    // 3. Set status to "queued"
    // 4. Trigger csv-breakdown task
  })

3. Trigger Tasks

3.1 New: src/trigger/ai/csv-breakdown.ts

Orchestrate CSV processing.

export const csvBreakdownTask = task({
  id: "csv-breakdown",
  run: async (payload: { uploadId: string }) => {
    // 1. Set status to "processing"
    // 2. Call extractTransactionsCsvTask.triggerAndWait()
    // 3. Call categorizeAndImportTransactionsTask.trigger() (reused)
    // 4. Delete CSV file from storage
  },
  catchError: // same pattern as upload-breakdown
})

3.2 New: src/trigger/ai/extract-transactions-csv.ts

Extract transactions from CSV using AI.

export const extractTransactionsCsvTask = task({
  id: "extract-transactions-csv",
  run: async (payload: { uploadId: string }) => {
    // 1. Get upload with csvConfig (answers + mapping)
    // 2. Download full CSV from Supabase
    // 3. Build prompt with user answers as context
    // 4. Call AI with full CSV text
    // 5. Return same ExtractedTransaction[] format as PDF
  },
})

AI receives:

  • Full CSV content as text
  • User answers (bank, currency, document type, date format)
  • Column mapping

Output: Same ExtractedTransaction[] schema as extract-transactions.ts


4. Frontend Changes

4.1 src/components/logged-in/nav-main.tsx

Change file input:

accept=".pdf,.csv"

Modify upload flow:

// After successful upload to Supabase
if (isCsvFile(file)) {
  // 1. Call createCsvUpload to create record
  // 2. Open CSV config dialog with uploadId
  setCsvDialogOpen(true)
  setCsvUploadId(uploadId)
} else {
  // Existing PDF flow
  processUploads({ files: [...] })
}

4.2 New: src/components/logged-in/uploads/csv-config-dialog.tsx

Props:

interface CsvConfigDialogProps {
  uploadId: string
  fileName: string
  onSuccess?: () => void
  children?: React.ReactNode  // trigger button for upload-item usage
}

Dialog flow:

  1. Opens (from nav-main after upload OR from upload-item button)
  2. Checks if questions already exist in metadata (for retry case)
  3. If no questions: calls analyzeCsv mutation (shows loading)
  4. Renders dynamic form (react-hook-form + zod)
  5. Shows CSV preview table (headers + 3-5 rows)
  6. User fills answers
  7. Submit calls submitCsvAnswers mutation
  8. Success → close dialog, invalidate uploads

Form field rendering by question type:

  • text → Input
  • select → Select with options
  • date → Input (text pattern for format like "MM/DD/YYYY")
  • boolean → Switch/Checkbox

4.3 src/components/logged-in/uploads/upload-item.tsx

Add handling for waiting_for_csv_answers status:

{upload.status === "waiting_for_csv_answers" && (
  <CsvConfigDialog uploadId={upload.id} fileName={upload.fileName}>
    <Button size="sm">Answer</Button>
  </CsvConfigDialog>
)}

4.4 src/components/logged-in/uploads/status-badge.tsx

Add new status config:

waiting_for_csv_answers: {
  label: "Awaiting answers",
  variant: "outline",
  icon: IconQuestionMark,  // or similar
}

5. AI Prompts

5.1 Analyze CSV Prompt (tRPC mutation)

Analyze this CSV bank statement export and:

1. INFER COLUMN MAPPING: Identify which columns contain:
   - Date (and likely format)
   - Merchant/Description
   - Amount
   - Optional: separate credit/debit columns

2. GENERATE QUESTIONS: Create minimal questions for context you cannot infer.
   Only ask what's truly needed for accurate extraction.

   Important context for extraction:
   - Bank name (helps identify merchant patterns)
   - Currency (critical if not in CSV)
   - Document type (checking/savings/credit card)
   - Date format (only if ambiguous between MM/DD and DD/MM)

## CSV Headers
{headers}

## Sample Rows (first 20)
{rows}

5.2 Extract Transactions CSV Prompt

Same as PDF extract-transactions.ts but:

  • Input is CSV text instead of images
  • User answers injected as context header
  • Column mapping provided

Files to Create

File Purpose
src/trigger/ai/csv-breakdown.ts Orchestrate CSV processing
src/trigger/ai/extract-transactions-csv.ts AI extraction from CSV
src/components/logged-in/uploads/csv-config-dialog.tsx Dynamic form dialog for answers

Files to Modify

File Changes
src/db/schema.ts New status enum + CsvConfig/CsvQuestion types
src/server/routers/uploads.ts New createCsvUpload + analyzeCsv + submitCsvAnswers mutations
src/components/logged-in/nav-main.tsx Accept .csv, open dialog for CSV
src/components/logged-in/uploads/upload-item.tsx "Answer" button for waiting_for_csv_answers
src/components/logged-in/uploads/status-badge.tsx New status badge
src/constants/uploads.ts Add to CANCELLABLE_STATUSES

Verification

  1. Happy path: Upload CSV → dialog opens → answer questions → submit → transactions extracted
  2. Unhappy path: Upload CSV → close browser → reopen → see "Answer" button → click → complete flow
  3. Cancel works for waiting_for_csv_answers status
  4. Same categorization pipeline as PDF
  5. CSV deleted after extraction

Constraints

  • CSV file size limit: 1MiB (same validation as PDF)
  • Multiple currencies: Each transaction has its own currency field (same as PDF)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions