Skip to content

feat(optimizer): [3/N] Analyzer app#533

Draft
mkuchenbecker wants to merge 2 commits intomkuchenb/optimizer-2from
mkuchenb/optimizer-3
Draft

feat(optimizer): [3/N] Analyzer app#533
mkuchenbecker wants to merge 2 commits intomkuchenb/optimizer-2from
mkuchenb/optimizer-3

Conversation

@mkuchenbecker
Copy link
Copy Markdown
Collaborator

@mkuchenbecker mkuchenbecker commented Apr 7, 2026

Optimizer Stack

PR Content
#527 Data Model
#530 Database Repos
#531 REST service
#533 (this) Analyzer app
#534 Scheduler app
#tbd Spark BatchedOFD app
#tbd Infra, docker-compose, smoke test

Summary

PR 3 of N in the optimizer stack.
Overall Project
Service Design doc.

Introduces apps/optimizer-analyzer, a Spring Boot CommandLineRunner that evaluates every table in table_stats against pluggable OperationAnalyzer strategies. The first strategy, OrphanFilesDeletionAnalyzer, schedules OFD operations with 24h success / 1h failure retry cadence, a 6h SCHEDULED timeout, and a 5-strike circuit breaker.

Key design choices:

  • Bulk-loads operations and history into maps (one query per type), then iterates the stats list — O(types) queries, not O(tables).
  • Uses the existing generic find() repository methods with null params.
  • Pure unit tests with Mockito — no Spring context needed.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

Core: AnalyzerRunner — loads table_stats, pre-loads operations and history into maps, evaluates each table against all analyzers, circuit breaker logic.

Strategy interface: OperationAnalyzerisEnabled(table), shouldSchedule(table, currentOp, latestHistory), getCircuitBreakerThreshold().

Cadence policy: CadencePolicy — encapsulates time-based retry logic shared across operation types.

OFD analyzer: OrphanFilesDeletionAnalyzer — enabled via maintenance.optimizer.ofd.enabled table property.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

25 unit tests:

  • AnalyzerRunnerTest (7 tests) — eligible table insertion, cadence skip, disabled table, shouldSchedule=false, null UUID, circuit breaker trip, below-threshold pass
  • OrphanFilesDeletionAnalyzerTest (18 tests) — isEnabled variants, shouldSchedule for no-op/PENDING/SCHEDULING/SCHEDULED with history combinations
./gradlew :apps:optimizer-analyzer:test
# BUILD SUCCESSFUL — 25 tests pass

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

…ling

Introduces apps/optimizer-analyzer, a Spring Boot CommandLineRunner that
evaluates every table in table_stats against pluggable OperationAnalyzer
strategies. The first strategy, OrphanFilesDeletionAnalyzer, schedules
OFD operations with 24h success / 1h failure retry cadence, a 6h
SCHEDULED timeout, and a 5-strike circuit breaker.

Key design choices:
- Bulk-loads operations and history into maps (one query per type),
  then iterates the stats list — O(types) queries, not O(tables).
- Uses the existing generic find() repository methods with null params.
- Pure unit tests with Mockito — no Spring context needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename TableSummary to Table, TableOperationRecord to TableOperation
- Add Table.from(TableStatsRow) and TableOperation.from(TableOperationRow)
- Add TableOperation.pending(Table, type) factory and toRow() for JPA
- Move circuit breaker check into OperationAnalyzer as overridable default
- Parameterize analyze() with optional filters (optype, db, table, uuid)
- Inline loadOpsMap, loadHistoryMap, remove standalone converter methods
- Expand CadencePolicy field javadoc with plain-english examples
- Add TODOs: per-db iteration, benchmarking, querybuilder, CB reset

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
.collect(Collectors.toList());

// Pre-load the small sides of the joins — one query per analyzer type.
// TODO: Move to a query builder (Criteria API or jOOQ) as filter count grows.
Copy link
Copy Markdown
Collaborator Author

@mkuchenbecker mkuchenbecker Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this todo or move it to where we have a null,null,null when querying the history repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant