Skip to content

Clean up metrics: dead code, flat maxDepth fix, remove CSV fallback#277

Open
alexskr wants to merge 8 commits intodevelopfrom
fix/metrics-cleanup
Open

Clean up metrics: dead code, flat maxDepth fix, remove CSV fallback#277
alexskr wants to merge 8 commits intodevelopfrom
fix/metrics-cleanup

Conversation

@alexskr
Copy link
Copy Markdown
Member

@alexskr alexskr commented Apr 11, 2026

Summary

  • Remove dead metrics_for_submission left behind after refactoring to SubmissionMetricsCalculator
  • Remove unused require 'csv' from metrics/metrics.rb
  • Mark recursive_depth as potentially unused (TODO comment)
  • Add unit test for class_count returning -1 when metrics are absent
  • Remove error-swallowing rescue from metrics_for_submission in SubmissionMetricsCalculator — the inner rescue masked root cause errors (e.g., SPARQL failures surfaced as NoMethodError on nil instead of the real error)
  • Fix maxDepth returning non-zero for flat ontologies — max_depth_fn was reading from CSV without checking the flat flag
  • Remove metrics.csv fallback from OntologySubmission#class_count — the CSV file should only be read during ontology processing (ncbo_cron), not by the API at query time
  • Skip query_groupby_classes SPARQL call for flat ontologies — was called with nil predicate, producing invalid SPARQL that failed on GraphDB

Closes #276

@alexskr alexskr marked this pull request as draft April 11, 2026 04:30
@alexskr alexskr force-pushed the fix/metrics-cleanup branch from a8fbe74 to c45993b Compare April 11, 2026 05:24
alexskr added 8 commits April 10, 2026 22:26
This method was moved to SubmissionMetricsCalculator during a prior
refactoring but the original copy was left behind. No callers exist
in this repo or dependent projects (ontologies_api, ncbo_cron,
ncbo_annotator).
No CSV usage remains in this file after removal of metrics_for_submission.
The csv library is still required by ontology_submission.rb and
submission_mertrics_calculator.rb where it is needed.
No external callers found in this repo or dependent projects.
Keeping the method for now pending further validation.
Verifies that class_count returns -1 gracefully when no metrics
exist in the triplestore and no CSV fallback is available.
The inner rescue in metrics_for_submission caught errors, logged a
minimal message, and returned nil. This masked the real error —
the caller (compute_metrics) would then fail with NoMethodError on
nil, and the outer rescue in process_metrics would log that
misleading error instead of the root cause.

process_metrics already handles errors properly: logs the real
exception with full backtrace and sets the METRICS error status.
The inner rescue was redundant and harmful.
max_depth_fn was reading maxDepth from the CSV file generated by
owlapi_wrapper regardless of the flat flag. owlapi_wrapper has no
knowledge of BioPortal's flat designation, so it reports the real
tree depth. Now we short-circuit and return 0 for flat ontologies
before any CSV or SPARQL calculation.
class_count was falling back to reading metrics.csv from disk when
triplestore metrics were absent. This caused errors on API nodes
where the file does not exist or is missing for older submissions.
The API should always read metrics from the triplestore. The CSV
file should only be consumed during ontology parsing in ncbo_cron.
query_groupby_classes was called with rdfsSC=nil for flat ontologies,
producing invalid SPARQL (<> predicate). This was silently tolerated
by 4store but caused a SPARQL::Client::MalformedQuery error on
GraphDB, preventing the metrics status from being set.

The groupby_children results were already unused for flat ontologies
(the loop body was guarded by `unless is_flat`), so the query was
wasteful even when it didn't error. Moved the entire block inside
the `unless is_flat` guard.
@alexskr alexskr force-pushed the fix/metrics-cleanup branch from c45993b to 3ffa994 Compare April 11, 2026 05:28
@alexskr alexskr marked this pull request as ready for review April 13, 2026 04:39
@alexskr alexskr requested a review from mdorf April 13, 2026 04:39
@jvendetti
Copy link
Copy Markdown
Member

I disagree with this change specifically:

Fix maxDepth returning non-zero for flat ontologies

I don’t think non-zero maxDepth for a flat ontology is inherently wrong, because flat and maxDepth are different concepts in the codebase.

flat is a boolean flag used to indicate that an ontology should be treated as flat for browsing/UI purposes. We've set that manually on a case-by-case basis for ontologies where displaying a tree control wasn't performant. It's not a statement that the ontology literally has no hierarchy.

maxDepth is a hierarchy metric that describes the actual maximum depth of the ontology and is displayed alongside other metrics like class count.

Because of that, setting maxDepth = 0 for flat ontologies is incorrect. It changes the meaning of the metric. A depth of 0 means there's no hierarchy depth, which is not what flat means in our system. Marking an ontology as flat should affect presentation behavior, not redefine the metric itself.

Copy link
Copy Markdown
Member

@mdorf mdorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one comment: if there are submissions in prod where the metric model doesn't exist in the triplestore but a metrics.csv file does, it could cause the API to return -1 for class counts that previously worked.

@alexskr
Copy link
Copy Markdown
Member Author

alexskr commented Apr 14, 2026

@jvendetti Good point — this deserves a closer look. I traced the history of how flat ontology metrics were calculated across three eras of the codebase.

Pre-owlapi-maxDepth calculation (before 61ee7059)

In the original class_metrics, the entire depth calculation was inside an unless is_flat guard. For flat ontologies:

  • maxDepth = 0 (depth calculation skipped entirely)
  • classesWithOneChild = 0, maxChildCount = 0, classesWithMoreThan25Children = 0, averageChildCount = 0 (child metrics skipped via unless is_flat guard in the groupby_children loop)
  • classes, properties, individuals reported real values

The test asserted assert_equal 0, metrics.maxDepth for BROTEST-ISFLAT.

Post-owlapi-maxDepth calculation (commit 61ee7059)

Timothy Redmond extracted max_depth_fn and added CSV-first logic. The CSV is generated by owlapi_wrapper which has no knowledge of BioPortal's flat flag, so it reports the real tree depth. The is_flat guard only protects the SPARQL fallback path, not the CSV path.

The commit message acknowledges this: "It ignores any repute indicating that an ontology is flat and it ignore the roots. In both cases, these values are implicit in the java algorithm."

The test assertion was changed from assert_equal 0 to assert_equal 7 to match.

However, all the child metrics (classesWithOneChild, maxChildCount, classesWithMoreThan25Children, averageChildCount) continued to be zeroed out by the existing unless is_flat guard — that was not changed.

The inconsistency

After owlapi metrics calculcation, flat ontology metrics became internally inconsistent:

Metric Value Why
maxDepth 7 (real) Read from CSV, is_flat not checked
classesWithOneChild 0 Skipped by unless is_flat
maxChildCount 0 Skipped by unless is_flat
classesWithMoreThan25Children 0 Skipped by unless is_flat
averageChildCount 0 Skipped by unless is_flat

If the position is that flat is a UI flag and metrics should reflect reality, then the child metrics should also report real values — removing the unless is_flat guard from the groupby_children loop. If the position is that flat means "report zero for hierarchy metrics," then maxDepth should also be 0.

Either approach is valid, but we should be consistent. Which way should we go?

@alexskr
Copy link
Copy Markdown
Member Author

alexskr commented Apr 14, 2026

Looks good. Just one comment: if there are submissions in prod where the metric model doesn't exist in the triplestore but a metrics.csv file does, it could cause the API to return -1 for class counts that previously worked.

Good observation. This is a deliberate change. The triplestore should be the source of truth for metrics. If a Metric object doesn't exist there, it means the metrics calculation process either wasn't run or failed — and the correct response is to reprocess that submission, not to silently fall back to a CSV file that may be stale or from a partial/failed run.

The motivation for this change was that I started noticing a bunch of metrics.csv file not found exceptions in the API logs, where the API was trying to fall back to reading the CSV. For unparsed ontologies or failed submissions, the old behavior was: try triplestore (fail), try CSV (fail, log error), return -1. Our change is: try triplestore (fail), return -1. Same end result, minus the noisy CSV error in the logs. The ClassesController already handles the -1 case by returning a 403 with a descriptive message.

If there are submissions in production where the CSV exists but the Metric object doesn't, those should be identified and reprocessed.

@jvendetti
Copy link
Copy Markdown
Member

I agree there’s an inconsistency in how hierarchy-related metrics are currently handled for flat ontologies. But I don’t think that means this PR should fix it by setting maxDepth = 0.

My concern is simpler: flat and maxDepth aren't the same thing, so changing maxDepth to 0 isn't the right fix. A value of 0 changes the meaning of the metric. flat is a flag we use for how an ontology is treated for browsing/UI purposes; maxDepth is a metric describing the actual depth of the hierarchy.

I think we should be careful about changing metric semantics, since these metrics are shown in the UI and treated as descriptive properties of an ontology. Some of this logic was moved into owlapi_wrapper because calculating metrics live in the triplestore was too expensive or unreliable at scale. maxDepth in particular was moved when AllegroGraph proved unable to calculate it for large ontologies. So this doesn’t feel like a small cleanup to me; it changes the meaning of a user-facing metric in an area where we’ve already made tradeoffs for performance.

If we want to revisit the bigger question of how all hierarchy-related metrics should behave for flat ontologies, that seems like a separate discussion. I don’t think this PR, which is otherwise a bugfix/refactor, should also change the meaning of maxDepth as a way of resolving a larger inconsistency in how these metrics behave.

I’d rather see this PR scoped to the concrete fixes it already contains, and handle the larger inconsistency separately if we decide it’s worth changing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants