Clean up metrics: dead code, flat maxDepth fix, remove CSV fallback by alexskr · Pull Request #277 · ncbo/ontologies_linked_data

alexskr · 2026-04-11T04:29:44Z

Summary

Remove dead metrics_for_submission left behind after refactoring to SubmissionMetricsCalculator
Remove unused require 'csv' from metrics/metrics.rb
Mark recursive_depth as potentially unused (TODO comment)
Add unit test for class_count returning -1 when metrics are absent
Remove error-swallowing rescue from metrics_for_submission in SubmissionMetricsCalculator — the inner rescue masked root cause errors (e.g., SPARQL failures surfaced as NoMethodError on nil instead of the real error)
Fix maxDepth returning non-zero for flat ontologies — max_depth_fn was reading from CSV without checking the flat flag
Remove metrics.csv fallback from OntologySubmission#class_count — the CSV file should only be read during ontology processing (ncbo_cron), not by the API at query time
Skip query_groupby_classes SPARQL call for flat ontologies — was called with nil predicate, producing invalid SPARQL that failed on GraphDB

Closes #276

This method was moved to SubmissionMetricsCalculator during a prior refactoring but the original copy was left behind. No callers exist in this repo or dependent projects (ontologies_api, ncbo_cron, ncbo_annotator).

No CSV usage remains in this file after removal of metrics_for_submission. The csv library is still required by ontology_submission.rb and submission_mertrics_calculator.rb where it is needed.

No external callers found in this repo or dependent projects. Keeping the method for now pending further validation.

Verifies that class_count returns -1 gracefully when no metrics exist in the triplestore and no CSV fallback is available.

The inner rescue in metrics_for_submission caught errors, logged a minimal message, and returned nil. This masked the real error — the caller (compute_metrics) would then fail with NoMethodError on nil, and the outer rescue in process_metrics would log that misleading error instead of the root cause. process_metrics already handles errors properly: logs the real exception with full backtrace and sets the METRICS error status. The inner rescue was redundant and harmful.

max_depth_fn was reading maxDepth from the CSV file generated by owlapi_wrapper regardless of the flat flag. owlapi_wrapper has no knowledge of BioPortal's flat designation, so it reports the real tree depth. Now we short-circuit and return 0 for flat ontologies before any CSV or SPARQL calculation.

class_count was falling back to reading metrics.csv from disk when triplestore metrics were absent. This caused errors on API nodes where the file does not exist or is missing for older submissions. The API should always read metrics from the triplestore. The CSV file should only be consumed during ontology parsing in ncbo_cron.

query_groupby_classes was called with rdfsSC=nil for flat ontologies, producing invalid SPARQL (<> predicate). This was silently tolerated by 4store but caused a SPARQL::Client::MalformedQuery error on GraphDB, preventing the metrics status from being set. The groupby_children results were already unused for flat ontologies (the loop body was guarded by `unless is_flat`), so the query was wasteful even when it didn't error. Moved the entire block inside the `unless is_flat` guard.

jvendetti · 2026-04-13T20:15:56Z

I disagree with this change specifically:

Fix maxDepth returning non-zero for flat ontologies

I don’t think non-zero maxDepth for a flat ontology is inherently wrong, because flat and maxDepth are different concepts in the codebase.

flat is a boolean flag used to indicate that an ontology should be treated as flat for browsing/UI purposes. We've set that manually on a case-by-case basis for ontologies where displaying a tree control wasn't performant. It's not a statement that the ontology literally has no hierarchy.

maxDepth is a hierarchy metric that describes the actual maximum depth of the ontology and is displayed alongside other metrics like class count.

Because of that, setting maxDepth = 0 for flat ontologies is incorrect. It changes the meaning of the metric. A depth of 0 means there's no hierarchy depth, which is not what flat means in our system. Marking an ontology as flat should affect presentation behavior, not redefine the metric itself.

mdorf

Looks good. Just one comment: if there are submissions in prod where the metric model doesn't exist in the triplestore but a metrics.csv file does, it could cause the API to return -1 for class counts that previously worked.

alexskr · 2026-04-14T17:13:04Z

@jvendetti Good point — this deserves a closer look. I traced the history of how flat ontology metrics were calculated across three eras of the codebase.

Pre-owlapi-maxDepth calculation (before 61ee7059)

In the original class_metrics, the entire depth calculation was inside an unless is_flat guard. For flat ontologies:

maxDepth = 0 (depth calculation skipped entirely)
classesWithOneChild = 0, maxChildCount = 0, classesWithMoreThan25Children = 0, averageChildCount = 0 (child metrics skipped via unless is_flat guard in the groupby_children loop)
classes, properties, individuals reported real values

The test asserted assert_equal 0, metrics.maxDepth for BROTEST-ISFLAT.

Post-owlapi-maxDepth calculation (commit `61ee7059`)

Timothy Redmond extracted max_depth_fn and added CSV-first logic. The CSV is generated by owlapi_wrapper which has no knowledge of BioPortal's flat flag, so it reports the real tree depth. The is_flat guard only protects the SPARQL fallback path, not the CSV path.

The commit message acknowledges this: "It ignores any repute indicating that an ontology is flat and it ignore the roots. In both cases, these values are implicit in the java algorithm."

The test assertion was changed from assert_equal 0 to assert_equal 7 to match.

However, all the child metrics (classesWithOneChild, maxChildCount, classesWithMoreThan25Children, averageChildCount) continued to be zeroed out by the existing unless is_flat guard — that was not changed.

The inconsistency

After owlapi metrics calculcation, flat ontology metrics became internally inconsistent:

Metric	Value	Why
`maxDepth`	7 (real)	Read from CSV, `is_flat` not checked
`classesWithOneChild`	0	Skipped by `unless is_flat`
`maxChildCount`	0	Skipped by `unless is_flat`
`classesWithMoreThan25Children`	0	Skipped by `unless is_flat`
`averageChildCount`	0	Skipped by `unless is_flat`

If the position is that flat is a UI flag and metrics should reflect reality, then the child metrics should also report real values — removing the unless is_flat guard from the groupby_children loop. If the position is that flat means "report zero for hierarchy metrics," then maxDepth should also be 0.

Either approach is valid, but we should be consistent. Which way should we go?

alexskr · 2026-04-14T17:37:49Z

Looks good. Just one comment: if there are submissions in prod where the metric model doesn't exist in the triplestore but a metrics.csv file does, it could cause the API to return -1 for class counts that previously worked.

Good observation. This is a deliberate change. The triplestore should be the source of truth for metrics. If a Metric object doesn't exist there, it means the metrics calculation process either wasn't run or failed — and the correct response is to reprocess that submission, not to silently fall back to a CSV file that may be stale or from a partial/failed run.

The motivation for this change was that I started noticing a bunch of metrics.csv file not found exceptions in the API logs, where the API was trying to fall back to reading the CSV. For unparsed ontologies or failed submissions, the old behavior was: try triplestore (fail), try CSV (fail, log error), return -1. Our change is: try triplestore (fail), return -1. Same end result, minus the noisy CSV error in the logs. The ClassesController already handles the -1 case by returning a 403 with a descriptive message.

If there are submissions in production where the CSV exists but the Metric object doesn't, those should be identified and reprocessed.

jvendetti · 2026-04-16T00:00:11Z

I agree there’s an inconsistency in how hierarchy-related metrics are currently handled for flat ontologies. But I don’t think that means this PR should fix it by setting maxDepth = 0.

My concern is simpler: flat and maxDepth aren't the same thing, so changing maxDepth to 0 isn't the right fix. A value of 0 changes the meaning of the metric. flat is a flag we use for how an ontology is treated for browsing/UI purposes; maxDepth is a metric describing the actual depth of the hierarchy.

I think we should be careful about changing metric semantics, since these metrics are shown in the UI and treated as descriptive properties of an ontology. Some of this logic was moved into owlapi_wrapper because calculating metrics live in the triplestore was too expensive or unreliable at scale. maxDepth in particular was moved when AllegroGraph proved unable to calculate it for large ontologies. So this doesn’t feel like a small cleanup to me; it changes the meaning of a user-facing metric in an area where we’ve already made tradeoffs for performance.

If we want to revisit the bigger question of how all hierarchy-related metrics should behave for flat ontologies, that seems like a separate discussion. I don’t think this PR, which is otherwise a bugfix/refactor, should also change the meaning of maxDepth as a way of resolving a larger inconsistency in how these metrics behave.

I’d rather see this PR scoped to the concrete fixes it already contains, and handle the larger inconsistency separately if we decide it’s worth changing.

alexskr marked this pull request as draft April 11, 2026 04:30

alexskr force-pushed the fix/metrics-cleanup branch from a8fbe74 to c45993b Compare April 11, 2026 05:24

alexskr added 8 commits April 10, 2026 22:26

Remove dead metrics_for_submission from metrics.rb

e16282b

This method was moved to SubmissionMetricsCalculator during a prior refactoring but the original copy was left behind. No callers exist in this repo or dependent projects (ontologies_api, ncbo_cron, ncbo_annotator).

Remove unused require 'csv' from metrics.rb

037f57c

No CSV usage remains in this file after removal of metrics_for_submission. The csv library is still required by ontology_submission.rb and submission_mertrics_calculator.rb where it is needed.

Add TODO comment marking recursive_depth as potentially unused

485eb29

No external callers found in this repo or dependent projects. Keeping the method for now pending further validation.

Add unit test for class_count when metrics are absent

d32bfd3

Verifies that class_count returns -1 gracefully when no metrics exist in the triplestore and no CSV fallback is available.

alexskr force-pushed the fix/metrics-cleanup branch from c45993b to 3ffa994 Compare April 11, 2026 05:28

alexskr marked this pull request as ready for review April 13, 2026 04:39

alexskr requested a review from mdorf April 13, 2026 04:39

mdorf reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up metrics: dead code, flat maxDepth fix, remove CSV fallback#277

Clean up metrics: dead code, flat maxDepth fix, remove CSV fallback#277
alexskr wants to merge 8 commits intodevelopfrom
fix/metrics-cleanup

alexskr commented Apr 11, 2026 •

edited

Loading

Uh oh!

jvendetti commented Apr 13, 2026

Uh oh!

mdorf left a comment

Uh oh!

alexskr commented Apr 14, 2026 •

edited

Loading

Uh oh!

alexskr commented Apr 14, 2026 •

edited

Loading

Uh oh!

jvendetti commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alexskr commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

jvendetti commented Apr 13, 2026

Uh oh!

mdorf left a comment

Choose a reason for hiding this comment

Uh oh!

alexskr commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-owlapi-maxDepth calculation (before 61ee7059)

Post-owlapi-maxDepth calculation (commit 61ee7059)

The inconsistency

Uh oh!

alexskr commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvendetti commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexskr commented Apr 11, 2026 •

edited

Loading

alexskr commented Apr 14, 2026 •

edited

Loading

Post-owlapi-maxDepth calculation (commit `61ee7059`)

alexskr commented Apr 14, 2026 •

edited

Loading