Clean up metrics: dead code, flat maxDepth fix, remove CSV fallback#277
Clean up metrics: dead code, flat maxDepth fix, remove CSV fallback#277
Conversation
a8fbe74 to
c45993b
Compare
This method was moved to SubmissionMetricsCalculator during a prior refactoring but the original copy was left behind. No callers exist in this repo or dependent projects (ontologies_api, ncbo_cron, ncbo_annotator).
No CSV usage remains in this file after removal of metrics_for_submission. The csv library is still required by ontology_submission.rb and submission_mertrics_calculator.rb where it is needed.
No external callers found in this repo or dependent projects. Keeping the method for now pending further validation.
Verifies that class_count returns -1 gracefully when no metrics exist in the triplestore and no CSV fallback is available.
The inner rescue in metrics_for_submission caught errors, logged a minimal message, and returned nil. This masked the real error — the caller (compute_metrics) would then fail with NoMethodError on nil, and the outer rescue in process_metrics would log that misleading error instead of the root cause. process_metrics already handles errors properly: logs the real exception with full backtrace and sets the METRICS error status. The inner rescue was redundant and harmful.
max_depth_fn was reading maxDepth from the CSV file generated by owlapi_wrapper regardless of the flat flag. owlapi_wrapper has no knowledge of BioPortal's flat designation, so it reports the real tree depth. Now we short-circuit and return 0 for flat ontologies before any CSV or SPARQL calculation.
class_count was falling back to reading metrics.csv from disk when triplestore metrics were absent. This caused errors on API nodes where the file does not exist or is missing for older submissions. The API should always read metrics from the triplestore. The CSV file should only be consumed during ontology parsing in ncbo_cron.
query_groupby_classes was called with rdfsSC=nil for flat ontologies, producing invalid SPARQL (<> predicate). This was silently tolerated by 4store but caused a SPARQL::Client::MalformedQuery error on GraphDB, preventing the metrics status from being set. The groupby_children results were already unused for flat ontologies (the loop body was guarded by `unless is_flat`), so the query was wasteful even when it didn't error. Moved the entire block inside the `unless is_flat` guard.
c45993b to
3ffa994
Compare
|
I disagree with this change specifically:
I don’t think non-zero
Because of that, setting |
mdorf
left a comment
There was a problem hiding this comment.
Looks good. Just one comment: if there are submissions in prod where the metric model doesn't exist in the triplestore but a metrics.csv file does, it could cause the API to return -1 for class counts that previously worked.
|
@jvendetti Good point — this deserves a closer look. I traced the history of how flat ontology metrics were calculated across three eras of the codebase. Pre-owlapi-maxDepth calculation (before 61ee7059)In the original
The test asserted Post-owlapi-maxDepth calculation (commit
|
| Metric | Value | Why |
|---|---|---|
maxDepth |
7 (real) | Read from CSV, is_flat not checked |
classesWithOneChild |
0 | Skipped by unless is_flat |
maxChildCount |
0 | Skipped by unless is_flat |
classesWithMoreThan25Children |
0 | Skipped by unless is_flat |
averageChildCount |
0 | Skipped by unless is_flat |
If the position is that flat is a UI flag and metrics should reflect reality, then the child metrics should also report real values — removing the unless is_flat guard from the groupby_children loop. If the position is that flat means "report zero for hierarchy metrics," then maxDepth should also be 0.
Either approach is valid, but we should be consistent. Which way should we go?
Good observation. This is a deliberate change. The triplestore should be the source of truth for metrics. If a Metric object doesn't exist there, it means the metrics calculation process either wasn't run or failed — and the correct response is to reprocess that submission, not to silently fall back to a CSV file that may be stale or from a partial/failed run. The motivation for this change was that I started noticing a bunch of If there are submissions in production where the CSV exists but the Metric object doesn't, those should be identified and reprocessed. |
|
I agree there’s an inconsistency in how hierarchy-related metrics are currently handled for flat ontologies. But I don’t think that means this PR should fix it by setting My concern is simpler: I think we should be careful about changing metric semantics, since these metrics are shown in the UI and treated as descriptive properties of an ontology. Some of this logic was moved into If we want to revisit the bigger question of how all hierarchy-related metrics should behave for flat ontologies, that seems like a separate discussion. I don’t think this PR, which is otherwise a bugfix/refactor, should also change the meaning of I’d rather see this PR scoped to the concrete fixes it already contains, and handle the larger inconsistency separately if we decide it’s worth changing. |
Summary
metrics_for_submissionleft behind after refactoring toSubmissionMetricsCalculatorrequire 'csv'frommetrics/metrics.rbrecursive_depthas potentially unused (TODO comment)class_countreturning -1 when metrics are absentmetrics_for_submissioninSubmissionMetricsCalculator— the inner rescue masked root cause errors (e.g., SPARQL failures surfaced asNoMethodErroron nil instead of the real error)maxDepthreturning non-zero for flat ontologies —max_depth_fnwas reading from CSV without checking theflatflagmetrics.csvfallback fromOntologySubmission#class_count— the CSV file should only be read during ontology processing (ncbo_cron), not by the API at query timequery_groupby_classesSPARQL call for flat ontologies — was called with nil predicate, producing invalid SPARQL that failed on GraphDBCloses #276