Skip to content

Fix missing label retry state leaking across ontology processing#272

Draft
alexskr wants to merge 5 commits intodevelopfrom
fix/missing-labels-state-leak
Draft

Fix missing label retry state leaking across ontology processing#272
alexskr wants to merge 5 commits intodevelopfrom
fix/missing-labels-state-leak

Conversation

@alexskr
Copy link
Copy Markdown
Member

@alexskr alexskr commented Apr 7, 2026

@alexskr alexskr marked this pull request as ready for review April 8, 2026 00:16
@alexskr alexskr requested a review from mdorf April 8, 2026 00:16
Copy link
Copy Markdown
Member

@mdorf mdorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor issue: silent success on page retrieval failure

Suppose paging.page(page, size).all at line 110 raises a StandardError (e.g., a SPARQL timeout, network. error, or triplestore issue).

Step 1 — The error is caught and swallowed (lines 109-117):

begin
  page_classes = paging.page(page, size).all   # raises StandardError
  total_pages = page_classes.total_pages
  page_len = page_classes.length
rescue StandardError => e
  page_classes = []          # line 114
  logger.error(...)          # line 115 — logged, but execution continues
  logger.flush
end

After the rescue: page_classes = [], total_pages = 0 (from line 107 initialization).

Step 2 — The retry block is skipped (line 120):

if total_pages > 0 && page_classes.empty? && ...

Since total_pages = 0, this entire block (lines 120-147) is skipped. The retry logic never fires — the error is treated as "there are simply no pages."

Step 3 — The loop breaks (lines 149-156):

if page_classes.empty?
  if total_pages > 0
    ...
  else
    logger.info("Ontology #{acr} contains #{total_pages} pages...")  # line 153
  end
  break   # line 155
end

The loop exits cleanly with a log message that says the ontology contains 0 pages — as if this is a normal, expected condition rather than an error.

Step 4 — Success status is set (lines 187-191):

# set the status on actions that have completed successfully
callbacks.each do |_, callback|
  if callback[:status]
    @submission.add_submission_status(callback[:status])   # line 189 — marks RDF_LABELS as SUCCESS
    @submission.save
  end
end

The submission is marked with the RDF_LABELS success status even though no labels were actually processed — the page fetch failed with an exception.

Step 5 — handle_missing_labels rescue is never triggered (lines 33-38):

Since loop_classes returned normally (no exception propagated), the rescue Exception => e at line 33 never fires, so the error status is never set.

Suggested fix

Re-raise the exception after logging on the initial page fetch, so it reaches the error status path in handle_missing_labels. The retry-loop rescue (lines 128-138) can remain as-is since the retry block already handles exhausted retries by raising at line 145.

# lines 109-117: re-raise on initial fetch failure
begin
  page_classes = paging.page(page, size).all
  total_pages = page_classes.total_pages
  page_len = page_classes.length
rescue StandardError => e
  page_classes = []
  logger.error("Error retrieving page #{page} for #{acr}: #{e.class}: #{e.message}")
  logger.flush
  raise e  # let handle_missing_labels rescue set the error status
end

@alexskr alexskr marked this pull request as draft April 9, 2026 00:30
@alexskr alexskr marked this pull request as draft April 9, 2026 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants