Skip to content

feat(sql): add debug logging for dropped tables and views in SQLAlchemy and Teradata sources#18136

Open
NehaGslab wants to merge 4 commits into
datahub-project:masterfrom
NehaGslab:feat-teradata-enhance-debug-pattern-filtering
Open

feat(sql): add debug logging for dropped tables and views in SQLAlchemy and Teradata sources#18136
NehaGslab wants to merge 4 commits into
datahub-project:masterfrom
NehaGslab:feat-teradata-enhance-debug-pattern-filtering

Conversation

@NehaGslab

Copy link
Copy Markdown
Contributor

Summary

Adds logger.debug visibility at the points where tables and views are scanned and
dropped by table_pattern / view_pattern. Previously a Teradata run that filtered
out every table/view still reported as "successful" with no indication of why
nothing was ingested — the dropped entities are only sampled into report.filtered
(a LossyList), so there was no way to see which identifiers were denied or what
identifier string was being matched against the patterns.

This mirrors the per-object debug logging BigQuery already emits, making pattern
misconfiguration (e.g. wrong case, schema.table vs table format) diagnosable
by re-running with debug logging enabled.

Changes

  • sql_common.py (SQLAlchemySource):
    • loop_tables: log each table scanned and each table dropped by table_pattern.
    • loop_views: log each view scanned and each view dropped by view_pattern.
      (Teradata tables flow through the base loop_tables via cached_loop_tables.)
  • teradata.py:
    • _loop_views_with_connection_pool (multithreaded view path): log views
      scanned and dropped by view_pattern.
    • _process_views_single_threaded (single-threaded view path): same logging.

No behavioral change — only additional debug-level log lines. Filtering logic,
reporting, and emitted work units are unchanged.

Test plan

Added TestFilterDropDebugLogging in metadata-ingestion/tests/unit/test_teradata_source.py,
covering every drop point where table_pattern / view_pattern filtering occurs. Each test
asserts the denied object is recorded in report.filtered and its _process_* handler is
never invoked:

  • test_single_threaded_view_drop_reports_and_skips_processing — a view denied by
    view_pattern on the single-threaded path (_process_views_single_threaded,
    max_workers=1) is recorded in report.filtered and _process_view is not called.
  • test_multi_threaded_view_drop_reports_and_skips_processing — same behavior on the
    multi-threaded path (_loop_views_with_connection_pool, max_workers=2).
  • test_allowed_view_is_processed_and_not_dropped — a view that passes view_pattern
    is handed to _process_view and does not appear in report.filtered.
  • test_table_drop_reports_and_skips_processing — a table denied by table_pattern
    (flowing through cached_loop_tables → base loop_tables) is recorded in
    report.filtered and _process_table is not called.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly PR Title Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions Bot added the ingestion PR or Issue related to the ingestion of metadata label Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Linear: ING-2951

Thanks for your contribution! We have created an internal ticket to track this PR. A member of the core DataHub team will be assigned to review it within the next few business days - you will get a follow-up comment once a reviewer is assigned.

@github-actions github-actions Bot added the community-contribution PR or Issue raised by member(s) of DataHub Community label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant