Skip to content

Conversation

@khustup2
Copy link
Contributor

🚀 🚀 Pull Request

Impact

  • Bug fix (non-breaking change which fixes expected existing functionality)
  • Enhancement/New feature (adds functionality without impacting existing logic)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

Things to be aware of

Things to worry about

Additional Context

Copilot AI review requested due to automatic review settings January 14, 2026 19:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where dropping a column from a deeplake table caused subsequent queries to fail with "cache lookup failed for type 0". The fix filters out dropped columns when building table metadata, ensuring only active columns are processed.

Changes:

  • Modified table_data to track only active (non-dropped) columns via an active_column_indices_ vector that maps logical column indices to TupleDesc indices
  • Added comprehensive test coverage for DROP COLUMN scenarios including multiple drops, reconnection after drop, and array type columns
  • Updated process_utility to reload table metadata after DROP COLUMN operations

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
postgres/tests/py_tests/test_drop_column_type_error.py New test suite validating DROP COLUMN functionality across various scenarios
cpp/deeplake_pg/table_data_impl.hpp Core fix: filters dropped columns during initialization and maps logical indices to physical TupleDesc indices
cpp/deeplake_pg/table_data.hpp Added is_column_dropped() method and active_column_indices_ member variable
cpp/deeplake_pg/table_am.cpp Reordered includes alphabetically and applied code formatting
cpp/deeplake_pg/extension_init.cpp Added DROP COLUMN handler to reload table metadata and applied code formatting
cpp/deeplake_pg/duckdb_deeplake_scan.cpp Applied code formatting and added comment about active columns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pg::table_storage::instance().erase_table(table_name);
pg::table_storage::instance().force_load_table_metadata();
elog(INFO, "Reloaded table_data after DROP COLUMN for table '%s'", table_name.c_str());
return; // Exit early after reloading table
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment 'Exit early after reloading table' is misleading since the function returns at line 997, but there's additional code after this block (at line 998-1000) that would execute in other ALTER TABLE cases. Consider revising to 'Exit processing for DROP COLUMN after reloading table' or similar to clarify this only exits the DROP COLUMN case.

Suggested change
return; // Exit early after reloading table
return; // Exit processing for DROP COLUMN after reloading table

Copilot uses AI. Check for mistakes.

inline bool table_data::is_column_dropped(AttrNumber attr_num) const noexcept
{
// Active columns are never dropped (dropped columns are filtered out)
Copy link

Copilot AI Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_column_dropped() function unconditionally returns false and ignores its parameter. This creates a confusing API - callers might expect this to check if a column is dropped, but it always returns false. Consider either removing this function if it's not needed, or documenting why it exists (e.g., if it's required to satisfy an interface) and renaming it to something like is_active_column_dropped() to clarify it only operates on active columns.

Suggested change
// Active columns are never dropped (dropped columns are filtered out)
// Active columns are never dropped: table_data only exposes non-dropped columns and
// filters out dropped ones when building active_column_indices_. Therefore, for any
// attr_num that reaches this function, the corresponding column is guaranteed to be
// active and this function will always return false.
//
// The attr_num parameter is retained to keep the interface consistent with callers
// and related helpers, but it is not used in this implementation.

Copilot uses AI. Check for mistakes.
@sonarqubecloud
Copy link

@khustup2 khustup2 merged commit 2c30ced into main Jan 14, 2026
6 checks passed
@khustup2 khustup2 deleted the statistics branch January 14, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants