Skip to content

feat: DH-21548: Permit switching partition-ordering for Iceberg#7652

Open
robbcamera wants to merge 4 commits intodeephaven:mainfrom
robbcamera:rjc_DH-21548
Open

feat: DH-21548: Permit switching partition-ordering for Iceberg#7652
robbcamera wants to merge 4 commits intodeephaven:mainfrom
robbcamera:rjc_DH-21548

Conversation

@robbcamera
Copy link
Contributor

No description provided.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

No docs changes detected for 3839876

@robbcamera robbcamera changed the title DH-21548: Permit switching partition-ordering for Iceberg feat: DH-21548: Permit switching partition-ordering for Iceberg Feb 4, 2026
for (int colIdx = 0; colIdx < partitionFieldsFromSpec.size(); colIdx += 1) {
final PartitionField partitionField = partitionFieldsFromSpec.get(colIdx);
if (!partitioningColumnNamesFromDefinition.get(colIdx).equals(partitionField.name())) {
for (final PartitionField partitionField : partitionFieldsFromSpec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting for potential follow-up - it looks like tablePartitionSpec is always coming from org.apache.iceberg.Table#spec, which is not necessarily the spec we want to be use. I think we'd want to be using the partition spec as written down in the ExtendedStorage block; you could also make an argument that technically the reading and writing partition specs could be different (for example, you want to partition more when you are writing it, but not present that to the user when reading). We could consider this as something that would be better defined in the PQ config (if we have a more strongly typed config eventually coming from that layer), or, as a separate element defined in the

<Table namespace="...">
  ...
  <ExtendedStorage type="iceberg">
    <Catalog>
      ...
    </Catalog>
    <Table>
     ...
    </Table>
    <Merge>
      <TheoreticalBlockForMergePartitionSpec>...</>
    </Merge>
  </ExtendedStorage>
</Table>

Comment on lines +1948 to +1950
final IcebergTableWriter tableWriter2 = tableAdapter1.tableWriter(writerOptionsBuilder()
.tableDefinition(tableDefinition2)
.build());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to step through the debugger here to verify some things myself - will do later.

Comment on lines +165 to +166
verifyPartitioningColumns(resolver, resolver.spec().isEmpty() ? tableSpec : resolver.spec().get(),
tableDefinition);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not rigorous - we should require caller to always be explicit.

I would argue that the above line

this.tableSpec = table.spec();

should be changed; our choice of how what partitioning to use should not be dictated by whatever happens to be returned from org.apache.iceberg.Table#spec. I would argue that we might want to instead get the spec from the Resolver unconditionally (if it's not set, org.apache.iceberg.PartitionSpec#unpartitioned can be used).

We also need to consider that we are being passed in two TableDefinitions here - does that make sense? We need to audit the higher level callers to see if there is any potential for drift, and if so, what does it mean that there are two different definitions? tableWriterOptions.tableDefinition() v tableAdapter.definition()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc on TableWriterOptions#tableDefinition explains how they may differ, need to consider that angle here.

.addTables(part4)
.addPartitionPaths("InternalPartition=1/Date=2024-08-02")
.build());
final Table fromIceberg2 = tableAdapter1.table();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought one of the whole points was we want to try to read out existing data in a different partition order? In which case, you can't use tableAdapter1, you'll need to io.deephaven.iceberg.util.IcebergCatalogAdapter#loadTable(io.deephaven.iceberg.util.LoadTableOptions) w/ a Resolver that has the TableDefinition in a different order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants