feat: DH-21548: Permit switching partition-ordering for Iceberg#7652
feat: DH-21548: Permit switching partition-ordering for Iceberg#7652robbcamera wants to merge 4 commits intodeephaven:mainfrom
Conversation
No docs changes detected for 3839876 |
extensions/iceberg/src/main/java/io/deephaven/iceberg/base/IcebergUtils.java
Outdated
Show resolved
Hide resolved
| for (int colIdx = 0; colIdx < partitionFieldsFromSpec.size(); colIdx += 1) { | ||
| final PartitionField partitionField = partitionFieldsFromSpec.get(colIdx); | ||
| if (!partitioningColumnNamesFromDefinition.get(colIdx).equals(partitionField.name())) { | ||
| for (final PartitionField partitionField : partitionFieldsFromSpec) { |
There was a problem hiding this comment.
Noting for potential follow-up - it looks like tablePartitionSpec is always coming from org.apache.iceberg.Table#spec, which is not necessarily the spec we want to be use. I think we'd want to be using the partition spec as written down in the ExtendedStorage block; you could also make an argument that technically the reading and writing partition specs could be different (for example, you want to partition more when you are writing it, but not present that to the user when reading). We could consider this as something that would be better defined in the PQ config (if we have a more strongly typed config eventually coming from that layer), or, as a separate element defined in the
<Table namespace="...">
...
<ExtendedStorage type="iceberg">
<Catalog>
...
</Catalog>
<Table>
...
</Table>
<Merge>
<TheoreticalBlockForMergePartitionSpec>...</>
</Merge>
</ExtendedStorage>
</Table>| final IcebergTableWriter tableWriter2 = tableAdapter1.tableWriter(writerOptionsBuilder() | ||
| .tableDefinition(tableDefinition2) | ||
| .build()); |
There was a problem hiding this comment.
I want to step through the debugger here to verify some things myself - will do later.
extensions/iceberg/src/test/java/io/deephaven/iceberg/junit5/SqliteCatalogBase.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/test/java/io/deephaven/iceberg/junit5/SqliteCatalogBase.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/test/java/io/deephaven/iceberg/junit5/SqliteCatalogBase.java
Show resolved
Hide resolved
| verifyPartitioningColumns(resolver, resolver.spec().isEmpty() ? tableSpec : resolver.spec().get(), | ||
| tableDefinition); |
There was a problem hiding this comment.
This is not rigorous - we should require caller to always be explicit.
I would argue that the above line
this.tableSpec = table.spec();should be changed; our choice of how what partitioning to use should not be dictated by whatever happens to be returned from org.apache.iceberg.Table#spec. I would argue that we might want to instead get the spec from the Resolver unconditionally (if it's not set, org.apache.iceberg.PartitionSpec#unpartitioned can be used).
We also need to consider that we are being passed in two TableDefinitions here - does that make sense? We need to audit the higher level callers to see if there is any potential for drift, and if so, what does it mean that there are two different definitions? tableWriterOptions.tableDefinition() v tableAdapter.definition()
There was a problem hiding this comment.
The javadoc on TableWriterOptions#tableDefinition explains how they may differ, need to consider that angle here.
extensions/iceberg/src/test/java/io/deephaven/iceberg/junit5/SqliteCatalogBase.java
Show resolved
Hide resolved
| .addTables(part4) | ||
| .addPartitionPaths("InternalPartition=1/Date=2024-08-02") | ||
| .build()); | ||
| final Table fromIceberg2 = tableAdapter1.table(); |
There was a problem hiding this comment.
I thought one of the whole points was we want to try to read out existing data in a different partition order? In which case, you can't use tableAdapter1, you'll need to io.deephaven.iceberg.util.IcebergCatalogAdapter#loadTable(io.deephaven.iceberg.util.LoadTableOptions) w/ a Resolver that has the TableDefinition in a different order.
No description provided.