Skip to content

Handle out-of-order partitioned tables with pubviaroot = false#4045

Draft
jgao54 wants to merge 5 commits intomainfrom
partitioned-table-ordering-pt-2
Draft

Handle out-of-order partitioned tables with pubviaroot = false#4045
jgao54 wants to merge 5 commits intomainfrom
partitioned-table-ordering-pt-2

Conversation

@jgao54
Copy link
Copy Markdown
Contributor

@jgao54 jgao54 commented Mar 12, 2026

When a customer provides their own publication without publish_via_partition_root = true, only the relationMessage from the child is sent, not the parent. And insert/update/delete events reference the child table's relID instead of the parent.

Before this change, child's relationMessage are mapped to parent's relID, which means if different children have different column ordering, the relationMessage of a later arrived child would overwrite the earlier one, potentially causing decoding failure. In other word, all child partitions were sharing the same entry in relationMessageMapping.

This fix makes sure relationMessageMapping is now keyed by the original relID (i.e. parent's relID if pubviaroot = true, child's relID if pubviaroot = false). Then processInsertMessage/processUpdateMessage/processDeleteMessage use the original relID to look up the correct relation message from the child's relID. This doesn't affect pubviaroot = true because the original relID of the change event is the parent's relID, which is also how it behaved before this PR.

Todo: test out inherit table case which should work as well.

@jgao54 jgao54 force-pushed the clear-error-on-partitioned-table branch from 69ceb98 to 4dbe3ee Compare March 17, 2026 22:41
Base automatically changed from clear-error-on-partitioned-table to main March 18, 2026 01:02
jgao54 added a commit that referenced this pull request Mar 18, 2026
Skip child partition relation messages in CDC stream.

With `publish_via_partition_root = true`, PostgreSQL emits _both_ a
parent and child RelationMessage before each partition's first change
event.
Also note that with `publish_via_partition_root = true`, the
insert/update/delete message would always use the parent's relation id,
what I didn't realize was that _the tuple data would also use the
parent's column ordering._

this means the parent's RelationMessage carries the correct column
ordering, while the child RelationMessage may have a different column
ordering.

Previously, processRelationMessage would remap the parent's relation id
to the child's relation message and store it in `relationMessageMapping`
(`p.relationMessageMapping[currRel.RelationID] = currRel`), overwriting
the parent's column ordering. This causes change events to be decoded
against the child's column order, and if the child's column ordering has
a mismatch with the parent, it would lead to decoding errors.

The fix for `publish_via_partition_root = true`'s case turned out to be
quite simple: skip the child's relation message rather than overwriting
the `relationMessageMapping` with it.

Note that inherited tables work a bit differently because only the child
table's RelationMessage would be sent, not the parent's. So we need to
rely on child's RelationMessage. This does mean that inherited tables
where the column order does not match the parent can also cause decoding
errors. However this is an existing issue already and is out-of-scope
for this PR.

Fixes: #3544

Testing: e2e test without the change
[fails](https://github.com/PeerDB-io/peerdb/actions/runs/22974964490/job/66701154925?pr=4035#step:28:1683)
but should succeeds after.

Follow up with #4045 to fix
out-of-order columns in partitioned table when pubviaroot = false.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant