[fix][io] Fix Solr Sink KeyValue schema unwrapping for Debezium CDC#26
Open
Praveenkumar76 wants to merge 5 commits into
Open
[fix][io] Fix Solr Sink KeyValue schema unwrapping for Debezium CDC#26Praveenkumar76 wants to merge 5 commits into
Praveenkumar76 wants to merge 5 commits into
Conversation
…eric schema API and extracting JSON payloads
…move hardcoded id by dynamically extracting primary key
…d improve CDC detection by validating Debezium envelope (op + before/after)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes apache#23763
Motivation
Solr sink indexing remains empty when consuming PostgreSQL CDC events produced by the Debezium connector.
The root cause is a schema and structural incompatibility between Debezium-generated messages and the existing Solr sink implementation:
KeyValueschema wrapper. The existingGenericRecordprocessing could not correctly extract nested payload fields from this structure.before,after,op, andsource. The sink previously lacked logic to unwrap and process the actual row state.Integer,Long), causing silent field drops during indexing.Additionally, while CDC transformation can be handled externally using Pulsar Functions or intermediate processors, native support within the Solr sink simplifies standard CDC-to-search indexing pipelines.
Modifications
Configuration
unwrapDebeziumRecordconfiguration inSolrSinkConfig(default:false) to allow users to explicitly enable Debezium CDC payload handling.SolrAbstractSink
Updated the parent sink lifecycle handling:
record.ack()record.fail()to support retries and DLQ handlingSolrGenericRecordSink
Implemented native Debezium CDC support:
extractValueRecord()usinggetNativeObject()for safe extraction ofKeyValuepayloadsisDebeziumEnvelope()to identify valid Debezium CDC envelopes usingopand state fields (before/after)mapDebeziumPayload()andextractAfterRecord()to recursively unwrap CDC payloadsafter == null)beforedeleteByIdin SolrnormalizeValue()to safely convert numeric values into Strings for Solr schema compatibilityRefactoring
Highlight of changes:
ack()/fail()lifecycle handlingVerifying this change
Verified end-to-end using:
Validation results:
Example CDC operations validated:
"op":"c"→ INSERT"op":"u"→ UPDATE"op":"d"→ DELETETests added
testDebeziumUnwrapEnvelopeValidates handling of standard Debezium envelope payloads
testDebeziumUnwrapFlatValueEnsures compatibility with non-envelope payloads
testDebeziumUnwrapDeleteVerifies DELETE event handling and Solr deletion flow
Updated mocks to support:
KeyValueschema handlingopfield detectionConfirmed no regression for non-CDC Pulsar topics
Does this pull request potentially affect one of the following parts: