Skip to content

Integer Overflow / Data Truncation: SMALLINT (MariaDB) → Int16 (ClickHouse) causes silent data loss #4102

@nareshntr

Description

@nareshntr

Summary
When a MariaDB column is initially defined as SMALLINT (mapped to Int16 in ClickHouse by PeerDB), and the column is later altered to INT in MariaDB, PeerDB does not propagate the schema change to ClickHouse. The destination column remains Int16. When a large INT value like 2147483647 is subsequently inserted in MariaDB, ClickHouse silently truncates it to 0 due to the type mismatch — with no error or warning raised by PeerDB.

Environment

Source DB: MariaDB (version: fill in)
Destination DB: ClickHouse (version: fill in)
PeerDB Version: fill in
Mirror Type: CDC (Change Data Capture)
Replication Mode: Log-based (Binlog)

Steps to Reproduce
Step 1 — Create source table in MariaDB with SMALLINT column:
CREATE TABLE test_overflow (
id INT PRIMARY KEY AUTO_INCREMENT,
value SMALLINT
);
Step 2 — Set up PeerDB CDC mirror from MariaDB to ClickHouse.
At this point, ClickHouse destination column value is correctly mapped to Int16.
Step 3 — Verify initial ClickHouse schema:
DESCRIBE TABLE test_overflow;
-- value column should show Int16
Step 4 — ALTER the column in MariaDB from SMALLINT to INT:
ALTER TABLE test_overflow MODIFY COLUMN value INT;
Step 5 — Insert a large INT value in MariaDB:
INSERT INTO test_overflow (value) VALUES (2147483647);
Step 6 — Query the destination table in ClickHouse:
SELECT * FROM test_overflow FINAL WHERE _peerdb_is_deleted = 0;

Expected Behavior
After the ALTER TABLE in Step 4, PeerDB should detect the MODIFY COLUMN DDL event from the MariaDB binlog and do one of the following:
Option A — Automatically propagate the schema change to ClickHouse by altering the destination column from Int16 to Int32, so the value 2147483647 is replicated correctly.
Option B — Pause the CDC mirror and raise a clear schema drift alert such as: "Source column value was changed from SMALLINT to INT. Destination column in ClickHouse is still Int16. Pipeline paused to prevent data corruption. Please update the destination schema manually."
Option C — At minimum, reject the row and write it to a dead-letter queue with a clear error: "Value 2147483647 overflows destination type Int16 (max: 32767). Row skipped."

Actual Behavior
PeerDB does not detect or propagate the MODIFY COLUMN DDL event. The ClickHouse column remains Int16 after the ALTER. When 2147483647 is inserted in MariaDB and replicated to ClickHouse, the value silently overflows Int16 and is stored as 0. No error, warning, or alert is raised anywhere in the PeerDB pipeline, UI, or logs.

Data Type Reference

MariaDB SMALLINT (signed): range -32,768 to 32,767 — mapped to ClickHouse Int16
MariaDB INT (signed): range -2,147,483,648 to 2,147,483,647 — should map to ClickHouse Int32
Inserted value 2147483647 exceeds Int16 max of 32,767 by over 65,000x
ClickHouse overflow behavior on Int16: wraps/truncates silently to 0

Root Cause
PeerDB partially supports DDL propagation from MariaDB binlog events:

ADD COLUMN: Supported
DROP COLUMN: Supported
MODIFY COLUMN / CHANGE TYPE: Not Supported

Because MODIFY COLUMN events are not handled, the destination schema drifts silently from the source after any column type change. ClickHouse's strict typing then silently coerces any out-of-range value to 0, making the corruption invisible to the end user.

Impact

Severity: High
Data Loss: Yes — original values are permanently corrupted in ClickHouse with no recovery path
Scope: Any CDC pipeline where a MariaDB column type is widened (e.g., SMALLINT to INT, INT to BIGINT, TINYINT to SMALLINT)
Detectability: Very low — no errors in PeerDB UI, no pipeline pause, no log warnings, data appears written successfully but with wrong values

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions