Feat: Add StarRocks engine support#5658
Conversation
### What - **Add StarRocks engine support to SQLMesh** via StarRocks’ MySQL-compatible protocol. - Ship **engine adapter + docs + real integration tests** to ensure generated SQL works on StarRocks. ### Why - **User demand / adoption**: StarRocks is a common OLAP choice; SQLMesh users want to run the same model lifecycle (build, incremental maintenance, views/MVs) on StarRocks without bespoke SQL. - **Engine-specific semantics**: StarRocks differs from vanilla MySQL in DDL/DML constraints (e.g., key types, delete behavior, rename caveats). An adapter is needed to produce correct and predictable SQL. - **Confidence & maintainability**: Documenting config patterns + codifying behavior with integration tests prevents regressions and makes support “real” (not just “it parses”). ### Scope (what’s supported) - **Connectivity**: Connect through MySQL protocol (e.g., `pymysql`). - **Table creation / DDL**: - Key table types via `physical_properties`: **DUPLICATE KEY (default)**, **PRIMARY KEY (recommended for incremental)**, **UNIQUE KEY** - **Partitioning**: simple `partitioned_by` and advanced `partition_by` (complex expression partitioning) + optional initial `partitions` - **Distribution**: `distributed_by` structured form or string fallback (HASH / RANDOM; buckets required) - **Ordering**: `order_by` / `clustered_by` - **Generic PROPERTIES passthrough** (string key/value) - **Views**: - Regular views - **Materialized views** via `kind VIEW(materialized true)` with StarRocks-specific notes/constraints - **DML / maintenance**: - Insert/select/update basics - Delete behavior handled with StarRocks compatibility constraints (PRIMARY KEY tables recommended for robust deletes) ### Changes - **Engine adapter**: `sqlmesh/core/engine_adapter/starrocks.py` - **Docs**: `docs/integrations/engines/starrocks.md` - **Integration tests**: `tests/core/engine_adapter/integration/test_integration_starrocks.py`, and `tests/core/engine_adapter/test_starrocks.py` ### Verification - **Integration tests require a running StarRocks** instance. - Ran: - set `STARROCKS_HOST/PORT/USER/PASSWORD` - `pytest -m "starrocks and docker" tests/core/engine_adapter/integration/test_integration_starrocks.py` ### Known limitations / caveats - **No sync MV support (currently)** - **No tuple IN**: `(c1, c2) IN ((v1, v2), ...)` - **No `SELECT ... FOR UPDATE`** - **RENAME caveat**: rename target can’t be qualified with a database name ### Notes on compatibility - **Changes are StarRocks-scoped** (adapter/docs/tests) and should not impact other engines. Signed-off-by: jaogoy <jaogoy@gmail.com>
|
@erindru Hi Erin, would you like to take a review of this PR. This PR is similar with #5033, but to support StarRocks in SQLMesh. I'll be very glad to see your comments. I'm trying to fix the CI problem and some test cases. |
And optimize some test cases. Signed-off-by: jaogoy <jaogoy@gmail.com>
Signed-off-by: jaogoy <jaogoy@gmail.com>
Signed-off-by: jaogoy <jaogoy@gmail.com>
Signed-off-by: jaogoy <jaogoy@gmail.com>
Signed-off-by: jaogoy <jaogoy@gmail.com>
Signed-off-by: jaogoy <jaogoy@gmail.com>
Signed-off-by: jaogoy <jaogoy@gmail.com>
Signed-off-by: jaogoy <jaogoy@gmail.com>
Signed-off-by: jaogoy <jaogoy@gmail.com>
@erindru Hi, Erin, tobymao/sqlglot#6827 in SQLGlot is merged. |
Signed-off-by: jaogoy <jaogoy@gmail.com>
Signed-off-by: jaogoy <jaogoy@gmail.com>
Signed-off-by: jaogoy <jaogoy@gmail.com>
|
@erindru would be awesome if we could havr your final look on this! |
|
@jaogoy Can you take a look at the conflicts that might need to be resolved? Also it looks like |
|
@jaogoy I have a question about async materialized views though and this part of the docs in the PR
I actually don't want this to happen, and currently SQLMesh does indeed drop and recreate the MV on every EDIT: EDIT 2: |
|
I found another issue with async materialized views (haven't checked other model types) related to audits. It could potentially be achieved with some pre/post statement macros (once this PR points to the newer sqlglot version, because right now it fails), though it would be a bit inconvenient. |
|
OK, I'll take some time later to pass the test cases. For MV to emit |
Are you sure? I'm pretty sure the
|
|
There's one more thing that's currently slightly inconvenient and I believe should be handled directly in the engine. but IMO those should be resolved in the engine. Here's the code of the macro: @macro()
def resolve_physical(evaluator, *models: exp.Expression) -> exp.Expression:
"""Emit a single-quoted, comma-separated list of physical table names."""
names: t.List[str] = []
for model in models:
snapshot = evaluator.get_snapshot(model)
if snapshot is not None:
# table_name() is the fully-qualified, quoted physical name, e.g.
# "catalog"."sqlmesh__starrocks"."starrocks__test_1_model__1455206902"
table = exp.to_table(snapshot.table_name())
# StarRocks wants db.table only: drop the catalog and the quotes.
names.append(f"{table.db}.{table.name}" if table.db else table.name)
else:
# Not a SQLMesh-managed model (e.g. a raw source) -> keep as written.
# identify=False so we don't emit quotes into the property string.
names.append(model.sql(dialect=evaluator.dialect))
# Returning a string literal makes the property render as '...': a quoted
# value, exactly like a hand-written excluded_trigger_tables string.
return exp.Literal.string(",".join(names)) |
|
One last question to you @jaogoy On the other hand, I'm eager to get it merged soon, ideally within a week, so if you lack capacity for working on it, I might contribute to your PR. |
What
Why
Scope (what’s supported)
pymysql).physical_properties: DUPLICATE KEY (default), PRIMARY KEY (recommended for incremental), UNIQUE KEYpartitioned_byand advancedpartition_by(complex expression partitioning) + optional initialpartitionsdistributed_bystructured form or string fallback (HASH / RANDOM; buckets required)order_by/clustered_bykind VIEW(materialized true)with StarRocks-specific notes/constraintsChanges
sqlmesh/core/engine_adapter/starrocks.pydocs/integrations/engines/starrocks.mdtests/core/engine_adapter/integration/test_integration_starrocks.py, andtests/core/engine_adapter/test_starrocks.pyVerification
STARROCKS_HOST/PORT/USER/PASSWORDpytest -m "starrocks and docker" tests/core/engine_adapter/integration/test_integration_starrocks.pyKnown limitations / caveats
(c1, c2) IN ((v1, v2), ...)SELECT ... FOR UPDATEAcknowledgement
This implementation was largely inspired by #5033 — thanks to @xinge-ji for the solid groundwork.