Skip to content

Conversation

@Nachiket-Roy
Copy link
Contributor

@Nachiket-Roy Nachiket-Roy commented Jan 4, 2026

Which issue does this PR close?

Rationale for this change

DataFusion recently added TableProvider hooks for row-level DML operations such as DELETE and UPDATE, but TRUNCATE TABLE was still unsupported.

What changes are included in this PR?

This PR adds planning and integration support for TRUNCATE TABLE in DataFusion, completing another part of the DML surface alongside existing DELETE and UPDATE support.

Specifically, it includes:

  • SQL parsing support for TRUNCATE TABLE
  • Logical plan support via a new WriteOp::Truncate DML operation
  • Physical planner routing for TRUNCATE statements
  • A new TableProvider::truncate() hook for storage-native implementations
  • Protobuf / DML node support for serializing and deserializing TRUNCATE operations
  • SQL logic tests validating logical and physical planning behavior

The implementation follows the same structure and conventions as the existing DELETE and UPDATE DML support.
Execution semantics are delegated to individual TableProvider implementations via the new hook.

Are these changes tested?

Yes. The PR includes:
SQL logic tests that verify:

  • Parsing of TRUNCATE TABLE
  • Correct logical plan generation
  • Correct physical planner routing
  • Clear and consistent errors for providers that do not yet support TRUNCATE

These tests mirror the existing testing strategy used for unsupported DELETE and UPDATE operations.

Are there any user-facing changes?

Yes. Users can now execute TRUNCATE TABLE statements in DataFusion for tables whose TableProvider supports the new truncate() hook. Tables that do not support TRUNCATE will return a clear NotImplemented error.

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate proto Related to proto crate labels Jan 4, 2026
@Nachiket-Roy Nachiket-Roy marked this pull request as draft January 4, 2026 06:52
@Nachiket-Roy Nachiket-Roy marked this pull request as ready for review January 4, 2026 08:01
@Nachiket-Roy Nachiket-Roy changed the title added truncate table support Feat : added truncate table support Jan 4, 2026
@Nachiket-Roy
Copy link
Contributor Author

@ethan-tyler, please review this PR. Thank you!

Copy link
Contributor

@ethan-tyler ethan-tyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding TRUNCATE @Nachiket-Roy! This threads through WriteOp, TableProvider, physical planner, and protobuf correctly. Looking nicely done.

Couple things I noticed:

Schema: DmlStatement is constructed with an empty schema instead of using DmlStatement::new(), which sets {count: UInt64}. Breaks the DML contract and causes a proto roundtrip mismatch.

Silent option ignoring: Options like CASCADE, RESTART IDENTITY, PARTITION are dropped silently via ... TRUNCATE TABLE t CASCADE would succeed without actually cascading. Worth rejecting explicitly.

Once those are in, adding TRUNCATE to roundtrip_logical_plan_dml and some negative tests for unsupported options would round it out.

Comment on lines 1376 to 1386
Ok(LogicalPlan::Dml(DmlStatement {
table_name: table.clone(),
target: source,
op: WriteOp::Truncate,
input: Arc::new(LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: DFSchemaRef::new(DFSchema::empty()),
})),
output_schema: DFSchemaRef::new(DFSchema::empty()),
}))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use DmlStatement::new() rather than constructing the struct directly. The constructor sets output_schema: make_count_schema() which gives you the {count: UInt64} that DML ops return.
Also noticed this causes a proto roundtrip mismatch: encode uses the struct directly but decode uses the constructor (at mod.rs:978), so you'd get different plans before/after serialization.

Something like:

Suggested change
Ok(LogicalPlan::Dml(DmlStatement {
table_name: table.clone(),
target: source,
op: WriteOp::Truncate,
input: Arc::new(LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: DFSchemaRef::new(DFSchema::empty()),
})),
output_schema: DFSchemaRef::new(DFSchema::empty()),
}))
}
Ok(LogicalPlan::Dml(DmlStatement::new(
table.clone(),
source,
WriteOp::Truncate,
Arc::new(LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: DFSchemaRef::new(DFSchema::empty()),
})),
)))

Comment on lines 1365 to 1370
Statement::Truncate { table_names, .. } => {
if table_names.len() != 1 {
return not_impl_err!(
"TRUNCATE with multiple tables is not supported"
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .. here silently drops partitions, identity, cascade, on_cluster, and target.only. If someone writes TRUNCATE TABLE t CASCADE expecting cascade behavior, they'd get a normal truncate with no indication the option was ignored.
Worth rejecting these explicitly? Something like:

Suggested change
Statement::Truncate { table_names, .. } => {
if table_names.len() != 1 {
return not_impl_err!(
"TRUNCATE with multiple tables is not supported"
);
}
Statement::Truncate { table_names, partitions, identity, cascade, on_cluster, .. } => {
if table_names.len() != 1 {
return not_impl_err!("TRUNCATE with multiple tables is not supported");
}
let target = &table_names[0];
if target.only {
return not_impl_err!("TRUNCATE with ONLY is not supported");
}
if partitions.is_some() {
return not_impl_err!("TRUNCATE with PARTITION is not supported");
}
if identity.is_some() {
return not_impl_err!("TRUNCATE with RESTART/CONTINUE IDENTITY is not supported");
}
if cascade.is_some() {
return not_impl_err!("TRUNCATE with CASCADE/RESTRICT is not supported");
}
if on_cluster.is_some() {
return not_impl_err!("TRUNCATE with ON CLUSTER is not supported");
}
// ... rest
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add the option rejection, some negative tests would round it out:

statement error TRUNCATE with CASCADE/RESTRICT is not supported
TRUNCATE TABLE t1 CASCADE;

statement error TRUNCATE with multiple tables is not supported
TRUNCATE TABLE t1, t2;

/// Returns an [`ExecutionPlan`] producing a single row with `count` (UInt64),
/// representing the number of rows removed.
async fn truncate(&self, _state: &dyn Session) -> Result<Arc<dyn ExecutionPlan>> {
not_impl_err!("TRUNCATE not supported for {}", self.table_type())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DELETE/UPDATE messages say "... for {} table" but this one drops "table"

@Nachiket-Roy
Copy link
Contributor Author

@ethan-tyler Thanks for the thorough review. I've addressed all your comments, Please take another look.

WriteOp::Truncate,
Arc::new(LogicalPlan::EmptyRelation(EmptyRelation {
produce_one_row: false,
schema: DFSchemaRef::new(DFSchema::empty()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// The relation that determines the tuples to add/remove/modify the schema must match with table_schema
says the schema must match with table_schema. Is this important here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike DELETE or UPDATE, TRUNCATE is a table-level operation with no filters, assignments. The input relation is only a placeholder to satisfy the DML plan shape and is never inspected or executed for row data. Because no rows or columns are read, the usual “input schema must match table schema” invariant does not apply here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this and you're right that TRUNCATE doesn't use the input.

Physical planner shows:

  • DELETE/UPDATE: extract filters/assignments from input
  • TRUNCATE: ignores input entirely, just calls provider.truncate()

So functionally the empty schema is fine. The doc comment is aspirational. Could either keep as-is and update the DmlStatement docs to note TRUNCATE is special, or use the table schema for consistency (extra code, no functional benefit). Former seems fine.

Copy link
Contributor

@ethan-tyler ethan-tyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up @Nachiket-Roy DmlStatement::new() fix and option rejection look good.

Few things on closer look:

  • Test contract: CaptureTruncateProvider returns EmptyExec (zero rows), it checks the hook was called but doesn't validate the "returns one row with count" contract.
  • Proto roundtrip: Adding TRUNCATE to roundtrip_logical_plan_dml would catch regressions on the proto mapping.
  • Negative tests: CASCADE and multi-table are covered, PARTITION, ONLY, and IDENTITY are also rejected but not tested.

Comment on lines +219 to +226
async fn truncate(&self, _state: &dyn Session) -> Result<Arc<dyn ExecutionPlan>> {
*self.truncate_called.lock().unwrap() = true;

Ok(Arc::new(EmptyExec::new(Arc::new(Schema::new(vec![
Field::new("count", DataType::UInt64, false),
])))))
}
}
Copy link
Contributor

@ethan-tyler ethan-tyler Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TableProvider::truncate docs say it returns an ExecutionPlan producing {count: UInt64}. This test returns EmptyExec which produces zero rows, it validates the hook was called, but not the contract.

Could tighten this to return an actual row and assert on it:

Suggested change
async fn truncate(&self, _state: &dyn Session) -> Result<Arc<dyn ExecutionPlan>> {
*self.truncate_called.lock().unwrap() = true;
Ok(Arc::new(EmptyExec::new(Arc::new(Schema::new(vec![
Field::new("count", DataType::UInt64, false),
])))))
}
}
// Add these imports:
use arrow::array::UInt64Array;
use arrow::record_batch::RecordBatch;
use datafusion_physical_plan::test::TestMemoryExec;
// Then update the truncate impl:
async fn truncate(&self, _state: &dyn Session) -> Result<Arc<dyn ExecutionPlan>> {
*self.truncate_called.lock().unwrap() = true;
let schema = Arc::new(Schema::new(vec![
Field::new("count", DataType::UInt64, false),
]));
let batch = RecordBatch::try_new(
Arc::clone(&schema),
vec![Arc::new(UInt64Array::from(vec![0u64]))],
)?;
Ok(Arc::new(TestMemoryExec::try_new(&[vec![batch]], schema, None)?))
}

WriteOp::Delete => protobuf::dml_node::Type::Delete,
WriteOp::Update => protobuf::dml_node::Type::Update,
WriteOp::Ctas => protobuf::dml_node::Type::Ctas,
WriteOp::Truncate => protobuf::dml_node::Type::Truncate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You added the proto enum and conversions and worth adding a roundtrip test to lock it in.

In datafusion/proto/tests/cases/roundtrip_logical_plan.rs, the roundtrip_logical_plan_dml test covers INSERT, DELETE, UPDATE, CTAS.

Adding TRUNCATE there would catch any future regressions:

// In the queries array:
"TRUNCATE TABLE test_table",


# Test TRUNCATE with multiple tables
statement error TRUNCATE with multiple tables is not supported
TRUNCATE TABLE t1, t2; No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have negative tests for CASCADE and multi-table. The planner also rejects PARTITION, ONLY, IDENTITY, and ON CLUSTER.

Might be worth covering those too:

statement error TRUNCATE with PARTITION is not supported
TRUNCATE TABLE t1 PARTITION (p1);

statement error TRUNCATE with ONLY is not supported
TRUNCATE ONLY t1;

statement error TRUNCATE with RESTART/CONTINUE IDENTITY is not supported
TRUNCATE TABLE t1 RESTART IDENTITY;

@Nachiket-Roy
Copy link
Contributor Author

@ethan-tyler Thanks again for the review! I’ve tightened the CaptureTruncateProvider test to return a single {count: UInt64} row, added negative tests covering PARTITION, ONLY, and IDENTITY (in addition to CASCADE and multi-table), and added TRUNCATE to roundtrip_logical_plan_dml to lock in the proto roundtrip behavior.

@Nachiket-Roy
Copy link
Contributor Author

Nachiket-Roy commented Jan 6, 2026

@ethan-tyler I tried tightening the test using TestMemoryExec to materialize a {count: UInt64} row as suggested.
However this causes the test to fail:

TestMemoryExec::with_new_children not implemented

This happens because physical optimizer rules run and require ExecutionPlan::with_new_children, which TestMemoryExec does not implement. Could you please look into this?

@ethan-tyler
Copy link
Contributor

@ethan-tyler I tried tightening the test using TestMemoryExec to materialize a {count: UInt64} row as suggested. However this causes the test to fail:

TestMemoryExec::with_new_children not implemented

This happens because physical optimizer rules run and require ExecutionPlan::with_new_children, which TestMemoryExec does not implement. Could you please look into this?

@Nachiket-Roy My bad - should have validated that suggestion first. TestMemoryExec doesn't implement with_new_children, so it panics during filter pushdown.

Better to stick with EmptyExec:

async fn truncate(&self, _state: &dyn Session) -> Result<Arc<dyn ExecutionPlan>> {
    *self.truncate_called.lock().unwrap() = true;

    Ok(Arc::new(EmptyExec::new(Arc::new(Schema::new(vec![
        Field::new("count", DataType::UInt64, false),
    ])))))
}

This matches what CaptureDeleteProvider and CaptureUpdateProvider already do. These tests just verify the hook gets called and validating provider output is the implementor's responsibility.

The "proper" way to return an actual count is DmlResultExec, but it's private to the memory table module. Making it public would be a separate discussion. Sorry for the detour, the original implementation was fine.

Copy link
Contributor

@ethan-tyler ethan-tyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - approving, thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

catalog Related to the catalog crate core Core DataFusion crate logical-expr Logical plan and expressions proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants