Skip to content

Conversation

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Dec 28, 2025

This PR adds constructors for DataFusion/SedonaDB (logical) Expr and a framework for constructing the using an R expression. I also added a few function translations to jog my memory on the corner cases with respect to constructing them. This is to support #474 (i.e., a stepping stone to help implement dplyr verbs).

library(sedonadb)

schema <- nanoarrow::na_struct(list(x = nanoarrow::na_int32()))
ctx <- sedonadb:::sd_expr_ctx(schema)
sedonadb:::sd_eval_expr(quote(sum(x + 5)), expr_ctx = ctx)
#> <SedonaDBExpr>
#> sum(x + Float64(5)) RESPECT NULLS

This implementation comes from the work I did in arrow ( https://github.com/apache/arrow/blob/main/r/R/dplyr-funcs.R ) and substrait-r ( https://github.com/voltrondata/substrait-r/blob/e3045d21b1765c15d2b774409f4ca2b1ebb01796/R/compiler.R#L448-L466 ) and the suggestions in #474 with respect to how this is organized in duckdb-r/duckplyr.

In arrow and substrait-r we used a slightly more straightforward approach than this PR, which was basically to load an environment with "override" functions and field references. When those packages evaluate an expression, we arrange the environments such that column reference expressions and our "translated" functions (that accept expressions and return expressions) are resolved first. This mostly works but:

  • It's tricky to determine the output type of an expression. For example, /, which in R is never floor division, we need to check if both arguments are integers (and if so, cast one of them to a float).
  • It's tricky to scope translations. For example, we mostly want things scoped to a SessionContext and not globally (although currently we only have a global session context in R).
  • Aggregations and window functions need some extra context that isn't available when translations are global and stateless.

For this reason, I introduced a sd_expr_ctx() (expression context) here that is faithfully passed down when evaluating translated expressions. This gives each translation access to things like the input schema (so that the output type of the expression can be computed), session options when we have them, and information like grouping expressions for aggregate functions.

This means that evaluation is a bit more "manual", but I think the extra control is worth it (although we can revisit as we implement more translations and I remember why various things are hard). We could accomplish some of this with a global variable (like the global substrait_compiler() in substrait-r) and less complicated evaluation as well.

There is a lot to do after this PR, like add support for more functions, more types of literals, and wire up actual dplyr verbs like filter().

@paleolimbot paleolimbot requested a review from Copilot January 2, 2026 06:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements expression translation functionality for the SedonaDB R package, enabling R expressions to be converted into DataFusion logical expressions. This is a foundational feature for supporting R-style query syntax in SedonaDB.

Key changes:

  • Added expression infrastructure including SedonaDBExpr and SedonaDBExprFactory types
  • Implemented R expression evaluation system with translation support for common operations
  • Added literal conversion system with support for basic R types and geometry types

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
r/sedonadb/src/rust/src/expression.rs Implements core Rust expression types and factory methods for creating DataFusion expressions
r/sedonadb/src/rust/src/ffi.rs Adds FFI helper functions for importing Arrow arrays and fields from R
r/sedonadb/R/expression.R Implements R-side expression factory, translation system, and operator mappings
r/sedonadb/R/literal.R Provides S3 generic system for converting R objects to SedonaDB literal expressions
r/sedonadb/tests/testthat/test-expression.R Test coverage for expression translation functionality
r/sedonadb/tests/testthat/test-literal.R Test coverage for literal conversion
r/sedonadb/tools/savvy-update.sh Build utility script for updating savvy-generated code
r/sedonadb/src/rust/api.h Generated C API declarations for new expression types
r/sedonadb/src/init.c Generated C initialization code for new expression functions
r/sedonadb/R/000-wrappers.R Generated R wrapper functions for Rust expression types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@paleolimbot paleolimbot requested a review from Copilot January 2, 2026 07:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +41 to +45
# Add license header, put includes on their own lines, and fix a typo in init.c
echo "${LICENSE_C}" > "${init_c}"
sed 's/#include/\n#include/g' "${init_c}.tmp" | \
sed '1s/^\n//' | \
sed 's/initialzation/initialization/g' >> "${init_c}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yutannihilation I will open an issue and/or try to fix this in savvy when I have a moment, but while I'm thinking about it:

  • Includes have to go on their own lines, or else clang-format will reorder them in a way such that they won't compile
  • There's a typo in one of the comments that causes our pre-commit to fail ("initialzation")

...the other stuff (license files, clang-format) is specific to our setup and I'm not sure that's savvy's issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding license files, I found bindgen has option to insert arbitrary strings at the top of the generated file. Will it be useful if I implement this in savvy-cli?

bindgen input.h -o bindings.rs \
    --raw-line "/*" \
    --raw-line " * Copyright (c) 2026 Your Name" \
    --raw-line " * Licensed under Apache-2.0" \
    --raw-line " */"

Comment on lines +44 to +46
sd_expr_column <- function(column_name, qualifier = NULL, factory = sd_expr_factory()) {
factory$column(column_name, qualifier)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e-kotov I tried to mirror the DuckDB expression constructors you linked! The names are slightly different because I used DataFusion naming conventions.

Comment on lines +183 to +186
// This seems to require $.ptr from the list() input (can't just
// use list of R SedonaDBExpr objects)
let expr_wrapper: &SedonaDBExpr = item.try_into()?;
Ok(expr_wrapper.inner.clone())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yutannihilation Is there a trick here that I should be using to get a list of a #[savvy] struct XXX back into Rust land?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to remember if there's anything, but I think there's no trick...

Regarding $.ptr, it's posible to return a bare EXTPTRSXP without a wrapper, but it requres you to add several wrapper R functions (e.g. add S3 class, and print it nicely). So, I'm not sure which is easier.

cf. https://yutannihilation.github.io/savvy/guide/struct.html#external-pointer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants