-
Notifications
You must be signed in to change notification settings - Fork 1.9k
doc: update udwf docs & add LimitEffect doc #19642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| /// # use datafusion_macros::user_doc; | ||
| /// # use std::sync::Arc; | ||
| /// | ||
| /// #[user_doc( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires pulling in datafusion-macros as a dev dependency, but to me it is more representative of how you would implement a UDWF
| } | ||
|
|
||
| /// the effect this function will have on the limit pushdown | ||
| /// The effect this function will have on limit pushdowns through a window bound. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avantgardnerio would you be able to double check the documentation here since you worked on the original implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal was to move away from boolean APIs for specific optimizer rules, i.e. support_filter_pushdown() -> bool because that means each UDAF needs to know about each optimizer rule.
Instead the idea was to move towards natural mathematical properties such as cardinality_effect() -> CustomEnum. This does not specifically relate to an optimizer rule, it merely exposes information about what the LogicalPlanNode / UDAF / etc does.
By separating these concerns, we no longer need to update everything each time we invent a new optimizer rule (in fact, the window_push_past_limit rule was able to use the pre-existing cardinality_effect() from a previous optimizer rule).
This is a long-winded way to say: I think the update to this documentation is incorrect in that it is falsely overly specific.
I'm sure the world would go on if we merged as it, but I think it helps understanding to leave it correctly vague.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does that mean we're planning to alter/remove this limit_effect() API, even though it was recently introduced?
I agree with this being too overly specific, but considering we introduced a new API to WindowUDFImpl just for this optimizer rule I felt it would help users understand why this API exists and how it affects things in DataFusion. As the documentation currently is on main, I find it hard to understand its purpose so I assume it may be the same for other users. And the best way I found to understand this was via an example usage (and so far the only usage).
Do you have a suggestion on how we could generalize this documentation in a way that still helps users understand what it is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we are over thinking it:
| /// The effect this function will have on limit pushdowns through a window bound. | |
| /// the effect this function will have on the limit pushdown (e.g. through a window bound) |
Which issue does this PR close?
Rationale for this change
Was reading through some window related code and decided to fix/update some documentation.
What changes are included in this PR?
Minor fixes to formatting, wording, and links. Also add a doc section on
LimitEffect(introduced by #18029)Are these changes tested?
Doc changes.
Are there any user-facing changes?
Doc changes.