-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix: fix spill oom for sort #19041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix: fix spill oom for sort #19041
Conversation
| // track the memory in the newly created Rows. | ||
| let mut rows_reservation = self.reservation.new_empty(); | ||
| rows_reservation.try_grow(rows.size())?; | ||
| rows_reservation.grow(rows.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ?
new_empty() returns a MemoryReservation with size=0 but the registration is shared (Arc::clone()), so the pool may not have enough space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, however, if an error is thrown here when memory is rapidly occupied elsewhere, it will cause the entire system to fail. Generally speaking, physical memory is larger than the memory pool. If no error is thrown here, the components that support spill-to-disk will detect the memory pool overflow and thus spill data to disk, preventing unlimited memory growth. Therefore, this logic seems acceptable.
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to test this behavior?
I'm not sure if there's any convenient testing method available in the DataFusion library. My approach is to create 20,000 left-table files (totaling 200GB with 200 million rows) and 20,000 right-table files (totaling 20GB with 200,000 rows), then execute a query like SELECT * FROM t1 JOIN t2 ON [join_condition]. With sort-merge join enabled, the process will throw an error after running for a while. |
|
Maybe you could set the memory limit to some lower threshold to provide the issue with a smaller dataset? |
Got it, got it. Thanks! By the way, this fix has already been tested with large datasets on my end and it works. |
I think we're just waiting on a reproducible test case per this comment for this PR; feel free to undraft this when it's ready |
Which issue does this PR close?
Rationale for this change
When multiple operators attempt to acquire memory from the same memory pool, memory allocation failures may occur due to mutual resource contention.
What changes are included in this PR?
Make multiple attempts to call try_grow within the sort operator.
Since RowCursorStream cannot perform spill-to-disk operations, the memory pool will directly trigger a grow operation instead of throwing an error when it is full; other operators that support spill-to-disk will perform spill operations afterward (assuming there is a portion of redundant memory available).
Are these changes tested?
yes
Are there any user-facing changes?
no