Skip to content

Conversation

@ShashidharM0118
Copy link
Contributor

Which issue does this PR close?

Closes #19393

Rationale for this change

The upgrade guide contained incorrect API examples that didn't match the current implementation

What changes are included in this PR?

  • Updated ParquetExecBuilder migration example in the upgrade guide to use correct API:
    • Changed ParquetSource::new(parquet_options) to ParquetSource::new(table_schema) with proper TableSchema creation
    • Changed FileScanConfig::new(url, file_schema, source) to FileScanConfigBuilder::new(url, source) (removed schema parameter)
    • Removed non-existent with_table_partition_cols() method call
    • Updated with_projection() to with_projection_indices() with proper error handling
    • Fixed final execution plan creation to use DataSourceExec::from_data_source()

Are these changes tested?

Are there any user-facing changes?

No

Copilot AI review requested due to automatic review settings December 19, 2025 05:55
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 19, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incorrect API examples in the upgrade guide documentation for migrating from ParquetExecBuilder to the new FileScanConfigBuilder and DataSourceExec pattern. The examples were updated to reflect the current API implementation, correcting method signatures, parameter types, and builder patterns that had evolved since the original documentation was written.

Key Changes

  • Updated ParquetSource instantiation to use TableSchema parameter instead of parquet options
  • Replaced FileScanConfig::new() with FileScanConfigBuilder pattern and removed the schema parameter
  • Changed with_projection() to with_projection_indices() with proper error handling
  • Updated final execution plan creation to use DataSourceExec::from_data_source()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alamb
Copy link
Contributor

alamb commented Dec 20, 2025

This is an upgrade guide for an older version (which had a different API) -- Why would the upgrade match the current version?

@ShashidharM0118
Copy link
Contributor Author

This is an upgrade guide for an older version (which had a different API) -- Why would the upgrade match the current version?

Apologies — I misunderstood the purpose of the upgrade guide itself. 😅.

@mag1c1an1
Copy link
Contributor

This is an upgrade guide for an older version (which had a different API) -- Why would the upgrade match the current version?

When I upgraded from version 50 to 51, this guide left me confused, or perhaps there was something I overlooked.

image image image image image

@alamb
Copy link
Contributor

alamb commented Dec 24, 2025

Hi @mag1c1an1

When I upgraded from version 50 to 51, this guide left me confused, or perhaps there was something I overlooked.

I am not sure where your screen shots come from. This one shows the arguments from main (not the 51 branch)

Screenshot 2025-12-24 at 6 43 22 AM

The actual released 51 docs are here:

https://docs.rs/datafusion/51.0.0/datafusion/datasource/physical_plan/struct.FileScanConfigBuilder.html#method.new

Screenshot 2025-12-24 at 6 44 18 AM

I wonder if you are trying to upgrade to the main branch (what will become 52.0.0) -- it is still labeled (confusingly) as 51.0.0

@mag1c1an1
Copy link
Contributor

mag1c1an1 commented Dec 29, 2025

Hi @alamb . I'm very sorry that I only saw your comment now. I upgraded to version 51 instead of the main branch. What puzzles me is the following description in the upgrading guide:

FileSource constructors now require TableSchema: All built-in file sources now take the schema in their constructor:

- let source = ParquetSource::default();
+ let source = ParquetSource::new(table_schema);

But in api doc , the new function of ParquetSource take TableParquetOptions.

FileScanConfigBuilder no longer takes schema as a parameter: The schema is now passed via the FileSource:

- FileScanConfigBuilder::new(url, schema, source)
+ FileScanConfigBuilder::new(url, source)

But in api doc , the new function of FileScanConfigBuilder still needs a file schema.

In my understanding, the meaning of the above code block is that the API of 51 has changed now. Please make the necessary modifications, such as removing the code starting with '- ' and replacing it with the code starting with '+ '. Am I mistaken in my interpretation? All the code blocks come from the "DataFusion 51" section of the upgrading guide. I hope my poor expression skills can correctly convey my intention.

@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 3, 2026

@mag1c1an1 you're correct that the upgrade guide is incorrect, it seems the PR was originally slated for v51 but was pushed to v52 but I guess we forget to amend the upgrade guide

@ShashidharM0118 Please doublecheck where we're supposed to be fixing the upgrade guide as per what @mag1c1an1 stated

cc @adriangb

@Jefffrey Jefffrey marked this pull request as draft January 3, 2026 02:48
@adriangb
Copy link
Contributor

adriangb commented Jan 3, 2026

Apologies if I put docs in the wrong version and caused confusion. This was a big change and I'm sure I made mistakes. It seems like there's already a lot of discussion underway, and I myself am confused after reading this about what is in what version 😄, I'm happy to help clarify anything if needed otherwise I will allow the work to continue.

@ShashidharM0118
Copy link
Contributor Author

Thanks @mag1c1an1 and @Jefffrey for the clarification! I've updated the docs

Comment on lines +766 to +772
**Key changes:**

1. **FileSource constructors now require TableSchema**: All built-in file sources now take the schema in their constructor:

```diff
- let source = ParquetSource::default();
+ let source = ParquetSource::new(table_schema);
+ let source = ParquetSource::new(TableParquetOptions::default());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not correct @ShashidharM0118; can you please double check my comment above? We should not be modifying the existing text in-place under v51 as it is not supposed to be in v51, it is meant to be in v52. Moreover these edits don't make sense; the existing text is left to state:

FileSource constructors now require TableSchema:

But the example has been changed so this is not the case here so now it is inconsistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ShashidharM0118, thanks for your contribution to #19393!

Regarding the update, could you please remove the sections I mentioned? Since these API changes haven't been introduced in v51 yet, this part remains the same for users upgrading from v50 or earlier. Keeping it might cause confusion, so it's best to revert those specific examples to match the current v51 implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade guide doesn't match api doc

5 participants