-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Is there an existing issue for this?
- I have searched the existing issues
Description of the bug
On Windows, the library uses PathBuf for joining paths, which results in paths with backslashes (\). This causes problems when interacting with cloud storage APIs or POSIX-style APIs that expect forward slashes (/) as the path separator. As a result, file operations may fail, or the generated paths may not be accepted by external systems. There is no documented workaround or environment variable to force forward slash normalization for cross-platform compatibility.
Steps To Reproduce
-
Use python hudi to access some hudi table in S3 on Windows.
-
Run example code:
from hudi import HudiTableBuilder
import pyarrow as pa
hudi_table = HudiTableBuilder.from_base_uri("s3://path/to_your_hudi/").build()
batches = hudi_table.read_snapshot(filters=[("city", "=", "san_francisco")])
# convert to PyArrow table
arrow_table = pa.Table.from_batches(batches)
result = arrow_table.select(["rider", "city", "ts", "fare"])
print(result)- Observe s3 error: path to .hoodie snapshot will be concatenated with backslash
Expected behavior
Paths should always use forward slashes (/) when interacting with external APIs, regardless of the underlying OS. There should be a documented way (e.g., environment variable or API option) to normalize paths for cross-platform compatibility.
Screenshots / Logs
Storage error: Object at location
my_table/.hoodie\20250909105111479.deltacommit not found: Client error with status 404 Not Found:
NoSuchKeyThe specified key does not
exist.my_table/.hoodie\20250909105111479.deltacommitXXXXYYYY
Software information
- Operating system: Windows
- Project version: 0.4.0
Additional context
This bug can cause major interoperability problems for users working on Windows, especially with cloud storage (S3, GCS, etc.). Please consider adding a normalization step or a config option/environment variable to always use POSIX-style paths when interacting with external systems.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status