Skip to content

Avro decoder can't handle a reader schema with no fields #9608

@mzabaluev

Description

@mzabaluev

Describe the bug
An application that needs to count records in an Avro file without decoding any fields may pass a reader schema to that effect.
In the current implementation, RecordDecoder creates a RecordBatch from decoded column arrays without the row_count option, which results in an error when there are no columns to decide the number of rows from.

To Reproduce
Create an arrow-avro reader with a reader schema matching the top-level record of the Avro content (e.g. an OCF file) schema, but listing no fields, e.g.

{
    "type": "record",
    "name": "topLevelRecord",
    "fields": []
}

Use the appropriate read API to read batches from the file.
The error is reported: "Invalid argument error: must either specify a row count or at least one column"

Expected behavior
The reader retrieves batches with no columns, but numbers of rows as decided by the batch size option and other flags affecting batch composition (i.e. the row counts should be the same as if the full writer schema was read).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions