Skip to content

Parquet writer rejects values of fusion type #6765

@philrz

Description

@philrz
$ curl -s -L -O "https://data.gharchive.org/2023-02-08-0.json.gz" &&
  super -version &&
  super -f parquet -o gha-super.parquet -c "fuse" 2023-02-08-0.json.gz

Version: v0.3.0-2-g02bfe41d9
parquetio: not a record: fusion({id:"26939254345",type:"DeleteEvent",...

Details

Repro is with super commit 02bfe41.

We have a GitHub Archive benchmark we run periodically, and one of its data prep steps up until now is to fuse the JSON into a flat table and dump that to Parquet. When I went to run it for the first time since the merge of #6713, I ran into the problem above. @mccanne reminded me that the current way to achieve the same is to use the newer blend operator, and indeed that got me around the problem just fine. However, he also noted that it would probably benefit the user to also have the Parquet writer automatically handle these values of fusion type.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions