-
Notifications
You must be signed in to change notification settings - Fork 72
Open
Description
$ curl -s -L -O "https://data.gharchive.org/2023-02-08-0.json.gz" &&
super -version &&
super -f parquet -o gha-super.parquet -c "fuse" 2023-02-08-0.json.gz
Version: v0.3.0-2-g02bfe41d9
parquetio: not a record: fusion({id:"26939254345",type:"DeleteEvent",...
Details
Repro is with super commit 02bfe41.
We have a GitHub Archive benchmark we run periodically, and one of its data prep steps up until now is to fuse the JSON into a flat table and dump that to Parquet. When I went to run it for the first time since the merge of #6713, I ran into the problem above. @mccanne reminded me that the current way to achieve the same is to use the newer blend operator, and indeed that got me around the problem just fine. However, he also noted that it would probably benefit the user to also have the Parquet writer automatically handle these values of fusion type.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels