CLI to build CRyPTIC data tables from the outputs of GPAS
Philip Fowler, 11 March 2026
Notes
build-tablesshould not modify the data fields and be able to cope with either JSON or CSV files downloaded from GPAS. The glob should be recursive to deal with a sharded file systemcorrect-tables. This is a separate command that takes the output ofbuild-tablesand joinsmutationstovariantsto pick up some useful genetics metrics.shard-filestakes a delimiter (for CRyPTIC this is.) and moves the output files into a sharded file system (not yet implemented)
Issues
- Why does
variants.parquetfail to run i.e. isKilled? Is it a column type? - Why does
genomeshave more rows than it should? -> because of a carriage return inpipeline_build. - There are multiple ENA run accessions for some UNIQUEIDs; how do I know which was used?
- Why does mykrobe report a lineage but then record zero median depth for some samples?
- Why are some samples missing a
main_report? Example issite.10.subj.YA00040368.lab.YA00040368.iso.1/98bc5c23-d219-43bb-9aab-e8df1c6a0f7ewhich I can download via mapping but isn't in the folder, suggesting it failed. Curiously it is the last file in the mapping csv.