Characterizing the performance of an antibiotic resistance prediction tool, gnomonicus, using a diverse test set of 2,663 Mycobacterium tuberculosis samples

This repository contains the data and code necessary to reproduce the tables and figures in the below article.

Westhead J, Baker CS, Brouard M, Colpus M, Constantinides B, Hall A, Knaggs J, Lopes Alves M, Spies R, Thai H, Surrell S, Govender K, Peto TEA, Crook DW, Omar SV, Turner R, Fowler PW

Characterizing the performance of an antibiotic resistance prediction tool, gnomonicus, using a diverse test set of 2,663 Mycobacterium tuberculosis samples

Microbial Genomics. 11:001592 doi:10.1099/mgen.0.001592

Reproduce AMR predictions from the VCF files using `gnomonicus`

First install gnomonicus; the easiest way to do this is using pip

pip install gnomonicus

This should automatically place gnomonicus in your $PATH. Now you need to get the WHOv2 catalogue and H37Rv version 3 Genbank file via

git clone git@github.com:fowler-lab/validate-myco-amr.git
cd ..
git clone git@github.com:oxfordmmm/tuberculosis_amr_catalogues.git
cd validate-myco-amr/

Lastly you'll need to have installed GNU Parallel for the below to work (this nicely uses all the cores on your machine to speed up the processing). On a Mac this is easiest via MacPorts or Brew. The below should take 1-2 hours on a Mac laptop with an M-series CPU.

cd dat/outputs/ukmyc/
find dat/ -name '*vcf' | parallel --bar gnomonicus --vcf_file {} --catalogue_file ../../../../tuberculosis_amr_catalogues/catalogues/NC_000962.3/NC_000962.3_WHO-UCN-TB-2023.5_v2.1_GARC1_RFUS.csv --json --genome_object ../../../../tuberculosis_amr_catalogues/catalogues/NC_000962.3/NC_000962.3.gbk --min_dp 3
cd ../mgit/
find dat/ -name '*vcf' | parallel --bar gnomonicus --vcf_file {} --catalogue_file ../../../../tuberculosis_amr_catalogues/catalogues/NC_000962.3/NC_000962.3_WHO-UCN-TB-2023.5_v2.1_GARC1_RFUS.csv --json --genome_object ../../../../tuberculosis_amr_catalogues/catalogues/NC_000962.3/NC_000962.3.gbk --min_dp 3

The above steps will create JSON files in the same directories as the VCF files. These JSON files contain the AMR predictions that will be used in the downstream analysis.

Parse the `gnomonicus` output JSON files and create the results tables

For simplicity there is a Python script that will detect the output JSON files and recreate dat/RAW_EFFECTS.csv and dat/RAW_PREDICTIONS.csv. To recreate these tables issue

python bin/parse_gnomonicus.py

Reproduce AMR predictions from TB-Profiler

The output JSON files from TB-Profiler are included in the repository in dat/tbprofiler. If you wish to reproduce these you will need to download the FASTQ files for all 2,663 samples using these scripts

bash bin/UKMYC_1000_samples_download.sh
bash bin/MGIT_1663_samples_download.sh

Then you can either use the web portal at https://tbdr.lshtm.ac.uk/ or install TB-Profiler locally yourself, or contact the TB-Profiler team for assistance. Whichever way you choose, you'll need to download and keep all the output JSON files (the ones used in the analysis are stored in dat/outputs/tbprofiler/).

Parse the TB-Profiler output JSON files and recreate the results tables

The script below will parse the TB-Profiler output JSON files and recreate dat/tbprofiler_EFFECTS.csv and dat/tbprofiler_PREDICTIONS.csv.

python bin/parse_tbprofiler.py

Create the results tables used to drive the analysis

The first two jupyter notebooks (01-create-phenotypes-table.ipynb and 02-create-results-tables.ipynb) are designed to step you through (i) creating the one row per sample per pDST phenotype table (dat/PHENOTYPES.csv, includes Table 1) and (ii) reading in the raw EFFECTS and PREDICTIONS tables from both gnomonicus and TB-Profiler and creating e.g. the RESULTS table.

The last two notebooks are designed to then step you through re-creating all the figures and tables in the manuscript. The first notebook (03-main-analysis.ipynb) creates the main results figures and tables, and the second notebook (04-discrepancy-analysis.ipynb) creates the discrepancy analysis tables.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
bin		bin
dat		dat
pdf		pdf
src		src
.gitignore		.gitignore
01-create-phenotypes-table.ipynb		01-create-phenotypes-table.ipynb
02-create-results-tables.ipynb		02-create-results-tables.ipynb
03-main-analysis.ipynb		03-main-analysis.ipynb
04-discrepancy-analysis.ipynb		04-discrepancy-analysis.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Characterizing the performance of an antibiotic resistance prediction tool, gnomonicus, using a diverse test set of 2,663 Mycobacterium tuberculosis samples

Reproduce AMR predictions from the VCF files using `gnomonicus`

Parse the `gnomonicus` output JSON files and create the results tables

Reproduce AMR predictions from TB-Profiler

Parse the TB-Profiler output JSON files and recreate the results tables

Create the results tables used to drive the analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Characterizing the performance of an antibiotic resistance prediction tool, gnomonicus, using a diverse test set of 2,663 Mycobacterium tuberculosis samples

Reproduce AMR predictions from the VCF files using gnomonicus

Parse the gnomonicus output JSON files and create the results tables

Reproduce AMR predictions from TB-Profiler

Parse the TB-Profiler output JSON files and recreate the results tables

Create the results tables used to drive the analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Reproduce AMR predictions from the VCF files using `gnomonicus`

Parse the `gnomonicus` output JSON files and create the results tables

Packages