Skip to content

Releases: ieeta-pt/VCFX

v1.1.4 - GATK-compatible Validator

18 Dec 19:26

Choose a tag to compare

New Features

VCFX_validator now includes comprehensive GATK-compatible validation checks:

  • ALT allele observation (ALLELES validation) - warns when ALT alleles are not observed in any sample genotype
  • Empty VCF detection - header-only files now fail by default (use --allow-empty to override)
  • Header Type/Number validation - validates INFO/FORMAT field definitions
  • Variant sorting check - ensures records are sorted by CHROM and POS (disable with -S)
  • AN/AC consistency - validates allele count consistency in strict mode (CHR_COUNTS)
  • REF validation - check REF alleles against FASTA reference (-R)
  • dbSNP ID validation - validate variant IDs against dbSNP (-D)
  • GVCF validation - GVCF-specific format checks (-g)
  • New -i/--input flag - explicit mmap file input option

Performance

  • Maintained ~110 MB/s throughput with all new validations enabled
  • 1.26x faster than bcftools for validation on benchmark dataset

Full Changelog

See CHANGELOG.md for details.

v1.1.3

11 Dec 15:49

Choose a tag to compare

What's New

Performance Optimizations

  • ancestry_inferrer: Major optimization with mmap, bloom filter for fast variant lookup, SIMD-accelerated processing, and multi-threading support
  • Added benchmark results to README showing performance across 65 tools

Bug Fixes

  • Fixed missing unordered_map include for Linux builds

Documentation

  • Added comprehensive benchmark results to README
  • Performance comparison with bcftools and vcftools

Benchmark Results (4GB VCF, chr21 from 1000 Genomes)

  • Basic I/O tools: avg 2.2s
  • Filtering tools: avg 7.6s
  • Analysis tools: avg 40.4s
  • VCFX allele_freq_calc: 29s vs vcftools: 86s (3x faster)
  • VCFX missing_detector: 9s vs vcftools: 83s (9x faster)