Releases: ieeta-pt/VCFX
Releases · ieeta-pt/VCFX
v1.1.4 - GATK-compatible Validator
New Features
VCFX_validator now includes comprehensive GATK-compatible validation checks:
- ALT allele observation (ALLELES validation) - warns when ALT alleles are not observed in any sample genotype
- Empty VCF detection - header-only files now fail by default (use
--allow-emptyto override) - Header Type/Number validation - validates INFO/FORMAT field definitions
- Variant sorting check - ensures records are sorted by CHROM and POS (disable with
-S) - AN/AC consistency - validates allele count consistency in strict mode (CHR_COUNTS)
- REF validation - check REF alleles against FASTA reference (
-R) - dbSNP ID validation - validate variant IDs against dbSNP (
-D) - GVCF validation - GVCF-specific format checks (
-g) - New
-i/--inputflag - explicit mmap file input option
Performance
- Maintained ~110 MB/s throughput with all new validations enabled
- 1.26x faster than bcftools for validation on benchmark dataset
Full Changelog
See CHANGELOG.md for details.
v1.1.3
What's New
Performance Optimizations
- ancestry_inferrer: Major optimization with mmap, bloom filter for fast variant lookup, SIMD-accelerated processing, and multi-threading support
- Added benchmark results to README showing performance across 65 tools
Bug Fixes
- Fixed missing unordered_map include for Linux builds
Documentation
- Added comprehensive benchmark results to README
- Performance comparison with bcftools and vcftools
Benchmark Results (4GB VCF, chr21 from 1000 Genomes)
- Basic I/O tools: avg 2.2s
- Filtering tools: avg 7.6s
- Analysis tools: avg 40.4s
- VCFX allele_freq_calc: 29s vs vcftools: 86s (3x faster)
- VCFX missing_detector: 9s vs vcftools: 83s (9x faster)