Releases: jorgeMFS/PhenoQC
Releases · jorgeMFS/PhenoQC
v1.2.1
v1.0.0
PhenoQC v1.0.0 Release
Overview
PhenoQC is a comprehensive toolkit for quality control of phenotypic datasets, providing robust validation, standardization, and harmonization capabilities. This release marks our first stable version, offering a complete suite of tools for researchers working with phenotypic data.
Key Features
Data Validation & Schema Compliance
- JSON schema-based validation for ensuring data structure consistency
- Format validation for various input types (CSV, TSV, JSON)
- Automated checks for data type consistency and required fields
Ontology Integration & Mapping
- Support for multiple ontologies (HPO, DO, MPO)
- Fuzzy matching for term mapping with configurable thresholds
- Custom mapping capabilities for specialized terminology
Missing Data Management
- Multiple imputation strategies:
- Basic: mean, median, mode
- Advanced: KNN, MICE, SVD
- Configurable handling of missing values
- Quality metrics for imputation assessment
Batch Processing Capabilities
- Parallel processing of multiple files
- Support for various input formats
- Progress tracking and error handling
User Interface Options
- Command-line interface (CLI) for automation and scripting
- Streamlit-based GUI for interactive data exploration
- Comprehensive logging and error reporting
Reporting & Visualization
- Detailed QC reports in multiple formats
- Visual summaries of data quality metrics
- Export capabilities for downstream analysis
Installation
pip install phenoqcCitation
If you use PhenoQC in your research, please cite:
@software{silva_phenoqc_2024,
author = {Silva, Jorge Miguel Ferreira},
title = {PhenoQC: Quality Control for Phenotypic Data},
year = 2024,
publisher = {Zenodo},
}License
This project is licensed under the MIT License - see the LICENSE file for details.
PhenoQC 0.1.0
Changelog
All notable changes to PhenoQC will be documented in this file.
[0.1.0] - 2024-10-18
Added
- Initial release of PhenoQC
- Data validation functionality:
- Schema validation using JSON schema
- Format compliance checks
- Integrity verification
- Duplicate record detection
- Conflicting record identification
- Ontology mapping feature:
- Support for multiple ontologies (HPO, DO, MPO)
- Custom mapping support
- Synonym resolution
- Missing data detection and imputation:
- Multiple imputation strategies (mean, median, mode, KNN, MICE, SVD)
- Option to flag records with missing data
- Batch processing capability:
- Support for multiple file processing
- Parallel execution
- Command-line interface (CLI)
- Streamlit-based graphical user interface (GUI)
- Reporting and visualization:
- PDF and Markdown report generation
- Visual summaries using Plotly
- Support for CSV, TSV, and JSON input files
- Recursive directory scanning option
- Comprehensive logging system
- YAML-based configuration for ontology mappings and imputation strategies
Changed
- Improved error handling and logging
- Updated documentation and examples
Fixed
- Resolved critical bugs related to data validation and ontology mapping