Skip to content

Releases: jorgeMFS/PhenoQC

v1.2.1

14 Aug 15:06

Choose a tag to compare

v1.0.0

21 Jan 19:51
33cfe8f

Choose a tag to compare

PhenoQC v1.0.0 Release

Overview

PhenoQC is a comprehensive toolkit for quality control of phenotypic datasets, providing robust validation, standardization, and harmonization capabilities. This release marks our first stable version, offering a complete suite of tools for researchers working with phenotypic data.

Key Features

Data Validation & Schema Compliance

  • JSON schema-based validation for ensuring data structure consistency
  • Format validation for various input types (CSV, TSV, JSON)
  • Automated checks for data type consistency and required fields

Ontology Integration & Mapping

  • Support for multiple ontologies (HPO, DO, MPO)
  • Fuzzy matching for term mapping with configurable thresholds
  • Custom mapping capabilities for specialized terminology

Missing Data Management

  • Multiple imputation strategies:
    • Basic: mean, median, mode
    • Advanced: KNN, MICE, SVD
  • Configurable handling of missing values
  • Quality metrics for imputation assessment

Batch Processing Capabilities

  • Parallel processing of multiple files
  • Support for various input formats
  • Progress tracking and error handling

User Interface Options

  • Command-line interface (CLI) for automation and scripting
  • Streamlit-based GUI for interactive data exploration
  • Comprehensive logging and error reporting

Reporting & Visualization

  • Detailed QC reports in multiple formats
  • Visual summaries of data quality metrics
  • Export capabilities for downstream analysis

Installation

pip install phenoqc

Citation

If you use PhenoQC in your research, please cite:

@software{silva_phenoqc_2024,
  author       = {Silva, Jorge Miguel Ferreira},
  title        = {PhenoQC: Quality Control for Phenotypic Data},
  year         = 2024,
  publisher    = {Zenodo},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

PhenoQC 0.1.0

18 Oct 12:22

Choose a tag to compare

Changelog

All notable changes to PhenoQC will be documented in this file.

[0.1.0] - 2024-10-18

Added

  • Initial release of PhenoQC
  • Data validation functionality:
    • Schema validation using JSON schema
    • Format compliance checks
    • Integrity verification
    • Duplicate record detection
    • Conflicting record identification
  • Ontology mapping feature:
    • Support for multiple ontologies (HPO, DO, MPO)
    • Custom mapping support
    • Synonym resolution
  • Missing data detection and imputation:
    • Multiple imputation strategies (mean, median, mode, KNN, MICE, SVD)
    • Option to flag records with missing data
  • Batch processing capability:
    • Support for multiple file processing
    • Parallel execution
  • Command-line interface (CLI)
  • Streamlit-based graphical user interface (GUI)
  • Reporting and visualization:
    • PDF and Markdown report generation
    • Visual summaries using Plotly
  • Support for CSV, TSV, and JSON input files
  • Recursive directory scanning option
  • Comprehensive logging system
  • YAML-based configuration for ontology mappings and imputation strategies

Changed

  • Improved error handling and logging
  • Updated documentation and examples

Fixed

  • Resolved critical bugs related to data validation and ontology mapping