Skip to content

Add SCI-oriented sporotrichosis multinomial & ML analysis pipeline#2

Open
pifu-dabai wants to merge 1 commit intomainfrom
codex/conduct-statistical-analysis-and-create-visualizations
Open

Add SCI-oriented sporotrichosis multinomial & ML analysis pipeline#2
pifu-dabai wants to merge 1 commit intomainfrom
codex/conduct-statistical-analysis-and-create-visualizations

Conversation

@pifu-dabai
Copy link
Copy Markdown
Collaborator

Motivation

  • Provide a reusable, publication-oriented analysis pipeline for retrospective cutaneous sporotrichosis cohorts that standardizes core variables by lymphatic drainage anatomy and produces SCI-ready English figures and tables.
  • Enable multivariable multinomial logistic modeling of three non-ordinal diagnostic categories and compare machine-learning classifiers to the planned binary analyses.

Description

  • Add analysis/sporotrichosis_analysis.py, which standardizes diagnostic_type and season, excludes unknown onset sites, runs statsmodels.MNLogit for multivariable multinomial logistic regression, and exports OR/95%CI/p-values as CSV.
  • Implement plotting helpers that export stacked bar, seasonal trend, and forest plots in PDF (vector, 600 dpi), PNG (transparent, 600 dpi), and TIFF (300 dpi) with black text suitable for medical journals via save_figure_all_formats.
  • Add ML benchmarking (CatBoost when available, RandomForest, and multinomial LogisticRegression) using stratified cross-validation and export model comparison metrics to CSV.
  • Add analysis/README.md with usage examples, required input columns, and dependency instructions (pandas, numpy, matplotlib, seaborn, scikit-learn, statsmodels, optional catboost).

Testing

  • Ran Python byte-compile with python -m py_compile analysis/sporotrichosis_analysis.py, which completed successfully.
  • Created a synthetic analysis/sample.csv to exercise the pipeline and attempted to run python analysis/sporotrichosis_analysis.py --input analysis/sample.csv --output-dir analysis/out --covariates age sex, but runtime execution failed due to missing plotting dependency matplotlib in the environment.
  • No further automated test failures were observed during static checks; full end-to-end execution requires installing the listed plotting and ML dependencies and then re-running the script.

Codex Task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant