Skip to content
View erinyoung's full-sized avatar

Organizations

@UPHL-BioNGS

Block or report erinyoung

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
erinyoung/README.md

Hi, I'm Erin Young.

ORCID

Bioinformatician | Regional Technical Lead | Open Source Contributor

I am a Senior Data Scientist and Technical Lead specializing in high-throughput genomic data engineering. Currently, I serve as the Bioinformatics Regional Resource for the Mountain Region, architecting scalable, reproducible workflows for public health surveillance.

My work focuses on workflow orchestration (Nextflow), containerization (Docker/Singularity), and cloud infrastructure (AWS) to turn petabytes of raw sequencing data into actionable epidemiological insights.


Technical Stack

  • Languages: Python (Pandas, Scipy, PySAM), R, Groovy, Bash
  • Workflow Orchestration: Nextflow (DSL2), Snakemake, WDL
  • Infrastructure: Docker, Singularity, AWS (Batch, S3, HealthOmics), GitHub Actions
  • Data Engineering: ETL pipeline design, algorithmic benchmarking, metadata governance

Featured Projects

Role: Lead Architect & Maintainer

The standard-of-care SARS-CoV-2 sequencing pipeline used by the CDC and public health laboratories across the US.

  • Tech: Nextflow, Docker, Singularity, AWS Batch.
  • Scale: Orchestrates alignment, variant calling, and lineage classification for thousands of concurrent samples.
  • Impact: CLIA-validated and deployed for real-time genomic surveillance.

Role: Lead Maintainer

A command-line tool for Unsupervised Machine Learning in genomic epidemiology.

  • Tech: Python, Scikit-learn (PCA, Silhouette Analysis), Fastcluster.
  • ML Features: Uses Auto-K optimization to mathematically identify lineage thresholds and PCA for cluster validation.
  • Performance: Optimized $O(N^2)$ clustering for large-scale distance matrices.

Role: Core Maintainer

A community-driven repository for reproducible bioinformatics containers.

  • Tech: Docker, GitHub Actions CI/CD.
  • Impact: Solves the "it works on my machine" problem by providing version-controlled, public-health-grade images.

Connect

Pinned Loading

  1. StaPH-B/docker-builds StaPH-B/docker-builds Public

    📦 🐳 Dockerfiles and documentation on tools for public health bioinformatics

    Dockerfile 221 137

  2. UPHL-BioNGS/Cecret UPHL-BioNGS/Cecret Public

    Reference-based consensus creation

    Nextflow 59 29

  3. UPHL-BioNGS/Donut_Falls UPHL-BioNGS/Donut_Falls Public

    Assembly of Nanopore Sequencing

    Nextflow 17 7

  4. update_mash_dist update_mash_dist Public

    mash works best when given a mash dist file with a bunch of references.

    3 1

  5. MinkeMap MinkeMap Public

    A Python-based Circular Genome Visualization Tool

    Python 1

  6. heatcluster heatcluster Public template

    Forked from DrB-S/heatcluster

    Creates a heat map with an accompanying cluster map for a SNP matrix

    Python