Skip to content

Release/0.3.0#16

Open
qclayssen wants to merge 41 commits intodevfrom
release/0.3.0
Open

Release/0.3.0#16
qclayssen wants to merge 41 commits intodevfrom
release/0.3.0

Conversation

@qclayssen
Copy link
Collaborator

@qclayssen qclayssen commented Sep 15, 2025

Add

Upgrade

  • pcgrr v1.4.1 -> pcgr v2.2.5
  • gpgr v2.1.3 → v2.2.11
  • Improve logging

Removed

  • Removal of COSMIC count and ICGC PCAWG count from rescue logic

* update doc

*  bump  gpgr version
@qclayssen qclayssen self-assigned this Sep 15, 2025
qclayssen and others added 9 commits September 23, 2025 17:00
* Bump version: 0.2.15 → 0.2.16

* Change hrd facultative for cancer report (#17)

* change hrd file facultatif

* fix arg position and assign default value

* fix version (#18)

* Bump gpgr version to 2.2.0
* Bump version: 0.2.16 → 0.2.17

* bump gpgr to v2.2.1 in docker files (#21)
@qclayssen qclayssen force-pushed the release/0.3.0 branch 2 times, most recently from 54672c1 to 9c613d4 Compare October 17, 2025 00:38
@qclayssen qclayssen requested a review from Copilot October 24, 2025 00:46
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR upgrades PCGR from v1.4.1 to v2.2.1 and adds handling for hypermutated samples. The upgrade requires updating field names and annotations to match the new PCGR version's output format, while new functionality filters variants in hypermutated samples to stay within PCGR's variant count limits.

Key changes:

  • Updated PCGR/CPSR field names to match v2.2.1 output (e.g., PCGR_TIERPCGR_ACTIONABILITY_TIER)
  • Added hypermutation handling with variant filtering logic for samples exceeding 450,000 variants
  • Implemented chunked PCGR processing for large variant sets with parallel execution

Reviewed Changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pyproject.toml Version bump to 0.2.17
docker/Dockerfile.pcgr Updated PCGR conda environment from v1.4.1 to v2.2.1
docker/Dockerfile.gpgr Updated r-gpgr from 2.1.3 to 2.2.11
conda/env/bolt_env.yml Added ncurses dependency
bolt/workflows/sv_somatic/annotate.py Added logging infrastructure
bolt/workflows/smlv_somatic/rescue.py Added logging infrastructure
bolt/workflows/smlv_somatic/report.py Added hypermutation variant filtering and updated PCGR integration
bolt/workflows/smlv_somatic/filter.py Updated field names for PCGR v2.2.1 and added helper function to handle missing values
bolt/workflows/smlv_somatic/annotate.py Refactored PCGR processing to support chunking and updated field names
bolt/workflows/smlv_germline/report.py Added logging and vep_dir parameter
bolt/workflows/other/cancer_report.py Made dragen_hrd_fp optional and added new signature analysis parameters
bolt/util.py Enhanced command execution with logging, added VCF splitting/merging utilities
bolt/logging_config.py New logging configuration module
bolt/common/pcgr.py Major refactor for PCGR v2.2.1 compatibility, chunked processing, and updated field mappings
bolt/common/constants.py Updated field names and added hypermutation filtering constants
tests/test_smlv_somatic_filter.py Updated test data to match new PCGR field names
README.md Updated version references to 0.2.17
CHANGELOG.md Added changelog entries for recent PRs
.bumpversion.cfg Updated version to 0.2.17
Comments suppressed due to low confidence (1)

bolt/common/pcgr.py:1

  • Missing spaces around the == operator. Should be if value == '' or value == '.':
import collections

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

qclayssen and others added 7 commits November 19, 2025 14:25
* update "HIGH_INF_POS" to "HIGH_I_I_POS" in VCF header for latest pcgr version

* update constant for pcgr 2.2.5
* change sage vcf header to match 2024 version

* move check function

* add check in rescue for sage vcf

* change for consistencency with pcgr
…arallelisation because OOM hypermutated samples
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 26 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +49 to +52
pos='1',
vid='.',
ref='.',
alt='.',
ref='A',
alt='T',
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test data changed from default values to specific values. The change from pos='.' to pos='1', ref='.' to ref='A', and alt='.' to alt='T' looks correct for creating valid test records, but ensure all tests that rely on this function still work correctly with these new default values.

Copilot uses AI. Check for mistakes.
Comment on lines +338 to +342

if (variant_count - filter_sum) <= constants.MAX_SOMATIC_VARIANTS:
break

filter_sum += len(variants_sorted.get(key, []))
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic error in hypermutation filtering: Line 339 checks if (variant_count - filter_sum) <= constants.MAX_SOMATIC_VARIANTS, but filter_sum represents variants to be filtered out, while the condition should check if remaining variants are under the limit. The correct logic should be checking if we've accumulated enough variants to filter to bring the total under the threshold. This could result in incorrect filtering behavior.

Suggested change
if (variant_count - filter_sum) <= constants.MAX_SOMATIC_VARIANTS:
break
filter_sum += len(variants_sorted.get(key, []))
cat_count = len(variants_sorted.get(key, []))
# Check if filtering the variants in this category as well would bring
# the total number of remaining variants under the PCGR limit
if (variant_count - (filter_sum + cat_count)) <= constants.MAX_SOMATIC_VARIANTS:
filter_categories.append(key)
break
filter_sum += cat_count

Copilot uses AI. Check for mistakes.
chunk_size = kwargs.get('pcgr_variant_chunk_size')
if chunk_size is not None and chunk_size <= 0:
raise click.BadParameter('must be a positive integer', param_hint='--pcgr_variant_chunk_size')
chunk_size = chunk_size or constants.MAX_SOMATIC_VARIANTS
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation: Lines 104-105 check if pcgr_variant_chunk_size is negative or zero, but line 106 then allows the value to be None by using chunk_size = chunk_size or constants.MAX_SOMATIC_VARIANTS. However, if a user passes 0, the validation would raise an error. The logic should clarify whether 0 is acceptable (and treated as None) or not. Current implementation is inconsistent.

Suggested change
chunk_size = chunk_size or constants.MAX_SOMATIC_VARIANTS
if chunk_size is None:
chunk_size = constants.MAX_SOMATIC_VARIANTS

Copilot uses AI. Check for mistakes.
--solver libmamba \
--name pcgr \
--file https://raw.githubusercontent.com/sigven/pcgr/v1.4.1/conda/env/lock/pcgr-linux-64.lock
--file https://raw.githubusercontent.com/sigven/pcgr/refs/tags/v2.2.5/conda/env/lock/pcgr-linux-64.lock
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version mismatch: The PR description states "pcgrr v1.4.1 -> pcgr v2.2.1", but the Dockerfile is being updated to v2.2.5. The version should be consistent with what's documented in the PR description or the description should be updated to reflect v2.2.5.

Copilot uses AI. Check for mistakes.

# Set filter category for each variant
variants_sorted = collections.defaultdict(list)
for variant_count, variant in enumerate(cyvcf2.VCF(fp_annotated_out), 1):
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable naming issue: Line 321 uses for variant_count, variant in enumerate(cyvcf2.VCF(fp_annotated_out), 1), which shadows the outer scope where variant_count might be used later (line 339). This can lead to confusion. Consider using a different variable name like idx or i for the enumeration.

Copilot uses AI. Check for mistakes.


import click
import logging
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'logging' is not used.

Copilot uses AI. Check for mistakes.
qclayssen and others added 2 commits January 21, 2026 14:32
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@umccr umccr deleted a comment from Copilot AI Jan 21, 2026
@umccr umccr deleted a comment from Copilot AI Jan 21, 2026
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 21 changed files in this pull request and generated 11 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 15 to +24
conda create \
--solver libmamba \
--name pcgr \
--file https://raw.githubusercontent.com/sigven/pcgr/v1.4.1/conda/env/lock/pcgr-linux-64.lock
--file https://raw.githubusercontent.com/sigven/pcgr/refs/tags/v2.2.5/conda/env/lock/pcgr-linux-64.lock

RUN \
conda create \
--solver libmamba \
--name pcgrr \
--file https://raw.githubusercontent.com/sigven/pcgr/v1.4.1/conda/env/lock/pcgrr-linux-64.lock
--file https://raw.githubusercontent.com/sigven/pcgr/refs/tags/v2.2.5/conda/env/lock/pcgrr-linux-64.lock
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says PCGR is upgraded to v2.2.1, but this Dockerfile pulls lock files from tag v2.2.5. Either update the PR description/release notes to match, or change the Dockerfile to the intended PCGR version so the built image matches what’s documented.

Copilot uses AI. Check for mistakes.
Comment on lines 55 to 71
| Name | Docker image URI | Commands |
| --- | --- | --- |
| pcgr | ghcr.io/scwatts/bolt:0.2.14-pcgr | • `bolt smlv_germline report`<br />• `bolt smlv_somatic annotate`<br />• `bolt smlv_somatic report`<br /> |
| gpgr | ghcr.io/scwatts/bolt:0.2.14-gpgr | • `bolt other cancer_report` |
| snpeff | ghcr.io/scwatts/bolt:0.2.14-snpeff | • `bolt sv_somatic annotate` |
| circos | ghcr.io/scwatts/bolt:0.2.14-circos | • `bolt other purple_baf_plot` |
| multiqc | ghcr.io/scwatts/bolt:0.2.14-multiqc | • `bolt other multiqc_report` |
| base | ghcr.io/scwatts/bolt:0.2.14 | • `bolt smlv_germline prepare`<br />• `bolt smlv_somatic rescue`<br />• `bolt smlv_somatic filter`<br />• `bolt sv_somatic prioritise`<br /> |
| pcgr | ghcr.io/scwatts/bolt:0.2.17-pcgr | • `bolt smlv_germline report`<br />• `bolt smlv_somatic annotate`<br />• `bolt smlv_somatic report`<br /> |
| gpgr | ghcr.io/scwatts/bolt:0.2.17-gpgr | • `bolt other cancer_report` |
| snpeff | ghcr.io/scwatts/bolt:0.2.17-snpeff | • `bolt sv_somatic annotate` |
| circos | ghcr.io/scwatts/bolt:0.2.17-circos | • `bolt other purple_baf_plot` |
| multiqc | ghcr.io/scwatts/bolt:0.2.17-multiqc | • `bolt other multiqc_report` |
| base | ghcr.io/scwatts/bolt:0.2.17 | • `bolt smlv_germline prepare`<br />• `bolt smlv_somatic rescue`<br />• `bolt smlv_somatic filter`<br />• `bolt sv_somatic prioritise`<br /> |

## Usage

Given the nature of software dependencies required, it is strongly recommended to run `bolt` commands via the existing
[Docker images](#docker-images):

```bash
docker run -ti -v $(pwd):$(pwd) -w $(pwd) ghcr.io/scwatts/bolt:0.2.14 \
docker run -ti -v $(pwd):$(pwd) -w $(pwd) ghcr.io/scwatts/bolt:0.2.17 \
bolt smlv_somatic filter \
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README Docker image tags are set to 0.2.17, but the project version in pyproject.toml/.bumpversion.cfg is 0.3.0. Update the README image URIs and example docker run command to reference the same release version to avoid publishing/usage confusion.

Copilot uses AI. Check for mistakes.
CHANGELOG.md Outdated
- [3](https://github.com/scwatts/bolt/pull/3) - Improve PCGR / CPSR argument handling

- [6](https://github.com/umccr/bolt/pull/6) - Change oncoanalyser v2.0.0 uptade, with switch sv caller from GRIPSS to eSVee No newline at end of file
- [6](https://github.com/umccr/bolt/pull/6) - Change oncoanalyser v2.0.0 uptade, with switch sv caller from GRIPSS to eSVee
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in changelog entry: "uptade" should be "update".

Suggested change
- [6](https://github.com/umccr/bolt/pull/6) - Change oncoanalyser v2.0.0 uptade, with switch sv caller from GRIPSS to eSVee
- [6](https://github.com/umccr/bolt/pull/6) - Change oncoanalyser v2.0.0 update, with switch sv caller from GRIPSS to eSVee

Copilot uses AI. Check for mistakes.
Comment on lines +329 to +331
filter_category = (data['tier'], *variant_filter)
variant_repr = pcgr.get_variant_repr(variant)
variants_sorted[filter_category].append(variant_repr)
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_category prefixes with data['tier'] from PCGR_ACTIONABILITY_TIER (values like '1'..'4'/'N'), but pcgr.get_ordering() iterates tiers from constants.PCGR_TIERS_FILTERING (currently TIER_1…/NONCODING). With this mismatch, variants_sorted.get(key, []) won’t match and hypermutation down-selection won’t actually reduce the VCF below MAX_SOMATIC_VARIANTS. Align tier values (constants and/or mapping) so ordering keys match data['tier'].

Copilot uses AI. Check for mistakes.
Comment on lines 83 to 87
# - PCGR ACMG TIER [INFO/PCGR_TIER]
# - VEP consequence [INFO/PCR_CSQ]
# - Known mutation hotspot [INFO/PCGR_MUTATION_HOTSPOT]
# - ClinVar clinical significant [INFO/PCGR_CLINVAR_CLNSIG]
# - Hits in COSMIC [INFO/PCGR_COSMIC_COUNT]
# - ClinVar clinical significant [INFO/PCGR_CLNSIG]
# - Hits in TCGA [INFO/PCGR_TCGA_PANCANCER_COUNT]
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments still reference legacy PCGR output fields (PCGR_TIER, PCGR_CLINVAR_CLNSIG, COSMIC/PCAWG counts), but the code/constants have moved to PCGR_ACTIONABILITY_TIER and PCGR_CLINVAR_CLASSIFICATION and no longer transfer COSMIC/PCAWG counts. Update the comment block to match the current annotations being transferred.

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +53
PCGR_TIERS_FILTERING = (
'TIER_1',
'TIER_2',
}
'TIER_3',
'TIER_4',
'NONCODING',
)
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PCGR_TIERS_FILTERING is defined as TIER_1NONCODING, but the code normalizes/uses actionability tiers as '1'..'4' / 'N' (see bolt/common/pcgr.py where PCGR_ACTIONABILITY_TIER is set). This mismatch breaks tier-based ordering/filtering (e.g., hypermutation down-selection). Align the constants with the actual tier values used in VCF/TSV, or add a mapping layer.

Copilot uses AI. Check for mistakes.
''')

# Set filter category for each variant
variants_sorted = collections.defaultdict(list)
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variant_count is introduced by the enumerate(..., 1) loop, but it’s later used after the loop to decide whether enough variants have been filtered out. If the annotated VCF has zero records, variant_count will be undefined and raise UnboundLocalError. Initialize variant_count = 0 before the loop (or compute total count explicitly).

Suggested change
variants_sorted = collections.defaultdict(list)
variants_sorted = collections.defaultdict(list)
variant_count = 0

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +24
--file https://raw.githubusercontent.com/sigven/pcgr/refs/tags/v2.2.5/conda/env/lock/pcgr-linux-64.lock

RUN \
conda create \
--solver libmamba \
--name pcgrr \
--file https://raw.githubusercontent.com/sigven/pcgr/v1.4.1/conda/env/lock/pcgrr-linux-64.lock
--file https://raw.githubusercontent.com/sigven/pcgr/refs/tags/v2.2.5/conda/env/lock/pcgrr-linux-64.lock
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Docker build downloads conda lockfiles for pcgr and pcgrr directly from raw.githubusercontent.com using only a tag reference, which can change over time. If the upstream repository or tag is compromised, an attacker could modify the environment definition to pull and execute malicious packages during image build or when the container runs, leading to a supply-chain compromise. To reduce this risk, pin these URLs to immutable commit SHAs or vendor the lockfiles into this repository and verify their integrity before use.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants