Skip to content

fix: Fix wrong start for intersecting CNV events#70

Open
hwalinga wants to merge 1 commit intovplagnol:masterfrom
hwalinga:19-fix-combining-intersecting-cnv-events
Open

fix: Fix wrong start for intersecting CNV events#70
hwalinga wants to merge 1 commit intovplagnol:masterfrom
hwalinga:19-fix-combining-intersecting-cnv-events

Conversation

@hwalinga
Copy link
Copy Markdown

Fixes #19 (duplicate #65)

See also for deeper investigation: Multiplicom#4 (comment)

Basically, when the Viterbi path switches from deletion to duplication or vice versa, the start variable is not updated. Thus eg the Viterbi path is 00000011111122220000, then the switch from 0 -> 1 is recording the start, and 1 -> 2 is saving that data, however not setting the start variable again when saving at 2 -> 0, thus "2" has the same start variable as "1".


Now for a test, you can use this script, modified from the vignette:

library(ExomeDepth)
data(ExomeCount)

# Let's flip some numbers to create a duplication adjacent to a deletion.
ExomeCount[22]$Exome1 <- 22 
ExomeCount[22]$Exome2 <- 48
ExomeCount[22]$Exome3 <- 12
ExomeCount[22]$Exome4 <- 44

ExomeCount[23]$Exome1 <- 68
ExomeCount[23]$Exome2 <- 132
ExomeCount[23]$Exome3 <- 46
ExomeCount[23]$Exome4 <- 136

ExomeCount[24]$Exome1 <- 40
ExomeCount[24]$Exome2 <- 88
ExomeCount[24]$Exome3 <- 26
ExomeCount[24]$Exome4 <- 90

ExomeCount[25]$Exome1 <- 28
ExomeCount[25]$Exome2 <- 60
ExomeCount[25]$Exome3 <- 22
ExomeCount[25]$Exome4 <- 16

ExomeCount.dafr <- as(ExomeCount, 'data.frame')
ExomeCount.dafr$chromosome <- gsub(as.character(ExomeCount.dafr$seqnames), 
                                        pattern = 'chr', 
                                        replacement = '')  ##remove the annoying chr letters

my.test <- ExomeCount$Exome4
my.ref.samples <- c('Exome1', 'Exome2', 'Exome3')
my.reference.set <- as.matrix(ExomeCount.dafr[, my.ref.samples])

# Adjustment from the Vignette because of https://github.com/vplagnol/ExomeDepth/pull/68
my.bin.length <- (ExomeCount.dafr$end - ExomeCount.dafr$start)/1000 
my.bin.length[my.bin.length == 0] <- 0.001

my.choice <- select.reference.set (test.counts = my.test, 
                                   reference.counts = my.reference.set, 
                                   bin.length = my.bin.length,
                                   n.bins.reduced = 10000)

my.matrix <- as.matrix( ExomeCount.dafr[, my.choice$reference.choice, drop = FALSE])
my.reference.selected <- apply(X = my.matrix, 
                               MAR = 1, 
                               FUN = sum)

all.exons <- new('ExomeDepth',
                 test = my.test,
                 reference = my.reference.selected,
                 formula = 'cbind(test, reference) ~ 1')

all.exons <- CallCNVs(x = all.exons, 
                      transition.probability = 10^-4, 
                      chromosome = ExomeCount.dafr$chromosome, 
                      start = ExomeCount.dafr$start, 
                      end = ExomeCount.dafr$end, 
                      name = ExomeCount.dafr$names)
head(all.exons@CNV.calls)

NB. I noticed the vignette was not 100% working anymore out of the box because of a check that #68 introduced.

The current implementation will now output (second CNV has start 22 from preceding duplication):

  start.p end.p        type nexons   start     end chromosome                                  
1      22    24 duplication      3   34556   36082          1                                                                                                                                  
2      22    27    deletion      3   34556   91106          1                                  
3      62    66    deletion      5  461753  523834          1                                                                                                                                  
4     100   103 duplication      4  743956  745551          1                                  
5     575   576    deletion      2 1569583 1570002          1                                                                                                                                  
6     587   591    deletion      5 1592941 1603069          1   

Now install my modified ExomeDepth from this branch:

library(devtools)
install_github("hwalinga/ExomeDepth@19-fix-combining-intersecting-cnv-events")

My fix now correctly outputs (second CNV has now correct start of 25):

  start.p end.p        type nexons   start     end chromosome
1      22    24 duplication      3   34556   36082          1
2      25    27    deletion      3   89553   91106          1
3      62    66    deletion      5  461753  523834          1
4     100   103 duplication      4  743956  745551          1
5     575   576    deletion      2 1569583 1570002          1
6     587   591    deletion      5 1592941 1603069          1

Fixes vplagnol#19 (duplicate vplagnol#65)

See also for deeper investigation: Multiplicom#4 (comment)

Basically, when the Viterbi path switches from deletion to duplication or vice versa,
the start variable is not updated. Thus eg the Viterbi path is 00000011111122220000,
then the switch from 0 -> 1 is recording the start, and 1 -> 2 is saving that data,
however not setting the start variable again when saving at 2 -> 0,
thus "2" has the same start variable as "1".
@hwalinga
Copy link
Copy Markdown
Author

Hi Vincent @vplagnol

Thanks a lot for the ExomeDepth package! And for still occasionally putting out a new release for this!

I looked into a long-standing bug that we occasionally encountered and was able to fix it at the root of where the bug comes from. Adding a reproducible example here as well to show exactly when this happened.

I hope a new release can have this fix, maybe now even release back on CRAN with 096bb38, but we are also very happy to get it from conda.

Regards,
Hielke

@hwalinga
Copy link
Copy Markdown
Author

hwalinga commented Oct 1, 2025

NB. I noticed the vignette was not 100% working anymore out of the box because of a check that #68 introduced.

Someone already opened an issue about it here #71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deletions with more reads than expected

1 participant