Skip to content

filterbarcodes creates duplicated bam lines #76

@Enterprise-J

Description

@Enterprise-J

I used the following code to subset a single-cell bam from a 10X Multiome ATAC bam file:

sinto filterbarcodes --bam atac_possorted_bam.bam --cells GTGATCAGTTGAGCCG-1.txt --outdir . --nproc 12

However, grep LH00328:406:22GHLYLT4:2:2404:37243:13079 on subset bam file gives:

LH00328:406:22GHLYLT4:2:2404:37243:13079	65	chr16	58735131	60	150M	chr2	32916362	0	AAACTCGTGTATGTCAAATCCTGATAACTGTTTGTGGAAGGAAATGAATCGGTTTTGAAGGAAATGAATCGGTTTTAAAGGAAAAAGCAAATGAATTTCAGACACATCCACAAGTCAAACGAGTCTTTCTCATGTGTAGTCTATTTCTCT	-II9III-IIIIIII9IIIIIIIIIII9IIIIIIIIIIIIIIII9IIIIIIIII9IIIIIIIIIIIII9II-II9IIIIIIIIIIIIIIIIII-IIIIIII9I99IIII9IIIIIIIIIIIIIIIIIII9IIIII9II9III9IIIII9I	NM:i:0	MD:Z:150	AS:i:150	XS:i:21	CR:Z:TCCTGATCAGCAACTT	CY:Z:9III9I9I--I-9-I-	CB:Z:GTGATCAGTTGAGCCG-1	BC:Z:GGCGGAAT	QT:Z:9IIIII9I	RG:Z:S2_PGT_pool4:MissingLibrary:1:22GHLYLT4:2
LH00328:406:22GHLYLT4:2:2404:37243:13079	65	chr16	58735131	60	150M	chr2	32916362	0	AAACTCGTGTATGTCAAATCCTGATAACTGTTTGTGGAAGGAAATGAATCGGTTTTGAAGGAAATGAATCGGTTTTAAAGGAAAAAGCAAATGAATTTCAGACACATCCACAAGTCAAACGAGTCTTTCTCATGTGTAGTCTATTTCTCT	-II9III-IIIIIII9IIIIIIIIIII9IIIIIIIIIIIIIIII9IIIIIIIII9IIIIIIIIIIIII9II-II9IIIIIIIIIIIIIIIIII-IIIIIII9I99IIII9IIIIIIIIIIIIIIIIIII9IIIII9II9III9IIIII9I	NM:i:0	MD:Z:150	AS:i:150	XS:i:21	CR:Z:TCCTGATCAGCAACTT	CY:Z:9III9I9I--I-9-I-	CB:Z:GTGATCAGTTGAGCCG-1	BC:Z:GGCGGAAT	QT:Z:9IIIII9I	RG:Z:S2_PGT_pool4:MissingLibrary:1:22GHLYLT4:2
LH00328:406:22GHLYLT4:2:2404:37243:13079	129	chr2	32916362	0	51M99S	chr16	58735131	0	GGGGGGGGGGGGGGGGGGGGGGGGGGGAGGGGGGGGGGGAGGGGGGGGGGGAGAGAGAGCGGATAGGTATGGTAACAAGGAGGGAGTGCAGTAGCGCAGTCCTGGCTCACTGCAATGTTTTTTTTTTTTTTTTTTTTTTTTTGGTTGGGG	-9I9II9999IIIII9I99II999II9-99--999-99--999-99--9---9---------9-9-----9-------9---99-9-9-II--9I999-I--9I9-9-9-9--I-9-9--99-99I9---99-99-99III--9I-II-I	NM:i:1	MD:Z:27G23	AS:i:46	XS:i:46	SA:Z:chr16,58735808,-,14S54M82S,0,3;	CR:Z:TCCTGATCAGCAACTT	CY:Z:9III9I9I--I-9-I-	CB:Z:GTGATCAGTTGAGCCG-1	BC:Z:GGCGGAAT	QT:Z:9IIIII9I	RG:Z:S2_PGT_pool4:MissingLibrary:1:22GHLYLT4:2

Line 1 and 2 are completely the same. The same grep command on the original atac_possorted_bam.bam only gives:

LH00328:406:22GHLYLT4:2:2404:37243:13079	65	chr16	58735131	60	150M	chr2	32916362	0	AAACTCGTGTATGTCAAATCCTGATAACTGTTTGTGGAAGGAAATGAATCGGTTTTGAAGGAAATGAATCGGTTTTAAAGGAAAAAGCAAATGAATTTCAGACACATCCACAAGTCAAACGAGTCTTTCTCATGTGTAGTCTATTTCTCT	-II9III-IIIIIII9IIIIIIIIIII9IIIIIIIIIIIIIIII9IIIIIIIII9IIIIIIIIIIIII9II-II9IIIIIIIIIIIIIIIIII-IIIIIII9I99IIII9IIIIIIIIIIIIIIIIIII9IIIII9II9III9IIIII9I	NM:i:0	MD:Z:150	AS:i:150	XS:i:21	CR:Z:TCCTGATCAGCAACTT	CY:Z:9III9I9I--I-9-I-	CB:Z:GTGATCAGTTGAGCCG-1	BC:Z:GGCGGAAT	QT:Z:9IIIII9I	RG:Z:S2_PGT_pool4:MissingLibrary:1:22GHLYLT4:2
LH00328:406:22GHLYLT4:2:2404:37243:13079	129	chr2	32916362	0	51M99S	chr16	58735131	0	GGGGGGGGGGGGGGGGGGGGGGGGGGGAGGGGGGGGGGGAGGGGGGGGGGGAGAGAGAGCGGATAGGTATGGTAACAAGGAGGGAGTGCAGTAGCGCAGTCCTGGCTCACTGCAATGTTTTTTTTTTTTTTTTTTTTTTTTTGGTTGGGG	-9I9II9999IIIII9I99II999II9-99--999-99--999-99--9---9---------9-9-----9-------9---99-9-9-II--9I999-I--9I9-9-9-9--I-9-9--99-99I9---99-99-99III--9I-II-I	NM:i:1	MD:Z:27G23	AS:i:46	XS:i:46	SA:Z:chr16,58735808,-,14S54M82S,0,3;	CR:Z:TCCTGATCAGCAACTT	CY:Z:9III9I9I--I-9-I-	CB:Z:GTGATCAGTTGAGCCG-1	BC:Z:GGCGGAAT	QT:Z:9IIIII9I	RG:Z:S2_PGT_pool4:MissingLibrary:1:22GHLYLT4:2

The original bam file is too large to upload but if you need anything to diagnose I would be happy to help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions