-
Notifications
You must be signed in to change notification settings - Fork 26
How to deal with libraries that do not have UMIs? #75
Description
Hi,
I am trying to run sinto barcode in a fastq file originating from a custom single-cell DNA (not RNA) library. This library does not have UMIs. It contains the cell barcode in the first 45 nt in read 2, which is followed by the genomic insert. The structure is:
NNNNNNNNAGGANNNNNNNNACTCNNNNNNNNAAGGNNNNNNNNT-Genomic Insert
I am using sinto barcode to simply add the cell barcode to the reads identifier with the following command:
sinto barcode --barcode_fastq "$r2" \
--read1 "$r1" \
--read2 "$r2" \
--bases 45 \
--whitelist "$WHITELIST \
--suffix $LIB_PREFIX"Where $r2 points to the read2 fastq file, $r1 points to the read 1 file, and $WHITELIST points to a text file with the whitelist of known barcodes.
Unfortunately, after some time running I get the following error:
Function run_barcode called with the following arguments:
barcode_fastq /scratch/antwerpen/205/vsc20542/atrandi_scDNA/input_fastq/fastp/scDNA_AT_01_R2_clean.fastq
read1 /scratch/antwerpen/205/vsc20542/atrandi_scDNA/input_fastq/fastp/scDNA_AT_01_R1_clean.fastq
read2 /scratch/antwerpen/205/vsc20542/atrandi_scDNA/input_fastq/fastp/scDNA_AT_01_R2_clean.fastq
bases 45
prefix
suffix scDNA_AT_01
whitelist /scratch/antwerpen/205/vsc20542/atrandi_scDNA/whitelist.tsv
func <function run_barcode at 0x154c53258360>
Traceback (most recent call last):
File "/data/antwerpen/205/vsc20542/python_lib/bin/sinto", line 8, in <module>
sys.exit(main())
^^^^^^
File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/sinto/arguments.py", line 555, in main
options.func(options)
File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/sinto/utils.py", line 24, in wrapper
func(args)
File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/sinto/cli.py", line 105, in run_barcode
addbarcodes.addbarcodes(
File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/sinto/addbarcodes.py", line 101, in addbarcodes
barcodes = correct_barcodes(barcodes, whitelist)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/sinto/addbarcodes.py", line 49, in correct_barcodes
for entry in clusterer(counts, threshold=1):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/antwerpen/205/vsc20542/python_lib/lib/python3.12/site-packages/umi_tools/network.py", line 368, in __call__
assert max(len_umis) == min(len_umis), (
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: not all umis are the same length(!): 43 - 56
Is it due to the lack of UMIs after the barcode sequence? If so, is there a way of making Sinto bypass the UMI detection?
Not sure if helpful, but here is the head of my whitelist.txt file:
$ head whitelist.tsvGTAATGCCAGGATACAGCAGACTCTACAACCGAAGGGTAACCGAT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGTCCTCAACT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGTGGTCTCAT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGGTCCGATTT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGTTGACCACT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGTCCAGGATT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGGACAGCATT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGGATGGTCTT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGCATACCGTT
GTAATGCCAGGATACAGCAGACTCTACAACCGAAGGCGGTTGATT