The bed file with complement to targeted regions was used for exclusion in filtered_and_shuffled_fiber_locations_chromosome.
ref: /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa
ref_name: hg38
n_chunks: 1 # split bam file across x chunks
max_t: 4 # use X threaeds per chunk
manifest: config/config_targeted_project.tbl # table with samples to process
keep_chromosomes: chr4 # only keep chrs matching this regex.
keep_chromosomes: chr7
keep_chromosomes: chr20
## Force a read coverage instead of calulating it genome wide from the bam file.
## This can be useful if only a subset of the genome has reads.
#force_coverage: 50
## regions to not use when identifying null regions that should not have RE, below are the defaults auto used for hg38.
excludes:
- workflow/annotations/hg38.fa.sorted.bed
#- workflow/annotations/hg38.gap.bed.gz
#- workflow/annotations/SDs.merged.hg38.bed.gz
## you can optionally specify a model that is not the default.
# model: models/my-custom-model.dat
##
## only used if training a new model
##
# train: True
# dhs: workflow/annotations/GM12878_DHS.bed.gz # regions of suspected regulatory elements
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=204800, mem_mib=195313, disk_mb=4096, disk_mib=3907, time=100440, gpus=0
Select jobs to execute...
[Thu Dec 28 16:04:53 2023]
rule fdr_table:
input: results/bc2031/fiber-calls/FIRE.bed.gz, results/bc2031/coverage/filtered-for-coverage/fiber-locations.bed.gz, results/bc2031/coverage/filtered-for-coverage/fiber-locations-shuffled.bed.gz, /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa.fai
output: results/bc2031/FDR-peaks/FIRE.score.to.FDR.tbl
jobid: 0
reason: Forced execution
wildcards: sm=bc2031
threads: 8
resources: mem_mb=204800, mem_mib=195313, disk_mb=4096, disk_mib=3907, tmpdir=/tmp, time=100440, gpus=0
python /home/nshaikhutdinov/.cache/snakemake/snakemake/source-cache/runtime-cache/tmpiwuex449/file/net/seq/pacbio/fiberseq_processing/fiberseq/fire_analysis_v0.0.2/fiberseq-fire/workflow/rules/../scripts/fire-null-distribution.py -v 1 results/bc2031/fiber-calls/FIRE.bed.gz results/bc2031/coverage/filtered-for-coverage/fiber-locations.bed.gz /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa.fai -s results/bc2031/coverage/filtered-for-coverage/fiber-locations-shuffled.bed.gz -o results/bc2031/FDR-peaks/FIRE.score.to.FDR.tbl
Activating conda environment: ../../../../../../../home/nshaikhutdinov/FIRE/env/72529d38651d38b3fc44b5aae6fe7a22_
[INFO][Time elapsed (ms) 1068]: Reading FIRE file: results/bc2031/fiber-calls/FIRE.bed.gz
/home/nshaikhutdinov/.cache/snakemake/snakemake/source-cache/runtime-cache/tmpiwuex449/file/net/seq/pacbio/fiberseq_processing/fiberseq/fire_analysis_v0.0.2/fiberseq-fire/workflow/rules/../scripts/fire-null-distribution.py:486: DeprecationWarning: `the argument comment_char` for `read_csv` is deprecated. It has been renamed to `comment_prefix`.
fire = pl.read_csv(
[INFO][Time elapsed (ms) 1082]: Reading genome file: /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa.fai
[INFO][Time elapsed (ms) 1085]: Reading fiber locations file: results/bc2031/coverage/filtered-for-coverage/fiber-locations.bed.gz
[INFO][Time elapsed (ms) 1095]: Reading shuffled fiber locations file: results/bc2031/coverage/filtered-for-coverage/fiber-locations-shuffled.bed.gz
Traceback (most recent call last):
File "/home/nshaikhutdinov/.cache/snakemake/snakemake/source-cache/runtime-cache/tmpiwuex449/file/net/seq/pacbio/fiberseq_processing/fiberseq/fire_analysis_v0.0.2/fiberseq-fire/workflow/rules/../scripts/fire-null-distribution.py", line 539, in <module>
defopt.run(main, show_types=True, version="0.0.1")
File "/home/nshaikhutdinov/.local/lib/python3.11/site-packages/defopt.py", line 356, in run
return call()
^^^^^^
File "/home/nshaikhutdinov/.cache/snakemake/snakemake/source-cache/runtime-cache/tmpiwuex449/file/net/seq/pacbio/fiberseq_processing/fiberseq/fire_analysis_v0.0.2/fiberseq-fire/workflow/rules/../scripts/fire-null-distribution.py", line 517, in main
shuffled_locations = pl.read_csv(
^^^^^^^^^^^^
File "/home/nshaikhutdinov/.local/lib/python3.11/site-packages/polars/utils/deprecation.py", line 100, in wrapper
return function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nshaikhutdinov/.local/lib/python3.11/site-packages/polars/io/csv/functions.py", line 369, in read_csv
df = pl.DataFrame._read_csv(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nshaikhutdinov/.local/lib/python3.11/site-packages/polars/dataframe/frame.py", line 784, in _read_csv
self._df = PyDataFrame.read_csv(
^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.NoDataError: empty CSV
[Thu Dec 28 16:04:54 2023]
Error in rule fdr_table:
jobid: 0
input: results/bc2031/fiber-calls/FIRE.bed.gz, results/bc2031/coverage/filtered-for-coverage/fiber-locations.bed.gz, results/bc2031/coverage/filtered-for-coverage/fiber-locations-shuffled.bed.gz, /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa.fai
output: results/bc2031/FDR-peaks/FIRE.score.to.FDR.tbl
conda-env: /home/nshaikhutdinov/FIRE/env/72529d38651d38b3fc44b5aae6fe7a22_
shell:
python /home/nshaikhutdinov/.cache/snakemake/snakemake/source-cache/runtime-cache/tmpiwuex449/file/net/seq/pacbio/fiberseq_processing/fiberseq/fire_analysis_v0.0.2/fiberseq-fire/workflow/rules/../scripts/fire-null-distribution.py -v 1 results/bc2031/fiber-calls/FIRE.bed.gz results/bc2031/coverage/filtered-for-coverage/fiber-locations.bed.gz /home/nshaikhutdinov/working_directory/genome_hg38/hg38.fa.fai -s results/bc2031/coverage/filtered-for-coverage/fiber-locations-shuffled.bed.gz -o results/bc2031/FDR-peaks/FIRE.score.to.FDR.tbl
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Index(['bc2029', 'bc2031', 'bc2025', 'bc2027', 'bc2026', 'bc2032', 'bc2030',
'bc2028'],
dtype='object', name='sample')
chr1 1 248956422
chr10 1 133797422
chr11 1 135086622
chr12 1 133275309
chr13 1 114364328
chr14 1 107043718
chr15 1 101991189
chr16 1 90338345
chr17 1 83257441
chr18 1 80373285
chr19 1 58617616
chr2 1 242193529
chr20 1 4680670
chr20 4690391 64444167
chr21 1 46709983
chr22 1 50818468
chr3 1 198295559
chr4 1 3072454
chr4 3077294 190214555
chr5 1 181538259
chr6 1 170805979
chr7 1 140917955
chr7 140927420 159345973
chr8 1 145138636
chr9 1 138394717
chrM 1 16569
chrX 1 156040895
chrY 1 57227415
Hi Mitchell!
We've tried to run FIRE on targeted seq data, and pipeline is failing with "polars.exceptions.NoDataError: empty CSV", since fiber-locations-shuffled.bed.gz is created empty.
The bed file with complement to targeted regions was used for exclusion in filtered_and_shuffled_fiber_locations_chromosome.
What could be an issue in our usage of FIRE? Is it suitable for such a task?
Config yaml:
Example of error log:
Exclusion bed file: