-
Notifications
You must be signed in to change notification settings - Fork 25
RepeatModeler stalls at "refining families" step #302
Description
Describe the bug
Hi there! :)
When running the pipeline using the Singularity container on our HPC using the slurm scheduler, RepeatModeler will SOMETIMES stall out during the first round during the RepeatScout "refining families" step. Confusingly, it does not seem to happen for every genome, despite using the same script and settings.
E.g. I have 4 closely related bird species. The genomes are fairly average for birds: ~1Gb, not repeat dense or polyploid. For one species, the black footed albatross, the pipeline ran successfully in under a day, while for the other species they stall at the same point.
To Reproduce
Here is my slurm submission script:
#!/bin/bash
#SBATCH -n 12
#SBATCH -N 1
#SBATCH -t 3-0:00
#SBATCH -p holy-cow
#SBATCH --mem=92G
#SBATCH -o logs/repeat_modeler_%j.out
#SBATCH -e logs/repeat_modeler_%j.err
export BLAST_USAGE_REPORT=false
FASTA="/n/netscratch/informatics/Lab/dkhost/albatross/pacbio_hifi_assembly/workflow/results/assembly/layAlba.p_ctg.fa"
SAMPLE="layAlba"
./dfam-tetools.sh --container dfam-tetools-latest.sif -- BuildDatabase -name $SAMPLE $FASTA
./dfam-tetools.sh --container dfam-tetools-latest.sif -- RepeatModeler -database $SAMPLE -threads 12 -LTRStruct
Here are the relevant lines of the log file:
...
RepeatModeler Round # 1
========================
Searching for Repeats
-- Sampling from the database...
- Gathering up to 40000000 bp
- Final Sample Size = 40001684 bp ( 40001684 non ambiguous )
- Num Contigs Represented = 103
- Sequence extraction : 00:00:10 (hh:mm:ss) Elapsed Time
-- Running RepeatScout on the sequences...
- RepeatScout: Running build_lmer_table ( l = 14, min = 10 )..
- RepeatScout: Running RepeatScout.. : 77 raw families identified
- RepeatScout: Running filtering stage.. 66 families remaining
- RepeatScout: 00:03:37 (hh:mm:ss) Elapsed Time
- Collecting repeat instances...
- Refining 64 families...
The pipeline runs for multiple days (my submission script calls for 3 days, but I have run it for over a week) and does not make it past this step. It seems to be the same point for all samples.
Expected behavior
For the species that ran successfully, according to the log files this same step ran in about 1hr using the exact same submission script and finished successfully in under a day:
...
RepeatModeler Round # 1
========================
Searching for Repeats
-- Sampling from the database...
- Gathering up to 40000000 bp
- Final Sample Size = 40024074 bp ( 40024074 non ambiguous )
- Num Contigs Represented = 120
- Sequence extraction : 00:00:06 (hh:mm:ss) Elapsed Time
-- Running RepeatScout on the sequences...
- RepeatScout: Running build_lmer_table ( l = 14, min = 10 )..
- RepeatScout: Running RepeatScout.. : 98 raw families identified
- RepeatScout: Running filtering stage.. 82 families remaining
- RepeatScout: 00:03:36 (hh:mm:ss) Elapsed Time
- Collecting repeat instances...
- Refining 77 families... 01:07:45 (hh:mm:ss) Elapsed Time
- Redundant Families and Large Satellite Filtering..
: 16 satellite(s), 24 contained, found in 00:00:07 (hh:mm:ss) Elapsed Time
Family Refinement: 00:00:07 (hh:mm:ss) Elapsed Time
Round Time: 01:11:37 (hh:mm:ss) Elapsed Time : 38 families discovered.
Host system (please complete as much of the following information as you can find out):
Our cluster runs on Rocky Linux v 8.10, using the slurm scheduler
I am running on a single node using 12 cores
Singularity version (installed on the cluster by admins; Docker is not usable on our cluster): 4.3.3-3.el8
I downloaded the container manually from the latest release (1.94)
Additional context
As far as I can tell, there is nothing special about these genomes that would be causing them to fail, formatting looks OK, etc. I have also run the pipeline successfully on other organisms as well, using the same version of the pipeline (iirc). Is the issue something with our filesystem perhaps? I had read somewhere that the container attempts to contact the NCBI servers which can cause issues, tho I thought I had disabled that option...
Any feedback would be appreciated! :)
-Danielle