Skip to content

viash-hub/demultiplex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Demultiplex.vsh

Demultiplex.vsh is a workflow for demultiplexing of raw sequencing data. Currently data from Illumina and Element Biosciences sequencers are supported.

ViashHub GitHub GitHub License GitHub Issues Viash version

Introcuction

This workflow is designed to demultiplex raw RNA-seq sequencing data from Illumina and Element Biosciences sequencers.

The workflow is built in a modular fashion, where most of the base functionality is provided by components from biobox supplemented by custom base components and workflow components in this package. Each of these components can be used independently as stand-alone modules with a standardized interface.

The full workflow can be run in two ways:

  1. Run the main workflow containing the main functionality.
  2. Run the (opinianated) runner where a number of choices (input/output structure and location) have been made.

Workflow Overview

The workflow executes the following steps:

  1. Unpacking the input data (when a TAR archive is provided)
  2. Run bclconvert or bases2fastq
  3. Run falco and convert Illumina InterOp information to csv
  4. Run multiqc to generate a report

Example usage

Two variants of the same workflow are provided, depending on the flexibility in the ouput structure required:

  • The runner workflow provides a predifined output structure. It requires the minimal amount of parameters to be provided, at the cost of being less flexible. It is located at target/nextflow/runner/main.nf
  • The demultiplex workflow (target/nextflow/demultiplex/main.nf) allows for more fine-grained tuning, but required more parameters to be provided.

Test data

We have provided test data at gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2 (Illumina), but please feel free to bring your own. The URL of the test data can be provided as-is to the workflow, or you can download everything and specify a local path.

The input data should follow the structure of either Illumina or Element Biosciences sequencers. The workflow will automatically detect which demultiplexer to use (bclconvert or bases2fastq) based on the presence of either SampleSheet.csv or RunParameters.xml in the input directory. Demultiplexer can also be set explicitly using the --demultiplexer parameter.

Setup

In order to use the workflows in this package, you’ll need to do the following:

  • Install nextflow
  • Install a nextflow compatible executor. This workflow provides a profile for docker.

Run from Viash Hub

  1. Open Viash Hub and browse to the demultiplex component. Press the ‘Launch’ button and follow the instructions.

  1. We will start an example run and set profile to docker.

  1. In the next step, we provide the paramters as follows and leave the rest as defalut:
  • input: gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2

Press the ‘Launch’ button at the end to get the instructions on how to run the workflow from the CLI.

Run using NF-Tower / Seqera Cloud

It’s possible to run the workflow directly from Seqera Cloud. The necessary Nextflow schema file has been built and provided with the workflows in order to use the form-based input.

  1. Select the option to run the workflow using Seqera Cloud. You will need to create an API token for your account. Once this token is filled in in the corresponding field, we will get the option to select a ‘Workspace’ and a ‘Compute environment’.

  1. Provide the parameters similar to the previous step.

  2. In the next screen, pressing the ‘Launch’ button will actually start the workflow on Seqera Cloud. A message is shown when the submit was successful.

Setting up SCM

In order to let nextflow use the viash-hub workflows, you need to setup a SCM file. This can be done once by creating $HOME/.nextflow/scm and adding the following:

providers {
   vsh {
    platform = 'gitlab'
    server = "packages.viash-hub.com"
  }
}

Alternatively, a custom location for the SCM file can be specified using the NXF_SCM_FILE environment variable.

You can check if everything is working by getting the --help for a workflow:

nextflow run \
vsh/demultiplex \
-r v0.3.11 \
--help

Run from the CLI

Running from the CLI directly without using Viash hub is possible as well. The easiest is to use the integrated help functionality, for instance using the following:

 nextflow run vsh/demultiplex \
  -revision v0.3.11 \
  -main-script target/nextflow/workflows/runner/main.nf \
  --help

Having this project available locally, you can run the following command:

nextflow run vsh/demultiplex \
-r v0.3.11 \
-main-script target/nextflow/runner/main.nf \
--input "gs://viash-hub-resources/demultiplex/v3/demultiplex_htrnaseq_meta/SingleCell-RNA_P3_2"  \
--demultiplexer bclconvert \
--skip_copycomplete_check \
--publish_dir example_output/ \
-profile docker \
-c src/config/labels.config

(Optional) Resource usage tuning

Nextflow’s labels can be used to specify the amount of resources a process can use. This workflow uses the following labels for CPU and memory:

  • verylowmem, lowmem, midmem, highmem
  • verylowcpu, lowcpu, midcpu, highcpu

The defaults for these labels can be found at src/config/labels.config. Nextflow checks that the specified resources for a process do not exceed what is available on the machine and will not start if it does. Create your own config file to tune the labels to your needs, for example:

// Resource labels
withLabel: verylowcpu { cpus = 2 }
withLabel: lowcpu { cpus = 8 }
withLabel: midcpu { cpus = 16 }
withLabel: highcpu { cpus = 16 }

withLabel: verylowmem { memory = 4.GB }
withLabel: lowmem { memory = 8.GB }
withLabel: midmem { memory = 8.GB }
withLabel: highmem { memory = 8.GB }

When starting nextflow using the CLI, you can use -c to provide the file to nextflow and overwrite the defaults.

Acknowledgements

Developed in collaboration with Data Intuitive and Open Analytics.

About

Demultiplexing pipeline

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors