Skip to content

hakyimlab/enformer-predict

Repository files navigation

Enformer predict

A revamping of the old enformer pipeline to make it user-friendly and to allow for more flexibility in the predictions and saving of outputs.

Date: Fri Sep 20 2023

Main highlights:

  • Improved documentation
  • Use of dataclasses to ensure that data types are appropriate; subsequent use of dictionaries to pass parameters to parsl apps
  • Outputs are now saved in batches of hdf5 files; debug mode will allow to print the outputs to the screen or save as you wish
  • The save batches of hdf5 files can be collected into 1 big hdf5 file.
  • Fewer number of modules or sourced scripts
  • Aggregation script is now a part of the src code rather than a standalone module; you can now aggregate on-the-fly
  • Option to combine sub-hdf5 files into a single file; will require a different submission
  • Can also prepare the predictions in a format for PredictDB; useful when using the pipeline with the TFXcan pipeline.
  • Add feature: to check if the outputs are already present and skip the computation

Notes

To run very large jobs, it is best to open a screen session and submit the job from there. This will allow you to disconnect from the server and the job will continue to run. To run small scale jobs, you can submit the sbatch script after modifying.

There are 4 main scripts here:

  1. enformer_predict.{py, sbatch}: This is the main script that will run the enformer pipeline. It will take in the parameters and run the pipeline.
  2. enformer_merge.{py, sbatch}: This script will merge the sub-hdf5 files into a single hdf5 file.
  3. enformer_process.{py, sbatch}: This script take the merged file and split into a matrix of the predictions and relevant metadata.
  4. enpact_predict.{py, sbatch}: This script takes Enpact weights, and the outputs of enformer_process and predicts TF binding. In addition, the predictions are saved in a format that can be used with PredictDB to train SNP predictors of TF binding.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors