Skip to content

eastgenomics/test_directory_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

187 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

test_directory_parser

Script(s) to parse the test directories.

These test directories have gone through a checking phase using https://github.com/eastgenomics/test_directory_checker

Before running the code

HGNC dump

To generate the HGNC dump, you can go to: https://www.genenames.org/download/custom/

And for the code to work, the following checkboxes need to be checked when you download the dump (they are by default):

  • HGNC ID
  • Approved symbol
  • Alias symbols
  • Previous symbols

Test directory

From v1.4.0, the test directory needs to contain a NGS Technology column which should be present in the internal test directory obtainable here: https://future.nhs.uk/EMEEGL/view?objectID=164193093

Config file

The config file is used to indicate the header line, the name of the sheet of interest and the name of the columns that need to be gathered and processed.

Right now the columns containing the test code, clinical indication name, test methods and the targets columns are processed without addition of code.

The code filters for the clinical indications that have values present in the ngs_type and ngs_test_methods fields.

{
    "name": "220421_RD",
    "sheet_of_interest": "R&ID indications",
    "clinical_indication_column_code": "Test ID",
    "clinical_indication_column_name": "Clinical Indication",
    "panel_column": "Target/Genes",
    "test_method_column": "Test Method",
    "ngs_column": "NGS Technology",
    "header_index": 1,
    "ngs_type": ["WES", "CEN"],
    "ngs_test_methods": [
        "Medium panel", "Single gene sequencing <=10 amplicons",
        "Single gene sequencing <10 amplicons",
        "Single gene sequencing >=10 amplicons",
        "Single gene testing (<10 amplicons)", "small panel", "Small panel",
        "WES or Large panel", "WES or Large Panel", "WES or Large penel",
        "WES or Medium panel", "WES or Medium Panel", "WES or Small Panel", "WGS"
    ]
}

Python environment

Setup your environment first:

python3 -m venv ${path_to_env}/${env_name}
source ${path_to_env}/${env_name}/bin/activate
pip install -r requirements.txt

How to run

# outputs a json containing cleaned data from the given test directory
python main.py -c configs/${config} [-o ${output_path}] --hgnc ${hgnc_dump.txt} rare_disease ${test_directory.xlsx} 

Run unittests

python -m unittest test_directory_parser.tests
# to suppress prints in the code
python -m unittest -b test_directory_parser.tests

Output

The code will output a JSON file with the following default name ${YYMMDD}_RD_TD.json with the following format:

{
  "td_source": "name_of_td_file_used_at_runtime",
  "config_source": "config_file_named_used",
  "date": "date_at_runtime",
  "indications": [
    {
      "name": "CI1",
      "code": "R1.1",
      "gemini_name": "R1.1_CI1_P",
      "test_method": "test_method1",
      "panels": [
        "panelapp_id"
      ],
      "original_targets": "Panel 1 (panelapp_id)",
      "changes": "No changes"
    },
    {
      "name": "CI2",
      "code": "R2.1",
      "gemini_name": "R2.1_CI2_P",
      "test_method": "test_method2",
      "panels": [
        "HGNC_ID"
      ],
      "original_targets": "Gene symbol",
      "changes": "No changes"
    },
  ]
}

This output is than used to import this data into the panel database using panel_ops (https://github.com/eastgenomics/panel_ops).

About

Script(s) to parse the test directories

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages