test_directory_parser

Script(s) to parse the test directories.

These test directories have gone through a checking phase using https://github.com/eastgenomics/test_directory_checker

Before running the code

HGNC dump

To generate the HGNC dump, you can go to: https://www.genenames.org/download/custom/

And for the code to work, the following checkboxes need to be checked when you download the dump (they are by default):

HGNC ID
Approved symbol
Alias symbols
Previous symbols

Test directory

From v1.4.0, the test directory needs to contain a NGS Technology column which should be present in the internal test directory obtainable here: https://future.nhs.uk/EMEEGL/view?objectID=164193093

Config file

The config file is used to indicate the header line, the name of the sheet of interest and the name of the columns that need to be gathered and processed.

Right now the columns containing the test code, clinical indication name, test methods and the targets columns are processed without addition of code.

The code filters for the clinical indications that have values present in the ngs_type and ngs_test_methods fields.

{
    "name": "220421_RD",
    "sheet_of_interest": "R&ID indications",
    "clinical_indication_column_code": "Test ID",
    "clinical_indication_column_name": "Clinical Indication",
    "panel_column": "Target/Genes",
    "test_method_column": "Test Method",
    "ngs_column": "NGS Technology",
    "header_index": 1,
    "ngs_type": ["WES", "CEN"],
    "ngs_test_methods": [
        "Medium panel", "Single gene sequencing <=10 amplicons",
        "Single gene sequencing <10 amplicons",
        "Single gene sequencing >=10 amplicons",
        "Single gene testing (<10 amplicons)", "small panel", "Small panel",
        "WES or Large panel", "WES or Large Panel", "WES or Large penel",
        "WES or Medium panel", "WES or Medium Panel", "WES or Small Panel", "WGS"
    ]
}

Python environment

Setup your environment first:

python3 -m venv ${path_to_env}/${env_name}
source ${path_to_env}/${env_name}/bin/activate
pip install -r requirements.txt

How to run

# outputs a json containing cleaned data from the given test directory
python main.py -c configs/${config} [-o ${output_path}] --hgnc ${hgnc_dump.txt} rare_disease ${test_directory.xlsx}

Run unittests

python -m unittest test_directory_parser.tests
# to suppress prints in the code
python -m unittest -b test_directory_parser.tests

Output

The code will output a JSON file with the following default name ${YYMMDD}_RD_TD.json with the following format:

{
  "td_source": "name_of_td_file_used_at_runtime",
  "config_source": "config_file_named_used",
  "date": "date_at_runtime",
  "indications": [
    {
      "name": "CI1",
      "code": "R1.1",
      "gemini_name": "R1.1_CI1_P",
      "test_method": "test_method1",
      "panels": [
        "panelapp_id"
      ],
      "original_targets": "Panel 1 (panelapp_id)",
      "changes": "No changes"
    },
    {
      "name": "CI2",
      "code": "R2.1",
      "gemini_name": "R2.1_CI2_P",
      "test_method": "test_method2",
      "panels": [
        "HGNC_ID"
      ],
      "original_targets": "Gene symbol",
      "changes": "No changes"
    },
  ]
}

This output is than used to import this data into the panel database using panel_ops (https://github.com/eastgenomics/panel_ops).

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.github/workflows		.github/workflows
configs		configs
test_directory_parser		test_directory_parser
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

test_directory_parser

Before running the code

HGNC dump

Test directory

Config file

Python environment

How to run

Run unittests

Output

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

test_directory_parser

Before running the code

HGNC dump

Test directory

Config file

Python environment

How to run

Run unittests

Output

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages