Script(s) to parse the test directories.
These test directories have gone through a checking phase using https://github.com/eastgenomics/test_directory_checker
To generate the HGNC dump, you can go to: https://www.genenames.org/download/custom/
And for the code to work, the following checkboxes need to be checked when you download the dump (they are by default):
- HGNC ID
- Approved symbol
- Alias symbols
- Previous symbols
From v1.4.0, the test directory needs to contain a NGS Technology column which should be present in the internal test directory obtainable here: https://future.nhs.uk/EMEEGL/view?objectID=164193093
The config file is used to indicate the header line, the name of the sheet of interest and the name of the columns that need to be gathered and processed.
Right now the columns containing the test code, clinical indication name, test methods and the targets columns are processed without addition of code.
The code filters for the clinical indications that have values present in the ngs_type and ngs_test_methods fields.
{
"name": "220421_RD",
"sheet_of_interest": "R&ID indications",
"clinical_indication_column_code": "Test ID",
"clinical_indication_column_name": "Clinical Indication",
"panel_column": "Target/Genes",
"test_method_column": "Test Method",
"ngs_column": "NGS Technology",
"header_index": 1,
"ngs_type": ["WES", "CEN"],
"ngs_test_methods": [
"Medium panel", "Single gene sequencing <=10 amplicons",
"Single gene sequencing <10 amplicons",
"Single gene sequencing >=10 amplicons",
"Single gene testing (<10 amplicons)", "small panel", "Small panel",
"WES or Large panel", "WES or Large Panel", "WES or Large penel",
"WES or Medium panel", "WES or Medium Panel", "WES or Small Panel", "WGS"
]
}Setup your environment first:
python3 -m venv ${path_to_env}/${env_name}
source ${path_to_env}/${env_name}/bin/activate
pip install -r requirements.txt# outputs a json containing cleaned data from the given test directory
python main.py -c configs/${config} [-o ${output_path}] --hgnc ${hgnc_dump.txt} rare_disease ${test_directory.xlsx} python -m unittest test_directory_parser.tests
# to suppress prints in the code
python -m unittest -b test_directory_parser.testsThe code will output a JSON file with the following default name ${YYMMDD}_RD_TD.json with the following format:
{
"td_source": "name_of_td_file_used_at_runtime",
"config_source": "config_file_named_used",
"date": "date_at_runtime",
"indications": [
{
"name": "CI1",
"code": "R1.1",
"gemini_name": "R1.1_CI1_P",
"test_method": "test_method1",
"panels": [
"panelapp_id"
],
"original_targets": "Panel 1 (panelapp_id)",
"changes": "No changes"
},
{
"name": "CI2",
"code": "R2.1",
"gemini_name": "R2.1_CI2_P",
"test_method": "test_method2",
"panels": [
"HGNC_ID"
],
"original_targets": "Gene symbol",
"changes": "No changes"
},
]
}This output is than used to import this data into the panel database using panel_ops (https://github.com/eastgenomics/panel_ops).