FILTER support for '*' or '?' constraint for more than 1 variable#37
Open
mitochon wants to merge 2 commits intopcingola:masterfrom
mitochon:support_multi_var_filter
Open
FILTER support for '*' or '?' constraint for more than 1 variable#37mitochon wants to merge 2 commits intopcingola:masterfrom mitochon:support_multi_var_filter
mitochon wants to merge 2 commits intopcingola:masterfrom
mitochon:support_multi_var_filter
Conversation
mitochon
commented
Oct 19, 2016
| <dependency> | ||
| <groupId>org.antlr</groupId> | ||
| <artifactId>antlr</artifactId> | ||
| <artifactId>antlr4</artifactId> |
Author
There was a problem hiding this comment.
This is not relevant, but I had to make this change for mvn to run successfully.
mitochon
commented
Oct 19, 2016
| FieldIterator.get().setMax(IteratorType.GENOTYPE_VAR, sub.length - 1); | ||
| FieldIterator.get().setType(index); | ||
| idx = FieldIterator.get().get(IteratorType.VAR); | ||
| idx = FieldIterator.get().get(IteratorType.GENOTYPE_VAR); |
Author
There was a problem hiding this comment.
Not sure about this change, but seems like this should be GENOTYPE_VAR.
Otherwise this need to be changed to
idx = FieldIterator.get().getVar(name)
Would recommend adding a test case that covers this use case.
mitochon
commented
Oct 19, 2016
|
|
||
| // Filter data | ||
| SnpSiftCmdFilter snpsiftFilter = new SnpSiftCmdFilter(); | ||
| String expression = "(EXAC_AF[*] <= 0.1) & (COSMIC_SITE_COUNT_SOMATIC[*] >= 2)"; |
Author
There was a problem hiding this comment.
If you add this test case and undo the other changes above, EXAC_AF[*] always yields 0, so the left hand side of this expression always evaluates to true
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
FILTER expression involving '*' or '?' does not give correct result if more than 1 variable used.
Example:
query Q1
filter "(BAR[*] < 0.1)" test.vcf-> yields no resultquery Q2
filter "(BAR[*] < 0.1) & (FOO[*] >= 2)" test.vcf-> yields 3 resultQ2 is more strict than Q1 so its result should always be a subset of Q1's result.
Cause
After looking at the code, the
*or?query currently supports 1 variable.If there are multiple variables
FOO,BARboth having*or?predicate, the indexnused to evaluateBAR[n]does not get evaluated properly.This causes incorrect results to be generated. See also
TestCasesFilter#test_57in this pull request for concrete example.Proposed Solution
Track the current index for each query variable in
FieldIterator.In this context
FOOwill track some indexmandBARwill track another indexn.Previously both
FOOandBARtracks the same indexneven though they are at different stages in the iteration.Verification
Added the following test files:
TestCasesFilter#test_57test/test_filter_multiple_var.vcf