NEW @W-21600165@ - Add pattern_not_regex support to reduce false positives in custom rules #439
Conversation
Implements optional pattern_not_regex field for custom rules that allows excluding matches that also match a negative pattern. This reduces false positives by enabling users to specify patterns that should NOT be flagged, such as validatedMessageId or known safe functions in DataWeave email headers. Changes: - Added pattern_not_regex field to RegexRule type with validation - Implemented negative pattern matching in engine scan logic - Added documentation and examples in config description - Created test data files for email header injection scenarios - Added 5 comprehensive tests covering validation and functionality - Bumped version to 0.34.0-SNAPSHOT
- More intuitive naming: regex_ignore clearly indicates patterns to ignore - Updated all references in code, tests, and documentation - All 79 tests pass successfully - No functional changes, pure refactoring Changed files: - config.ts: Type definition and validation logic - engine.ts: Pattern matching implementation - messages.ts: Error messages and documentation - engine.test.ts: Test cases - plugin.test.ts: Plugin configuration tests
| const patternNotRegex = this.regexRules[ruleName].regex_ignore | ||
| ? convertToRegex(this.regexRules[ruleName].regex_ignore!) | ||
| : undefined; | ||
|
|
There was a problem hiding this comment.
what happens if an invalid regex is passed does that validation happen before this ?
There was a problem hiding this comment.
**Yes, validation happens before this code executes.
In config.ts (lines 82-84), regex_ignore is validated during configuration parsing:
const rawPatternNotRegex: string | undefined = ruleExtractor.extractString('regex_ignore');
const patternNotRegexString: string | undefined = rawPatternNotRegex
? validateRegexString(rawPatternNotRegex, ruleExtractor.getFieldPath('regex_ignore'))
: undefined;
The validateRegexString() function tests if the pattern is a valid regex and throws a descriptive error if it's invalid. This happens during config loading, before any engine execution.
So by the time engine.ts calls convertToRegex(), the regex pattern has already been validated and is guaranteed to be valid (or the engine wouldn't have started).
Flow:
- Config loaded → config.ts validates regex_ignore pattern
- Invalid regex → Error thrown, engine doesn't start
- Valid regex → Engine starts, engine.ts safely calls convertToRegex()
**
Summary
Implements optional
pattern_not_regexfield for custom regex rules that allows excluding matches that also match a negative pattern. This reduces false positives by enabling users to specify patterns that should NOT be flagged, such asvalidatedMessageIdor known safe functions.Motivation
Regex rules currently flag all matches including known safe patterns, resulting in excessive false positives. For example, the DataWeave email header injection rule flags
$(validatedMessageId)even though it's already validated and safeto use. This requires manual triage and reduces rule effectiveness.
Changes
Core Implementation
pattern_not_regex?: stringfield toRegexRuletype with validationscanFileContentsmethodpattern_not_regexto RegExp if definedregex.lastIndexto prevent state issuesConfigFieldDescription_custom_ruleswith documentation and DataWeave example0.34.0-SNAPSHOTTests
pattern_not_regexis included in configpattern_not_regexvalidatedMessageIdmatchespattern_not_regexExample Usage