Skip to content

NEW @W-21600165@ - Add pattern_not_regex support to reduce false positives in custom rules #439

Merged
aruntyagiTutu merged 4 commits intodevfrom
arun.tyagi/add_pattern_not_regex_support
Mar 20, 2026
Merged

NEW @W-21600165@ - Add pattern_not_regex support to reduce false positives in custom rules #439
aruntyagiTutu merged 4 commits intodevfrom
arun.tyagi/add_pattern_not_regex_support

Conversation

@aruntyagiTutu
Copy link
Copy Markdown
Contributor

Summary

Implements optional pattern_not_regex field for custom regex rules that allows excluding matches that also match a negative pattern. This reduces false positives by enabling users to specify patterns that should NOT be flagged, such as
validatedMessageId or known safe functions.

Motivation

Regex rules currently flag all matches including known safe patterns, resulting in excessive false positives. For example, the DataWeave email header injection rule flags $(validatedMessageId) even though it's already validated and safe
to use. This requires manual triage and reduces rule effectiveness.

Changes

Core Implementation

  • config.ts: Added optional pattern_not_regex?: string field to RegexRule type with validation
  • engine.ts: Implemented negative pattern matching in scanFileContents method
    • Converts pattern_not_regex to RegExp if defined
    • Tests matched text substring against negative pattern
    • Skips (continues) if matches negative pattern
    • Resets regex.lastIndex to prevent state issues
  • messages.ts: Updated ConfigFieldDescription_custom_rules with documentation and DataWeave example
  • package.json: Bumped version to 0.34.0-SNAPSHOT

Tests

  • plugin.test.ts: Added 2 config validation tests
    • Validates pattern_not_regex is included in config
    • Validates error handling for invalid pattern_not_regex
  • engine.test.ts: Added 3 functional tests
    • Tests exclusion of validatedMessageId matches
    • Tests normal behavior without pattern_not_regex
    • Tests multiple exclusion patterns
  • Test data: Created 2 DataWeave files demonstrating use case

Example Usage

engines:
  regex:
    custom_rules:
      "DataWeaveEmailHeaderInjection":
        regex: /(To|From|Subject):\s*\$\([^)]+\)/gi
        pattern_not_regex: /\$\(\s*validatedMessageId\s*\)/gi
        file_extensions: [".dwl"]
        description: "Detects user input in email headers, excluding validated IDs"
        severity: "Critical"
        tags: ["Security"]

Before:
headers: {
  "In-Reply-To": $(validatedMessageId)  // ❌ Flagged (false positive)
  To: $(payload.email)                  // ❌ Flagged (real violation)
}

After:
headers: {
  "In-Reply-To": $(validatedMessageId)  // ✅ Excluded (matches pattern_not_regex)
  To: $(payload.email)                  // ❌ Still flagged (doesn't match pattern_not_regex)

aruntyagiTutu and others added 4 commits March 13, 2026 10:46
Implements optional pattern_not_regex field for custom rules that allows
excluding matches that also match a negative pattern. This reduces false
positives by enabling users to specify patterns that should NOT be flagged,
such as validatedMessageId or known safe functions in DataWeave email headers.

Changes:
- Added pattern_not_regex field to RegexRule type with validation
- Implemented negative pattern matching in engine scan logic
- Added documentation and examples in config description
- Created test data files for email header injection scenarios
- Added 5 comprehensive tests covering validation and functionality
- Bumped version to 0.34.0-SNAPSHOT
- More intuitive naming: regex_ignore clearly indicates patterns to ignore
- Updated all references in code, tests, and documentation
- All 79 tests pass successfully
- No functional changes, pure refactoring

Changed files:
- config.ts: Type definition and validation logic
- engine.ts: Pattern matching implementation
- messages.ts: Error messages and documentation
- engine.test.ts: Test cases
- plugin.test.ts: Plugin configuration tests
const patternNotRegex = this.regexRules[ruleName].regex_ignore
? convertToRegex(this.regexRules[ruleName].regex_ignore!)
: undefined;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if an invalid regex is passed does that validation happen before this ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**Yes, validation happens before this code executes.

In config.ts (lines 82-84), regex_ignore is validated during configuration parsing:

const rawPatternNotRegex: string | undefined = ruleExtractor.extractString('regex_ignore');
const patternNotRegexString: string | undefined = rawPatternNotRegex
? validateRegexString(rawPatternNotRegex, ruleExtractor.getFieldPath('regex_ignore'))
: undefined;

The validateRegexString() function tests if the pattern is a valid regex and throws a descriptive error if it's invalid. This happens during config loading, before any engine execution.

So by the time engine.ts calls convertToRegex(), the regex pattern has already been validated and is guaranteed to be valid (or the engine wouldn't have started).

Flow:

  1. Config loaded → config.ts validates regex_ignore pattern
  2. Invalid regex → Error thrown, engine doesn't start
  3. Valid regex → Engine starts, engine.ts safely calls convertToRegex()
    **

@aruntyagiTutu aruntyagiTutu merged commit b6c0106 into dev Mar 20, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants