Skip to content

DSM-3194 RDBMS | Masking | When certain regex is entered in Regular Expression masking and we generate samples or save, the UI freezes.#7

Merged
Mehak22852 merged 1 commit intomasterfrom
mn/dsm-3194
Mar 19, 2026
Merged

DSM-3194 RDBMS | Masking | When certain regex is entered in Regular Expression masking and we generate samples or save, the UI freezes.#7
Mehak22852 merged 1 commit intomasterfrom
mn/dsm-3194

Conversation

@Mehak22852
Copy link
Copy Markdown

@Mehak22852 Mehak22852 commented Mar 13, 2026

Issue - The regex used ^[A-Za-z]+(?:[ '-][A-Za-z]+)*$ is an infinte regex and it causes prepareRandom() to hang due to exhaustive backtracking in the algorithm.

Cause:
The prepareRandom() method uses exhaustive backtracking to find a string of an exact target length. For infinite regexes (those with cycles like +, *), when the first random path doesn't produce the exact target length, the algorithm tries all remaining transitions at every level. With states that have multiple transitions (e.g., [A-Za-z] has 2 transitions covering 52 characters, looping to the same state), this creates exponential exploration — 2^N paths for depth N. For a target length of 50, that's 2^50 paths, causing the method to hang indefinitely.

Resolution:
Added a global attempt counter (MAX_RANDOM_ATTEMPTS = 1000) that limits the total recursive exploration, applied only to infinite regexes. Finite regexes are unaffected since their search space is naturally bounded. When the budget is exhausted, the algorithm returns the best valid match found so far. Each call is O(1000) worst case instead of O(2^N).

Additional changes:

  1. Strip ^ and $ anchors: The brics automaton library (that Generex uses) does not support ^ and $ as anchors — it treats them as literal characters. This means a regex like ^[A-Za-z]+$ would generate strings like ^aBcDe$ with literal ^ and $ in the output. The fix strips a leading ^ and trailing $ (unless escaped as $) in createRegExp() before passing the regex to brics.

  2. Convert non-capturing groups (?:...) to (...): The brics library does not support non-capturing group syntax. Without this conversion, (?:abc) would be interpreted as literal ?, :, a, b, c characters inside a group. Since Generex only generates strings and never extracts capture groups, converting (?: to ( is a lossless transformation — the grouping behavior is identical for generation purposes.

@Mehak22852 Mehak22852 self-assigned this Mar 13, 2026
Copy link
Copy Markdown

@VipinKumar110 VipinKumar110 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Looks fine to me.

@Mehak22852 Mehak22852 marked this pull request as ready for review March 16, 2026 10:30
*/
private static RegExp createRegExp(String regex) {
String finalRegex = regex;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this affects any customers, but we might need a migration script to make these adjustments in their saved policies, so they still get the same behavior as before.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A migration script shouldn't be needed. The changes are in the backend processing of the regex. If such regex patterns are present in their saved policies, they will now produce the correct masked output as the backend code has been corrected to properly handle these regexes.

@Mehak22852
Copy link
Copy Markdown
Author

Suggested changes have been addressed. Please review
@ethan-wrasman-pkware , @RyanLuMaye

@Mehak22852 Mehak22852 force-pushed the mn/dsm-3194 branch 2 times, most recently from ab217da to 7e0a079 Compare March 17, 2026 19:28
…xpression masking and we generate samples or save, the UI freezes.

Issue - The regex used ^[A-Za-z]+(?:[ '-][A-Za-z]+)*$ is an infinte regex and it causes prepareRandom to hang due to exhaustive backtracking in the algorithm.

Cause -  The prepareRandom() method uses exhaustive backtracking to find a string of an exact target length. For infinite regexes (those with cycles like +, *), when the first random path doesn't produce the exact target length, the algorithm tries all remaining transitions at every level. With states that have multiple transitions (e.g., [A-Za-z] has 2 transitions covering 52 characters, looping to the same state), this creates exponential exploration — 2^N paths for depth N. For a target length of 50, that's 2^50 paths, causing the method to hang indefinitely.

Resolution:
Added a global attempt counter (MAX_RANDOM_ATTEMPTS = 1000) that limits the total recursive exploration, applied only to infinite regexes. Finite regexes are unaffected  since their search space is naturally bounded. When the budget is exhausted, the algorithm returns the best valid match found so far. Each call is O(1000) worst case instead of O(2^N).

Additional changes:

1. Strip ^ and $ anchors: The brics automaton library (that Generex uses) does not support ^ and $ as anchors — it treats them as literal characters.
  This means a regex like ^[A-Za-z]+$ would generate strings like ^aBcDe$ with literal ^ and $ in the output. Added convertToBricsRegex preprocessing to strip ^ and $ anchors in createRegExp() before passing the regex to brics.

2. Convert non-capturing groups (?:...) to (...): The brics library does not support non-capturing group syntax. Without this conversion, (?:abc) would be interpreted as literal ?, :, a, b, c characters inside a group. Since Generex only generates strings and never extracts capture groups, converting (?: to ( is a lossless transformation — the grouping behavior is identical for generation purposes.
@Mehak22852 Mehak22852 merged commit a9d337e into master Mar 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants