DSM-3194 RDBMS | Masking | When certain regex is entered in Regular Expression masking and we generate samples or save, the UI freezes.#7
Merged
Mehak22852 merged 1 commit intomasterfrom Mar 19, 2026
Conversation
manojgarg061983
approved these changes
Mar 16, 2026
SimranKA223
approved these changes
Mar 16, 2026
| */ | ||
| private static RegExp createRegExp(String regex) { | ||
| String finalRegex = regex; | ||
|
|
There was a problem hiding this comment.
I'm not sure if this affects any customers, but we might need a migration script to make these adjustments in their saved policies, so they still get the same behavior as before.
Author
There was a problem hiding this comment.
A migration script shouldn't be needed. The changes are in the backend processing of the regex. If such regex patterns are present in their saved policies, they will now produce the correct masked output as the backend code has been corrected to properly handle these regexes.
RyanLuMaye
suggested changes
Mar 17, 2026
Author
|
Suggested changes have been addressed. Please review |
ab217da to
7e0a079
Compare
ethan-wrasman-pkware
approved these changes
Mar 18, 2026
RyanLuMaye
approved these changes
Mar 18, 2026
…xpression masking and we generate samples or save, the UI freezes. Issue - The regex used ^[A-Za-z]+(?:[ '-][A-Za-z]+)*$ is an infinte regex and it causes prepareRandom to hang due to exhaustive backtracking in the algorithm. Cause - The prepareRandom() method uses exhaustive backtracking to find a string of an exact target length. For infinite regexes (those with cycles like +, *), when the first random path doesn't produce the exact target length, the algorithm tries all remaining transitions at every level. With states that have multiple transitions (e.g., [A-Za-z] has 2 transitions covering 52 characters, looping to the same state), this creates exponential exploration — 2^N paths for depth N. For a target length of 50, that's 2^50 paths, causing the method to hang indefinitely. Resolution: Added a global attempt counter (MAX_RANDOM_ATTEMPTS = 1000) that limits the total recursive exploration, applied only to infinite regexes. Finite regexes are unaffected since their search space is naturally bounded. When the budget is exhausted, the algorithm returns the best valid match found so far. Each call is O(1000) worst case instead of O(2^N). Additional changes: 1. Strip ^ and $ anchors: The brics automaton library (that Generex uses) does not support ^ and $ as anchors — it treats them as literal characters. This means a regex like ^[A-Za-z]+$ would generate strings like ^aBcDe$ with literal ^ and $ in the output. Added convertToBricsRegex preprocessing to strip ^ and $ anchors in createRegExp() before passing the regex to brics. 2. Convert non-capturing groups (?:...) to (...): The brics library does not support non-capturing group syntax. Without this conversion, (?:abc) would be interpreted as literal ?, :, a, b, c characters inside a group. Since Generex only generates strings and never extracts capture groups, converting (?: to ( is a lossless transformation — the grouping behavior is identical for generation purposes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue - The regex used ^[A-Za-z]+(?:[ '-][A-Za-z]+)*$ is an infinte regex and it causes prepareRandom() to hang due to exhaustive backtracking in the algorithm.
Cause:
The prepareRandom() method uses exhaustive backtracking to find a string of an exact target length. For infinite regexes (those with cycles like +, *), when the first random path doesn't produce the exact target length, the algorithm tries all remaining transitions at every level. With states that have multiple transitions (e.g., [A-Za-z] has 2 transitions covering 52 characters, looping to the same state), this creates exponential exploration — 2^N paths for depth N. For a target length of 50, that's 2^50 paths, causing the method to hang indefinitely.
Resolution:
Added a global attempt counter (MAX_RANDOM_ATTEMPTS = 1000) that limits the total recursive exploration, applied only to infinite regexes. Finite regexes are unaffected since their search space is naturally bounded. When the budget is exhausted, the algorithm returns the best valid match found so far. Each call is O(1000) worst case instead of O(2^N).
Additional changes:
Strip ^ and $ anchors: The brics automaton library (that Generex uses) does not support ^ and $ as anchors — it treats them as literal characters. This means a regex like ^[A-Za-z]+$ would generate strings like ^aBcDe$ with literal ^ and $ in the output. The fix strips a leading ^ and trailing $ (unless escaped as $) in createRegExp() before passing the regex to brics.
Convert non-capturing groups (?:...) to (...): The brics library does not support non-capturing group syntax. Without this conversion, (?:abc) would be interpreted as literal ?, :, a, b, c characters inside a group. Since Generex only generates strings and never extracts capture groups, converting (?: to ( is a lossless transformation — the grouping behavior is identical for generation purposes.