-
Notifications
You must be signed in to change notification settings - Fork 41
Unexpected exception and results with unmatched prefix (or suffix) #31
Copy link
Copy link
Open
Description
The pure versions of regex match extraction functions, Text.ICU.prefix, suffix, and (possibly) group do not correctly handle the case where a group is in a regex but is not used in a match. For example "a(b)?c" against "ac" or "(a)|b" against "b". They assume that start_ and end_ return -1 only when the grouping is out of range, but in fact they can when a grouping does not fire.
> prefix 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
*** Exception: Data.Text.Array.new: size overflow
CallStack (from HasCallStack):
error, called at ./Data/Text/Array.hs:129:20 in text-1.2.2.1-FeA6fTH3E2n883cNXIS2Li:Data.Text.Array
> suffix 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Just "\NULxabcghiy"
An out of bounds range gives the expected results:
> prefix 2 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Nothing
> suffix 2 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Nothing
group possibly does right thing, but not for the right reason (it extracts -1 to -1), and perhaps should return Nothing instead:
> group 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Just ""
One solution would be to use the safe underlying start and end functions instead, returning Nothing for any underlying Nothing. Happy to submit a PR for this approach.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels