Skip to content

Unexpected exception and results with unmatched prefix (or suffix) #31

@dylex

Description

@dylex

The pure versions of regex match extraction functions, Text.ICU.prefix, suffix, and (possibly) group do not correctly handle the case where a group is in a regex but is not used in a match. For example "a(b)?c" against "ac" or "(a)|b" against "b". They assume that start_ and end_ return -1 only when the grouping is out of range, but in fact they can when a grouping does not fire.

> prefix 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
*** Exception: Data.Text.Array.new: size overflow
CallStack (from HasCallStack):
  error, called at ./Data/Text/Array.hs:129:20 in text-1.2.2.1-FeA6fTH3E2n883cNXIS2Li:Data.Text.Array
> suffix 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Just "\NULxabcghiy"

An out of bounds range gives the expected results:

> prefix 2 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Nothing
> suffix 2 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Nothing

group possibly does right thing, but not for the right reason (it extracts -1 to -1), and perhaps should return Nothing instead:

> group 1 =<< find (regex [] "abc(def)?ghi") "xabcghiy"
Just ""

One solution would be to use the safe underlying start and end functions instead, returning Nothing for any underlying Nothing. Happy to submit a PR for this approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions