Skip to content

Null-deref in RE2 DFA/hash-set teardown triggered by extremely small max_mem option #589

@hgarrereyn

Description

@hgarrereyn

Hi, there is a potential bug in the RE2 teardown pathway triggered by a small max_mem option.

This bug was reproduced on e7aec59.

Description

What crashes

  • The crash is a NULL read in absl::container_internal::raw_hash_set iterator construction (raw_hash_set.h:597) where the iterator stores and dereferences a generation pointer that is NULL.
  • In the original stack, this happens while RE2’s compiler consults a small internal FlatHashMap cache (Compiler::CachedRuneByteSuffix). In the A/B test with a trivial pattern, the same issue appears in DFA::ClearCache during RE2 teardown. Both paths ultimately construct absl flat_hash_* iterators whose generation_ptr is NULL.

Why it happens

  • The fuzzer set RE2::Options::max_mem to an extremely small value (4096). RE2 logs “DFA out of memory” and attempts to clean up DFA/cache structures using absl::flat_hash_* containers.
  • With the tiny budget, the associated absl containers end up in a state where begin()/end()/find() operate on an uninitialized ‘generation’ pointer. Abseil’s raw_hash_set expects a valid generation pointer even for empty tables; passing NULL causes a NULL dereference when constructing an iterator.
  • This occurs with any pattern under this option setting: our A/B check shows that even "A" reproduces, so it is not dependent on crafted text or a particular regex feature.

Notes

  • Two code paths exhibit the same underlying issue: (1) Compiler::CachedRuneByteSuffix using a FlatHashMap, and (2) DFA::ClearCache iterating a FlatHashSet. Both rely on absl::raw_hash_set begin()/end()/find() reading a generation pointer that is NULL when the table has not been properly initialized due to out-of-memory or zero-capacity setup under tiny max_mem. Auditing initialization and empty/capacity==0 handling for these containers should resolve both manifestations.

POC

The following testcase demonstrates the bug:

testcase.cpp

#include <string>
#include "/fuzz/install/include/re2/re2.h"

int main() {
  re2::RE2::Options opts;
  // Tiny DFA memory budget triggers the bug.
  opts.set_max_mem(4096);
  // Any trivial pattern reproduces; even "A" is enough.
  re2::RE2 re("A", opts);
  // Touch the object to ensure construction/destruction occurs.
  bool m = re2::RE2::PartialMatch("fooAbar", re);
  (void)m;
  return 0;  // Crash occurs during construction/teardown under ASan.
}

stdout


stderr

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1767886920.891102       1 re2.cc:783] DFA out of memory: pattern length 1, program size 5, list count 4, bytemap range 2
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f782c2a2a0d bp 0x7ffe4056b510 sp 0x7ffe4056b3e0 T0)
==1==The signal is caused by a READ memory access.
==1==Hint: address points to the zero page.
    #0 0x7f782c2a2a0d in absl::container_internal::raw_hash_set<absl::container_internal::FlatHashSetPolicy<re2::DFA::State*>, re2::DFA::StateHash, re2::DFA::StateEqual, std::allocator<re2::DFA::State*>>::end() (/fuzz/install/lib/libre2.so.11+0x4ea0d) (BuildId: 32ec25d1e89973f2ee3c7888dc08d675ebdafaf0)
    #1 0x7f782c2a2df8 in absl::container_internal::raw_hash_set<absl::container_internal::FlatHashSetPolicy<re2::DFA::State*>, re2::DFA::StateHash, re2::DFA::StateEqual, std::allocator<re2::DFA::State*>>::begin() (/fuzz/install/lib/libre2.so.11+0x4edf8) (BuildId: 32ec25d1e89973f2ee3c7888dc08d675ebdafaf0)
    #2 0x7f782c296ed0 in re2::DFA::ClearCache() (/fuzz/install/lib/libre2.so.11+0x42ed0) (BuildId: 32ec25d1e89973f2ee3c7888dc08d675ebdafaf0)
    #3 0x7f782c296c68 in re2::DFA::~DFA() (/fuzz/install/lib/libre2.so.11+0x42c68) (BuildId: 32ec25d1e89973f2ee3c7888dc08d675ebdafaf0)
    #4 0x7f782c29f010 in re2::Prog::DeleteDFA(re2::DFA*) (/fuzz/install/lib/libre2.so.11+0x4b010) (BuildId: 32ec25d1e89973f2ee3c7888dc08d675ebdafaf0)
    #5 0x7f782c2fef91 in re2::Prog::~Prog() (/fuzz/install/lib/libre2.so.11+0xaaf91) (BuildId: 32ec25d1e89973f2ee3c7888dc08d675ebdafaf0)
    #6 0x7f782c319c42 in re2::RE2::~RE2() (/fuzz/install/lib/libre2.so.11+0xc5c42) (BuildId: 32ec25d1e89973f2ee3c7888dc08d675ebdafaf0)
    #7 0x55798544d6ca in main /fuzz/testcase.cpp:14:1
    #8 0x7f782bd04d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #9 0x7f782bd04e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #10 0x557985372334 in _start (/fuzz/test+0x2c334) (BuildId: 367e7a2f221b119096a3fee9ed226284252ef68e)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/fuzz/install/lib/libre2.so.11+0x4ea0d) (BuildId: 32ec25d1e89973f2ee3c7888dc08d675ebdafaf0) in absl::container_internal::raw_hash_set<absl::container_internal::FlatHashSetPolicy<re2::DFA::State*>, re2::DFA::StateHash, re2::DFA::StateEqual, std::allocator<re2::DFA::State*>>::end()
==1==ABORTING

Steps to Reproduce

The crash was triaged with the following Dockerfile:

Dockerfile

# Ubuntu 22.04 with some packages pre-installed
FROM hgarrereyn/stitch_repro_base@sha256:3ae94cdb7bf2660f4941dc523fe48cd2555049f6fb7d17577f5efd32a40fdd2c

RUN git clone https://github.com/google/re2.git /fuzz/src && \
    cd /fuzz/src && \
    git checkout e7aec59 && \
    git submodule update --init --remote --recursive

ENV LD_LIBRARY_PATH=/fuzz/install/lib
ENV ASAN_OPTIONS=hard_rss_limit_mb=1024:detect_leaks=0

RUN echo '#!/bin/bash\nexec clang-17 -fsanitize=address -O0 "$@"' > /usr/local/bin/clang_wrapper && \
    chmod +x /usr/local/bin/clang_wrapper && \
    echo '#!/bin/bash\nexec clang++-17 -fsanitize=address -O0 "$@"' > /usr/local/bin/clang_wrapper++ && \
    chmod +x /usr/local/bin/clang_wrapper++

# Install dependencies: build tools, pkg-config, Abseil
RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
      make pkg-config ca-certificates cmake ninja-build && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /tmp
RUN git clone https://github.com/abseil/abseil-cpp.git && \
    cd abseil-cpp && \
    git checkout 735c86164a69141f33ccfcb20ecf1b9254be32a7 && \
    cd .. && \
    cmake -S abseil-cpp -B absl-build \
        -DCMAKE_CXX_STANDARD=17 \
        -DCMAKE_CXX_STANDARD_REQUIRED=ON \
        -DCMAKE_BUILD_TYPE=Release \
        -DCMAKE_POSITION_INDEPENDENT_CODE=ON \
        -DCMAKE_INSTALL_PREFIX=/fuzz/install \
        -DBUILD_SHARED_LIBS=ON && \
    cmake --build absl-build -j$(nproc) && \
    cmake --install absl-build && \
    rm -rf /tmp/abseil-cpp /tmp/absl-build

# Configure and build RE2
WORKDIR /fuzz/src
RUN cmake -S . -B build \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_CXX_STANDARD=17 \
    -DABSL_PROPAGATE_CXX_STD=ON \
    -DABSL_ENABLE_INSTALL=ON \
    -DCMAKE_CXX_STANDARD_REQUIRED=ON \
    -DCMAKE_C_COMPILER=clang_wrapper \
    -DCMAKE_CXX_COMPILER=clang_wrapper++ \
    -DCMAKE_INSTALL_PREFIX=/fuzz/install \
    -DCMAKE_PREFIX_PATH=/fuzz/install \
    -DBUILD_SHARED_LIBS=ON \
    -DCMAKE_SKIP_BUILD_RPATH=FALSE \
    -DCMAKE_BUILD_RPATH=/fuzz/install/lib \
    -DCMAKE_INSTALL_RPATH=/fuzz/install/lib \
    -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=TRUE \
    -DRE2_BUILD_TESTING=OFF && \
    cmake --build build -j$(nproc) && \
    cmake --install build

Build Command

clang++-17 -fsanitize=address -g -O0 -o /fuzz/test /fuzz/testcase.cpp -I/fuzz/install/include -L/fuzz/install/lib -lre2 -pthread && /fuzz/test

Reproduce

  1. Copy Dockerfile and testcase.cpp into a local folder.
  2. Build the repro image:
docker build . -t repro --platform=linux/amd64
  1. Compile and run the testcase in the image:
docker run \
    -it --rm \
    --platform linux/amd64 \
    --mount type=bind,source="$(pwd)/testcase.cpp",target=/fuzz/testcase.cpp \
    repro \
    bash -c "clang++-17 -fsanitize=address -g -O0 -o /fuzz/test /fuzz/testcase.cpp -I/fuzz/install/include -L/fuzz/install/lib -lre2 -pthread && /fuzz/test"


Additional Info

This testcase was discovered by STITCH, an autonomous fuzzing system. All reports are reviewed manually (by a human) before submission.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions