Japanese | License | Changelog
A C++11 wrapper around the Oniguruma regular expression engine that provides a std::regex-like interface. This library emphasizes ease of use and compatibility.
It offers an API style similar to the standard library's <regex>, while exposing Oniguruma's powerful features.
- Access Oniguruma features with a
std::regex-like interface - Easy build using CMake
- Compatibility tests against the standard library implementation (CI tests multiple configurations)
- C++11-compatible compiler (Visual Studio 2015+, GCC, Clang, etc.)
- CMake >= 3.10 (required by the build scripts)
- git (for submodule support)
This project is intended to work with modern C++11-compatible compilers (MSVC 2015+, GCC 5+, Clang 3.8+). CI tests commonly include Windows, Linux, and recent macOS toolchains. If you rely on a different toolchain, please open an issue or test locally.
CI also tests builds under MSYS2 on Windows with both MINGW64 (x86_64) and MINGW32 (i686) environments using the MinGW-w64 toolchain.
#define USE_ONIGPP // toggle between onigpp and std (for example/demo only)
#ifdef USE_ONIGPP
#include "onigpp.h"
namespace rex = onigpp;
rex::auto_init g_auto_init;
#else
#include <regex>
namespace rex = std;
#endif
#include <iostream>
#include <string>
int main() {
std::string s = "hello 123 world";
rex::regex r(R"(\d+)");
rex::smatch m;
if (rex::regex_search(s, m, r)) {
std::cout << "matched: " << m.str() << std::endl;
}
return 0;
}Note: The
#define USE_ONIGPPtoggle in the example above is only for demonstration purposes to show how code can switch between onigpp andstd::regex. In actual projects, simply includeonigpp.hand link to the onigpp library.
Recommended steps (building from source):
git clone https://github.com/katahiromz/onigpp
cd onigpp
git submodule update --init --recursiveOut-of-source build (portable style):
mkdir build
cd build
cmake ..
cmake --build .CMake shorthand (CMake 3.13+):
cmake -S . -B build
cmake --build buildCommon CMake options:
-DCMAKE_BUILD_TYPE=Releaseor-DCMAKE_BUILD_TYPE=Debug-DUSE_STD_FOR_TESTS=ONor-DUSE_STD_FOR_TESTS=OFF(useful to run the same compatibility checks locally as CI)-DBUILD_SHARED_LIBS=ONor-DBUILD_SHARED_LIBS=OFF
On Windows (MSVC), specify the Visual Studio generator when running cmake.
Run the project's tests:
cmake --build build --target testOr run CTest directly from the build directory:
cd build
ctest --output-on-failureCI runs tests with both USE_STD_FOR_TESTS=ON and OFF to verify compatibility.
The USE_STD_FOR_TESTS CMake option allows you to run the compatibility tests using std::regex instead of onigpp. This is useful for verifying that the test patterns themselves are valid and that onigpp behaves consistently with the standard library.
# Build with USE_STD_FOR_TESTS enabled
cmake -S . -B build -DUSE_STD_FOR_TESTS=ON
cmake --build build
# Run the ecmascript compatibility test
./build/ecmascript_compat_testOnigpp provides an ECMAScript compatibility mode to make migrating from std::regex easier. Here's what you need to know:
ECMAScript is the default grammar when no flags are specified, matching the behavior of std::regex. This means:
#include "onigpp.h"
namespace rex = onigpp;
rex::auto_init g_auto_init;
// Both of these are equivalent - ECMAScript is the default
rex::regex r1(R"(\d+)"); // default: ECMAScript
rex::regex r2(R"(\d+)", rex::regex::ECMAScript); // explicit: ECMAScriptSince ECMAScript is the default, you typically don't need to specify it explicitly:
#include "onigpp.h"
namespace rex = onigpp;
rex::auto_init g_auto_init;
// ECMAScript mode (default, same as std::regex)
rex::regex pattern(R"(\d+)");The ECMAScript mode in onigpp provides:
-
Escape Sequences:
\xHH- Hexadecimal escape (e.g.,\x41for 'A')\uHHHH- Unicode escape (e.g.,\u00E9for 'é')\0- Null character (when not followed by digit)
-
Dot Behavior:
- By default, dot (
.) does NOT match newline characters (same as std::regex)
- By default, dot (
-
Replacement Templates:
$&- Whole match$1,$2, ... - Capture groups- `$`` - Text before match (prefix)
$'- Text after match (suffix)$$- Literal dollar sign${name}- Named capture groups
Some features may behave differently from std::regex:
-
Multiline Mode:
- The
multilineflag emulates ECMAScript semantics when combined withECMAScriptmode - When both
ECMAScriptandmultilineflags are set, onigpp rewrites the pattern at compile time to make^and$match at line boundaries. The recognized line separators are:\n,\r,\r\n, U+2028 (Line Separator), and U+2029 (Paragraph Separator) - This emulation preserves ECMAScript semantics: dot (
.) still does NOT match newlines (controlled separately by a potential future dotall flag) - Performance note: Pattern rewriting adds a small CPU cost at regex construction time, but has no runtime matching overhead
- Limitations: The rewrite handles common cases (unescaped
^and$outside character classes). Complex or unusual patterns may have edge cases; please report issues if you find any
- The
-
Named Captures:
- Oniguruma uses
(?<name>...)syntax - ECMAScript also supports this syntax
- Oniguruma uses
-
Lookahead/Lookbehind:
- Both positive and negative lookahead are supported
- Lookbehind is also available (an extension beyond basic ECMAScript)
onigpp exposes a set of syntax and compilation flags that let you control matching behavior and compilation options. These are available as enum flags on the wrapper (use bitwise OR to combine). Commonly used flags include:
ECMAScript(default): Use ECMAScript grammar (same asstd::regexdefault).icase: Case-insensitive matching (similar tostd::regex_constants::icase).nosubs: Do not store submatch results (can be used when only a boolean match is needed).multiline: Emulate ECMAScript multiline behavior so that^and$match at line boundaries (see "Multiline Mode").oniguruma: Enable Oniguruma's native syntax and behavior. When this flag is set, Oniguruma's default regex syntax is used instead of ECMAScript.optimize/ compilation hints: Some wrappers provide hints to request optimized compilation; behavior may be implementation-specific.
Usage example:
#include "onigpp.h"
namespace rex = onigpp;
rex::auto_init g_auto_init;
// Combine flags with bitwise OR:
rex::regex r(R"(^line\d+)", rex::regex::ECMAScript | rex::regex::multiline | rex::regex::icase);Notes:
- The exact set and names of flags exposed by the onigpp wrapper are defined in
onigpp.h. For advanced, engine-specific Oniguruma options and encoding-related settings, consultonigpp.hand the Oniguruma documentation. - Flags affect how patterns are compiled and matched; some Oniguruma-specific behaviors may not have direct equivalents in
std::regex.
Before (std::regex):
#include <regex>
std::string text = "Price: $100";
std::regex rex(R"(\$(\d+))");
std::string result = std::regex_replace(text, rex, "$$$1.00");
// Result: "Price: $100.00"After (onigpp with ECMAScript mode):
#include "onigpp.h"
onigpp::auto_init init;
std::string text = "Price: $100";
onigpp::regex rex(R"(\$(\d+))", onigpp::regex::ECMAScript);
std::string result = onigpp::regex_replace(text, rex, "$$$1.00");
// Result: "Price: $100.00"With the multiline emulation, ^ and $ match at line boundaries:
#include "onigpp.h"
onigpp::auto_init init;
std::string text = "line1\nline2\nline3";
onigpp::regex rex("^line\\d", onigpp::regex::ECMAScript | onigpp::regex::multiline);
// Find all lines starting with "line" followed by a digit
auto begin = onigpp::sregex_iterator(text.begin(), text.end(), rex);
auto end = onigpp::sregex_iterator();
for (auto it = begin; it != end; ++it) {
std::cout << "Match: " << it->str() << "\n";
}
// Output:
// Match: line1
// Match: line2
// Match: line3Note: Even with multiline mode, the dot (.) still does NOT match newlines, preserving ECMAScript semantics.
The test suite includes comprehensive ECMAScript compatibility tests. You can run them with:
cmake --build build --target ecmascript_compat_test
./build/ecmascript_compat_test- RE.md — Oniguruma++ regular expression reference (English)
- RE_ja.md — Oniguruma++ regular expression reference (Japanese)
- API.md — Oniguruma C API reference (English)
- onigpp.h — main header
Contributions are welcome! Here's how to get started:
-
Run tests before submitting: Ensure all tests pass locally.
cmake -S . -B build cmake --build build cd build && ctest --output-on-failure
-
Run compatibility tests: The project includes a compatibility harness to verify behavior against
std::regex.cmake -S . -B build -DUSE_STD_FOR_TESTS=ON cmake --build build cd build && ctest --output-on-failure
-
General guidelines:
- Keep changes focused and minimal
- Update
CHANGELOG.mdif adding significant features or fixes - Add or update tests for new functionality
- BSD-2-Clause (see LICENSE file for details)
- Maintainer: Katayama Hirofumi MZ katayama.hirofumi.mz@gmail.com
- Issues: https://github.com/katahiromz/onigpp/issues