This repo contains my Python implementation of a Non-deterministic Finite Automaton (NFA) Regex engine. The benefit of an NFA Regex engine is its support for advanced features (such as backreferences, lazy quantifiers, and lookarounds), though it faces the risk of catastrophic backtracking. NFA engines are the standard and are implemented in the default libraries of Python, Perl, JS, JAVA, .NET, PHP, Ruby, and many more languages.
Regex-Engine is a functional NFA regex engine built from scratch without external regex libraries. It parses regular expressions into an Abstract Syntax Tree (AST) and evaluates strings via backtracking.
Below is a render of the Abstract Syntax Tree for "(reg)*-?(ex)+".
- Literals: Matches exact string characters.
- Wildcard (
.): Matches any single character (except newlines). - Concatenation (
abc): Sequential character matching. - Alternation (
a|b): Matches either the expression on the left or the right. - Quantifiers (Greedy):
*: Matches the preceding element zero or more times.+: Matches the preceding element one or more times.?: Matches the preceding element zero or one time.
- Character Classes (
[...]): Matches any single character within the brackets. Supports both discrete characters (e.g.,[abc]) and ranges (e.g.,[a-zA-Z1-5]). - Grouping (
(...)): Overrides default precedence rules to group sub-expressions together (e.g.,(abc|def)+). (Note: Capturing and extracting these groups is currently a work-in-progress). - Escaping (
\): Allows the use of metacharacters as literal characters (e.g.,\.).
The engine is broken down into three main theoretical components:
- Lexer: Reads the raw regex string and tokenizes it into identifiable units (e.g.,
STAR,PIPE,CHAR_CLASS). - Parser: A recursive descent parser that processes the tokens based on regex operator precedence (Parentheses > Quantifiers > Concatenation > Alternation) to build an Abstract Syntax Tree (AST) composed of
Nodeobjects. - Matcher (AST Nodes): Each
Nodesubclass (likeLiteral,Alternation, orQuantifier) implements amatch()method. The engine traverses the text using aStateContextand utilizes backtracking to explore different NFA branches until a match is found or all possibilities are exhausted. - Animations: using manim, there is a helper script (
resources/regex_anim.py) that takes a regex expression and string to match against, and then renders a video of the checking process.
The project includes a built-in Tester class that validates the engine's functionality against a dictionary of predefined unit tests. It utilizes colorama for clean, color-coded terminal outputs indicating PASS/FAIL states alongside the evaluated rule, target string, and boolean result.
Istraživačka stanica Petnica (Petnica Science Center) is a premier independent, non-profit institution in Serbia (near Valjevo) founded in 1982 for extracurricular science education. It provides advanced training, research facilities, and mentorship for high-achieving students and teachers across 15+ STEM and social science disciplines, fostering critical thinking through hands-on research projects.

