Deterministic, production-grade C++ inference engine built around Boost.SML orchestration.
This repository is under active development. APIs, state machines, and formats will change. If you’re evaluating EMEL, expect fast iteration and breaking changes until the core loader, allocator, and execution pipelines stabilize.
This inference engine is being implemented by AI under human engineering and architecture direction.
- Architect first, then scaffold cleanly.
- Port math, instructions, and behavior without mirroring reference control flow.
- Prove parity against llama.cpp.
- Match model/tokenizer intent as defined by their creators (transformers).
- Optimize once correctness is locked.
EMEL exists to make inference behavior explicit and verifiable. Instead of ad-hoc control flow, orchestration is modeled as Boost.SML state machines with deterministic, testable transitions. That enables:
- Clear operational semantics and failure modes.
- Deterministic, reproducible inference paths.
- High-performance, C-compatible boundaries without dynamic dispatch in hot paths.
- Auditable parity work against reference implementations without copying their control flow.
It might look like over-engineering — "I have a hammer so everything looks like a nail." But a state machine with two states has virtually zero overhead, and the goal is explicit behavior modeling, not complexity for its own sake. Stateless functions inevitably accumulate conditional logic as the code evolves: mode flags, error booleans, retry counters, phase enums. Taming that accidental complexity before it starts is the whole point of EMEL. Every actor has a visible state model, every transition is declared, and every unexpected event has a defined handler. That's the trade I'm making.
End-to-end performance will be inferior to llama.cpp and other engines initially — that's expected and accepted, even though many individual machines will perform comparably or better in isolation. Having explicit actions and states makes it straightforward to find hotspots, and if profiling shows a state machine itself is the bottleneck, it gets removed. Concurrency is intentionally deferred until single-threaded behavior is verified. That doesn't mean there's no plan for it — the actor model makes adding concurrency easier than it looks, and it will be introduced only where measurement says it's necessary.
“EMEL” is pronounced like “ML”. It’s a short, neutral name that doesn’t carry existing assumptions or baggage. It’s intentionally low-ceremony while I iterate on the core design.
Huge thanks to the contributors of llama.cpp and ggml. EMEL’s parity work depends on the quality and clarity of these reference implementations.
Special shout out to Georgi Gerganov, whose work created the foundation that made this ecosystem possible.
scripts/quality_gates.shIndividual gates live in scripts/build_with_zig.sh, scripts/test_with_coverage.sh,
scripts/test_with_sanitizers.sh, scripts/fuzz_smoke.sh, scripts/lint_snapshot.sh,
and scripts/bench.sh.
Zig’s C/C++ toolchain gives us consistent, fast, cross-platform builds without forcing a full dependency on any single system compiler or SDK. It keeps the default dev path reproducible, while still allowing native toolchains when needed.
Coverage and CI tooling are already standardized around CMake + CTest + llvm-cov/gcovr in this repo. Using CMake for test/coverage builds keeps gates deterministic and portable across CI environments, while Zig remains the default for day-to-day builds.
- Architecture (generated state-machine docs + Mermaid diagrams)
- Benchmarks (generated benchmark snapshot table)
- SML Conventions (Boost.SML conventions and usage)
- Parity Audit (parity audit status)
docs/benchmarks.mddocs/architecture/batch_planner_modes_equal.mddocs/architecture/batch_planner_modes_sequential.mddocs/architecture/batch_planner_modes_simple.mddocs/architecture/batch_planner.mddocs/architecture/gbnf_rule_parser_definition_parser.mddocs/architecture/gbnf_rule_parser_expression_parser.mddocs/architecture/gbnf_rule_parser_lexer.mddocs/architecture/gbnf_rule_parser_nonterm_parser.mddocs/architecture/gbnf_rule_parser.mddocs/architecture/gbnf_rule_parser_term_parser.mddocs/architecture/gbnf_sampler_accept_parser.mddocs/architecture/gbnf_sampler_candidate_parser.mddocs/architecture/gbnf_sampler_matcher_parser.mddocs/architecture/gbnf_sampler.mddocs/architecture/gbnf_sampler_token_parser.mddocs/architecture/generator.mddocs/architecture/gguf_loader.mddocs/architecture/graph_allocator_liveness_pass.mddocs/architecture/graph_allocator_ordering_pass.mddocs/architecture/graph_allocator_placement_pass.mddocs/architecture/graph_allocator.mddocs/architecture/graph_assembler_assemble_alloc_pass.mddocs/architecture/graph_assembler_assemble_build_pass.mddocs/architecture/graph_assembler_assemble_validate_pass.mddocs/architecture/graph_assembler_reserve_alloc_pass.mddocs/architecture/graph_assembler_reserve_build_pass.mddocs/architecture/graph_assembler_reserve_validate_pass.mddocs/architecture/graph_assembler_reuse_decision_pass.mddocs/architecture/graph_assembler.mddocs/architecture/graph_processor_alloc_step.mddocs/architecture/graph_processor_bind_step.mddocs/architecture/graph_processor_extract_step.mddocs/architecture/graph_processor_kernel_step.mddocs/architecture/graph_processor_prepare_step.mddocs/architecture/graph_processor.mddocs/architecture/graph_processor_validate_step.mddocs/architecture/graph.mddocs/architecture/kernel_aarch64.mddocs/architecture/kernel_cuda.mddocs/architecture/kernel_metal.mddocs/architecture/kernel_vulkan.mddocs/architecture/kernel_wasm.mddocs/architecture/kernel_x86_64.mddocs/architecture/logits_sampler.mddocs/architecture/logits_validator.mddocs/architecture/memory_hybrid.mddocs/architecture/memory_kv.mddocs/architecture/memory_recurrent.mddocs/architecture/model_loader.mddocs/architecture/model_weight_loader.mddocs/architecture/tensor.mddocs/architecture/tensor_view.mddocs/architecture/text_conditioner.mddocs/architecture/text_detokenizer.mddocs/architecture/text_encoders_bpe.mddocs/architecture/text_encoders_fallback.mddocs/architecture/text_encoders_plamo2.mddocs/architecture/text_encoders_rwkv.mddocs/architecture/text_encoders_spm.mddocs/architecture/text_encoders_ugm.mddocs/architecture/text_encoders_wpm.mddocs/architecture/text_formatter.mddocs/architecture/text_jinja_formatter.mddocs/architecture/text_jinja_parser_classifier_parser.mddocs/architecture/text_jinja_parser_lexer.mddocs/architecture/text_jinja_parser_program_parser_expression_parser.mddocs/architecture/text_jinja_parser_program_parser.mddocs/architecture/text_jinja_parser_program_parser_statement_parser.mddocs/architecture/text_jinja_parser.mddocs/architecture/text_renderer.mddocs/architecture/text_tokenizer_preprocessor_bpe.mddocs/architecture/text_tokenizer_preprocessor_fallback.mddocs/architecture/text_tokenizer_preprocessor_plamo2.mddocs/architecture/text_tokenizer_preprocessor_rwkv.mddocs/architecture/text_tokenizer_preprocessor_spm.mddocs/architecture/text_tokenizer_preprocessor_ugm.mddocs/architecture/text_tokenizer_preprocessor_wpm.mddocs/architecture/text_tokenizer.mddocs/architecture/token_batcher.md
scripts/generate_docs.shUse scripts/generate_docs.sh --check in CI to validate generated artifacts.