Hi TinyEngine folks,
I wanted to share a small but unusual language-runtime project that may still be relevant to the broader system-algorithm co-design question your work represents, even though it targets language-task capability rather than the usual tiny vision or sensor workloads.
We built a public demo line called Engram and deployed it on a commodity ESP32-C3.
Current public numbers:
Important scope note:
This is not presented as unrestricted open-input native LLM generation on MCU.
The board-side path is closer to a flash-resident, table-driven runtime with:
- packed token weights
- hashed lookup structures
- fixed compiled probe batches
- streaming fold / checksum style execution over precompiled structures
So this is not a standard tiny dense model path. It is closer to a task-specialized language runtime whose behavior has been crystallized into a
compact executable form under severe physical constraints.
Repo:
https://github.com/Alpha-Guardian/Engram
Why I’m posting here is that TinyEngine and MCUNet are among the clearest public examples of system-algorithm co-design under extreme memory constraints.
What I’d be curious about is whether systems like this should be thought of as:
- completely outside the normal tiny-DNN family
- an extreme endpoint where some language-task capability may require its own tiny co-designed runtime path
- or an early sign that future tiny language systems may split into both very small dense models and highly specialized executable runtime forms
Would be very interested in your thoughts.
Hi TinyEngine folks,
I wanted to share a small but unusual language-runtime project that may still be relevant to the broader system-algorithm co-design question your work represents, even though it targets language-task capability rather than the usual tiny vision or sensor workloads.
We built a public demo line called Engram and deployed it on a commodity ESP32-C3.
Current public numbers:
Host-side benchmark capability
LogiQA = 0.392523IFEval = 0.780037Published board proof
LogiQA 642 = 249 / 642 = 0.3878504672897196host_full_match = 642 / 6421,380,771 bytesImportant scope note:
This is not presented as unrestricted open-input native LLM generation on MCU.
The board-side path is closer to a flash-resident, table-driven runtime with:
So this is not a standard tiny dense model path. It is closer to a task-specialized language runtime whose behavior has been crystallized into a
compact executable form under severe physical constraints.
Repo:
https://github.com/Alpha-Guardian/Engram
Why I’m posting here is that TinyEngine and MCUNet are among the clearest public examples of system-algorithm co-design under extreme memory constraints.
What I’d be curious about is whether systems like this should be thought of as:
Would be very interested in your thoughts.