libane

Direct access to the Apple Neural Engine from C++ and Python.

libane is a low-level ANE runtime and compiler interface. It exposes a Graph IR for describing full forward passes, compiles them into the minimum number of ANE dispatches via automatic op fusion, and executes them through a stable C ABI. Matmul is implemented as conv1×1 for the 3× throughput advantage over MIL matmul on ANE.

Private framework dependency. libane loads AppleNeuralEngine.framework via dlopen. It is intended for research use and low-level ANE experimentation, not production deployment.

Install

pip install libane

Requires Apple Silicon (M1 or later) and macOS 14+. The wheel is a compiled extension — no extra build steps needed.

Requirements


Hardware	Apple Silicon (M1 or later)
OS	macOS 14 Sonoma or later
Toolchain	Xcode 15+, CMake 3.24+, C++17
Python	3.10+ (optional)

Python quick-start

import ane
import numpy as np

print(ane.available())   # True on Apple Silicon
print(ane.version())     # "0.7.1"

# Single-op matmul
A = np.random.randn(128, 512).astype(np.float16)
B = np.random.randn(512, 256).astype(np.float16)
C = ane.matmul(A, B)    # shape (128, 256), fp16

# Graph API — fused FFN block
D, FFN, SEQ = 512, 2048, 128
W_up   = np.random.randn(D,   FFN).astype(np.float16)
W_down = np.random.randn(FFN, D  ).astype(np.float16)
scale  = np.ones(D, dtype=np.float16)

g = ane.Graph()
x   = g.add_input("x",  [1, D,   1, SEQ])
rn  = g.add_op(ane.RMSNORM, [x],   [1, D,   1, SEQ], weights=scale)
up  = g.add_op(ane.MATMUL,  [rn],  [1, FFN, 1, SEQ], weights=W_up)
act = g.add_op(ane.GELU,    [up],  [1, FFN, 1, SEQ])
out = g.add_op(ane.MATMUL,  [act], [1, D,   1, SEQ], weights=W_down)
g.mark_output(out)

cg = g.compile()
cg.set_output_shapes([[1, D, 1, SEQ]])

x_data = np.random.randn(D, SEQ).astype(np.float16)
result = cg(x_data)
print(result.shape)   # (1, 512, 1, 128)

See examples/ffn_inference.py for a timed end-to-end example.

Building from source

git clone https://github.com/AmiraniLabs/libane
cd libane
cmake -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build -j$(sysctl -n hw.logicalcpu)
ctest --test-dir build --output-on-failure

To build the Python module from source:

pip install pybind11 numpy scikit-build-core
pip install ./bindings/python

C quick-start

Single op

#include "libane.h"

// Compile a matmul [1, 512, 1, 128] → [1, 256, 1, 128]
libane_shape_t shape = {.dims = {1, 256, 1, 128}, .ndim = 4};
libane_handle_t h = libane_compile(LIBANE_OP_MATMUL, shape,
                                    fp16_weights, sizeof(fp16_weights));
libane_execute(h, input, output, shape);
libane_release(h);

Graph API (fused multi-op)

#include "libane.h"

// Build: RMSNorm → Matmul → GELU
libane_graph_t g = libane_graph_create();

libane_shape_t in_shape  = {.dims={1,512,1,128}, .ndim=4};
libane_shape_t out_shape = {.dims={1,256,1,128}, .ndim=4};

uint32_t x  = libane_graph_add_input(g, "x", in_shape);
uint32_t rn = libane_graph_add_op(g, LIBANE_OP_RMSNORM, &x, 1,
                                   in_shape, rn_scale, rn_scale_len);
uint32_t mm = libane_graph_add_op(g, LIBANE_OP_MATMUL,  &rn, 1,
                                   out_shape, weights, weights_len);
uint32_t act= libane_graph_add_op(g, LIBANE_OP_GELU,    &mm, 1,
                                   out_shape, NULL, 0);
libane_graph_mark_output(g, act, "out");

libane_compiled_graph_t cg = libane_graph_compile(g);

// Execute
const void* in_ptrs[]   = { input_fp16 };
size_t      in_bytes[]  = { 512 * 128 * 2 };
void*       out_ptrs[]  = { output_fp16 };
size_t      out_bytes[] = { 256 * 128 * 2 };
libane_graph_execute(cg, in_ptrs, in_bytes, 1, out_ptrs, out_bytes, 1);

libane_compiled_graph_release(cg);
libane_graph_release(g);

Architecture

libane
├── include/libane.h          Stable C ABI
├── src/
│   ├── core/
│   │   ├── mil_builder       MIL text program generator
│   │   ├── compile_cache     LRU cache for compiled programs
│   │   └── buffer_manager    IOSurface-backed fp16 buffer pool
│   ├── graph/
│   │   ├── ane_graph         Graph IR (DAG builder)
│   │   ├── graph_validator   7-check validation pass
│   │   ├── fusion_rules      Greedy linear-chain fusion
│   │   ├── graph_compiler    build_plan() + compile()
│   │   └── graph_executor    Per-group ANE dispatch
│   ├── runtime/
│   │   └── ane_runtime.mm    AppleNeuralEngine.framework wrapper
│   └── fallback/             Accelerate BLAS CPU fallback
└── bindings/python/          pybind11 module (pip install libane)

Fusion rules compact linear chains into single ANE programs. A 6-op FFN (RMSNorm → Matmul → SiLU, Matmul → Mul → Matmul) typically becomes 3–4 ANE dispatches instead of 6, eliminating intermediate DRAM round-trips.

Supported ops

Op	Notes
`MATMUL`	conv1×1 internally; 3× faster than MIL matmul on ANE
`RMSNORM`	rsqrt + mul, scale broadcast
`LAYERNORM`	normalise + gamma/beta affine
`GELU`	tanh approximation only
`SILU`	x × sigmoid(x)
`SOFTMAX`	over C (channel) dimension; axis=1
`ADD` / `MUL`	elementwise binary; shapes must match
`TRANSPOSE`	[0,3,2,1] only: [1,C,1,S] → [1,S,1,C]

Known Limitations

Experimental / research-use only. Not production-supported.
Private Apple framework dependency. Uses AppleNeuralEngine.framework via dlopen. Not App Store safe.
Constrained tensor layout. Graph API requires [1, C, 1, S] (NCHW with batch=1, height=1). Arbitrary shapes are not supported.
Channel cap. Graph API validation enforces C ≤ 16384. Larger channel counts (e.g. vocabulary projections) require raw MIL emission and are not exposed through the graph API.
fp16 only. No quantization (int8, int4) support. Weights and activations are fp16 throughout.
Some ops are compiler-sensitive. ANE's MIL compiler accepts a strict subset of MIL. Certain op combinations or shapes may require fallback paths. See the fallback module.
Not a general-purpose model runner. libane is a programmable kernel/runtime/compiler interface for ANE-native experimentation, not a drop-in inference engine.
macOS 14+ required. Older systems are not tested and not supported.

License

Apache 2.0. See LICENSE.

libane is not affiliated with or endorsed by Apple Inc.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
bindings		bindings
cmake		cmake
examples		examples
include		include
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

libane

Install

Requirements

Python quick-start

Building from source

C quick-start

Single op

Graph API (fused multi-op)

Architecture

Supported ops

Known Limitations

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

libane

Install

Requirements

Python quick-start

Building from source

C quick-start

Single op

Graph API (fused multi-op)

Architecture

Supported ops

Known Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages