Skip to content

AmiraniLabs/libane

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

libane

Direct access to the Apple Neural Engine from C++ and Python.

PyPI CI License Platform

libane is a low-level ANE runtime and compiler interface. It exposes a Graph IR for describing full forward passes, compiles them into the minimum number of ANE dispatches via automatic op fusion, and executes them through a stable C ABI. Matmul is implemented as conv1×1 for the 3× throughput advantage over MIL matmul on ANE.

Private framework dependency. libane loads AppleNeuralEngine.framework via dlopen. It is intended for research use and low-level ANE experimentation, not production deployment.


Install

pip install libane

Requires Apple Silicon (M1 or later) and macOS 14+. The wheel is a compiled extension — no extra build steps needed.


Requirements

Hardware Apple Silicon (M1 or later)
OS macOS 14 Sonoma or later
Toolchain Xcode 15+, CMake 3.24+, C++17
Python 3.10+ (optional)

Python quick-start

import ane
import numpy as np

print(ane.available())   # True on Apple Silicon
print(ane.version())     # "0.7.1"

# Single-op matmul
A = np.random.randn(128, 512).astype(np.float16)
B = np.random.randn(512, 256).astype(np.float16)
C = ane.matmul(A, B)    # shape (128, 256), fp16

# Graph API — fused FFN block
D, FFN, SEQ = 512, 2048, 128
W_up   = np.random.randn(D,   FFN).astype(np.float16)
W_down = np.random.randn(FFN, D  ).astype(np.float16)
scale  = np.ones(D, dtype=np.float16)

g = ane.Graph()
x   = g.add_input("x",  [1, D,   1, SEQ])
rn  = g.add_op(ane.RMSNORM, [x],   [1, D,   1, SEQ], weights=scale)
up  = g.add_op(ane.MATMUL,  [rn],  [1, FFN, 1, SEQ], weights=W_up)
act = g.add_op(ane.GELU,    [up],  [1, FFN, 1, SEQ])
out = g.add_op(ane.MATMUL,  [act], [1, D,   1, SEQ], weights=W_down)
g.mark_output(out)

cg = g.compile()
cg.set_output_shapes([[1, D, 1, SEQ]])

x_data = np.random.randn(D, SEQ).astype(np.float16)
result = cg(x_data)
print(result.shape)   # (1, 512, 1, 128)

See examples/ffn_inference.py for a timed end-to-end example.


Building from source

git clone https://github.com/AmiraniLabs/libane
cd libane
cmake -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build -j$(sysctl -n hw.logicalcpu)
ctest --test-dir build --output-on-failure

To build the Python module from source:

pip install pybind11 numpy scikit-build-core
pip install ./bindings/python

C quick-start

Single op

#include "libane.h"

// Compile a matmul [1, 512, 1, 128] → [1, 256, 1, 128]
libane_shape_t shape = {.dims = {1, 256, 1, 128}, .ndim = 4};
libane_handle_t h = libane_compile(LIBANE_OP_MATMUL, shape,
                                    fp16_weights, sizeof(fp16_weights));
libane_execute(h, input, output, shape);
libane_release(h);

Graph API (fused multi-op)

#include "libane.h"

// Build: RMSNorm → Matmul → GELU
libane_graph_t g = libane_graph_create();

libane_shape_t in_shape  = {.dims={1,512,1,128}, .ndim=4};
libane_shape_t out_shape = {.dims={1,256,1,128}, .ndim=4};

uint32_t x  = libane_graph_add_input(g, "x", in_shape);
uint32_t rn = libane_graph_add_op(g, LIBANE_OP_RMSNORM, &x, 1,
                                   in_shape, rn_scale, rn_scale_len);
uint32_t mm = libane_graph_add_op(g, LIBANE_OP_MATMUL,  &rn, 1,
                                   out_shape, weights, weights_len);
uint32_t act= libane_graph_add_op(g, LIBANE_OP_GELU,    &mm, 1,
                                   out_shape, NULL, 0);
libane_graph_mark_output(g, act, "out");

libane_compiled_graph_t cg = libane_graph_compile(g);

// Execute
const void* in_ptrs[]   = { input_fp16 };
size_t      in_bytes[]  = { 512 * 128 * 2 };
void*       out_ptrs[]  = { output_fp16 };
size_t      out_bytes[] = { 256 * 128 * 2 };
libane_graph_execute(cg, in_ptrs, in_bytes, 1, out_ptrs, out_bytes, 1);

libane_compiled_graph_release(cg);
libane_graph_release(g);

Architecture

libane
├── include/libane.h          Stable C ABI
├── src/
│   ├── core/
│   │   ├── mil_builder       MIL text program generator
│   │   ├── compile_cache     LRU cache for compiled programs
│   │   └── buffer_manager    IOSurface-backed fp16 buffer pool
│   ├── graph/
│   │   ├── ane_graph         Graph IR (DAG builder)
│   │   ├── graph_validator   7-check validation pass
│   │   ├── fusion_rules      Greedy linear-chain fusion
│   │   ├── graph_compiler    build_plan() + compile()
│   │   └── graph_executor    Per-group ANE dispatch
│   ├── runtime/
│   │   └── ane_runtime.mm    AppleNeuralEngine.framework wrapper
│   └── fallback/             Accelerate BLAS CPU fallback
└── bindings/python/          pybind11 module (pip install libane)

Fusion rules compact linear chains into single ANE programs. A 6-op FFN (RMSNorm → Matmul → SiLU, Matmul → Mul → Matmul) typically becomes 3–4 ANE dispatches instead of 6, eliminating intermediate DRAM round-trips.


Supported ops

Op Notes
MATMUL conv1×1 internally; 3× faster than MIL matmul on ANE
RMSNORM rsqrt + mul, scale broadcast
LAYERNORM normalise + gamma/beta affine
GELU tanh approximation only
SILU x × sigmoid(x)
SOFTMAX over C (channel) dimension; axis=1
ADD / MUL elementwise binary; shapes must match
TRANSPOSE [0,3,2,1] only: [1,C,1,S] → [1,S,1,C]

Known Limitations

  • Experimental / research-use only. Not production-supported.
  • Private Apple framework dependency. Uses AppleNeuralEngine.framework via dlopen. Not App Store safe.
  • Constrained tensor layout. Graph API requires [1, C, 1, S] (NCHW with batch=1, height=1). Arbitrary shapes are not supported.
  • Channel cap. Graph API validation enforces C ≤ 16384. Larger channel counts (e.g. vocabulary projections) require raw MIL emission and are not exposed through the graph API.
  • fp16 only. No quantization (int8, int4) support. Weights and activations are fp16 throughout.
  • Some ops are compiler-sensitive. ANE's MIL compiler accepts a strict subset of MIL. Certain op combinations or shapes may require fallback paths. See the fallback module.
  • Not a general-purpose model runner. libane is a programmable kernel/runtime/compiler interface for ANE-native experimentation, not a drop-in inference engine.
  • macOS 14+ required. Older systems are not tested and not supported.

License

Apache 2.0. See LICENSE.

libane is not affiliated with or endorsed by Apple Inc.

About

Native Apple Neural Engine runtime — Graph IR, op fusion, and direct ANE dispatch from C++ and Python

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors