Compatibility benchmark for Python scientific/ML packages on ARM64 (aarch64/Graviton).
Tests each package from packages.toml across Python 3.10–3.13 by:
- Checking PyPI for aarch64 wheel availability
- Running
pip install+import+ smoke test inside alinux/arm64Docker container (via uv)
- Docker with multi-platform support (buildx)
- Python 3.10+
# Full run (all packages × all Python versions)
python bench.py
# Test specific packages
python bench.py --packages numpy torch scipy
# Test specific Python versions
python bench.py --python 3.11 3.12
# Adjust parallelism (default: 4)
python bench.py --workers 8
# Custom output path
python bench.py --output results/my-results.jsonResults are saved to results/results.json and a Markdown report is auto-generated at results/report.md.
python report.py
python report.py --input results/results.json --output results/report.mdPackages are defined in packages.toml:
[numpy]
version = "latest"
smoke = "import numpy as np; assert np.dot([1,2],[3,4]) == 11"
[scikit-learn]
version = "1.4.0" # pin to a specific version
import = "sklearn" # import name differs from package name
smoke = "from sklearn.linear_model import LinearRegression; LinearRegression()"| Field | Required | Description |
|---|---|---|
version |
yes | "latest" for newest, or a specific version (e.g. "2.1.0") |
smoke |
no | Python expression to run after import (default: basic import) |
import |
no | Python import name if different from the package name |
The generated report includes:
- Summary statistics (pass/partial/fail counts)
- Compatibility matrix across all Python versions
- Wheel type per package (aarch64 binary, noarch, source-only, none)
- Failure details with error messages
- Install times for successful installs
- Recommendations (blockers, source-only builds)