Algebraic enhancements for GEMM & AI accelerators
-
Updated
Feb 28, 2025 - Python
Algebraic enhancements for GEMM & AI accelerators
Minimal TPU implementation with 8x8 systolic array and PyTorch integration
Open-source AI Accelerator Stack integrating compute, memory, and software — from RTL to PyTorch.
SystemVerilog Implementations of CUDA/TensorCore/TPU GEMM Operations
Hardware accelerator for 2D convolution using an 8×8 weight-stationary systolic array with split-kernel support, dual-port SRAM architecture, and DMA-based streaming
AS501 AI Semiconductor Design Basics & Practice Final Project
Modular systolic array with software interface
Parameterized N×N output-stationary systolic array accelerator for INT8 neural network inference. Full RTL-to-GDS flow on ASAP7 7nm using Cadence Genus + Innovus. 667 MHz, 42.7 GOPS peak throughput, 0.33 mW/GOP. SystemVerilog RTL, synthesis, place-and-route and self-checking testbench included.
High-performance systolic array computing framework with AI agents and medical compliance.
Parametric Verilog systolic implementation of Cannon's Matrix Multiplication on an M×M torus mesh.
N-Radix: Open-source wavelength-division ternary optical AI accelerator on lithium niobate
4×4 7-bit matrix multiplication hardware accelerator using a systolic array, with a Python driver for the Basys 3 FPGA and a systolic array UVC using UVM.
Technical Showcase: 22B True-MoE Engine running on 6GB VRAM (GTX 1060). Demonstrates "Surgical" NF4 quantization, dynamic expert swapping, and the custom "Grace Hopper" pipeline.
course materials of Hardware Accelerator course, Spring 2025, Shahid Beheshti University
Add a description, image, and links to the systolic-array topic page so that developers can more easily learn about it.
To associate your repository with the systolic-array topic, visit your repo's landing page and select "manage topics."