Abstract: Recent SRAM-based in-memory computing (IMC) hardware demonstrates high energy efficiency and throughput for matrix–vector multiplication (MVM), the dominant kernel for deep neural networks ...
Matrix-vector multiplication (MVM) is a computational bottleneck for transformer inference workloads at resource-restricted edge applications. Efficient MVM accelerator design is crucial to optimizing ...