Systolic Array Module
Description
The Systolic Array module (hs_npu_systolic
) is the central computational component of the ScaleNPU, designed to perform efficient matrix multiplication using a systolic array structure. This module orchestrates the flow of data through an SIZE x SIZE
grid of MAC units. The array processes elements from two input matrices (matrix A and matrix B). Elements of Matrix B propagate vertically through columns, while elements of Matrix A flow horizontally across rows. The systolic array enables synchronized multiplication and accumulation at each node, with the results passing to the next row.
The SIZE
parameter defines the dimensions of the systolic array, making it scalable for various matrix sizes.
Note
For the NPU, matrix A is the input matrix and matrix B is the weight matrix.
I/O Table
Input Table
Input Name | Direction | Type | Description |
---|---|---|---|
clk |
Input | logic |
Clock signal for synchronization across MAC units. |
enable_in |
Input | logic |
Enable signal to activate the flow of Matrix B values. |
matrixA |
Input | short[SIZE] |
Input values for Matrix A, with each element feeding one row. |
matrixB |
Input | short[SIZE] |
Input values for Matrix B, with each element feeding one column. |
sum_in |
Input | word[SIZE] |
Initial sums for the first row of MAC units. |
Output Table
Output Name | Direction | Type | Description |
---|---|---|---|
result |
Output | word[SIZE] |
Final computed values from the last row of MAC units. |
Module Behavior and Data Flow
This subsection does not prescribe a "correct" usage, as the systolic array can be used in several ways; nevertheless, this explains the specific way it is used in the NPU:
-
Matrix B values are injected into the first row of the array. These values propagate downward (one row per cycle) if
enable_in
is set. Ifenable_in
is not set, the last value is stored and used by each MAC unit as the B operand. -
Matrix A values enter the first row and always propagate from right to left. The immediate value at the input is used as operand A by the MAC unit.
-
Each MAC unit performs a multiply-accumulate operation using the inputs from its left (A values) and top (B values). Results are propagated downward and used as the SUM operand by the below MAC unit.
Submodule Diagram
The following diagram illustrates the Systolic Array module, showing the flow of inputs, outputs, and internal signal paths.
Related Files
File Name | Type |
---|---|
hs_npu_systolic | Top |
hs_npu_mac | Submodule - MAC |