Vivado Build¶
This page documents the Vivado build flow for the pccx v002 NPU core.
All build scripts reside under hw/vivado/; build.sh is the single
entry point. Source files are managed in the
pccxai/pccx-FPGA-NPU-LLM-kv260 repository.
Build Flow¶
build.sh is a thin wrapper around Vivado batch-mode invocations.
The first argument selects one of four stages.
./hw/vivado/build.sh project # create project only
./hw/vivado/build.sh synth # create project + OOC synthesis
./hw/vivado/build.sh impl # full implementation + bitstream
./hw/vivado/build.sh clean # remove build/ directory
The script first searches PATH for vivado; if not found it falls
back through /tools/Xilinx/2025.2, 2024.1, 2023.2 in that order.
Vivado 2023.2 or later is required.
Stage descriptions:
create_project.tcl— Creates a Vivado project targeting partxck26-sfvc784-2LV-c(KV260 ZU5EV) atbuild/pccx_v002_kv260/. Parsesfilelist.fto populate thesources_1fileset, then adds every*.xdcfromhw/constraints/toconstrs_1.synth.tcl— Runssynth_design -mode out_of_context -flatten_hierarchy rebuilt. On completion, writesutilization_post_synth.rpt,clocks_post_synth.rpt,timing_summary_post_synth.rpt, anddrc_post_synth.rpttobuild/reports/. Check WNS in the timing summary before proceeding to implementation.impl.tcl— Verifies thatsynth_1progress is100%, then launchesimpl_1 -to_step write_bitstream -jobs 4. On success, the bitstream is copied tobuild/pccx_v002_kv260.bit. Implementation is an hour-scale job; run it only after OOC synthesis is clean.
The OOC mode is required because NPU_top uses SystemVerilog interface
ports (axil_if / axis_if). Out-of-context synthesis leaves those
ports unbound, allowing the core to be synthesised and checked in
isolation before a block-design wrapper is available.
SV Interface Wrapper¶
The board-side SystemVerilog wrapper (npu_core_wrapper.sv) converts
NPU_top’s SystemVerilog interface ports into plain AXI4-Lite and
AXI4-Stream signal bundles so the core can be packaged as a Vivado IP
and placed alongside the Zynq PS in a block design. The wrapper
contains no registers and no CDC logic; it performs one-to-one signal
expansion only.
The wrapper file is part of the KV260 board integration repository
(pccxai/pccx-FPGA-NPU-LLM-kv260),
not the reusable IP-core package. After the v002 extraction it
remains in the board integration layer rather than pccx-v002,
because it encodes board-side packaging concerns. The current
authoritative source is therefore the board integration repo’s
hw/vivado/npu_core_wrapper.sv.
The external interface exposed by the wrapper is as follows.
Port group |
Direction |
Width |
Description |
|---|---|---|---|
|
Input |
1-bit |
Core-domain clock and active-low reset |
|
Input |
1-bit |
AXI-domain clock and active-low reset |
|
Slave |
32-bit |
AXI4-Lite control channel (CMD_IN / STAT_OUT) |
|
Slave |
128-bit |
AXI4-Stream HP ports × 4 (weight streaming) |
|
Slave |
128-bit |
AXI4-Stream ACP FMap input |
|
Master |
128-bit |
AXI4-Stream ACP result output |
The Vivado IP packager auto-infers AXI interfaces from plain signal
ports, so this wrapper allows NPU_top to be placed directly into a
block design alongside the Zynq PS.
Constraints¶
hw/constraints/pccx_timing.xdc is a timing-only constraint file.
Pin and IO constraints are absent; they are delegated to the block
design that wraps this core.
Two clock domains are declared.
Clock name |
Period |
Frequency |
Scope |
|---|---|---|---|
|
4.000 ns |
250 MHz |
AXI-Lite MMIO, CDC FIFO drain sides, DMA path |
|
2.500 ns |
400 MHz |
DSP48E2 array, GEMV lanes, CVO SFU |
The two domains are genuinely asynchronous. set_clock_groups -asynchronous is applied; every domain crossing uses a CDC FIFO or a
properly-staged reset synchroniser.
Additional path exceptions are set.
False paths — Reset bridge first-flop paths and
XPM_FIFO_ASYNCgray-coded pointer crossings.Multicycle path (setup 2, hold 1) — From DSP48E2 P-registers inside the GEMM systolic array to
mat_result_normalizer. The controller stalls new MACs during accumulator flush, so the drain path tolerates two cycles.
File Manifest¶
hw/vivado/filelist.f is the source list for both OOC synthesis and
xvlog lint. create_project.tcl parses this file to add sources to
the Vivado project.
Compile order follows the declaration order in the file. Packages and interfaces must appear before the modules that import them.
The full file is organized into the following sections.
Section |
Contents |
|---|---|
A (comment only) |
|
B |
|
C |
|
D |
|
Library |
BF16 math library, algorithms, QUEUE interface |
ISA |
|
MAT_CORE |
GEMM systolic array and result packer |
VEC_CORE |
GEMV core (accumulate, LUT generation, reduction, top) |
CVO_CORE |
CORDIC unit, SFU, CVO top |
PREPROCESS |
BF16↔fixed-point pipeline, fmap cache |
MEM_control |
L2 cache, HP buffer, CVO bridge, dispatcher |
NPU_Controller |
AXI-Lite decoder, dispatcher, front-end, Global Scheduler, top |
Top level |
|
When adding a new .sv file, insert it in dependency order inside
filelist.f.