1 Abstract
This is a short review of SystemVerilog as a hardware description language (HDL). The target is the AMD Vivado Design Suite toolchain.
2 The goal
The goal is to describe correct digital systems which meet a performance target.
SystemVerilog is a large language for hardware description and verification. For the description part, we will use a strict subset of the language. We will use “templates” (patterns, cookbooks) which represent happy paths where the synthesis and place and route engines have no trouble inferring the hardware. The end result will be a correct circuit as we have thought of.
3 Use only a subset of the HDL
- Use
integerdata type for constants and parameters. - Use
tridata type forinoutbidirectional ports. - Use
typedef enum {S0, S1}user-defined types for Finite State Machine (FSM) states. - Use
logicfor everything else. - Separate memory components (registers) into their own
alwaysblock. - Use
always_ffand nonblocking (<=) assignments for registers and memory. - Use
always_comband blocking (=) assignments for combinational circuits. - Use
always_ffand nonblocking (<=) assignments for registers and memory. - Use
always_comband blocking (=) assignments for combinational circuits. - Each variable (signal or register) should be assigned to in only one always block.
Vivado-guidelines: (AMD 2025)
- Do not asynchronously set or reset registers.
- Do not describe flip-flops (FFs) with both a set and a reset.
- Always describe the clock enable, set, and reset inputs of a FF as active-High.
- Memory inference of modules with logic arrays in formats single-port, simple-dual port, true dual port with up to two write ports and multiple read ports.
- It is important to describe a module which cleanly maps onto supported primitives. Use a template.
Vivado infers four types of FF primitive:
- FDCE (D FF with clock enable and asynchronous clear)
- FDPE (D FF with clock enable and asynchronous preset)
- FDSE (D FF with clock enable and synchronous set)
- FDRE (D FF with clock enable and synchronous reset)
What the final report tells you is the number of cells (FDCE, et. al) may change because of
- -FFs: Vivado used DSP or BRAM instead
- +FFs: Vivado duplicated your register for performance/see synth options.
- -FFs: Vivado removed registers which are actually logically constants or duplicated elsewhere.
Vivado uses the terminology “Register” but I will use “Flip-Flops”.
Code …
// 8-bit register with
// rising-edge clock
// active-high synchronous clear
// active-high clock enable
module registers_1
(
input logic [7:0] din,
input logic ce,
input logic clk,
input logic clr,
output logic [7:0] d_out
);
logic [7:0] d_reg;
always_ff @(posedge clk)
begin
if (clr)
d_reg <= 8'b0;
else if (ce)
d_reg <= d_in;
end
assign d_out = d_reg;
endmodule4 Know your design
Know what you are trying to achieve with your design. Understand the critical path of the digital circuits. Then, go to (AMD 2025) to optimize using synthesis settings to get better behaviour to get as close to your digital design as possible.
5 Memory patterns
The data flow is the most important consideration when designing an FPGA-based system. The bandwidth of data signals passing and interconnect congestion is the number one reason why designs do not meet timing. Efficient use of coarse-grained (hard) elements of the FPGA, including Block RAM (BRAM), Digital Signal Processing (DSP) slices, and hard processor cores is essential to getting a large design to be performant and fit.
Thus:
- Where do the inputs of my computation come from?
- What is the speed / bandwidth / bitrate at which inputs are produced?
- High speed bursts necessitate large buffers.
- What is the speed at which outputs are required and consumed?
- Latency requirements inform our design.
The AMD FPGA gives you:
- Discrete registers using Flip Flops.
- Distributed RAM (LUTs)
- Block RAMs.
| Action | Distributed RAM | Dedicated BRAM |
|---|---|---|
| Write | Synchronous | Synchronous |
| Read | Asynchronous | Asynchronous |
BRAM may be configured with synchronization modes including
- Read-first (old content read before new content loaded)
- Write-first (new content immediately made available for reading), also known as “read-through”
- No-change: output’s don’t change as new-content loaded.
(AMD 2025) provides full listings of a diverse set of RAM descriptions which they have tested can be inferred correctly:
Code …
// True Dual Port (TDP) Asymmetric RAM, READ_FIRST Mode.
module asym_ram_tdp_read_first
#(
parameter WIDTHB = 4;
parameter SIZEB = 1024;
parameter ADDRWIDTHB = 10;
parameter WIDTHA = 16;
parameter SIZEA = 256;
)
(
input logic clkA, clkB,
input logic enaA, enaB, // enables
input logic weA, weB, // write-enables
input logic [$clog(WIDTHA)-1:0] addrA,
input logic [$clog(WIDTHB)-1:0] addrB,
input logic [WIDTHA-1:0] diA,
input logic [WIDTHB-1:0] diB,
output logic [WIDTHA-1:0] doA,
output logic [WIDTHB-1:0] doB
);
// Not synthesized into hardware: becomes constant
`define max(a,b) {(a) > (b) ? (a) : (b)}
`define min(a,b) {(a) < (b) ? (a) : (b)}
localparam maxSIZE = `max(SIZEA, SIZEB);
localparam maxWIDTH = `max(WIDTHA, WIDTHB);
localparam minWIDTH = `min(WIDTHA, WIDTHB);
localparam RATIO = maxWIDTH / minWIDTH;
localparam log2RATIO = $clog(RATIO);
// END: finished calc of constant
logic [minWIDTH-1:0] RAM [0:maxSIZE-1];
logic [WIDTHA-1:0] readA_reg;
logic [WIDTHB-1:0] readB_reg;
always_ff @(posedge clkB) begin
if (enaB) begin
readB_reg <= RAM[addrB];
if (weB)
RAM[addrB] <= diB;
end
end
always_ff @(posedge clkA) begin : portA
integer i;
logic [log2RATIO-1:0] lsbaddr;
for (i=0; i < RATIO; i = i+1) begin
lsbaddr = i;
if (enaA) begin
readA_reg[(i+1)*minWIDTH-1 -: minWIDTH] <= RAM[{addrA, lsbaddr}];
if (weA)
RAM[{addrA, lsbaddr}] <= diA[(i+1)*minWIDTH-1 -: minWIDTH];
end
end
end
assign doA = readA_reg;
assign doB = readB_reg;
endmodule5.1 Register file
A register file is an array of addressable registers with two or more ports for reading and writing in parallel. A register file built from LUTs may be used for fast, temporary storage.
5.2 Multipliers
(AMD 2018, 2025) both have notes on DSP versus slice-based generation of multipliers.
6 Use of the Constraints File
Vivado uses Xilinx Design Constraints (XDC) files such as
constraints.xdc. Constraints are design-level rules
which tooling (synthesis, place and route) must work around and thus
informs the final shape of your design. This means a design may be
logically correct but have vastly different performance (timing)
based off of use of constraints.
XDC constraints are files of Tcl commands, recommended to be used in this sequence:
- Timing assertions section . Primary clocks . Virtual clocks . Generated clocks . Clock groups . Bus skew constraints . Input and output delay constraints
- Timing exceptions section . False paths . Max delay / min delay . Multicycle paths . Case analysis . Disable timing
- Physical constraints section (anywhere in file)
- May be stored in separate constraints file
- Includes pin count, pin types.
A project flow will load constraints in a constraint set load order (read sequence). Files read in later may reference constraints (such as declared clocks) from files read in first. A project without any IP will have all constraints in the same set, while IP may have its own constraints that aren’t visible to you. IP XDC files are read in before user XDC files by default, allowing you to reference IP clock objects (for example) in your own XDC or overwrite physical constraints set by an IP core.
6.1 What you can do
- Pin Assignment (top-level ports placement)
- Create and edit Pblocks and edit shape and location (Floorplanning)