SystemVerilog HDL for FPGA design realization

RFD10
AuthorsLeo Qi
Labelsreview
Stateprediscussion
Freshnessfresh
Inspired2, 3

1 Abstract

This is a short review of SystemVerilog as a hardware description language (HDL). The target is the AMD Vivado Design Suite toolchain.

2 The goal

The goal is to describe correct digital systems which meet a performance target.

SystemVerilog is a large language for hardware description and verification. For the description part, we will use a strict subset of the language. We will use “templates” (patterns, cookbooks) which represent happy paths where the synthesis and place and route engines have no trouble inferring the hardware. The end result will be a correct circuit as we have thought of.

3 Use only a subset of the HDL

  1. Use integer data type for constants and parameters.
  2. Use tri data type for inout bidirectional ports.
  3. Use typedef enum {S0, S1} user-defined types for Finite State Machine (FSM) states.
  4. Use logic for everything else.
  5. Separate memory components (registers) into their own always block.
  6. Use always_ff and nonblocking (<=) assignments for registers and memory.
  7. Use always_comb and blocking (=) assignments for combinational circuits.
  8. Use always_ff and nonblocking (<=) assignments for registers and memory.
  9. Use always_comb and blocking (=) assignments for combinational circuits.
  10. Each variable (signal or register) should be assigned to in only one always block.

Vivado-guidelines: (AMD 2025)

  1. Do not asynchronously set or reset registers.
  2. Do not describe flip-flops (FFs) with both a set and a reset.
  3. Always describe the clock enable, set, and reset inputs of a FF as active-High.
  4. Memory inference of modules with logic arrays in formats single-port, simple-dual port, true dual port with up to two write ports and multiple read ports.

Vivado infers four types of FF primitive:

What the final report tells you is the number of cells (FDCE, et. al) may change because of

Vivado uses the terminology “Register” but I will use “Flip-Flops”.

Code …
// 8-bit register with
// rising-edge clock
// active-high synchronous clear
// active-high clock enable
module registers_1
(
  input  logic [7:0] din,
  input  logic       ce,
  input  logic       clk,
  input  logic       clr,
  output logic [7:0] d_out
);

  logic [7:0] d_reg;
  always_ff @(posedge clk)
  begin
    if (clr)
      d_reg <= 8'b0;
    else if (ce)
      d_reg <= d_in;
  end
  assign d_out = d_reg;
endmodule

4 Know your design

Know what you are trying to achieve with your design. Understand the critical path of the digital circuits. Then, go to (AMD 2025) to optimize using synthesis settings to get better behaviour to get as close to your digital design as possible.

5 Memory patterns

The data flow is the most important consideration when designing an FPGA-based system. The bandwidth of data signals passing and interconnect congestion is the number one reason why designs do not meet timing. Efficient use of coarse-grained (hard) elements of the FPGA, including Block RAM (BRAM), Digital Signal Processing (DSP) slices, and hard processor cores is essential to getting a large design to be performant and fit.

Thus:

  1. Where do the inputs of my computation come from?
  2. What is the speed / bandwidth / bitrate at which inputs are produced?
  1. What is the speed at which outputs are required and consumed?

The AMD FPGA gives you:

  1. Discrete registers using Flip Flops.
  2. Distributed RAM (LUTs)
  3. Block RAMs.
Action Distributed RAM Dedicated BRAM
Write Synchronous Synchronous
Read Asynchronous Asynchronous

BRAM may be configured with synchronization modes including

(AMD 2025) provides full listings of a diverse set of RAM descriptions which they have tested can be inferred correctly:

Code …
// True Dual Port (TDP) Asymmetric RAM, READ_FIRST Mode.
module asym_ram_tdp_read_first
#(
  parameter WIDTHB = 4;
  parameter SIZEB = 1024;
  parameter ADDRWIDTHB = 10;
  parameter WIDTHA = 16;
  parameter SIZEA = 256;
)
(
  input logic clkA, clkB,
  input logic enaA, enaB,  // enables
  input logic weA, weB,    // write-enables
  input logic [$clog(WIDTHA)-1:0] addrA,
  input logic [$clog(WIDTHB)-1:0] addrB,
  input logic [WIDTHA-1:0] diA,
  input logic [WIDTHB-1:0] diB,
  output logic [WIDTHA-1:0] doA,
  output logic [WIDTHB-1:0] doB
);
  // Not synthesized into hardware: becomes constant
  `define max(a,b) {(a) > (b) ? (a) : (b)}
  `define min(a,b) {(a) < (b) ? (a) : (b)}

  localparam maxSIZE = `max(SIZEA, SIZEB);
  localparam maxWIDTH = `max(WIDTHA, WIDTHB);
  localparam minWIDTH = `min(WIDTHA, WIDTHB);
  localparam RATIO = maxWIDTH / minWIDTH;
  localparam log2RATIO = $clog(RATIO);
  // END: finished calc of constant

  logic [minWIDTH-1:0] RAM [0:maxSIZE-1];
  logic [WIDTHA-1:0] readA_reg;
  logic [WIDTHB-1:0] readB_reg;

  always_ff @(posedge clkB) begin
    if (enaB) begin
      readB_reg <= RAM[addrB];
      if (weB)
        RAM[addrB] <= diB;
    end
  end

  always_ff @(posedge clkA) begin : portA
    integer i;
    logic [log2RATIO-1:0] lsbaddr;
    for (i=0; i < RATIO; i = i+1) begin
      lsbaddr = i;
      if (enaA) begin
        readA_reg[(i+1)*minWIDTH-1 -: minWIDTH] <= RAM[{addrA, lsbaddr}];
        if (weA)
          RAM[{addrA, lsbaddr}] <= diA[(i+1)*minWIDTH-1 -: minWIDTH];
      end
    end
  end

  assign doA = readA_reg;
  assign doB = readB_reg;
endmodule

5.1 Register file

A register file is an array of addressable registers with two or more ports for reading and writing in parallel. A register file built from LUTs may be used for fast, temporary storage.

5.2 Multipliers

(AMD 2018, 2025) both have notes on DSP versus slice-based generation of multipliers.

6 Use of the Constraints File

Vivado uses Xilinx Design Constraints (XDC) files such as constraints.xdc. Constraints are design-level rules which tooling (synthesis, place and route) must work around and thus informs the final shape of your design. This means a design may be logically correct but have vastly different performance (timing) based off of use of constraints.

XDC constraints are files of Tcl commands, recommended to be used in this sequence:

  1. Timing assertions section . Primary clocks . Virtual clocks . Generated clocks . Clock groups . Bus skew constraints . Input and output delay constraints
  2. Timing exceptions section . False paths . Max delay / min delay . Multicycle paths . Case analysis . Disable timing
  3. Physical constraints section (anywhere in file)

A project flow will load constraints in a constraint set load order (read sequence). Files read in later may reference constraints (such as declared clocks) from files read in first. A project without any IP will have all constraints in the same set, while IP may have its own constraints that aren’t visible to you. IP XDC files are read in before user XDC files by default, allowing you to reference IP clock objects (for example) in your own XDC or overwrite physical constraints set by an IP core.

6.1 What you can do

7 References

AMD. 2018. “7 Series DSP48E1 Slice User Guide.” https://docs.amd.com/v/u/en-US/ug479_7Series_DSP48E1.
———. 2025. “Vivado Design Suite User Guide: Synthesis.” https://docs.amd.com/r/en-US/ug901-vivado-synthesis/.