Clock Frequency and Parallelization Techniques in 10Gb/100Gb Ethernet PHY Design


2 views

When examining high-speed Ethernet interfaces, the apparent clock frequency requirements seem physically unrealistic at first glance. For 10GbE (10 Gigabit Ethernet), a naive implementation would suggest needing a 10GHz clock to handle one bit per cycle. Modern transceivers employ several architectural optimizations to make this feasible with practical clock speeds.

The fundamental technique is parallelization through multi-lane designs. A typical 10GBASE-R implementation uses:

// Conceptual PHY data path structure
struct XGEMAC {
    uint64_t xgmii_tx_data[4];  // 4 lanes of 64-bit data
    uint8_t xgmii_tx_ctrl[4];   // Control bits per lane
    uint32_t clk312mhz;         // Actual clock frequency
};

This XGMII (10 Gigabit Media Independent Interface) specification divides the data path into four lanes running at 312.5MHz (312.5MHz × 4 lanes × 8 bits = 10Gbps). The physical implementation uses:

High-speed transceivers serialize/deserialize data streams using:

  • 64b/66b encoding (reduces framing overhead)
  • Clock data recovery (CDR) circuits
  • Differential signaling (PAM4 for 100GbE)

Example of a modern 100GbE implementation:

// QSFP28 module configuration (100GbE)
void configure_100g_phy() {
    set_modulation(PAM4);         // 2 bits per symbol
    set_lanes(4);                 // 4 physical lanes
    set_symbol_rate(26.5625GBd);  // Baud rate per lane
    // Effective rate: 26.5625GBd × 4 lanes × 2 bits = 212.5Gbps
    // (Includes FEC overhead for 100GbE payload)
}

Modern NIC designs implement multiple clock domains:

// Typical clock domains in a 100GbE controller
#define PCIE_CLK     250MHz    // Host interface
#define MAC_CLK      322MHz    // Data processing
#define PHY_CLK      156.25MHz // Physical layer
#define SERDES_CLK   26.5625GHz // Internal serializer

The highest frequency signals only exist within the analog SerDes blocks, while digital logic operates at more manageable frequencies.

Current generation NICs use these techniques:

Standard Data Rate Lanes Line Rate Encoding
10GBASE-R 10Gbps 4 3.125Gbps 64b/66b
100GBASE-CR4 100Gbps 4 25Gbps 64b/66b
100GBASE-SR4 100Gbps 4 25Gbps 64b/66b

Modern FPGA implementations demonstrate how this works in practice:

// Xilinx Ultrascale+ 100G Ethernet example
ethernet_100g #(
    .LANES(4),
    .LINE_RATE(25.78125),
    .ENCODING("64B66B")
) phy_inst (
    .clk_322mhz(mac_clk),
    .clk_156m(phy_clk),
    .serdes_clk(25G_refclk)
);

Modern high-speed Ethernet adapters use several clever techniques to avoid requiring impractical clock frequencies while maintaining line rate throughput. Let's examine how 10Gb and 100Gb interfaces achieve this.

Rather than processing a single bit stream at 10GHz, Ethernet cards use wide parallel buses internally:

// Typical 10G Ethernet MAC to PHY interface (XAUI)
4 lanes × 3.125 Gbps = 10 Gbps aggregate
// 100G implementations often use:
20 lanes × 5 Gbps = 100 Gbps (CAUI-4)
or 
4 lanes × 25 Gbps = 100 Gbps (CAUI-8)

Line encoding reduces the actual clock requirements:

  • 64B/66B encoding (10GbE): Adds 2-bit sync header, making effective clock ~10.3 GHz / 66 = 156.25 MHz per lane
  • 256B/257B encoding (100GbE): Further reduces overhead

Here are typical clock frequencies used internally:

Standard Line Rate Internal Clock Lanes
10GBASE-R 10.3125 Gbps 156.25 MHz 4 (XAUI)
100GBASE-CR4 103.125 Gbps 1.5625 GHz 20 (CAUI-4)

Modern FPGAs implement this using SERDES blocks. Here's a simplified Verilog example:

module xaui_mac (
  input wire clk_156mhz,
  input wire [63:0] tx_data,
  output wire [3:0] xaui_tx
);

  // 64b/66b encoder
  reg [65:0] encoded_data;
  always @(posedge clk_156mhz) begin
    encoded_data <= {2'b01, tx_data}; // Sync header + payload
  end

  // 4-channel SERDES
  genvar i;
  generate
    for (i=0; i<4; i=i+1) begin : serdes
      serializer ser (
        .clk(clk_156mhz),
        .parallel_in(encoded_data[i*16 +: 16]),
        .serial_out(xaui_tx[i])
      );
    end
  endgenerate
endmodule