I was reviewing some code not so long ago, and noticed together with the owner of the code, that we had some timing problems.
Part of the code looked something like that (Verilog):
wire [127:0] a;
wire [127:0] b;
wire [127:0] c;
assign c = select_register ? a : b;
For those not familiar with Verilog syntax, the code describes a MUX construct using the ternary operator. The two data inputs for the MUX are “a” and “b” and the select is “select_register”.
So why was this code translated into a relatively slow design? The answer is in the width of the signals. The code actually synthesizes to 128 parallel MUX structures. The “select_register” has actually 128 loads.
When a construct like this is hidden within a large code, our tendency is to just neglect it by saying it is “only” 2:1 MUX deep, but we have to look more carefully than that – and always remember to consider the load.
Solving this problem is relatively easy by replication. Just creating more versions of the “select_register” helped significantly.