Real World Examples #4 – More on “Thinking Hardware”

January 20, 2009

I was reviewing some code not so long ago, and noticed together with the owner of the code, that we had some timing problems.
Part of the code looked something like that (Verilog):

wire [127:0] a;
wire [127:0] b;
wire [127:0] c;
assign c = select_register ? a : b;

For those not familiar with Verilog syntax, the code describes a MUX construct using the ternary operator. The two data inputs for the MUX are “a” and “b” and the select is “select_register”.

So why was this code translated into a relatively slow design? The answer is in the width of the signals. The code actually synthesizes to 128 parallel MUX structures. The “select_register” has actually 128 loads.
When a construct like this is hidden within a large code, our tendency is to just neglect it by saying it is “only” 2:1 MUX deep, but we have to look more carefully than that – and always remember to consider the load.

Solving this problem is relatively easy by replication. Just creating more versions of the “select_register” helped significantly.


  1. Could somebody explain me why the synthesis tool is not taking care of this?
    Which tools was this and what was the target (asic|fpga).
    Until now, I was thinking that (for FPGA) the synthesis tool was taking care of this issue.


  2. this was with design compiler.
    You have to separate between combo replication and register replication.
    From my experience the latter is poorly (if at all supported) by the various tools.

    It also seems to me something that should be more or less automatic.

    Some say that by adding registers you change the state space of the the design – adding more states than necessary. This might make life harder for formal verification tools later on in the flow. I find this hard to accept since when replicating, you add flops which hold IDENTICAL states and this could be easily detected. I am sure the software/EDA guys can handle this issue as well.

  3. Replicating automatically can be enabled by the synthesis tool. The problem is usually when you want to disable it explicitly (clock crossing for example), then you have to do extra work. It’s matter of balance really between the number of times you have to replicate manually and the times you need to explicitly disable it.

    Also, from my experience, the tools (Especially FPGA ones) do poor job of replication when very wide data bus is involved. They don’t know how the data flow, and replicate incorrectly, cause problems later during P&R.

  4. I know very little about Verilog.. I want to know what did u mean by “LOAD” exactly here for the SELECT signal ? And I didnt understand what did u mean by replicating. Does it mean making SELECT also as [127:0] ?

  5. I really don’t understand why the synthesis tool cannot handle this problem,if you specify the constraint “set_max_transition”, the tool is supposed to take care of this right???

  6. This is not a fun problem. Creating more versions of the select_register would work but will also make your RTL hard or impossible to read. 128 in your example is not bad. I’ve seen bus width in the thousands.

    I see some people asking how come the tool doesn’t handle this. The tool does handle it by building a buffer tree. Due to the number of loads, the buffer tree can be huge and makes it impossible to meet timing.

    I believe the only solution is by cloning the register that generates selec_register. Two options exist. One is to re-code the RTL as suggested while the other is to let tool to do the cloning. It brings up other problems later but tools today are capable.

  7. When I have been dealing with such overlodaed nets we have always done the buffer tree at place & route – never at syntheis.

    Also when replicating registers Formal Verification tools can easily handle that – maybe you have to enable that since it might be disabled by default.

    • Advanced Logic synthesis tools are capable of merging the registers of same states across the designs. Thus, cloning of the registers in the RTL code might not be good idea when using such Logic synthesis tools. Other hand Place and route tools can also clone the registers.

      Cloning brings up the formal verification challenges as the new states added compared to original design(RTL) vs Netlist. Some formal verification tools accept user defined equivalent states to solve the cloning issues.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: