Archive for the ‘Coding Style’ Category


Real World Examples #4 – More on “Thinking Hardware”

January 20, 2009

I was reviewing some code not so long ago, and noticed together with the owner of the code, that we had some timing problems.
Part of the code looked something like that (Verilog):

wire [127:0] a;
wire [127:0] b;
wire [127:0] c;
assign c = select_register ? a : b;

For those not familiar with Verilog syntax, the code describes a MUX construct using the ternary operator. The two data inputs for the MUX are “a” and “b” and the select is “select_register”.

So why was this code translated into a relatively slow design? The answer is in the width of the signals. The code actually synthesizes to 128 parallel MUX structures. The “select_register” has actually 128 loads.
When a construct like this is hidden within a large code, our tendency is to just neglect it by saying it is “only” 2:1 MUX deep, but we have to look more carefully than that – and always remember to consider the load.

Solving this problem is relatively easy by replication. Just creating more versions of the “select_register” helped significantly.


A Coding Tip for Multi Clock Domain Designs

December 13, 2008

Multi clock domain designs are always interesting, but almost always hide some synchronization problems, which are not that trivial. There are tools on the market that identify all(??) clock domain crossings within a design. I personally had no experience with them, so I can’t give an opinion (although I heard some unflattering remarks from fellow engineers).

Seems like each company has its own ways of handling this problem. One of the oldest, easiest and IMHO one of the most efficient ways, is to keep strict naming guidelines for your signals, whether combinatorial or sequential !!

The most common way is to add a prefix to each signal which describes its driver clock e.g. clk_800_mux_32to1_out or clk_666_redge_acknowledge.

If you don’t use this simple technique, you won’t believe how useful it is. Many of the related problems of synchronization are actually discovered during the coding process itself. Moreover, it even makes life easier when doing the code review.

If you have more tips on naming convention guidelines for signals in RTL – post them as a comment!


Another Reason to Add Hierarchies to Your Designs

November 30, 2008

We are usually very annoyed when the team leader insists on code restructuring and hierarchical design.
I also know this very well from the other side as well. Trying to convince designers to better restructure their own design which they know so very well already.

Well, here is another small, yet important reason why you might want to do this more often.
Assume your design is more or less mature, you ran some simulation, went through some synthesis runs and see that you don’t meet timing.
You analyze the timing report just to find a huge timing path optimized by the tool and made of tons of NANDs, NORs, XORs and what not. Well you see the starting point and the end point very clearly, but you find yourself asking if this is the path that goes through the MUX or through the adder maybe?

Most logic designs are extremely complicated and the circuit is not just something you can draw immediately on paper. Moreover, looking at a timing report of optimized logic, it is very hard to interpret the exact path taken through the higher level structured – or in other words, what part of the code I am really looking at here??!! Adding an hierarchy will also add its name to the optimized structures in the timing report and you could then easily pin point your problems.

I even saw an engineer that uses this technique as a debugging tool. If he has a very deep logic cloud, he will intentionally build an hierarchy around say a simple 2:1 MUX in the design and look for it in the timing report. This enables him to “feel” how the synthesis tool optimizes the path and where manual optimization needs to be implemented .

Use this on your bigger blocks, it saves a lot of time and effort in the long run.


Fun With Enable Flip-Flops

October 27, 2008

Almost each library has enable flip-flops included. Unfortunately, they are not always used to their full potential. We will explore some of their potential in this post.

An enable flop is nothing but a regular flop which only registers new data if the enable signal is high, otherwise it keeps the old value. We normally implement this using a MUX and a feedback from the flop’s output as depicted below.

So what is the big deal about it? The nice thing is that the enable flop is already implemented by the guys who built the library in a very optimized way. Usually implementing this with a MUX before the flop will eat away from the cycle time you could otherwise use for your logic. However, a short glance at your library will prove that this MUX comes almost for free when you use an enable flop (for my current library the cost is 20ps).

So how can we use this to our advantage?

Example #1 – Soft reset coding

In many applications soft reset is a necessity. It is a signal usually driven by a register that will (soft) reset all flip flops given that a clock is running. Many times an enable “if” is also used in conjunction.
This is usually coded in this way (I use Verilog pseudo syntax and ask the forgiveness of you VHDL people):
always @(posedge clk or negedge hard_rst)
if (!hard_rst)
ff <= 1'b0;
else if (!soft_rst)
ff <= 1'b0;
else if (en)
ff <= D;

The above code usually results in the construction given in the picture below. The red arrow represents the critical timing path through a MUX and the AND gate that was generated for the soft reset.

Now, if we could only exchange the order of the last two “if” commands this would put the MUX in front of the AND gate and then we could use an enable flop… well, if we do that, it will not be logically equivalent anymore. Thinking about it a bit harder, we could use a trick – let’s exchange the MUX and the AND gate but during soft reset we could force the select pin of the MUX to be “1”, and thus transferring a “0” to the flop! Here’s the result in a picture form.

We can now use an enable flop and we basically got the MUX delay almost for free. This may look a bit petty to you, but this trick can save you a few extra precious tens or hundreds of pico-seconds.

Example #2 – Toggle Flip Flops

Toggle flops are really neat, and there are many cool ways to use them. The normal implementation requires an XOR gate combining the T input and a feedback of the flop itself.

Let’s have a closer look at the logical implementation of an XOR gate and how it is related to a MUX implementation: (a) is a MUX gate equivalent implementation (b) is an XOR gate equivalent implementation and (c) is an XOR implemented from a MUX.

Now, let’s try making a T flop using an enable flop. We saw already how to change the MUX into an XOR gate – all that is left, is to put everything together.


Why You Don’t See a Lot of Verilog or VHDL on This Site

August 31, 2008

I get a lot of emails from readers from all over the world. Many want me to help them with their latest design or problem. This is OK, after all this is what this site is all about – giving tips, tricks and helping other designers making their steps through the complex world of ASIC digital design.

Many ask me for solutions directly in Verilog or VHDL. Although this would be pretty simple to give, I try to make sure NOT to do so. The reason is that it is my personal belief that thinking of a design in terms of Verilog or VHDL directly is a mistake and leads to poorer designs. I am aware that some readers may differ, but I repeatedly saw this kind of thinking leading to bigger, power hungry and faulty designs.

Moreover, I am in the opinion that for design work it is not necessary to know all the ins and outs of VHDL or Verilog (this is different if you do modeling, especially for a mixed signal project).
Eventually we all have to write code, but if you would look at my code you’d see it is extremely simple. For example, I rarely use any “for” statement and strictly try not using arrays.

Another important point on the subject is for you guys who interview people for a new position in your company. Please don’t ask your candidates to write something in VHDL as an interview question. I see and hear about this mistake over and over again. The candidate should know how to think in hardware terms; It is of far lesser importance if he can generate some sophisticated code.
If he/she knows what he is aiming for in hardware terms he/she will be a much better designer than a Verilog or VHDL wiz who doesn’t know what his code will be synthesized into. This is btw a very typical problem for people who come from CS background and go for design.


A Concise Guide to Why and How to Split your State Machines

September 8, 2007

So, why do we really care about state machine partitioning? Why can’t I have my big fatty FSM with 147 states if I want to?

Well, smaller state machines are:

  1. Easier to debug and probably less buggy
  2. More easily modified
  3. Require less decoding
  4. Are more suitable for low power applications
  5. Just nicer…

There is no rule of thumb stating the correct size of an FSM. Moreover, a lot of times it just doesn’t make sense to split the FSM – So when can we do it? or when should we do it? Part of the answer lies in a deeper analysis of the FSM itself, its transitions and most important, the probability of occupying specific states.

Look at the diagram below. After some (hypothetical) analysis we recognize that in certain modes of operation, we spend either a lot of time among the states marked in red or among the states marked in blue. Transitions between the red and blue areas are possible but are less frequent.


The trick now, is to look at the entire red zone as one state for a new “blue” FSM, and vice versa for the a new “red” FSM. We basically split the original FSM into two completely separate FSMs and add to each of the FSM a new state, which we will call a “wait state”. The diagram below depicts our new construction.


Notice how for the “red” FSM transitioning in and out of the new “wait state” is exactly equivalent (same conditions) to switching in and out of the red zone of the original FSM. Same goes for the blue FSM but the conditions for going in and out of the “wait state” are naturally reversed.

OK, so far so good, but what is this good for? For starters, it would probably be easier now to choose state encodings for each separate FSM that will reduce switching (check out this post on that subject). However, the sweetest thing is that when we are in the “red wait state” we could gate the clock for the rest of the red FSM and all its dependent logic! This is a significant bonus, since although previously such strategy would have been possible, it would just be by far more complicated to implement. The price we pay is additional states which will sometimes lead to more flip-flops needed to hold the current state.

As mentioned before, it is not wise to just blindly partition your FSMs arbitrarily. It is important to try to look for patterns and recognize “regions of operation”. Then, try to find transitions in and out of this regions which are relatively simple (ideally one condition to go in and one to go out). This means that sometimes it pays to include in a “region” one more state, just to make the transitioning in and out of the “region” simpler.

Use this technique. It will make your FSMs easy to debug, simple to code and hopefully will enable you to introduce low power concepts more easily in your design.


FSM State Encoding – More Switching Reduction Tips

September 4, 2007

I promised before to write some words on reducing switching activity by cleverly assigning the states of an FSM, so here goes…

Look at the example below. The FSM has five states “A”-“E”. Most naturally, one would just sequentially enumerate them (or use some enumeration scheme given by VHDL or Veriog – which is easier for debugging purposes).
In the diagram the sequential enumeration is marked in red. Now, consider only the topology of the FSM – i.e. without any reference to the probability of state transitions. You will notice that the diagram states (pun intended) in red near each arc the amount of bits switching for this specific transition. For example, to go from state “E” (100) to state “B” (001), two bits will toggle.


But could we choose a better enumeration scheme that will reduce the amount of switching? Turns out that yes (don’t tell anybody but I forced this example to have a better enumeration 🙂 ). If you look at the green state enumeration you will clearly see that at most only one bit toggles for every transition.

If you sum up all transitions (assuming equal probability) you would see that the green implementation toggles exactly half the time as the red. An interesting point is that we need only to consider states “B” – “E”, because once state “A” is exited it can never be returned to (this is sometimes being referred to as “black hole” or “a pit”).

The fact that we chose the states enumeration more cleverly doesn’t only mean that we reduced switching in the actual flip-flops that hold the state itself, but we also reduce glitches/hazards in all the combinational logic that is dependent on the FSM! The latter point is extremely important since those combinational clouds can be huge in comparison to the n flops that hold the state of the FSM.

The procedure on choosing the right enumeration deserve more words but this will become a too lengthy post. In the usually small FSMs that the average designer handles on a daily basis, the most efficient enumeration can be easily reached by trial and error. I am sure there is somewhere some sort of clever algorithm that given an FSM topology can spit out the best enumeration. If you are aware of something like that, please send me an email.