Archive for October, 2008


Fun With Enable Flip-Flops

October 27, 2008

Almost each library has enable flip-flops included. Unfortunately, they are not always used to their full potential. We will explore some of their potential in this post.

An enable flop is nothing but a regular flop which only registers new data if the enable signal is high, otherwise it keeps the old value. We normally implement this using a MUX and a feedback from the flop’s output as depicted below.

So what is the big deal about it? The nice thing is that the enable flop is already implemented by the guys who built the library in a very optimized way. Usually implementing this with a MUX before the flop will eat away from the cycle time you could otherwise use for your logic. However, a short glance at your library will prove that this MUX comes almost for free when you use an enable flop (for my current library the cost is 20ps).

So how can we use this to our advantage?

Example #1 – Soft reset coding

In many applications soft reset is a necessity. It is a signal usually driven by a register that will (soft) reset all flip flops given that a clock is running. Many times an enable “if” is also used in conjunction.
This is usually coded in this way (I use Verilog pseudo syntax and ask the forgiveness of you VHDL people):
always @(posedge clk or negedge hard_rst)
if (!hard_rst)
ff <= 1'b0;
else if (!soft_rst)
ff <= 1'b0;
else if (en)
ff <= D;

The above code usually results in the construction given in the picture below. The red arrow represents the critical timing path through a MUX and the AND gate that was generated for the soft reset.

Now, if we could only exchange the order of the last two “if” commands this would put the MUX in front of the AND gate and then we could use an enable flop… well, if we do that, it will not be logically equivalent anymore. Thinking about it a bit harder, we could use a trick – let’s exchange the MUX and the AND gate but during soft reset we could force the select pin of the MUX to be “1”, and thus transferring a “0” to the flop! Here’s the result in a picture form.

We can now use an enable flop and we basically got the MUX delay almost for free. This may look a bit petty to you, but this trick can save you a few extra precious tens or hundreds of pico-seconds.

Example #2 – Toggle Flip Flops

Toggle flops are really neat, and there are many cool ways to use them. The normal implementation requires an XOR gate combining the T input and a feedback of the flop itself.

Let’s have a closer look at the logical implementation of an XOR gate and how it is related to a MUX implementation: (a) is a MUX gate equivalent implementation (b) is an XOR gate equivalent implementation and (c) is an XOR implemented from a MUX.

Now, let’s try making a T flop using an enable flop. We saw already how to change the MUX into an XOR gate – all that is left, is to put everything together.


Challenge #2 – One Hot Detection

October 20, 2008

The last challenge was a big success with many people sending their solutions via email or just posting them as a comments.
Many of you said they were waiting for the next challenge. So, before returning to the usual set of posts about different aspects of digital design, let’s look at another one.

Imagine you have a vector of 8 bits, The vector is supposed to be one hot coded (only a single logic “1” is allowed in the set). Your task if you choose to accept it :-), is to design a combo block to detect if the vector is indeed one-hot encoded.

We are again looking for the block with the shortest delay. As for the solution metrics for this challenge please refer to the previous challenge.

Also try to think how your design scales when the input vector is 16 bits wide, 32 bits wide and the general case of n bits wide.

Good luck!


Challenge #1 – DBI Detect

October 8, 2008

It has been a while since we had a challenge question on the site (last one was the divide by 3 question), and I would like to have more of those in the future. I will basically pose a problem and ask you to solve it under certain conditions – e.g. least hardware or latency, lowest power etc.

This time the challenge is related to a real problem I encountered recently. I reached a certain solution, which I do not claim to be optimal, actually I have the feeling it can be done better – I am therefore very interested in your own way of solving the problem.

Your challenge is to design a combo-block with 8 inputs and 1 output. You receive an 8-bit vector, If the vector contains 4 ‘1’s or more, the output should be high, otherwise low (This kind of calculation is commonly used for data bus inversion detection).

What is the best way to design it with respect to minimizing latency (in term of delay units), meaning the lowest logic depth possible.

Just so we could compare solutions, let’s agree on some metrics. I am aware that your own library might have different delay ratios between the different elements, but we gotta have something to work with.

  • Inverter – 1 delay unit
  • NOR, NAND – 2 delay units
  • AND, OR – 3 delay units
  • 3 or 4 input NOR, NAND – 4 delay units (2 for first stage + 2 for second stage)
  • 3 or 4 input OR, AND – 6 delay units (2 for first stage + 2 for second stage)
  • XOR, MUX – 7 delay units (2 AND/OR + 1 Inverter)
  • Please either post a comment with a detailed solution, or send me an email.

    Take it from here guys…


    Who Said Clock Skew is Only Bad?

    October 2, 2008

    We always have this fear of adding clock skew. Well, seems like this is one of the holy cows of digital design, but sometimes clock skew can be advantageous.

    Take a look at the example below. The capturing flop would normally violate setup requirements due to the deep logic cloud. By intentionally adding delay we could help make the clock arrive later and thus meet the setup condition. Nothing comes for free though, if we have another register just after the capturing one, the timing budget there will be cut.

    This technique can also be implemented on the block level as well. Assume we have two blocks A and B. B’s signals, which are headed towards A, are generated by a deep logic cloud. On the other hand A’s signals, which arrive at B, are generated by a rather small logic cloud. Skewing the clock in the direction of A now, will give more timing budget for the B to A signals but will eat away the budget from A to B’s signals.

    Inserting skew is very much disliked by physical implementation guys although a lot of the modern tools know how to handle it very nicely and even account for the clock re-convergence pessimism (more on this in another post). I have the feeling this dislike is more of a relic of the past, but as we push designs to be more complex, faster, less power hungry etc. we have to consider such techniques.