## Low Power – Clock Gating Is Not The End Of It…

June 12, 2007

A good friend of mine, who works for one of the micro-electronics giants, told me how low power is the buzz word today. They care less about speed/frequency and more about minimizing power consumption.

He exposed me to a technique in logic design I was not familiar with. It is basically described in this paper. Let me just give you the basic idea.

The main observation is that even when not active, logic gates have different leakage current values depending on their inputs. The example given in the article shows that a NAND gate can have its leakage current reduced by almost a factor of 2.5 depending on the inputs!
How is this applied in reality? Assume that a certain part of the design is clock gated, this means all flip-flops are inactive and in turn the logic clouds between them. By “muxing” a different value at the output of the flop, which is logic dependent, we could minimize the leakage through the logic clouds. When waking up, we return to the old stored value.

The article, which is not a recent work by the way, describes a neat and cheap way of implementing a storage element with a “sleep mode” output of either logic “1” or logic “0”. Notice that the “non-sleep mode” or normal operation value is still kept in the storage element. The cool thing is, that this need not really be a true MUX in the output of the flop – after finalizing the design an off-line application analyzes the logic clouds between the storage elements and determines what values are needed to be forced during sleep mode at the output of each flop. Then, the proper flavor of the storage element is instantiated in place (either a “sleep mode” logic “0” or a “sleep mode” logic “1”).

It turns out that the main problem is the analysis of the logic clouds and that the complexity of this problem is rather high. There is also some routing overhead for the “sleep mode” lines and of course a minor area overhead.
I am interested to know how those trade-offs are handled. As usual, emails and comments are welcome.

Bottom line – this is a way cool technique!!!

## Big Chips – Some Low Power Considerations

June 2, 2007

As designers, especially ones who only code in HDL, we don’t normally take into account the physical size of the chip we are working on. There are many effects which surface only past the synthesis stage and when approaching the layout.

As usual, let’s look at an example. Consider the situation described on the diagram below.

Imagine that block A and B are located physically far from one another, and could not be placed closer to one another. If the speeds we are dealing with are relatively high, it may very well be that the flight time of the signals from one side of the chip to another, already becomes too critical and even a flop to flop connection without any logic in between will violate setup requirements!
Now, imagine as depicted that many signals are sent across the chip. If you need to pipeline, you would need to pipeline a lot of parallel lines. This may result in a lot of extra flip-flops. Moreover, your layout tool will have to put in a lot of buffers to keep sharp edged signals. From architectural point of view, decoding globally may sound attractive at first, since you only need to do it once but can lead to a very power hungry architecture.

The alternative is to send as less long lines as possible across the chip, As depicted below.

With this architecture block B decodes the logic locally. If the lines sent to block B, need also to be spread all over the chip, we definitely pay in duplicating the logic for each target block.

There is no strict criteria to decide when to take the former or the latter architectures, as there is no definite crossover point. I believe this is more of a feeling and experience thing. It is just important to have this in mind when working on large designs.

## Do You Think Low Power???

May 20, 2007

There is almost no design today, where low power is not a concern. Reducing power is an issue which can be tackled on many levels, from the system design to the most fundamental implementation techniques.

In digital design, clock gating is the back bone of low power design. It is true that there are many other ways the designer can influence the power consumption, but IMHO clock gating is the easiest and simplest to introduce without a huge overhead or compromise.

Here is a simple example on how to easily implement low power features.

The picture on the right shows a very simple synchronous FIFO. That FIFO is a very common design structure which is easily implementable using a shift register. The data is being pushed to the right with each clock and the tap select decides which register to pick. The problem with this construction is that with each clock all the flip-flops potentially toggle, and a clock is driven to all. This hurts especially in data or packet processing applications where the size of this FIFO can be in the range of thousands of flip-flops!!

The correct approach is instead of moving the entire data around with each clock, to “move” the clock itself. Well not really move, but to keep only one specific cell (or row in the case of vectors) active while all the other flip-flops are gated. This is done by using a simple counter (or a state machine for specific applications) that rotates a “one hot” signal – thus enabling only one cell at a time. Notice that the data_in signal is connected to all the cells in parallel,. When new data arrives only the cell which receives a clock edge in that moment will have a new value stored.