## Puzzle #5 – Binary-Gray

June 12, 2007

Assuming you have an n-bit binary counter, made of n identical cascaded cells, which hold the corresponding bit value. Each of the binary cells dissipates a power of P units only when it toggles.
You also have an n-bit Gray counter made of n cascaded cells, which dissipates 3P units of power per cell when it toggles.

You now let the counters run through an entire cycle (2^n different values) until they return to their starting position. Which counter burns more power?

## Low Power – Clock Gating Is Not The End Of It…

June 12, 2007

A good friend of mine, who works for one of the micro-electronics giants, told me how low power is the buzz word today. They care less about speed/frequency and more about minimizing power consumption.

He exposed me to a technique in logic design I was not familiar with. It is basically described in this paper. Let me just give you the basic idea.

The main observation is that even when not active, logic gates have different leakage current values depending on their inputs. The example given in the article shows that a NAND gate can have its leakage current reduced by almost a factor of 2.5 depending on the inputs!
How is this applied in reality? Assume that a certain part of the design is clock gated, this means all flip-flops are inactive and in turn the logic clouds between them. By “muxing” a different value at the output of the flop, which is logic dependent, we could minimize the leakage through the logic clouds. When waking up, we return to the old stored value.

The article, which is not a recent work by the way, describes a neat and cheap way of implementing a storage element with a “sleep mode” output of either logic “1” or logic “0”. Notice that the “non-sleep mode” or normal operation value is still kept in the storage element. The cool thing is, that this need not really be a true MUX in the output of the flop – after finalizing the design an off-line application analyzes the logic clouds between the storage elements and determines what values are needed to be forced during sleep mode at the output of each flop. Then, the proper flavor of the storage element is instantiated in place (either a “sleep mode” logic “0” or a “sleep mode” logic “1”).

It turns out that the main problem is the analysis of the logic clouds and that the complexity of this problem is rather high. There is also some routing overhead for the “sleep mode” lines and of course a minor area overhead.
I am interested to know how those trade-offs are handled. As usual, emails and comments are welcome.

Bottom line – this is a way cool technique!!!

## Puzzle #4 – The min-max question

June 8, 2007

Here is a question you are bound to stumble upon in one of your logic design job interviews, why? I don’t know, I personally think it is pretty obvious, but what do I know…

MinMax2 is a component with 2 inputs – A and B, and 2 outputs – Max and Min. You guessed it, you connect the 2 n-bit numbers at the inputs and the component drives the Max output with the bigger of the two and the Min output with the smaller of the two.

Your job is to design a component – MinMax4, with 4 inputs and 4 outputs which sorts the 4 numbers using only MinMax2 components. Try to use as little as possible MinMax2 components.

If you made it so far, try making a MinMax6 component from MinMax2 and MinMax4 components.

For bonus points – how many different input sequences are needed to verify the logical behavior of MinMax4?

## Puzzle #3

June 4, 2007

OK, you seem to like them so here is another puzzle/interview question.

In the diagram below both X and Y are n-bit wide registers. With each clock cycle you could select a bit-wise basic operation between X and Y and load it to either X or Y, while the other register keeps its value.
The problem is to exchange the contents of X and Y. Describe the values of the “select logic op” and “load XnotY” signals for each clock cycle.

## Big Chips – Some Low Power Considerations

June 2, 2007

As designers, especially ones who only code in HDL, we don’t normally take into account the physical size of the chip we are working on. There are many effects which surface only past the synthesis stage and when approaching the layout.

As usual, let’s look at an example. Consider the situation described on the diagram below.

Imagine that block A and B are located physically far from one another, and could not be placed closer to one another. If the speeds we are dealing with are relatively high, it may very well be that the flight time of the signals from one side of the chip to another, already becomes too critical and even a flop to flop connection without any logic in between will violate setup requirements!
Now, imagine as depicted that many signals are sent across the chip. If you need to pipeline, you would need to pipeline a lot of parallel lines. This may result in a lot of extra flip-flops. Moreover, your layout tool will have to put in a lot of buffers to keep sharp edged signals. From architectural point of view, decoding globally may sound attractive at first, since you only need to do it once but can lead to a very power hungry architecture.

The alternative is to send as less long lines as possible across the chip, As depicted below.

With this architecture block B decodes the logic locally. If the lines sent to block B, need also to be spread all over the chip, we definitely pay in duplicating the logic for each target block.

There is no strict criteria to decide when to take the former or the latter architectures, as there is no definite crossover point. I believe this is more of a feeling and experience thing. It is just important to have this in mind when working on large designs.

## Synchronization of Buses

June 1, 2007

I know, I know, it is common knowledge that we never synchronize a bus. The reason being the uncertainty of when and how the meta-stability is resolved. You can read more about it in one of my previous posts.

A cool exception of when bus synchronization would be safe, is when you guarantee that:

1. On the sender side, one bit only changes at a time – Gray code like behavior
2. On the receiver (synchronized bus) side, the sampling clock is fast enough to allow only a single bus change

Just remember that both conditions must be fulfilled.

It is important to note that this can still be dangerous when the sender and receiver have the same frequency but phase is drifting! why???

Are there any other esoteric cases where one could synchronize a bus? comments are welcome!