Archive for the ‘Architecture’ Category

h1

Reordering Nets for Low Power

May 10, 2009

As posts accumulate, you can see that low power design aspects is a big topic on this site. I try to bring more subtle design examples for lower power design that you can control and implement (i.e. in RTL and the micro architectural stage).

Identifying “glitchy” nets is not always easy. Some good candidates are wide parity or CRC calculations (deep and wide XOR trees), complicated arithmetic paths and basically most logic that originates in very wide buses and converges to a single output controlling a specific path (e.g. as a select pin of a MUX for a wide data path).

If you happen to identify a good candidate, it is advisable (when possible) that you feed the “glitchy” nets as late as possible in the calculation path. This way the total amount of toggling in the logic is reduced.

Sounds easy enough? Well the crux of the problem is identifying those opportunities – and it is far from easy. I hope this post at least makes you more aware of that possibility.

To sum up, here are two figures that illustrate the issue visually. The figure below depicts the situation before the transformation.. The nets which are highlighted represent high activity nets.
glitchy_net1

After the transformation – pushing the glitchy net calculation late in the path. The transformation is logically equivalent (otherwise there is no point…) and we see less high activity nets.

glitchy_net2

h1

Who Said Clock Skew is Only Bad?

October 2, 2008

We always have this fear of adding clock skew. Well, seems like this is one of the holy cows of digital design, but sometimes clock skew can be advantageous.

Take a look at the example below. The capturing flop would normally violate setup requirements due to the deep logic cloud. By intentionally adding delay we could help make the clock arrive later and thus meet the setup condition. Nothing comes for free though, if we have another register just after the capturing one, the timing budget there will be cut.

This technique can also be implemented on the block level as well. Assume we have two blocks A and B. B’s signals, which are headed towards A, are generated by a deep logic cloud. On the other hand A’s signals, which arrive at B, are generated by a rather small logic cloud. Skewing the clock in the direction of A now, will give more timing budget for the B to A signals but will eat away the budget from A to B’s signals.

Inserting skew is very much disliked by physical implementation guys although a lot of the modern tools know how to handle it very nicely and even account for the clock re-convergence pessimism (more on this in another post). I have the feeling this dislike is more of a relic of the past, but as we push designs to be more complex, faster, less power hungry etc. we have to consider such techniques.

h1

Arithmetic Tips and Tricks #2 – Another Look at a Slow Adder

August 18, 2008

Do you remember the old serial adder circuit below? A stream of bits comes in (LSB first) on the FA inputs, the present carry-out bit is registered and fed in the next cycle as a carry in. The sum comes in serially on the output (LSB first).

True, it is rather slow – it takes n cycles to add n bits. But hold on, check out the logic depth – one full adder only!! This means the clock can run a lot faster than your typical n-bit adder.
Moreover, it is by far the smallest, cheapest and consumes the least power of all adders known to mankind.

Of course you gotta have this high speed clock available in the system already, and you still gotta know when to stop adding and to sample your result.
Taking all this into consideration, I am sure this old nugget can still be useful somewhere. If you already used it before, or have an idea, place a comment.

h1

Predictive Synchronizers

July 7, 2008

As we discussed many times before, synchronization of signals involves also latency issues. Sometimes these latency issues are quite a mess. This post will go over the principle of operation of predictive synchronizers, which offer a specific solution for a very specific case.

Let’s start by describing the conditions for this specific case. For the sake of explanation let us assume we have two clock domains with different clock periods. On top we have a certain limited or capped jitter component defined by our spec.
Taking the conservative approach, we would always use a full two flip flop synchronizer. However, a closer look at a typical waveform reveals something interesting.

The figure above shows both clocks. The limited jitter as defined by our spec, is shown in gray. Notice how only during specific periods a full synchronizer needs to be used. For the upper clock each 5th cycle is a “dangerous” one, while for the lower clock each 4th is problematic. The time window in which these danger zones occur is predictable.

In general we could count the clock cycles, and then, when the next clock edge occurs in the “danger zone” we could switch and use a full synchronizer circuit, otherwise a single flop is enough.

A circuit which implements this idea can be seen below. The potentially metastable node is blocked by the FSM during the “danger time” and the synchronizer output is taken, otherwise the normal, first flop’s output, is taken. The logic at the output is basically that of a MUX.

h1

Non-Power-of-2 Gray Counter Design

June 9, 2008

So… you want to design a counter with a cycle which is different than a power of 2. You would like to use a Gray counter because of its advantages and just because it is simply beautiful, but alas, your cycle length is not a power of two – what to do?
This post will try to give you a sort of recipe of how to design such a non-power-of-2 Gray counter and the reasoning behind.

First, if your cycle length is an odd number, you are in trouble since this is just not possible to construct a counter with the Gray properties and with an odd cycle length. A simple way to see why it is so, is to notice that a Gray counter changes its parity with each count because only one bit changes at a time.
This naturally means that the parity toggles, but since we have an odd number of states and if we started with even parity – the last state will also have odd parity, and when we wrap around the parity won’t change! Assuming that the first and last states are different, this means that 2 bits must change at a time, thus contradicting the Gray hypothesis.

OK, so we limited ourselves to an even amount of states, is it possible now? It is! We could ask our friend Google and come up with some info and even some patents, but the best discussion on the subject that I found was written by Clive Maxfield here.

When approaching this problem, what (hopefully) should immediately struck us, is that we have to somehow use the reflection property of the Gray code (This method among others is discussed by Clive as well). Let’s take a deeper look at the 4-bit Gray code below.

The pairs of states which have identical distance from the axis of reflection are only different by their MSB. This in turn, means that we could eliminate pairs-at-a-time around the axis of reflection, and arrive to our desired number of states for the counter. Moreover, we notice that the (n-1) LSBs count up to a certain value then change direction and count down again. This property remains true even if we remove any amount of pairs around the axis of reflection.

What we have to do now, is to find this “switching value”, when we reach it on the up-count, toggle the direction bit – which is also our MSB, and block the (n-1) LSBs Gray counter for this direction switch cycle (otherwise 2 bits would change). We now count down to the initial state (all zeros). When we reach it, we again have to switch direction and block the counter and so on ad infinitum.

We can use the modular up/down Gray counter I described here, here and here. for our (n-1) LSBs. We have to find a priori the “switching value”, which is the (n-1) bit Gray value of our number of counter states divided by 2. For Example, if you want a 10 state Gray counter then: 10/2 = 5, therefore we need the 5th Gray value of a normal 3 bit Gray code, which turns out to be 110
The rest of the circuit is depicted in the figure below:

It is important to see that we use the minimal possible memory elements required for the Gray counter (i.e. no extra states to remember or pipeline) and that during “direction switching” we gate the clock for the (n-1) LSBs up/down Gray counter using an ordinary clock gate construct.
If we look carefully we see that the “direction switching” logic is basically a mux structure with the select being the direction bit.

A timing diagram of the above circuit for a 10 state Gray counter is also depicted below for clarity.

h1

Replace Your Ripple Counters

May 23, 2008

I was recently talking to some friends, and they mentioned some problems they encountered after tape out. Turns out that, the suspicious part of the design was done full custom and the designers thought it would be best to save some power and area and use asynchronous ripple counters like the one pictured below. The problem was that those counters were later fed into a semi-custom block – the rest is history.

Asynchronous ripple counters are nice and great but you really have to be careful with them. They are asynchronous because not all bits change at the same time. For the MSB to change the signal has to ripple through all the bits to its right, changing them first. The nice thing about them is that they are cheap in area and power. This is why they are so attractive in fast designs, but this is also why they are very dangerous because the ripple time through the counter can approach the order of magnitude of the clock period. This means that a digital circuit that depends on the asynchronous ripple counter as an input might violate the setup-hold window of the capturing flop behind it. To sum up, just because it is coming from a flop doesn’t mean it has to be synchronous.

If you can, even if you are a full custom designer, I strongly recommend replacing your ripple counters with the following almost identical circuit.

It is based on T-flops (toggle flip flops are just normal flops with an XOR of the current state and the input, which is also called the toggle signal) and from the principle of operation is almost the same, although here instead of generating the clock edge for the next stage, we generate a toggle signal when all previous (LSB) are “1”. Notice that the counter is synchronous since the clock signal (marked red) arrives simultaneously to all flops.

h1

Latch Based Design Robustness

May 19, 2008

Latch based design is usually not given enough attention by digital designers. There are many reasons for that, some are very well based, other reasons are just because latch based design is just unknown and looked at as a strange beast.

I intend to have a serious of posts concerning latch based design issues. I have to admit that most of my design experience was gained doing flip flop based designs, but the latch based designs I did were always interesting and challenging. If any of you readers have some interesting latch based design examples please send them over to my email and I will include them in later posts.

Just to start the ball rolling, here is a very interesting paper on the robustness of latch based design with comparison to flip flop based design. If you don’t have the time to go through the entire paper just look at figure 1 and its description.

I also added this paper to the list of recommended reading list.