Archive for the ‘Low Power’ Category

h1

Reordering Nets for Low Power

May 10, 2009

As posts accumulate, you can see that low power design aspects is a big topic on this site. I try to bring more subtle design examples for lower power design that you can control and implement (i.e. in RTL and the micro architectural stage).

Identifying “glitchy” nets is not always easy. Some good candidates are wide parity or CRC calculations (deep and wide XOR trees), complicated arithmetic paths and basically most logic that originates in very wide buses and converges to a single output controlling a specific path (e.g. as a select pin of a MUX for a wide data path).

If you happen to identify a good candidate, it is advisable (when possible) that you feed the “glitchy” nets as late as possible in the calculation path. This way the total amount of toggling in the logic is reduced.

Sounds easy enough? Well the crux of the problem is identifying those opportunities – and it is far from easy. I hope this post at least makes you more aware of that possibility.

To sum up, here are two figures that illustrate the issue visually. The figure below depicts the situation before the transformation.. The nets which are highlighted represent high activity nets.
glitchy_net1

After the transformation – pushing the glitchy net calculation late in the path. The transformation is logically equivalent (otherwise there is no point…) and we see less high activity nets.

glitchy_net2

h1

Reducing Power Through Retiming

February 23, 2009

Here is an interesting and almost trivial technique for (potential) power reduction, which I never used myself, nor seen used in others’ designs. Well… maybe I am doing the wrong designs… but I thought it is well worth mentioning. So, if any of my readers use this, please do post a short comment on how exactly did you implement it and if it really resulted in some significant savings.

We usually have many high activity nets in the design. They are in many cases toggling during calculation more than once per cycle. Even worse, they often drive long and high capacitive nets. Since, in a usual synchronous design (which 99% of us do), we only need the stable result once per cycle – when the calculation is done – we can just put a register to drive the high capacitive net. The register effectively blocks all toggling on that net (where it hurts) and allows it to change maximum one time per cycle.

The image below tells the whole story. (a) is before the insertion of the flop, (b) right after.

low_power_retiming

This is all nice, but just remember that in real life it can be quite hard to identify those special nets and those special high toggling logic clouds. Moreover, most of the time we cannot afford the flop for latency reasons. But if you happen to be in the early design phase and you know more or less your floor plan, think about moving some of those flops so they will reduce the toggling on those high capacitive nets.

h1

Arithmetic Tips and Tricks #2 – Another Look at a Slow Adder

August 18, 2008

Do you remember the old serial adder circuit below? A stream of bits comes in (LSB first) on the FA inputs, the present carry-out bit is registered and fed in the next cycle as a carry in. The sum comes in serially on the output (LSB first).

True, it is rather slow – it takes n cycles to add n bits. But hold on, check out the logic depth – one full adder only!! This means the clock can run a lot faster than your typical n-bit adder.
Moreover, it is by far the smallest, cheapest and consumes the least power of all adders known to mankind.

Of course you gotta have this high speed clock available in the system already, and you still gotta know when to stop adding and to sample your result.
Taking all this into consideration, I am sure this old nugget can still be useful somewhere. If you already used it before, or have an idea, place a comment.

h1

Low Power Methodology Manual

May 24, 2008

I recently got an email from Synopsys. It was telling me I can download a personalized copy of a “Low Power Methodology Manual”. Now, to tell you the truth, I sometimes get overly suspicious of those emails (not necessarily from Synopsys) and I didn’t really expect to find real value in the manual – boy, was I wrong, big time!

Here you get a very nice book (as pdf file), which has extremely practical advice. It does not just spend ink on vague concepts – you get excellent explanations with examples. And mind you, this is all for free.

Just rush to their site and download this excellent reference book that should be on the table of each digital designer.

The Synopsys link here.

The hard copy version can be bought here or here.

h1

The Principle Behind Multi-Vdd Designs

April 2, 2008

Multi-Vdd design is a sort of buzz word lately. There are still many issues involved before it could become a real accepted and supported design methodology, but I wanted to write a few words on the principle behind the multi-Vdd approach.

The basic idea is that by lowering the operating voltage of a logic gate we naturally also cut the power dissipation through the gate.
The price we pay is that gates operated by lower voltage are somewhat slower (exact factor is dependent on many factors).

The basic idea is to identify the non-critical paths and to power the gates in those paths with a lower voltage. Seen below are two paths, there is obviously less logic through the blue path than through the orange one and is therefore a candidate for being supplied with lower Vdd.

multivdd.png

The idea looks elegant but as always the devil is in the details. There are routing overheads for the different power grids, level shifters must be introduced when two different Vdd logic paths converge to create a new logical function, new power source for the new Vdd must be designed and most important of all, there has to be support present by the CAD tools – if that doesn’t exist this technique will be buried.

h1

A Concise Guide to Why and How to Split your State Machines

September 8, 2007

So, why do we really care about state machine partitioning? Why can’t I have my big fatty FSM with 147 states if I want to?

Well, smaller state machines are:

  1. Easier to debug and probably less buggy
  2. More easily modified
  3. Require less decoding
  4. Are more suitable for low power applications
  5. Just nicer…

There is no rule of thumb stating the correct size of an FSM. Moreover, a lot of times it just doesn’t make sense to split the FSM – So when can we do it? or when should we do it? Part of the answer lies in a deeper analysis of the FSM itself, its transitions and most important, the probability of occupying specific states.

Look at the diagram below. After some (hypothetical) analysis we recognize that in certain modes of operation, we spend either a lot of time among the states marked in red or among the states marked in blue. Transitions between the red and blue areas are possible but are less frequent.

fsm_partition1.png

The trick now, is to look at the entire red zone as one state for a new “blue” FSM, and vice versa for the a new “red” FSM. We basically split the original FSM into two completely separate FSMs and add to each of the FSM a new state, which we will call a “wait state”. The diagram below depicts our new construction.

fsm_partition2.png

Notice how for the “red” FSM transitioning in and out of the new “wait state” is exactly equivalent (same conditions) to switching in and out of the red zone of the original FSM. Same goes for the blue FSM but the conditions for going in and out of the “wait state” are naturally reversed.

OK, so far so good, but what is this good for? For starters, it would probably be easier now to choose state encodings for each separate FSM that will reduce switching (check out this post on that subject). However, the sweetest thing is that when we are in the “red wait state” we could gate the clock for the rest of the red FSM and all its dependent logic! This is a significant bonus, since although previously such strategy would have been possible, it would just be by far more complicated to implement. The price we pay is additional states which will sometimes lead to more flip-flops needed to hold the current state.

As mentioned before, it is not wise to just blindly partition your FSMs arbitrarily. It is important to try to look for patterns and recognize “regions of operation”. Then, try to find transitions in and out of this regions which are relatively simple (ideally one condition to go in and one to go out). This means that sometimes it pays to include in a “region” one more state, just to make the transitioning in and out of the “region” simpler.

Use this technique. It will make your FSMs easy to debug, simple to code and hopefully will enable you to introduce low power concepts more easily in your design.

h1

FSM State Encoding – More Switching Reduction Tips

September 4, 2007

I promised before to write some words on reducing switching activity by cleverly assigning the states of an FSM, so here goes…

Look at the example below. The FSM has five states “A”-“E”. Most naturally, one would just sequentially enumerate them (or use some enumeration scheme given by VHDL or Veriog – which is easier for debugging purposes).
In the diagram the sequential enumeration is marked in red. Now, consider only the topology of the FSM – i.e. without any reference to the probability of state transitions. You will notice that the diagram states (pun intended) in red near each arc the amount of bits switching for this specific transition. For example, to go from state “E” (100) to state “B” (001), two bits will toggle.

red_switching_fsm.png

But could we choose a better enumeration scheme that will reduce the amount of switching? Turns out that yes (don’t tell anybody but I forced this example to have a better enumeration 🙂 ). If you look at the green state enumeration you will clearly see that at most only one bit toggles for every transition.

If you sum up all transitions (assuming equal probability) you would see that the green implementation toggles exactly half the time as the red. An interesting point is that we need only to consider states “B” – “E”, because once state “A” is exited it can never be returned to (this is sometimes being referred to as “black hole” or “a pit”).

The fact that we chose the states enumeration more cleverly doesn’t only mean that we reduced switching in the actual flip-flops that hold the state itself, but we also reduce glitches/hazards in all the combinational logic that is dependent on the FSM! The latter point is extremely important since those combinational clouds can be huge in comparison to the n flops that hold the state of the FSM.

The procedure on choosing the right enumeration deserve more words but this will become a too lengthy post. In the usually small FSMs that the average designer handles on a daily basis, the most efficient enumeration can be easily reached by trial and error. I am sure there is somewhere some sort of clever algorithm that given an FSM topology can spit out the best enumeration. If you are aware of something like that, please send me an email.