Archive for June, 2007

h1

Resource Sharing vs. Performance

June 27, 2007

I wanted to spend a few words on the issue of resource sharing vs. performance. I believe it is trivial for most engineers but a few extra words won’t do any harm I guess.
The issue is relevant most evidently when there is a need to perform a “heavy” or “expensive” calculation on several inputs in a repeated way.

The approaches usually in consideration are: building a balanced tree structure, sequencing the operations, or a combination of the two.

A tree structure architecture is depicted below. The logic cloud represents the “heavy” calculation. One can see immediately that the operation on a,b and c,d is done in parallel and thus saves latency on the expense of instantiating the logic cloud twice.

tree_structure.png

The other common solution, depicted below, is to use the logic cloud only once but introducing a state machine which controls a MUX, that determines which values will be calculated on the next cycle. The overhead of designing this FSM is minimal (and even less). The main saving is in using the logic cloud only once. Notice that we pay here in throughput and latency! With some more thinking, one could also save a calculation cycle by introducing another MUX in the feedback path, and using one of the inputs just for the first calculation, thereafter using always the feedback path.

resource_sharing.png

h1

Puzzle #4 – Solution

June 24, 2007

Here are the block diagrams for the solution of the MinMax problem.

minmax4.pngminmax6.png

h1

Puzzle #6 – The Spy – (A real tough one…)

June 22, 2007

This one I heard a while back and saw that a version of it also appears in Peter Winkler’s excellent book Mathematical Puzzles – A Connoisseur’s Collection. Here is the version that appears in the book:

A spy in an enemy country wants to transmit information back to his home country.
The spy wants to utilize the enemy country’s daily morning radio transmission of 15-bits (which is also received in his home country). The spy is able to infiltrate the radio station 5 minutes before transmission time, analyze the transmission that is about to go on air, and can either leave as it is, or flip a single bit somewhere in the transmission (a flip of more than one bit would make the original transmission too corrupt).

how much information can the spy transmit to his operators?

remember:

  • The transmission is most likely a different set of 15-bits each day but can also repeat the last day’s transmission. Best, assume it is random
  • The spy is allowed to change a maximum of 1 bit in any position
  • The spy has agreed on an algorithm/strategy with his operators before he was sent to the enemy country
  • No other information or communication is available. the communication is strictly one way
  • The spy sees for the first time the intended daily transmission 5 minutes before it goes on the air, he does not hold a list of all future transmissions
  • The information on the other end should be extracted in a deterministic way

I believe this one is too tough for an interview question – it took me well over an hour to come up with a solution (well, that actually doesn’t say much…). Anyways, this is definitely one of my favorite puzzles.

h1

Non-Readable Papers

June 19, 2007

I actually enjoy surfing the web and reading technical papers which are somewhat related to my work. A lot of the good stuff appears in books, but if you want to find the coolest techniques and breakthrough ideas, they naturally first appear in technical papers.

I have to admit I don’t like the format used by the standard technical papers, some of them seem to be made non-readable on purpose. Here is a real paper that can compete for the dubious title of being the most non-readable paper around.

Here is one of my papers. Before you continue, stop and try digesting what was written…

If you made through the first page, consider yourself a hero. That “technical paper” was generated automatically using SCIgen.

I bet a lot of people would be impressed if you present a list of papers generated by this service. A sort of a high-tech “emperor’s new cloths” syndrome – no one wants to admit he doesn’t understand a technical paper describing some “major” work in his own field…

h1

Puzzle #3 – Solution

June 19, 2007

This post is written only for completeness reasons. The answer to puzzle #3 was almost immediately given in the comments. I will just repeat it here.
The important observations are that XOR (X,X) = 0 and that XOR(X,0) = X The solution is therefore:

Operation       Result
---------------------------------
X = XOR(,)     X^Y,Y
Y = XOR(,)     X^Y,X^Y^Y = X
X = XOR(,)     X^X^Y = Y,X  done! 
h1

Low Power Techniques – Reducing Switching

June 15, 2007

In one of the previous posts we discussed a cool technique to reduce leakage current. This time we will look at dynamic power consumption due to switching and some common techniques to reduce it.

Usually, with just a little bit of thinking, reduction of switching activity is quite possible. Let’s look at some examples.

  • Bus inversion
  • Bus inversion is an old technique which is used a lot in communication protocols between chip-sets (memories, processors, etc.), but not very often between modules within a chip. The basic idea is to add another line to the bus, which signals whether to invert the entire bus (or not). When more than half of the lines needs to be switched the bus inversion line is asserted. Here is a small example of a hypothetical transaction and the comparison of amount of transitions between the two schemes.

    bu_inversion.png

    If you studied the above example a bit, you could immediately see that I manipulated the values in such a way that a significant difference in the total amount of transitions is evident.

  • Binary Number Representation
  • The two most common binary number representation in applications are 2’s complement and signed magnitude, with the former one usually preferred. However, for some very specific applications signed digit shows advantages in switching. Imagine you have a sort of integrator, which does nothing more than summing up values each clock cycle. Imagine also that the steady state value is around 0, but fluctuations above and below are common. If you would use 2’s complement going from 0 to -1 will result in switching of the entire bit range (-1 in 2’s complement is represented by 111….). If you would use signed digit, only 2 bits will switch when going from 0 to -1.

    reduce_switching_coding.png

  • Disabling/Enabling Logic Clouds
  • When handling a heavy logic cloud (with wide adders, multipliers, etc.) it is wise to enable this logic only when needed.
    Take a look at the diagrams below. On the left implementation, only the flop at the end of the path – flop “B” has an enable signal, since flop “A” could not be gated (its outputs are used someplace else!) the entire logic cloud is toggling and wasting power. On the right (no pun intended) implementation, the enable signal was moved before the logic cloud and just for good measures, the clock for flop “B” was gated.

    reducing_switching_enable1.pngreducing_switching_enable2.png

  • High Activity Nets
  • This trick is usually completely ignored by designers. This is a shame since only power saving tools which can drive input vectors on your design and run an analysis of the active nets, might be able to resolve this.
    The idea here is to identify the nets which have high activity among other very quiet nets, and to try to push them as deep as possible in the logic cloud.

    reducing_switching_high_activity_net1.pngreducing_switching_high_activity_net2.png

    On the left, we see a logic cloud which is a function of X1..Xn,Y. X1..Xn change with very low frequency, while Y is a high activity net. On the implementation on the right, the logic cloud was duplicated, once assuming Y=0 and once for Y=1, and then selecting between the 2 options depending on the value of Y. Often, the two new logic clouds will be reduced in size since Y has a fixed value there.

    h1

    A Short Note on Automatic Clock Gates Insertion

    June 13, 2007

    As we discussed before, clock gating is one of the most solid logic design techniques, which one can use when aiming for low power design.
    It is only natural that most tools on the market support an automatic clock gating insertion option. Here is a quote from a synopsys article describing their power compiler tool

    …Module clock gating can be used at the architectural level to disable the clock to parts of the design that are not in use. Synopsys’ Power Compiler™ helps replace the clock gating logic inserted manually, gating the clock to any module using an Integrated Clock Gating (ICG) cell from the library. The tool automatically identifies such combinational logic…

    But what does it really mean? What is this combinational logic that the tool “recognizes”?

    The answer is relatively simple. Imagine a flip-flop with an enable signal. Implementation wise, this is done with a normal flip-flop and a MUX before with a feedback path to preserve the logical value of the flop when the enable is low. This is equivalent to a flop with the MUX removed and the enable signal controlling the enable of a clock gate cell, which in turn drives the clock for the flip-flop.

    The picture below is better than any verbal explanation.

    auto_clock_gating.png