Archive for August, 2007


Low Power Buses – More Tricks for Switching Reduction

August 31, 2007

Viewing the search engine key words which people use to get to this blog, I can see that low power tips and tricks are one of the most interesting topics for people. Before I start this post, it is important to mention that although there is almost always something to do, the price can be great. Price doesn’t always mean in area or complexity, sometimes it is just your own precious time. You can spend tons of time thinking on a very clever architecture or encoding for a bus but you might miss your dead lines all together.
OK, enough of me blubbering about “nonsense”, let’s get into some more switching reduction tricks.

Switching reduction means less dynamic power consumption, this has little to do with static power or leakage current reduction. When thinking of architectures or designing a block in HDL (verilog or VHDL) this is the main point we can tackle though. There is much less we could do about static power reduction by using various HDL tricks. This can be left to the system architect, our standard cell library developers or our FPGA vendor.

  • Bus States Re-encoding
  • Buses usually transfer information across a chip, therefore in a lot of cases they are wide and long. Reduction of switching on a wide or long bus is of high importance. Assume you already have a design in a late stage which is already pretty well debugged. Try running some real life cases and extract what are the most common transitions that occur on the bus. If we got a 32-bit bus that switches a lot between 00…00 and 11…11 we know it is bad. It is a good idea to re-encode the state 11…11 into 00…01, for example. Then, decode it back on the other side. We would save the switching of 31 bits in this case. This is naturally a very simple case, but analyze your system, these things happen in real life and are relatively easy to solve – even on a late stage of a design! If you read this blog for sometime now, you probably know that I prefer visualization. The diagram below summarizes the entire paragraph.


  • Exploiting Special Cases – Identifying Patterns
  • Imagine this, you have a system which uses a memory. During many operation stages you have to dump some contents into or out of the memory element. This is done by addressing the memory address by address in a sequential manner. We probably can’t do much about the data, since it is by nature random but what about the address bus? We see a pattern that repeats over and over again: an address is followed by the next. We could add another line which tells the other side to increment the previous address given to him. This way we save the entire switching on the bus when sweeping through the entire address range.
    The diagram below gives a qualitative solution of how an approach like this would work. If you are really a perfectionist, you could gate the clock to the bus sampling flops which reserve the previous state, because their value is only important when doing the increments. You would just have to be careful on some corner cases.


    Generally speaking it is always a good idea to recognize patterns and symmetry and exploit it when transmitting information on a bus. Sometimes it can be the special numbering system being used, or a specific sequence which is often used on a bus or a million different other things. The point is to identify the trade off between investing a lot of investigations and the simplicity of the design.

    On one of the future posts, we will investigate how we could use the same switching reduction techniques for FSM state assignments, so stay tuned.


    Puzzle #6 – The Spy- Solution

    August 28, 2007

    This puzzle created some interest, but apart from one non-complete solution which demonstrates the principle only, I didn’t receive any other feedback. Here is my own solution, which is different than the one given in the Winkler book. Naturally, I believe my solution is easier to understand, but please get the Winkler book, it is really that good and you could decide for yourself.

    Now for the reason you are reading this post… the solution… oh, if you don’t remember the puzzle, please take a few moments to re-read it and understand what it is all about.

    I will (try to) prove that 16 different symbols can be transmitted by flipping a single bit of the 15 which are transmitted daily.
    First, for convenience reasons we define the 15-bit transmission as a 14-0 vector.
    We will now define four parity functions P0,P1,P2,P3, as follows:


    Why these specific functions will be clear in a moment.
    Let’s try to view them in a more graphical way by marking above each bit in the vector (with the symbols P0..P3) iff this bit affects the calculation of the respective “P-function”. For example, bit 11 is included in the formula for P0 and P3 therefore we mark the column above it with P0 and P3.
    So far so good, but a closer look (diagram below) on the distribution of the “P-functions” reveals why and how they were constructed.


    The “P-functions” were constructed in such a way that we have 1 bit which affects only the calculation of P0, one bit which affects only P0 and P1, one which affects only P0 and P2 and so on… Try to observe the columns above each of the bits in the vector – they span all the possible combinations!

    From here the end is very close.
    The operators on the receiving side have to calculate the P0..P3 functions and assemble them into a 4-bit “word”.
    All the spy has to do, is calculate the actual “P-functions” given by today’s random transmission and get a 4-bit “word”. The spy compares this to the 4-bit “word” she wants to transmit and discovers the difference – or in other words: the bits which need to be flipped in order to arrive from the actual “P-functions” to the desired “P-functions”. She then looks up in the diagram above and flips exactly that bit which corresponds to exactly the “P-functions” that she needs to flip. A single bit flip will also toggle the corresponding “P-function/s”.

    Since the above wording may sound a big vague, here is a table with some examples:


    I have to say this again, This is really one of the most beautiful and elegant puzzles I came across. It is definitely going into “the notebook”…


    The Coolest Binary Adder You Have Ever Seen…

    August 26, 2007

    I have to admit, I never thought I would ever link from this blog to youtube, but given the nature of the following contraption I believe you will agree it was a must…

    This is by far the coolest binary adder you have ever seen – link here.
    It has almost everything inside, a reset “pin”, carry out “pin” etc.
    If you are into wood working you could visit the builder’s site and see exactly how this can be done – visit him here.

    I also saw a mechanical binary adder in the Deutsches Mueseum, but it was based on water! I might try to get a video of that one running in the future, since the museum is 400 meters from my house. If you ever visit Munich and you don’t go there – shame on you!!!


    The Johnson Counter

    August 20, 2007

    Johnson counters, or Möbius counters (sometimes referred to with that name because of the abstract similarity to the famous Möbius strip) are extremely useful.
    The Johnson counter is made of a simple shift register with an inverted feedback – as can be seen below.


    Johnson counters have 2N states (where N is the number of flip-flops) compared to 2^N states for a normal binary counter.
    Since each time only a single bit is changing – Johnson counter states form a sort of a Gray code. The next picture shows the 12 states of a 6 bit Johnson counter as an example.


    Johnson counters are extremely useful in modeling, since by using any of the taps one could generate a clock like pattern with many different phases. You could easily see that by looking at the columns in the picture above, they all have 6 consecutive “1”s followed by 6 consecutive “0”s, but all in a different phase.

    Decoding the state of the counter is extremely easy. A single 2 input gate which detects the border between the “1”s and the “0”s is enough. One needs not compare the entire length of the counter to some value.

    One can also generate an odd length “sort-of” Johnson counter. The easiest way is by using a NOR feedback from the last two stages of the shift register as shown below.


    The last picture shows the 11 states of the modified 6 flip-flop Johnson counter. Looking at the states sequence it is immediately noticeable that the 11…11 stage is skipped. We also lose the Gray property of the counter like that, since on a single case both the last and first bits will change simultaneously. But looking at the columns, which represent the different taps, we see that we kept the same behavior on each column (with respect to the signal shape) but the duty cycle is not 50% anymore – that is obvious because we no longer have an even amount of states.


    This post is becoming a bit too long to present all the cool things that can be done with Johnson counters, but a very important issue is robustness. In a future post we will see that serious designers do not use just a simple inverter for the feedback path, but also include some sort of self correction mechanism. This is necessary, because if a forbidden state creeps in (wrong reset behavior, crosstalk, etc) it will stay forever in the counter – sort of like the same problem we had in one of the previous posts on the ring counter. There are ways to get over this problem, and I will try to analyze them in a future post. Stay tuned…


    Puzzle #8 – Clock Frequency Driver

    August 13, 2007

    Take the clock frequency circuit I posted about here. As I mentioned the XOR gate at the output might cause some duty cycle distortion with some libraries, due to the fact that most XOR gates are not built to be symmetrical with respect to transition delay.
    Now, assume your library has a perfectly symmetrical NAND gate. Could you modify the circuit so the XOR will be replaced by a NAND gate and still have a clock frequency at the output (You are of course allowed to add more logic on other parts of the circuit).

    If not, give a short explanation why not. If yes send a circuit description/diagram.


    Driving A Clock Frequency Signal From A Register

    August 9, 2007

    Usually in semi-custom flows it is a big no-no to use the clock in the data path. Sometimes it is necessary though, to drive a signal with the frequency of the clock to be used in some part of the design or driving it onto a pad. Normally, logical changes occur only on the rising edge of the clock and thus with half the frequency of the clock.

    Here is a cool little circuit that will drive a signal which toggles at the clock frequency but is still driven from a register. It is very robust and upon wake up from a reset state will drive a clock like signal with the opposite phase of the clock but with the same frequency. To use the same phase as the clock itself, replace the XOR with an XNOR at the output.



    If this circuit should be used as a clock for another block, consider the fact that the XOR gate at the output might introduce duty cycle distortion (DCD) due to the fact that many standard cell library XOR gates do not have a symmetrical behavior when transitioning from a logical 0 to a logical 1 and vice versa.

    As an afterthought, it might be interesting to look at the similarities between this circuit and the “Double Edge Flip-Flop” I descried before.


    Everything You Wanted to Know About Specman Verification and Never Dared to Ask

    August 7, 2007

    My friend Avidan Efody, has a site full of tons of advice, tips and tricks concerning verification with Specman. No, it is not “plug your buddy’s blog” section, but if verification is what you do, and you never been there before – shame on you – you should visit it ASAP.

    You can find it here.


    Puzzle #7 – Transitions – Solution

    August 3, 2007

    This one was solved pretty quickly. Basically I was trying to trick you. The idea was to try to create the impression an infinite amount of memory is necessary to hold all the 0–>1 and 1–>0 transitions. In practice there cannot be 2 consecutive 0–>1 transitions (or vice versa) since if the input goes from 0 to 1 before the next 0–>1 transition it must change to a 0 and thus have a 1–>0 transition!
    The FSM can have only three states: “exactly one more 0–>1”, “equal amount of 0–>1 and 1–>0” or “exactly one more 1–>0”.


    Arithmetic Tips & Tricks #1

    August 1, 2007

    Every single one of us had sometime or another to design a block utilizing some arithmetic operations. Usually we use the necessary operator and forget about it, but since we are “hardware men” (should be said with pride and a full chest) we know there is much more going under the hood. I intend to have a series of posts dealing specifically with arithmetic implementation tips and tricks. There are plenty of them, I don’t know all, probably not even half. So if you got some interesting ones please send them to me and I will post them with credits.

    Let’s start. This post will explain 2 of the most obvious and simple ones.

  • Multiplying by a constant
  • Multipliers are extremely area hungry and thus when possible should be eliminated. One of the classic examples is when multiplying by a constant.
    Assume you need to multiply the result of register A by a factor, say 5. Instead of instantiating a multiplier, you could “shift and add”. 5 in binary is 101, just add A to A00 (2 trailing zeros, have the effect of multiplying by 4) and you have the equivalent of multiplying by 5, since what you basically did was 4A+A = 5A.
    This is of course very simplistic, but when you write your code, make sure the constant is not passed on as an argument to a function. It might be that the synthesis tool knows how to handle it, but why take the risk.

  • Adding a bounded value
  • Sometimes (or even often), we need to add two values where one is much smaller than the other and bounded. For example adding a 3 bit value to a 32 bit register. The idea here is not to be neat and pad the 3 bit value by leading zeros and create by force a 32 bit register from it. Why? adding two 32 bit values instantiates full adder logic on all the 32 bits, while adding 3 bits to 32 will infer a full adder logic on the 3 LSBs and an increment logic (which is much faster and cheaper) on the rest of the bits. I am quite positive that today’s synthesis tools know how to handle this, but again, it is good practice to always check the synthesis result and see what came up. If you didn’t get what you wanted it is easy enough to force it by coding it in such a way.