Archive for the 'Real World Examples' Category

h1

Real World Examples #2 - Fast Counters

February 11, 2008

This is something which will be obvious to the “old school” people, because it was used a lot in the past.

A few weeks ago a designer who was working on a very, very small block asked for my advice on the implementation of counters. The problem was that he was using a 7-bit counter, defined in RTL as cntr <= cntr +1;
The synthesis tool generated a normal binary counter, but unfortunately it could not fulfill the timing requirements - a few GHz.
(Don’t ask me why this was not done in full-custom to begin with…)

Now, the key to solving this problem was to notice that in this specific design only the terminal count was necessary. This meant that all intermediate counter states were not used anywhere else, but the circuits purpose was to determine if certain amount of clock cycles have occured.

This brings us to the question: “Under these conditions, is there a cheaper/faster counter than the usual binary counter?”
Well I wouldn’t write this post if the answer was negative… so obviously the answer is “Yes” - this is our old friend the LFSR!
LFSRs can be also used as counters, and they are being used in two very common, specific ways:

  1. As a terminal counter - counter needs to measure a certain amount of clock edges. It counts to a specific value and then cycles over or resets
  2. As a FIFO pointer - where the actual value itself is not of great importance but the order of increment needs to be deterministic and/or the relationship to another pointer of the same nature

Back in the age of prehistorical chip design (the 1970s), when designers really had to think hard for every gate, LFSRs were a very common building block and were often used as counters.

A slight disadvantage, is that the counting space of a full length n-bit LFSR is not 2^n but rather (2^n)-1. This sounds a bit petty on my side but believe me it can be annoying. Fear not! There is a very easy way to transform the state space to a full 2^n states. (can you find how???)

So next time you need a very fast counter, or when you need pointers for your FIFO structure - consider your good old friend the LFSR. Normally with just a single XOR gate as glue logic to your registers, you achieve (almost) the same counting capabilities given to you by the common binary counter.

h1

Real World Examples #1 - DBI Bug Solution

January 7, 2008

In the previous post I presented the problem. If you haven’t read it, go back to it now cause it will make this entire explanation simpler.

Given the RTL code that was described, the synthesizer will generate something of this sort:

dbi_bug_sol1.png

A straight forward approach, to solve the problem, would be to try to generate the MSB of the addition logic and do the comparison on the 4-bit result. This logic cloud would (probably) be created if we would make the result vector to be 4-bit wide in the first place. It would look something like this:

dbi_bug_sol2.png

This looks nice on the paper, but press the pause button for a second and think - what is really hiding behind the MSB logic? You could probably re-use some of the addition logic already present, but you would have to do some digging in the layout netlist and make sure you got the right nets. On top of that, you would probably need to introduce some logic involving XORs (because of the nature of the addition). This is quite simple if you get to use any gate you wish, but it becomes complex when you got only NANDs and NORs available. It is possible from a logical point of view, but since you need to employ several spare cells, you might run into timing problems since the spare cells are spread all over and are not necessarily in the vicinity of your logic. Therefore, a solution with the least amount of gates is recommended!

So let’s rethink the problem. We know that the circuit works for 0-7 “1″s but fails only for the case of 8 “1″s. We also know that in that case the circuit behaves as if there were 0 “1″s. Remember we go 4 input NANDs and NORs to our disposal. We could take any 4 bits of the vector, AND them and OR them with the current result. It’s true, we do not identify 8 “1″s but in a case of 8 “1″s the AND result of any 4 bits will be high and together with the OR it will give the correct result. On other cases the output of this AND will be low and pass the correct result via the old circuit! There is a special case where there are exactly 4 bits on and these are the bits that are fed into our added AND gate, but in this case we have to anyway assert the DBI bit.
The above paragraph was relatively complicated so here is a picture to describe it:

dbi_bug_sol3.png

It is important to notice that with this solution, the newly introduced AND gate is driven directly from the flip-flops of the vector. This makes it much easier to locate in the layout netlist, since flip-flop names are not changed at all (or very slightly changed).

Here is the above circuit implemented with 4 input NAND gates only (marked in red). This is also the final solution that was implemented.

dbi_bug_sol4_fix.png

Closing words - this example is aimed to show that when doing ECOs one has to really put effort and try to look for the cheapest and simplest solution. Every gate counts, and a lot of tricks need to be used. This is also the true essence of our work, but let’s not get philosophical…

h1

Real World Examples #1 - The DBI bug

January 3, 2008

OK, back after the long holidays (which were spent mainly in bed due to severe sickness, both of myself and my kids…) with some new ideas.
I thought it would be interesting to pick up some real life examples and blog about them. I mainly concentrated so far on design guide lines, tricky puzzles and general advice. I guess it would benefit many if we dive into the real world a bit. So - I added a new category called (in a very sophisticated way) “real life examples”, which all this stuff will be tagged under.
Let’s start with the first one.

The circuit under scrutiny, was supposed to calculate a DBI (Data Bus Inversion) bit for an 8-bit vector. Basically, on this specific application, if the 8-bit vector had more than 4 “1″s a DBI bit should have gone high, otherwise it should have stayed low.
The RTL designer decided to add all the bits up and if the result was 4 or higher the DBI bit was asserted - this is not a bad approach in itself and usually superior to LUT.

The pseudo code looked something like that:

assign sum_of_ones = data[0] + data[1] + data[2] + data[3] + data[4] + data[5] + data[6] + data[7];
assign dbi_bit = (sum_of_ones > 3);

The problem was that accidentally the designer chose “sum_of_ones” to be 3-bit wide only! This meant that if the vector was all “1″s, the adder logic that generates the value for sums_of_ones would wrap around and give a value of “000″, which in turn would not result in the DBI bit being asserted as it should. During verification and simulation the problem was not detected for some reason (a thing in itself to question), but we were now facing with a problem we needed to fix as cheaply as possible. We decided to try a metal fix.

The $50K (or whatever the specific mask set cost was) question is how do you fix this as fast as possible with as little overhead as possible, assuming you have only 4 input NAND and 4 input NOR gates available?
Answer in the next post