## Real World Examples #2 – Fast Counters

February 11, 2008

This is something which will be obvious to the “old school” people, because it was used a lot in the past.

A few weeks ago a designer who was working on a very, very small block asked for my advice on the implementation of counters. The problem was that he was using a 7-bit counter, defined in RTL as cntr <= cntr +1;
The synthesis tool generated a normal binary counter, but unfortunately it could not fulfill the timing requirements – a few GHz.
(Don’t ask me why this was not done in full-custom to begin with…)

Now, the key to solving this problem was to notice that in this specific design only the terminal count was necessary. This meant that all intermediate counter states were not used anywhere else, but the circuits purpose was to determine if certain amount of clock cycles have occured.

This brings us to the question: “Under these conditions, is there a cheaper/faster counter than the usual binary counter?”
Well I wouldn’t write this post if the answer was negative… so obviously the answer is “Yes” – this is our old friend the LFSR!
LFSRs can be also used as counters, and they are being used in two very common, specific ways:

1. As a terminal counter – counter needs to measure a certain amount of clock edges. It counts to a specific value and then cycles over or resets
2. As a FIFO pointer – where the actual value itself is not of great importance but the order of increment needs to be deterministic and/or the relationship to another pointer of the same nature

Back in the age of prehistorical chip design (the 1970s), when designers really had to think hard for every gate, LFSRs were a very common building block and were often used as counters.

A slight disadvantage, is that the counting space of a full length n-bit LFSR is not 2^n but rather (2^n)-1. This sounds a bit petty on my side but believe me it can be annoying. Fear not! There is a very easy way to transform the state space to a full 2^n states. (can you find how???)

So next time you need a very fast counter, or when you need pointers for your FIFO structure – consider your good old friend the LFSR. Normally with just a single XOR gate as glue logic to your registers, you achieve (almost) the same counting capabilities given to you by the common binary counter.

### 6 comments

1. I’d love to see this post expanded with an example. I’m not quite clear on how to implement such an LFSR, even one that I can demonstrate gives me the 2^n – 1 counting space.

2. Thanks for the input. The next post will be about this problem. I will bring it up in 2 days or so…

3. Classic, but there’s a better way!

Knuth suggest in the mmix-doc (in the MMIX distribution) to use “carry-save addition”. Paraphrasing: “One nice embodiment o fthis idea is to represent a binary number x in a redundant form as the difference x’ – x” of two binary numbers. Any two such numbers can be added without carry propagation as follows: Let

f(x,y,z) = (x & ~y) | (x & z) | (~y & z)
g(x,y,z) = x ^ y ^ z

Then it is easy to check that x – y + z = 2*f(x,y,z)-g(x,y,z)
we need only verify this in the eight cases when x, y, and z are 0 or 1. Thus we can subtract 1 from a counter x’ – x” by setting

(x’,x”) <- (f(x’,x”,-1), g(x’,x”,-1))

The result is zero if and only if x’ = x”. We need not actually compute the difference x’-x” until we need to examine the register.”

I’ve used this trick and it works nicely.

Enjoy
Tommy

4. Tommy,

seems like I don’t really understand what you mean.

The counters described have usually a logic depth of a single XOR gate. Just checking if a vector equals another vector means an XOR gate plus a tree of ORs.

I would appreciate if you can send me a reference or a link by email.

Also what about storage –> area?

5. Indeed, but how much area do you need to convert back?

If you have a processor in the loop and plenty of time, the converting the LFSR count back to the nominal numbers is a non-issue and you’re better off using that, otherwise the carry-save representation is cheaper to convert.

Of course, I may have misunderstood your question.

The only reference I have is Knuth’s document and the reference listed within.

6. how about using a ripple counter , and positive edge detection at the last rippling flop output.

since we are not using all the flop outputs
of the ripple counter.

this may not be applicable if the terminal output should be synchronous with the clock of the ripple counter