## De Bruijn and Maximum Length LFSR Sequences

February 15, 2008

In the previous post I mentioned that a maximum length LFSR can be modified quite easily to generate the “all zero” state. The resulting sequence is then called a De Bruijn code. It has many uses in our business but also in remote areas like card tricks!!

The normal maximum length LFSR circuit cannot generate the “all zero” state or the trivial state, because it will get stuck in this state forever. The picture below shows as an example a 4-bit maximum length LFSR circuit.

The trick is to try to insert the “all zero” state in between the “00..01” and the “10..00” states. Normally after the “00..01” state the next value to be pushed in from the left would be a “1”, so the state “10..00” could be generated. If we would like to squeeze the “00..00” state next, we need to flip this to “0”. Then we have to make sure that for the next cycle a “1” will be pushed. This is done by detecting when the entire set of registers – less the rightmost one – are all zero, and using this as another input in the XOR feedback path. The result is a 2^n counting space sequence which is called a De Bruijn sequence. The figure of the completed De Bruijn counter is shown below.

Since it is a bit hard to follow the words, you can convince yourself by following the sequence – the table below will be of some help.

If you plot a single bit of the De Bruijn counter over time (doesn’t matter which bit) you will see something similar to the next figure. Notice how over time, observing 4 consecutive bits in the sequence not only gives a unique result (until it wraps around) but also gives all possible 4-bit combinations! A simple LFSR will only fulfill the first criteria.

If you like LFSRs and want to investigate a bit more, there is an excellent (albeit quite “heavy to digest”) book from Solomon Golomb called shift register sequences. The book is from the 1960s!! who said all good things in our industry are new…

## Null Convention Logic

October 11, 2007

It is extremely rare in our industry that totally new approaches for Logic circuit design are taken. I don’t know the exact reasons and I really don’t want to get into the “fight” between tool vendors and engineers.

Null Convention Logic, is a totally different approach to circuit design. It is asynchronous in its heart (I guess half of the readers of this post just dropped now).
It is not new and being currently pushed by its developers in Theseus Research.

They published a book, which I really recommend reading. It is not very practical with the current mainstream tools and flows but it is a very interesting reading that will open your eyes to new approaches in logic design.
You can get a good introduction to the book’s content by reading this paper. It is fairly technical and would need a few good hours to digest and grasp the meaning behind, especially given the fact that it is so much different than what we are used to – forget about AND, OR and NOT gates…

## Pre-scaled Counters

October 8, 2007

It is obvious that as a normal binary counter increases in width its maximum operation frequency drops. The critical path going through the carry chain up to the last half-adder element is purely combinational and increases with size. But what if our target frequency is fixed (as is usually the case) and we need to build a very wide counter? Here come to the rescue a variant of the normal binary counter – the pre-scaled counter.

Pre-scaled counters are based on the observation (or fact) that in the binary counting sequence, the LSB will toggle in the highest frequency (half that of the clock when working with the rising edge only). The next bit in line will toggle in half that frequency, the next with half of the previous and so on.
In general, the n-th bit will toggle with frequency 2^(n+1) lower than the clock (we assume that bit 0 is the LSB here). A short look at the figure below will convince you.

We can use this to our advantage by “partitioning” the counter we wish to build. In essence we make the target clock frequency of operation independent of the counter size! This means that given that our clock frequency enables us to have a single flop toggling plus minimal levels of logic, one could in theory build am extremely wide counter.

If you really insist, the above statement is not 100% correct (for reasons of clock distribution and skew, carry collect logic of a high number of partition stages, etc.) , but for all practical reasons it is true and useful. Just don’t try to build a counter with googolplex bits.

The basic technique for a 2-partition is shown below. We have an LSB counter which operates at clock frequency. Its width is set so it could still operate with the desired clock frequency. Once this counter rolls over an enable signal is generated for the MSB counter to make a single increment. Notice how we also keep the entire MSB counter clock gated since we know it cannot change its state.

The distance between the filtered clock edges (marked as “X”) of the MSB counter is determined by the width of the LSB counter. This should be constrained as a multi-cycle path with period X when doing synthesis.

The technique could be extended to a higher amount of “partitions” but then we must remember that the enable for each higher order counter is derived from all enable signals of all previous stages.

An interesting variant is trying to generate and up/down counter which is width independent. It is not so complicated and if you have an idea on how to implement it, just comment.

## The Johnson Counter

August 20, 2007

Johnson counters, or Möbius counters (sometimes referred to with that name because of the abstract similarity to the famous Möbius strip) are extremely useful.
The Johnson counter is made of a simple shift register with an inverted feedback – as can be seen below.

Johnson counters have 2N states (where N is the number of flip-flops) compared to 2^N states for a normal binary counter.
Since each time only a single bit is changing – Johnson counter states form a sort of a Gray code. The next picture shows the 12 states of a 6 bit Johnson counter as an example.

Johnson counters are extremely useful in modeling, since by using any of the taps one could generate a clock like pattern with many different phases. You could easily see that by looking at the columns in the picture above, they all have 6 consecutive “1”s followed by 6 consecutive “0”s, but all in a different phase.

Decoding the state of the counter is extremely easy. A single 2 input gate which detects the border between the “1”s and the “0”s is enough. One needs not compare the entire length of the counter to some value.

One can also generate an odd length “sort-of” Johnson counter. The easiest way is by using a NOR feedback from the last two stages of the shift register as shown below.

The last picture shows the 11 states of the modified 6 flip-flop Johnson counter. Looking at the states sequence it is immediately noticeable that the 11…11 stage is skipped. We also lose the Gray property of the counter like that, since on a single case both the last and first bits will change simultaneously. But looking at the columns, which represent the different taps, we see that we kept the same behavior on each column (with respect to the signal shape) but the duty cycle is not 50% anymore – that is obvious because we no longer have an even amount of states.

This post is becoming a bit too long to present all the cool things that can be done with Johnson counters, but a very important issue is robustness. In a future post we will see that serious designers do not use just a simple inverter for the feedback path, but also include some sort of self correction mechanism. This is necessary, because if a forbidden state creeps in (wrong reset behavior, crosstalk, etc) it will stay forever in the counter – sort of like the same problem we had in one of the previous posts on the ring counter. There are ways to get over this problem, and I will try to analyze them in a future post. Stay tuned…

## Driving A Clock Frequency Signal From A Register

August 9, 2007

Usually in semi-custom flows it is a big no-no to use the clock in the data path. Sometimes it is necessary though, to drive a signal with the frequency of the clock to be used in some part of the design or driving it onto a pad. Normally, logical changes occur only on the rising edge of the clock and thus with half the frequency of the clock.

Here is a cool little circuit that will drive a signal which toggles at the clock frequency but is still driven from a register. It is very robust and upon wake up from a reset state will drive a clock like signal with the opposite phase of the clock but with the same frequency. To use the same phase as the clock itself, replace the XOR with an XNOR at the output.

If this circuit should be used as a clock for another block, consider the fact that the XOR gate at the output might introduce duty cycle distortion (DCD) due to the fact that many standard cell library XOR gates do not have a symmetrical behavior when transitioning from a logical 0 to a logical 1 and vice versa.

As an afterthought, it might be interesting to look at the similarities between this circuit and the “Double Edge Flip-Flop” I descried before.

## The Double Edge Flip Flop

July 31, 2007

Sometimes it is necessary to use both the rising and the falling edge of the clock to sample the data. This is sometimes needed in many DDR applications (naturally). The double edge flop is sometimes depicted like that:

The most simple design one can imagine (at least me…), would be to use two flip flops. One sensitive to the rising edge of the clock, the other to the falling edge and to MUX the outputs of both, using the clock itself as the select. This approach is shown below:

What’s wrong with the above approach? Well in an ideal world it is OK, but we have to remember that semi-custom tools/users don’t like to have the clock in the data path. This requirement is justified and can cause a lot of headaches later when doing the clock tree synthesis and when analyzing the timing reports. It is a good idea to avoid such constructions unless they are absolutely necessary. This recommendation applies also for the reset net – try not combining the reset net into your logic clouds.

Here is a cool circuit that can help solve this problem:

I will not take the pleasure from you of drawing the timing diagrams yourself 🙂 and realizing how and why this circuit works, let me just say that IMHO this is a darn cool circuit!

Searching the web a bit I came across a paper which describes practically the same idea by Ralf Hildebrandt. He names it a “Pseudo Dual-Edge Flip Flop”, you can find his short (but more detailed) description, including a VHDL code, here.

## Low Power – Clock Gating Is Not The End Of It…

June 12, 2007

A good friend of mine, who works for one of the micro-electronics giants, told me how low power is the buzz word today. They care less about speed/frequency and more about minimizing power consumption.

He exposed me to a technique in logic design I was not familiar with. It is basically described in this paper. Let me just give you the basic idea.

The main observation is that even when not active, logic gates have different leakage current values depending on their inputs. The example given in the article shows that a NAND gate can have its leakage current reduced by almost a factor of 2.5 depending on the inputs!
How is this applied in reality? Assume that a certain part of the design is clock gated, this means all flip-flops are inactive and in turn the logic clouds between them. By “muxing” a different value at the output of the flop, which is logic dependent, we could minimize the leakage through the logic clouds. When waking up, we return to the old stored value.

The article, which is not a recent work by the way, describes a neat and cheap way of implementing a storage element with a “sleep mode” output of either logic “1” or logic “0”. Notice that the “non-sleep mode” or normal operation value is still kept in the storage element. The cool thing is, that this need not really be a true MUX in the output of the flop – after finalizing the design an off-line application analyzes the logic clouds between the storage elements and determines what values are needed to be forced during sleep mode at the output of each flop. Then, the proper flavor of the storage element is instantiated in place (either a “sleep mode” logic “0” or a “sleep mode” logic “1”).

It turns out that the main problem is the analysis of the logic clouds and that the complexity of this problem is rather high. There is also some routing overhead for the “sleep mode” lines and of course a minor area overhead.
I am interested to know how those trade-offs are handled. As usual, emails and comments are welcome.

Bottom line – this is a way cool technique!!!