h1

Clock Domain Crossing - An Important Problem

April 21, 2008

Sometimes, when crossing clock domains, synchronizers are just not enough.

Imagine sending data serially over a single line and receiving it on the other side from the output of a common synchronizer as shown bellow.

Assuming one clock cycle is enough to recover from metastability under the given operating conditions, what seems to be the main problem is not the integrity of the signal - i.e. making sure it is not propagating metastability through the rest of the circuit - but rather the correctness of the data.

Let’s observe the waveform below. The red vertical lines represent the sampling point of the incoming signal. We see from the waveform that since sometimes we sample during a transition - in effect violating the setup-hold window - the output of the first sampling flop (marked “x“) goes metastable. This metastability does not propagate further into the circuit, it is effectively blocked by the second flop, but since the result of recovery from metastability is not certain (see previous post) the outcome might be a corrupt data.
In this specific example we see that net x goes metastable after sampling the 3rd bit but recovers correctly. In a later sampling, for the 6th bit we see that the recovered outcome is not correct and as a result the output data is wrong.

Another interesting case is when both the send clock and the receive clock are frequency locked but their phase might drift in time or the clock signals might experience occasional jitter.
In that case, a bit might “stretch” or “shrink” and can be accidentally sampled twice or not sampled at all.
The waveform below demonstrates the problem. Notice how bit 2, was stretched and sampled twice.

To sum up, never use a simple synchronizer structure to transfer information serially between clock domains, even if they are frequency locked. You might be in more trouble than you initially thought.

On the next post we will discuss how to solve this problem with ring buffers (sometimes mistakenly called FIFOs).

h1

Another FSM Design Tool

April 17, 2008

For those who don’t read through the comments. Harry the ASIC guy commented on the last post about an FSM design environment from Paul Zimmer. You can find more details here.

h1

Visual FSM Design Tool

April 9, 2008

I am still not convinced visual FSM design tools make such a big difference but this one looks pretty cool.
I haven’t really went through all the features and details, so if anyone has some more details/recommendations/complaints about it, just email me or simply comment on this post.

h1

The Principle Behind Multi-Vdd Designs

April 2, 2008

Multi-Vdd design is a sort of buzz word lately. There are still many issues involved before it could become a real accepted and supported design methodology, but I wanted to write a few words on the principle behind the multi-Vdd approach.

The basic idea is that by lowering the operating voltage of a logic gate we naturally also cut the power dissipation through the gate.
The price we pay is that gates operated by lower voltage are somewhat slower (exact factor is dependent on many factors).

The basic idea is to identify the non-critical paths and to power the gates in those paths with a lower voltage. Seen below are two paths, there is obviously less logic through the blue path than through the orange one and is therefore a candidate for being supplied with lower Vdd.

multivdd.png

The idea looks elegant but as always the devil is in the details. There are routing overheads for the different power grids, level shifters must be introduced when two different Vdd logic paths converge to create a new logical function, new power source for the new Vdd must be designed and most important of all, there has to be support present by the CAD tools - if that doesn’t exist this technique will be buried.

h1

Puzzle #9 - The Snail - Solution

March 27, 2008

I will keep this post short. First make sure you take a look at the original puzzle - link here.

The shortest sequence is 6 bits long and is “100110″ (or its inverse “011001″). The smallest amount of bits needed to determine a direction is 5, i.e. any 5 consecutive bits seen by the snail could help him determine the direction home.

h1

Puzzle #8 - Clock Frequency Driver - Solution

March 26, 2008

It’s been a while since I posted some puzzles or solutions to puzzles. I noticed that I concentrated lately more on tricky circuits and fancy ideas but neglected the puzzle section. Some readers asked me to post some more puzzles. Before I can do that, I have to first clear the list of all unsolved puzzles.

The clock frequency driver puzzle drew little attention compared to the others and I got only one complete and correct solution for it.
What follows is my own solution which I hope will be easily understood.

The requirement was to have a NAND gate as the last output stage with one input driven by a rising edge triggered memory element and the other by a falling edge triggered memory element.
A look at the NAND gate truth table reveals that somehow the inputs have to toggle between “11″ (to generate a logical “0″) and “10″, “00″ or “01″ (to generate the logical “1″) on each and every clock edge!

I will now describe the solution for a certain case while the value in brackets will represent the analogous opposite case.
This in tern means (and without loss of generality) that with each rising[falling] clock edge the output state of both flops should be “11″. On the falling[rising] edge we should have the states “00″, “01″ or “10″.

The state “00″ can be immediately eliminated because the transition “00″ –> “11″ means we have to have both bits change on the rising[falling] edge together.
we are left with the following possible cases for the transitions (”r” marks a rising edge transition, “f” a falling edge transition):

  1. “10″ r–> “11″ f–> “10″
  2. “10″ r–> “11″ f–> “01″

Looking at the first option reveals that the rightmost bit needs to change on the rising edge from “0″ to “1″ and on the falling edge from “1″ to “0″ - this is not possible or in contradiction to the rules.
The second option looks promising - the rightmost bit changes from “0″ to “1″ on the rising edge, the left most from “1″ to “0″ on the falling edge - so far so good… but, let us continue the pattern:

“10″ r–> “11″ f–> “01″ r–> “11″

Each second state has to be “11″. After continuing the sequence for one more step we see that now the rightmost bit changes from “0″ to “1″ on the rising edge, but the immediate previous transition had it change on the falling edge, therefore we get again a contradiction!

We conclude that having a NAND on the output is impossible.

As mentioned before Mark Wachsler sent his own solution long time ago. Here it is in is own words:

I’m assuming the question is, is it possible to do something like this:

always @ (posedge clock) p <= something;
always @ (negedge clock) n <= something else;
assign out = ~ (p & n);

and have out toggle on every transition of the clock?

If so, the answer is no.

Proof by contradiction:
1. Assume it can be done: out toggles on every transition of the clock.
2. We know p and n never change simultaneously, so for out to toggle,
either p or n must be 1.
3. So it may never be the case that p == 0 and n == 0.
3. Since they can’t both be zero, and they never change
simultaneously, at least one of them must always be 1.
4. But if n is always one, out can’t have a transition on negedge.
And if p is always one, out can’t have a transition on posedge.
5. Therefore there are some clock edges on which out doesn’t toggle.

So it can’t be done.

h1

Cyclic Combinational Circuits

March 7, 2008

As one of my strange hobbies, I sometimes try to search the web for interesting PHD thesis works. I came across this one a while back and thought it would be interesting to share.

We always hear how bad combinational cyclic loops are. Design Compiler even generates a reports to help us detect them. In the normal ASIC flow combinational loops are very dangerous, hard to analyze and characterize for timing. But here comes this Dissertation work by Marc Riedel and highlights a special set of cyclic combinational circuits which offer several important advantages.

I will try to explain the basic principle by going through an example, but make sure to read his PHD thesis, it is well written and easily understood.

As an example we will look at the very simple case depicted below:

cyclic_combinational1.png

Notice that it has 5 inputs: X, Y0, Y1, Y2, Y3 and has 6 outputs f0..f5. Notice also the symmetry or duality between the AND/OR gates which have the X input connected into them. The basic principle being, that if X = “0″ the cycle will be broken at the top AND gate and if X = “1″, the cycle would be broken in the middle OR gate. This in turn will “create” two “different” circuits depending on the value of X. In essence we have physically a combinational loop BUT we guarantee that whatever value X has, this loop will be logically broken!

Both cases are shown below.

cyclic_combinational2.png

cyclic_combinational3.png

If we factor in X into the equations we get the following dependencies for all the outputs on all the inputs.

cyclic_combinational4.png

The above example is one of the simplest of all and was just presented to show the principle. In this specific circuit you could also short Y0 and Y2, Y1 and Y3 and get a 3 input circuit where each of the inputs has the same behavior as X in the example (shown in page 12 in the PDF file of the thesis).

The thesis goes on to show how such circuits can be used with different advantages. The thesis is bears the date May 2004 - I hope that significant advances have been made in this area in the last 4 years. This idea is too beautiful to just let it accumulate dust or being discarded by the CAD industry…

h1

Johnson Counter Recovery Circuits

February 25, 2008

In a previous post I discussed the Johnson counter (diagram below). It was mentioned that if a bit accidentally flips in the wrong place in the counter (due to wrong reset behavior, noise, etc.) it will rotate through the counter indefinitely.

johnson_recovery_1.png

In a robust Johnson counter there is a mechanism for self-correction of these errors. This post discussed in detail how to resolve such single bit errors with minimum hardware overhead.

Let’s assume that for some odd reason within a run of “1″s or “0″s a single bit had flipped. If we now take a snapshot of 3 consecutive bits in the counter as the bits are rotating, we will eventually discover two forbidden states: “010″ and “101″. All the other six possible states for 3 consecutive bits are legal - as seen in the table below:

johnson_recovery_2.png

The basic idea, is to try to identify those rogue states and fix them by flipping the middle, erroneous bit and pushing the result to the next state. Naturally, we have to make sure that we keep the normal behavior of the circuit as well.
We will examine two solutions (a) and (b). One more efficient (hardware wise) than the other.

Let’s start with approach (a). With this approach we try to correct both forbidden state. The table bellow shows a snapshot of 3 consecutive bits in the “state” column. One is marked in red the other in orange. The column “next(a)” contains the value to be shifted into the 3rd bit - e.g. if “011″ is encountered then the middle “1″ will be pushed unchanged to the bit to its right, however if the state “010″ will be detected, the middle “1″ will be flipped to a “0″ and pushed into the right thus correcting a forbidden state.

johnson_recovery_3.png

The second approach (b) corrects only the single forbidden state “010″. Then how come this solves the problem? Approach (b) relies on the fact that state “010″ is the inverse state of “101″. It is enough to correct state “010″ since state “101″ will reach the end of the counter, then will be flipped bit by bit and will eventually appear as “010″ in the next cycle through the counter!

The next diagram shows the different hardware implementation for both solutions. While I can be blamed of being petty, solution (b) is definitely cheaper.

johnson_recovery_4.png

The final, self-correcting 4-bit Johnson counter is shown below.

johnson_recovery_5.png

It is important to note that this circuit recovers from a single bit error. If we had a 7-bit Johnson counter and 2 adjacent bits would flip in a middle of a run (unlikely but still possible), we would not detect it with the above circuit. For correcting 2 adjacent flips a wider “snapshot” of 4-bits is needed, and the circuit will naturally become more complex.

It is considered good design practice to have at least a single bit self-correcting circuit, as the one above, for each Johnson counter being used.

h1

De Bruijn and Maximum Length LFSR Sequences

February 15, 2008

In the previous post I mentioned that a maximum length LFSR can be modified quite easily to generate the “all zero” state. The resulting sequence is then called a De Bruijn code. It has many uses in our business but also in remote areas like card tricks!!

The normal maximum length LFSR circuit cannot generate the “all zero” state or the trivial state, because it will get stuck in this state forever. The picture below shows as an example a 4-bit maximum length LFSR circuit.

4-bit_lfsr.png

The trick is to try to insert the “all zero” state in between the “00..01″ and the “10..00″ states. Normally after the “00..01″ state the next value to be pushed in from the left would be a “1″, so the state “10..00″ could be generated. If we would like to squeeze the “00..00″ state next, we need to flip this to “0″. Then we have to make sure that for the next cycle a “1″ will be pushed. This is done by detecting when the entire set of registers - less the rightmost one - are all zero, and using this as another input in the XOR feedback path. The result is a 2^n counting space sequence which is called a De Bruijn sequence. The figure of the completed De Bruijn counter is shown below.

4-bit_debruijn.png

Since it is a bit hard to follow the words, you can convince yourself by following the sequence - the table below will be of some help.

4-bit_debruijn_lfsr_comparison.png

If you plot a single bit of the De Bruijn counter over time (doesn’t matter which bit) you will see something similar to the next figure. Notice how over time, observing 4 consecutive bits in the sequence not only gives a unique result (until it wraps around) but also gives all possible 4-bit combinations! A simple LFSR will only fulfill the first criteria.

4-bit_debruijn_sequence.png

If you like LFSRs and want to investigate a bit more, there is an excellent (albeit quite “heavy to digest”) book from Solomon Golomb called shift register sequences. The book is from the 1960s!! who said all good things in our industry are new…

h1

Real World Examples #2 - Fast Counters

February 11, 2008

This is something which will be obvious to the “old school” people, because it was used a lot in the past.

A few weeks ago a designer who was working on a very, very small block asked for my advice on the implementation of counters. The problem was that he was using a 7-bit counter, defined in RTL as cntr <= cntr +1;
The synthesis tool generated a normal binary counter, but unfortunately it could not fulfill the timing requirements - a few GHz.
(Don’t ask me why this was not done in full-custom to begin with…)

Now, the key to solving this problem was to notice that in this specific design only the terminal count was necessary. This meant that all intermediate counter states were not used anywhere else, but the circuits purpose was to determine if certain amount of clock cycles have occured.

This brings us to the question: “Under these conditions, is there a cheaper/faster counter than the usual binary counter?”
Well I wouldn’t write this post if the answer was negative… so obviously the answer is “Yes” - this is our old friend the LFSR!
LFSRs can be also used as counters, and they are being used in two very common, specific ways:

  1. As a terminal counter - counter needs to measure a certain amount of clock edges. It counts to a specific value and then cycles over or resets
  2. As a FIFO pointer - where the actual value itself is not of great importance but the order of increment needs to be deterministic and/or the relationship to another pointer of the same nature

Back in the age of prehistorical chip design (the 1970s), when designers really had to think hard for every gate, LFSRs were a very common building block and were often used as counters.

A slight disadvantage, is that the counting space of a full length n-bit LFSR is not 2^n but rather (2^n)-1. This sounds a bit petty on my side but believe me it can be annoying. Fear not! There is a very easy way to transform the state space to a full 2^n states. (can you find how???)

So next time you need a very fast counter, or when you need pointers for your FIFO structure - consider your good old friend the LFSR. Normally with just a single XOR gate as glue logic to your registers, you achieve (almost) the same counting capabilities given to you by the common binary counter.