Archive for the 'Architecture' Category

h1

Latch Based Design Robustness

May 19, 2008

Latch based design is usually not given enough attention by digital designers. There are many reasons for that, some are very well based, other reasons are just because latch based design is just unknown and looked at as a strange beast.

I intend to have a serious of posts concerning latch based design issues. I have to admit that most of my design experience was gained doing flip flop based designs, but the latch based designs I did were always interesting and challenging. If any of you readers have some interesting latch based design examples please send them over to my email and I will include them in later posts.

Just to start the ball rolling, here is a very interesting paper on the robustness of latch based design with comparison to flip flop based design. If you don’t have the time to go through the entire paper just look at figure 1 and its description.

I also added this paper to the list of recommended reading list.

h1

Ring Buffers

May 14, 2008

On the last technical post I discussed the problem of transferring information serially between two different clock domains with similar frequency but with drifting phase.
This post will try to explain how this issue is being solved.

When approaching this problem, we have to remember that the phase might drift over time and we have to quantify this drift before the design starts. Modeling the channel beforehand is very helpful here.
Once we know the needed margin, we can approach the design of the ring buffer.

The ring buffer is a FIFO with both ends “tied together” as depicted below. Pointers designate the read and write position and are moved with each respected clock signal in the direction of the arrow (in the figure below - clockwise). Remember, the read and write pointers move at different times but the overall rate of change of both is similar. This means that in some moment one can move ahead of the other, and in another it can lag behind, but over time the the amount of clock edges is the same.

The tolerance of the ring buffer is represented below with the dashed arrows. The read and write clocks can drift in time up to a point just before they meet and cross each other.

The series of images below depicts how the read and write pointers move with time and how the buffer is filled with new information (green) and how it is read (red). Notice how the first two reads will generate garbage because it reads out information that was not written into the buffer.

One of the most complicated issues is the start-up of the ring buffer, because both clock domains are unrelated. A certain “start” signal has to be generated and “tell” both pointers to start to advance. If this is not done carefully enough, one pointer will start to advance ahead of time and thus “bite away” some of the margin we designed for. This problem is even more complicated, when a lot of channels with different ring buffers are operated in parallel.

In one of the next posts we will explore a simple technique that enables us to determine if the ring buffer failed and the information read is actually one which is not updated.

h1

Clock Domain Crossing - An Important Problem

April 21, 2008

Sometimes, when crossing clock domains, synchronizers are just not enough.

Imagine sending data serially over a single line and receiving it on the other side from the output of a common synchronizer as shown bellow.

Assuming one clock cycle is enough to recover from metastability under the given operating conditions, what seems to be the main problem is not the integrity of the signal - i.e. making sure it is not propagating metastability through the rest of the circuit - but rather the correctness of the data.

Let’s observe the waveform below. The red vertical lines represent the sampling point of the incoming signal. We see from the waveform that since sometimes we sample during a transition - in effect violating the setup-hold window - the output of the first sampling flop (marked “x“) goes metastable. This metastability does not propagate further into the circuit, it is effectively blocked by the second flop, but since the result of recovery from metastability is not certain (see previous post) the outcome might be a corrupt data.
In this specific example we see that net x goes metastable after sampling the 3rd bit but recovers correctly. In a later sampling, for the 6th bit we see that the recovered outcome is not correct and as a result the output data is wrong.

Another interesting case is when both the send clock and the receive clock are frequency locked but their phase might drift in time or the clock signals might experience occasional jitter.
In that case, a bit might “stretch” or “shrink” and can be accidentally sampled twice or not sampled at all.
The waveform below demonstrates the problem. Notice how bit 2, was stretched and sampled twice.

To sum up, never use a simple synchronizer structure to transfer information serially between clock domains, even if they are frequency locked. You might be in more trouble than you initially thought.

On the next post we will discuss how to solve this problem with ring buffers (sometimes mistakenly called FIFOs).

h1

Resource Sharing vs. Performance

June 27, 2007

I wanted to spend a few words on the issue of resource sharing vs. performance. I believe it is trivial for most engineers but a few extra words won’t do any harm I guess.
The issue is relevant most evidently when there is a need to perform a “heavy” or “expensive” calculation on several inputs in a repeated way.

The approaches usually in consideration are: building a balanced tree structure, sequencing the operations, or a combination of the two.

A tree structure architecture is depicted below. The logic cloud represents the “heavy” calculation. One can see immediately that the operation on a,b and c,d is done in parallel and thus saves latency on the expense of instantiating the logic cloud twice.

tree_structure.png

The other common solution, depicted below, is to use the logic cloud only once but introducing a state machine which controls a MUX, that determines which values will be calculated on the next cycle. The overhead of designing this FSM is minimal (and even less). The main saving is in using the logic cloud only once. Notice that we pay here in throughput and latency! With some more thinking, one could also save a calculation cycle by introducing another MUX in the feedback path, and using one of the inputs just for the first calculation, thereafter using always the feedback path.

resource_sharing.png

h1

Synchronization, Uncertainty and Latency

May 28, 2007

I noticed that most of the hits coming from search engines to this blog contain the word “synchronization” or “interview questions”. I guess people think this is a tricky subject. Therefore another post on synchronization wouldn’t hurt…

  • Synchronization
    Why do we need to synchronize signals at all? Signals arriving unrelated to the sampling clock might violate setup or hold conditions thus driving the output of the capturing flip-flop into a meta-stable state, or simply put, undefined. This means we could not guarantee the validity of the data at the output of the flip-flop. We do know, that since a flip-flop is a bi-stable device - after some (short) time the output will resolve either a logic “0″ or a logic “1″. The basic idea is to block the undefined (or meta-stable) value during this settling time from propagating into the rest of the circuit and creating havoc in our state machines. The simplest implementation is to use a shift register construction as pictured

    synchro1.png

  • Uncertainty
    We must remember, that regardless of the input transition, a meta-stable signal can resolve to either a logic “0″ or a logic “1″ after the settling time. The picture below is almost identical to the first, but here capture FF1 settled into a logic ‘0″ state. On the next clk B rising edge it will capture a static “1″ value and thus change. Compare the timing of capture FF1 and capture FF2 in both diagrams. We see there is an inherent uncertainty on when capture FF2 assumes the input data. This uncertainty is one clk B period for the given synchronizer circuit.

    synchro1a.png

  • Latency
    Sometimes, the uncertainty described can hurt the performance of a system. A trick which I don’t see used so often, is to use the falling edge triggered flop as one of the capture flops. This reduces the uncertainty from 1-2 capturing clock cycles to 1-1.5 capturing clock cycles. Sometimes though, there is no meaning to this uncertainty, it becomes more meaningful when there is only a phase difference between the 2 clock domains
  • h1

    Another Synchronization Pitfall…

    May 18, 2007

    Many are the headaches of a designer doing multi clock domain designs. The basics that everyone should know when doing multi clock domain designs are presented in this paper. I would like to discuss on this post a lesser known problem, which is overlooked by most designers. Just as a small anecdote, this problem was encountered by a design team led by a friend of mine. The team was offered a 2 day vacation reward for anyone tracking and solving the weird failures that they experienced. I guess this already is a good reason to continue reading…

    OK, we all know that when sending a control signal (better be a single one! - see the paper referenced above) from one clock domain to another, we must synchronize it at the other end by using a two stage shift register (some libraries even have a “sync cell” especially for this purpose).

    Take a look at the hypothetical example below

    sync_pitfall1.png sync_pitfall2.png

    Apparently all is well, the control signal, which is an output of some combinational logic, is being synchronized at the other end.
    So what is wrong?
    In some cases the combinational logic might generate a hazard, depending on the inputs. Regardless whether it is a static one (as depicted in the timing diagram) or a dynamic one, it is possible that exactly that point is being sampled at the other end. Take a close look at the timing diagram, the glitch was recognized as a “0″ on clk_b’s side although it was not intended to be.

    The solution to this problem is relatively easy and involves adding another sampling stage clocked with the sending clock as depicted below. Notice how this time the control signal at the other end was not recognized as a “0″. This is because the glitch had enough time to settle until the next rising edge of clk_a.

    sync_pitfall3.pngsync_pitfall4.png

    In general, the control signal sent between the two clock domains should present a strict behavior during switching- either 1–>0 or a 0–>1. Static hazards (1–>0–>1 or 0–>1–>0) or Dynamic hazards (1–>0–>1–>0 or 0–>1–>0–>1) are a cause for a problem.

    Just a few more lines on synchronization faults. Quite often they might pop up in only some of the designs. You might have 2 identical chips, one will show a problem the other not. This can be due to slight process variations that make some logic faster or slower, and in turn generate a hazard exactly at the wrong moment.