
Another Synchronization Pitfall…
May 18, 2007Many are the headaches of a designer doing multi clock domain designs. The basics that everyone should know when doing multi clock domain designs are presented in this paper. I would like to discuss on this post a lesser known problem, which is overlooked by most designers. Just as a small anecdote, this problem was encountered by a design team led by a friend of mine. The team was offered a 2 day vacation reward for anyone tracking and solving the weird failures that they experienced. I guess this already is a good reason to continue reading…
OK, we all know that when sending a control signal (better be a single one! – see the paper referenced above) from one clock domain to another, we must synchronize it at the other end by using a two stage shift register (some libraries even have a “sync cell” especially for this purpose).
Take a look at the hypothetical example below
Apparently all is well, the control signal, which is an output of some combinational logic, is being synchronized at the other end.
So what is wrong?
In some cases the combinational logic might generate a hazard, depending on the inputs. Regardless whether it is a static one (as depicted in the timing diagram) or a dynamic one, it is possible that exactly that point is being sampled at the other end. Take a close look at the timing diagram, the glitch was recognized as a “0″ on clk_b’s side although it was not intended to be.
The solution to this problem is relatively easy and involves adding another sampling stage clocked with the sending clock as depicted below. Notice how this time the control signal at the other end was not recognized as a “0″. This is because the glitch had enough time to settle until the next rising edge of clk_a.
In general, the control signal sent between the two clock domains should present a strict behavior during switching- either 1–>0 or a 0–>1. Static hazards (1–>0–>1 or 0–>1–>0) or Dynamic hazards (1–>0–>1–>0 or 0–>1–>0–>1) are a cause for a problem.
Just a few more lines on synchronization faults. Quite often they might pop up in only some of the designs. You might have 2 identical chips, one will show a problem the other not. This can be due to slight process variations that make some logic faster or slower, and in turn generate a hazard exactly at the wrong moment.