Real World Examples #5 – Clock Divider by 5

August 26, 2009

Here is a neat little circuit that was used in an actual project a long, long time ago (in a galaxy far, far away…).

The requirement was to build a divide by 5 circuit for the clock with 50% duty cycle. The initial (on reset) behavior was not important – i.e. the circuit could wake up in an undefined state, but should have settled after a given time. The engineer produced the circuit below:


Basically, the circuit is made out of a 3-bit counter, that counts from 000 to 100 and then resets. Signal ‘X’ goes high when the value of the counter is either 000, 001 or 010. Signal ‘Y’ goes high when the counter equals its ‘middle’ state 010. ‘Z’ is a sample on the falling edge of ‘Y’ in order to generate the 50% duty cycle.

So far so good. The general thinking was OK, but there was a major problem with the circuit, can you discover what it was? How would you fix it in RTL? and more important, how would you fix it in an ECO (as it was eventually done)?

No extra flops are allowed!


Dual Edge Binary Counters + Puzzle

June 24, 2009

I lately came across the need to use a dual edge counter, by this I mean a counter which is counting both on the rising and on the falling edge of the clock.
The limitation is that one has to use only normal single edge sensitive flops, the kind you find in each library.

There are several ways to do this, some easier than others. I would like to show you a specific design which is based on the dual edge flop I described in a previous post. This design is just used here to illustrate a point, I do not recommend you use it – there are far better ways. Please refer to the end of the post for more on that.

The figure below depicts the counter:
dual edge counter

The counter is made of 2 n-bit arrays of flops. The one operates on the rising edge, the other on the falling edge. The “+1” logic is calculated from the final XOR output, which is the real output of the counter! The value in each of the n-bit arrays does not represent the true counting value, but is used to calculate the final counter value. Do not make the mistake and use the value directly from either set of flops.

This leads to a small puzzle – given the conditions above, can this counter be done with less flops?


Reordering Nets for Low Power

May 10, 2009

As posts accumulate, you can see that low power design aspects is a big topic on this site. I try to bring more subtle design examples for lower power design that you can control and implement (i.e. in RTL and the micro architectural stage).

Identifying “glitchy” nets is not always easy. Some good candidates are wide parity or CRC calculations (deep and wide XOR trees), complicated arithmetic paths and basically most logic that originates in very wide buses and converges to a single output controlling a specific path (e.g. as a select pin of a MUX for a wide data path).

If you happen to identify a good candidate, it is advisable (when possible) that you feed the “glitchy” nets as late as possible in the calculation path. This way the total amount of toggling in the logic is reduced.

Sounds easy enough? Well the crux of the problem is identifying those opportunities – and it is far from easy. I hope this post at least makes you more aware of that possibility.

To sum up, here are two figures that illustrate the issue visually. The figure below depicts the situation before the transformation.. The nets which are highlighted represent high activity nets.

After the transformation – pushing the glitchy net calculation late in the path. The transformation is logically equivalent (otherwise there is no point…) and we see less high activity nets.



Parametrized Reset Values

April 19, 2009

For some odd reason some designers refuse to use parametrized blocks. I have no idea what are the reasons for such an opinion, but here is a good example why one would want to decide for the usage of parameters.

Imagine you need to design a block, which will be used several times throughout the design. The problem is, that each instance might need to have different reset values to some of its internal flops.
One (wrong) possibility is to define an extra input, which will in turn be connected as the reset value – but this is not something you’d like to do (why??)

The better option is to send the reset value as a parameter, which if it wasn’t clear by now, is the way to go.


Puzzle #14 – Multipliers

April 14, 2009

Here is an interview question that was circulating some of the message boards lately.
Can you create a 4×4 multiplier with only 2×2 multipliers at hand?

post your answers as a comment to this post.


Reducing Power Through Retiming

February 23, 2009

Here is an interesting and almost trivial technique for (potential) power reduction, which I never used myself, nor seen used in others’ designs. Well… maybe I am doing the wrong designs… but I thought it is well worth mentioning. So, if any of my readers use this, please do post a short comment on how exactly did you implement it and if it really resulted in some significant savings.

We usually have many high activity nets in the design. They are in many cases toggling during calculation more than once per cycle. Even worse, they often drive long and high capacitive nets. Since, in a usual synchronous design (which 99% of us do), we only need the stable result once per cycle – when the calculation is done – we can just put a register to drive the high capacitive net. The register effectively blocks all toggling on that net (where it hurts) and allows it to change maximum one time per cycle.

The image below tells the whole story. (a) is before the insertion of the flop, (b) right after.


This is all nice, but just remember that in real life it can be quite hard to identify those special nets and those special high toggling logic clouds. Moreover, most of the time we cannot afford the flop for latency reasons. But if you happen to be in the early design phase and you know more or less your floor plan, think about moving some of those flops so they will reduce the toggling on those high capacitive nets.


Transparent Pipelining

February 15, 2009

The nice thing about posting in such a site, is that one learns quite a bit with time. During the long pause I took, I tried to read quite a bit and look for some interesting papers. Yes, I am aware that most of my readers are not really interested in reading technical papers for fun, but the bunch that I collected are IMHO quite important and teach a lot.

The one for today is about a novel clocking scheme for latch based pipelines. I found it really interesting and important. I am sure that sometime I am going to implement this for a future design. The paper could have had more examples, and I bet that a few animations would only do good for that topic – but you can’t really ask for stuff like that in a technical paper, can you?

OK, enough of my words – you can find the paper here.


New Updates Coming Soon

February 4, 2009

I know it has been a while since I added new posts. There has been a lot going on here lately – new addition to the family, new job and some more smaller things, which keep me relatively busy lately.

Don’t give up on me just yet. I promise to keep the interesting posts coming, although maybe not on a weekly basis as I tried doing before.

Hope you guys understand…


Real World Examples #4 – More on “Thinking Hardware”

January 20, 2009

I was reviewing some code not so long ago, and noticed together with the owner of the code, that we had some timing problems.
Part of the code looked something like that (Verilog):

wire [127:0] a;
wire [127:0] b;
wire [127:0] c;
assign c = select_register ? a : b;

For those not familiar with Verilog syntax, the code describes a MUX construct using the ternary operator. The two data inputs for the MUX are “a” and “b” and the select is “select_register”.

So why was this code translated into a relatively slow design? The answer is in the width of the signals. The code actually synthesizes to 128 parallel MUX structures. The “select_register” has actually 128 loads.
When a construct like this is hidden within a large code, our tendency is to just neglect it by saying it is “only” 2:1 MUX deep, but we have to look more carefully than that – and always remember to consider the load.

Solving this problem is relatively easy by replication. Just creating more versions of the “select_register” helped significantly.


A Message for the New Year

January 10, 2009

Holiday season is gone, the new year is just starting and I am into preaching mood.

I get many, many emails from people asking me to help them with their designs, interview questions or just give advice. Sometimes, if I am not fast enough in replying, I even get complains and emails urging me to supply the answer “ASAP”. This is all OK and nice, but I would like you the reader to stop for a second and think on how much YOU are contributing to our community?

Not everyone can or likes to write a technical blog, but there are other options one can utilize – one of my favorites is posting on a forum. Even if you are a beginner in the field, post your questions, this is already a big help for many. I personally post from time to time on EDA board. Just go through that forum and have a quick look, some questions are very interesting while others can be extremely stupid (sorry) – who cares! What matters in my eyes, is that the forum is building a database of questions and answers that can help you and others.

I assume that most of my readers are on the passive side of things (just a hunch). I hope this post will make you open an account on one of the forums and start posting.

p.s. please use the comments section to recommend your favorite design related forums or groups.