h1

Interview Question – BCD Digit, Multiplied by 5

December 21, 2008

A while back, someone sent me the interview question I am about to describe, asking for help. I think it serves a very good example of observing patterns and not rushing into conclusions.
I will immediately post the answer after describing the problem. However, I urge you to try and solve it on your own and see what you came up with. On we go with the question…

Design a circuit with minimum logic that receives a single digit, coded BCD (4 wires) and as an output gives you the result multiplied by 5 – also BCD coded (8 wires).

So, I hope you got a solution ready at hand and you didn’t cheat ;-) .

Let’s first make some order and present the input and required outputs in a table (always a good habit).

Looking for some patterns we can see that we actually don’t need any logic at all to solve this problem!!

You will be amazed how many people get stuck with a certain solution and believe it is the minimal one. Especially when the outcome is one or two single gates. When you tell them it can be done with less, they will easily find the solution. IMHO there is nothing really clever or sophisticated about this problem, but it demonstrates beautifully how it is sometimes hard for us to escape our initial ideas and approaches about a problem.

Coming to think of it, this post was more about psychology and problem solving than digital design – please forgive…

h1

A Coding Tip for Multi Clock Domain Designs

December 13, 2008

Multi clock domain designs are always interesting, but almost always hide some synchronization problems, which are not that trivial. There are tools on the market that identify all(??) clock domain crossings within a design. I personally had no experience with them, so I can’t give an opinion (although I heard some unflattering remarks from fellow engineers).

Seems like each company has its own ways of handling this problem. One of the oldest, easiest and IMHO one of the most efficient ways, is to keep strict naming guidelines for your signals, whether combinatorial or sequential !!

The most common way is to add a prefix to each signal which describes its driver clock e.g. clk_800_mux_32to1_out or clk_666_redge_acknowledge.

If you don’t use this simple technique, you won’t believe how useful it is. Many of the related problems of synchronization are actually discovered during the coding process itself. Moreover, it even makes life easier when doing the code review.

If you have more tips on naming convention guidelines for signals in RTL – post them as a comment!

h1

Another Reason to Add Hierarchies to Your Designs

November 30, 2008

We are usually very annoyed when the team leader insists on code restructuring and hierarchical design.
I also know this very well from the other side as well. Trying to convince designers to better restructure their own design which they know so very well already.

Well, here is another small, yet important reason why you might want to do this more often.
Assume your design is more or less mature, you ran some simulation, went through some synthesis runs and see that you don’t meet timing.
You analyze the timing report just to find a huge timing path optimized by the tool and made of tons of NANDs, NORs, XORs and what not. Well you see the starting point and the end point very clearly, but you find yourself asking if this is the path that goes through the MUX or through the adder maybe?

Most logic designs are extremely complicated and the circuit is not just something you can draw immediately on paper. Moreover, looking at a timing report of optimized logic, it is very hard to interpret the exact path taken through the higher level structured – or in other words, what part of the code I am really looking at here??!! Adding an hierarchy will also add its name to the optimized structures in the timing report and you could then easily pin point your problems.

I even saw an engineer that uses this technique as a debugging tool. If he has a very deep logic cloud, he will intentionally build an hierarchy around say a simple 2:1 MUX in the design and look for it in the timing report. This enables him to “feel” how the synthesis tool optimizes the path and where manual optimization needs to be implemented .

Use this on your bigger blocks, it saves a lot of time and effort in the long run.

h1

Challenge #3 – Counting the Number of “1″s

November 13, 2008

Time for a new challenge! The last two had some great responses and solutions. If you read through the comments you’d see there were some disagreements on what is the best approach. Some claimed a hand crafted approach is the best, while others said it was more of a theoretical problem and we should use a synthesis tool to solve it.
Both have pros and cons, although for those specific challenges I personally tend to go with the hand crafted approach – you, of course, don’t have to agree with me.

For this time we got a very practical problem that pops up again and again: counting the number of “1″s in a vector.
Use the metrics given in challenge #1 and find the minimal delay circuit for a combo cloud that counts the number of “1″s in an 8-bit vector. You get 8 bits in and supply 4 output bits which give a binary representation of the amount of “1″s in the 8-bit vector.

Oh and don’t forget to mention how your method scales when counting 16-bit and 32-bit vectors.

Ready, set, go!

h1

Closing the Gap Between ASIC and Custom

November 8, 2008

I don’t know why I did not came across this wonderful, wonderful (maybe I should add another “wonderful”…) book before.

First here is a link to the book’s site and an amazon link – and for those who are interested in a short overview, this short summery from DAC should give a hint what it is all about.

The book is mostly about increasing performance of your circuits. It surveys many techniques, problems and ideas (some are not fully supported by major EDA tools). It doesn’t matter really if you use these techniques or not – you will learn a lot about “closing the gap” (at least I did).

This gets my full recommendation and endorsement (if anybody cares about my opinion … :) )

h1

Fun With Enable Flip-Flops

October 27, 2008

Almost each library has enable flip-flops included. Unfortunately, they are not always used to their full potential. We will explore some of their potential in this post.

An enable flop is nothing but a regular flop which only registers new data if the enable signal is high, otherwise it keeps the old value. We normally implement this using a MUX and a feedback from the flop’s output as depicted below.

So what is the big deal about it? The nice thing is that the enable flop is already implemented by the guys who built the library in a very optimized way. Usually implementing this with a MUX before the flop will eat away from the cycle time you could otherwise use for your logic. However, a short glance at your library will prove that this MUX comes almost for free when you use an enable flop (for my current library the cost is 20ps).

So how can we use this to our advantage?

Example #1 – Soft reset coding

In many applications soft reset is a necessity. It is a signal usually driven by a register that will (soft) reset all flip flops given that a clock is running. Many times an enable “if” is also used in conjunction.
This is usually coded in this way (I use Verilog pseudo syntax and ask the forgiveness of you VHDL people):
always @(posedge clk or negedge hard_rst)
if (!hard_rst)
ff <= 1'b0;
else if (!soft_rst)
ff <= 1'b0;
else if (en)
ff <= D;

The above code usually results in the construction given in the picture below. The red arrow represents the critical timing path through a MUX and the AND gate that was generated for the soft reset.

Now, if we could only exchange the order of the last two “if” commands this would put the MUX in front of the AND gate and then we could use an enable flop… well, if we do that, it will not be logically equivalent anymore. Thinking about it a bit harder, we could use a trick – let’s exchange the MUX and the AND gate but during soft reset we could force the select pin of the MUX to be “1″, and thus transferring a “0″ to the flop! Here’s the result in a picture form.

We can now use an enable flop and we basically got the MUX delay almost for free. This may look a bit petty to you, but this trick can save you a few extra precious tens or hundreds of pico-seconds.

Example #2 – Toggle Flip Flops

Toggle flops are really neat, and there are many cool ways to use them. The normal implementation requires an XOR gate combining the T input and a feedback of the flop itself.

Let’s have a closer look at the logical implementation of an XOR gate and how it is related to a MUX implementation: (a) is a MUX gate equivalent implementation (b) is an XOR gate equivalent implementation and (c) is an XOR implemented from a MUX.

Now, let’s try making a T flop using an enable flop. We saw already how to change the MUX into an XOR gate – all that is left, is to put everything together.

h1

Challenge #2 – One Hot Detection

October 20, 2008

The last challenge was a big success with many people sending their solutions via email or just posting them as a comments.
Many of you said they were waiting for the next challenge. So, before returning to the usual set of posts about different aspects of digital design, let’s look at another one.

Imagine you have a vector of 8 bits, The vector is supposed to be one hot coded (only a single logic “1″ is allowed in the set). Your task if you choose to accept it :-) , is to design a combo block to detect if the vector is indeed one-hot encoded.

We are again looking for the block with the shortest delay. As for the solution metrics for this challenge please refer to the previous challenge.

Also try to think how your design scales when the input vector is 16 bits wide, 32 bits wide and the general case of n bits wide.

Good luck!

h1

Challenge #1 – DBI Detect

October 8, 2008

It has been a while since we had a challenge question on the site (last one was the divide by 3 question), and I would like to have more of those in the future. I will basically pose a problem and ask you to solve it under certain conditions – e.g. least hardware or latency, lowest power etc.

This time the challenge is related to a real problem I encountered recently. I reached a certain solution, which I do not claim to be optimal, actually I have the feeling it can be done better – I am therefore very interested in your own way of solving the problem.

Your challenge is to design a combo-block with 8 inputs and 1 output. You receive an 8-bit vector, If the vector contains 4 ‘1’s or more, the output should be high, otherwise low (This kind of calculation is commonly used for data bus inversion detection).

What is the best way to design it with respect to minimizing latency (in term of delay units), meaning the lowest logic depth possible.

Just so we could compare solutions, let’s agree on some metrics. I am aware that your own library might have different delay ratios between the different elements, but we gotta have something to work with.

  • Inverter – 1 delay unit
  • NOR, NAND – 2 delay units
  • AND, OR – 3 delay units
  • 3 or 4 input NOR, NAND – 4 delay units (2 for first stage + 2 for second stage)
  • 3 or 4 input OR, AND – 6 delay units (2 for first stage + 2 for second stage)
  • XOR, MUX – 7 delay units (2 AND/OR + 1 Inverter)
  • Please either post a comment with a detailed solution, or send me an email.

    Take it from here guys…

    h1

    Who Said Clock Skew is Only Bad?

    October 2, 2008

    We always have this fear of adding clock skew. Well, seems like this is one of the holy cows of digital design, but sometimes clock skew can be advantageous.

    Take a look at the example below. The capturing flop would normally violate setup requirements due to the deep logic cloud. By intentionally adding delay we could help make the clock arrive later and thus meet the setup condition. Nothing comes for free though, if we have another register just after the capturing one, the timing budget there will be cut.

    This technique can also be implemented on the block level as well. Assume we have two blocks A and B. B’s signals, which are headed towards A, are generated by a deep logic cloud. On the other hand A’s signals, which arrive at B, are generated by a rather small logic cloud. Skewing the clock in the direction of A now, will give more timing budget for the B to A signals but will eat away the budget from A to B’s signals.

    Inserting skew is very much disliked by physical implementation guys although a lot of the modern tools know how to handle it very nicely and even account for the clock re-convergence pessimism (more on this in another post). I have the feeling this dislike is more of a relic of the past, but as we push designs to be more complex, faster, less power hungry etc. we have to consider such techniques.

    h1

    K-Maps Walks and Gray Codes

    September 26, 2008

    It is this time of year maybe, but I just feel I have to write another post on Gray codes.
    We all remember our good friend the K-map (give yourself a point if you knew how to spell the full name, I’m getting it all wrong each time).

    By nature of its construction – a “walk” through the map will generate a Gray code, since each cell is different from its adjacent one by a single bit only. Moreover, If we return to the point of origin, we just created a cyclic Gray code.

    Draw yourself some 4×4 K-Maps and start playing around with the idea. Remember the K-map is like a toroid, moving off the map to the left pops us back in on the right side and in an analogous way for up/down, right/left and down/up.

    Here for instance is the good old “reflected Gray code” which is usually used in many applications which require “a Gray code”. Notice the different toggling cycles of the columns in the outcome sequence – 2-2-2-2-2-2-2-2-2…,4-4-4-4…,8-8… and 8-8….

    What if we take a slightly different tour through the map? Notice how now the 3 LSB columns have been rotated.

    Let’s try another way to walk the K-map, but maybe this time a little less symmetric (only one axis of symmetry). Look how now the toggling cycles of the columns became rather strange – no more something like 4-4-4-4… but rather 4-2-4-6… and other weird cycles.

    What if we need (for whatever strange reason) a non cyclic one? there is nothing easier than that. The start and the end point are not adjacent! which makes the sequence not a cyclic one.

    As you see there are many, many different Gray codes around. Sometimes it is just nice playing around with some combinations. For practical implementations, the only time I personally needed the non standard Gray code was when using a non power of 2 Gray code counter – a topic which was already discussed here.