Archive for the 'Synthesis' Category

h1

ECO Flow

December 5, 2007

Here is a useful checklist you should use when doing your ECOs.

  1. RTL bug fix
  2. Correct your bug in RTL, run simulations for the specific test cases and some your general golden tests. See if you corrected the problem and more important didn’t destroy any correct behavior.

  3. Implement ECO in Synthesis netlist
  4. Using your spare cells and/or rewiring, implement the bug fix directly in the synthesis verilog netlist. Remember you do not re-synthesize the entire design, you are patching it locally.

  5. Run equivalence check between synthesis and RTL
  6. Using your favorite or available formal verification tool, run an equivalence check to see if the code you corrected really translates to the netlist you patched. Putting it simply - the formal verification tool runs through the entire state space and tries to look for an input vector that will create 2 different states in the RTL code and the synthesis netlist. If the two designs are equivalent you are sure that your RTL simulations would also have the same result (logically speaking) as the synthesis netlist.

  7. Implement ECO in layout netlist
  8. You will now have to patch your layout netlist as well. Notice that this netlist is very different than the synthesis netlist. It usually has extra buffers inserted for edge shaping or hold violation correction or maybe even totally differently logically optimized.
    This is the real thing, a change here has to take into account the actual position of the cells, the actuall names etc. Try to work with the layout expert in close proximity. Make sure you know and understand the floorplan as well - it is very common to connect a logic gate which is on the other side of the chip just because it is logically correct, but in reality it will violate timing requirements.

  9. Run equivalence check between layout and synthesis
  10. This is to make sure the changes you made in the layout netlist are logically equivalent to the synthesis. Some tools and company internal flows enable a direct comparison of the layout netlist to the RTL. In many it is not so and one has to go through the synthesis netlist change as well

  11. Layout to GDS / gate level simulations / STA runs on layout netlist (all that backend stuff…)
  12. Let the layout guys do their magic. As a designer you are usually not involved in this step.
    However, depending on your timing closure requirements, run STA on the layout netlist to see if everything is still ok. This step might be the most crucial since even a very small change might create huge timing violations and you would have to redo your work.
    Gate level simulations are also recommended, depending on your application and internal flow.

h1

Spare Cells

November 26, 2007

What are spare cells and why the heck do we need them?

Spare cells are basically elements embedded in the design which are not driving anything. The idea is that maybe they will enable an easy (metal) fix without the need of a full redesign.

Sometimes not everything works after tape-out, a counter might not be reseted correctly, a control signal needs to be additionally blocked when another signal is high etc. These kind of problems could be solved easily if “only I would have another AND gate here…”
Spare cells aim to give a chance of solving those kind of problems. Generally, the layout guys try to embed in the free spaces of the floor-plan some cells which are not driving anything. There is almost always free space around, and adding more cells doesn’t cost us in power (maybe in leakage in newer technologies), area (this space is anyhow there) or design time (the processes is 99% automatic).
Having spare cells might mean that we are able to fix a design for a few 10K dollars (sometimes less) rather than a few 100K.

So which spare cells should we use? It is always a good idea to have a few free memory elements, so I would recommend on a few flip-flops. Even a number as low as 100 FF in a 50K FF design is usually ok. Remember, you are not trying to build a new block, but rather to have a cheap possibility for a solution by rewiring some gates and FFs.
What gates should we through in? If you remember some basic boolean algebra, you know that NANDs and NORs can create any boolean function! This means that integrating only NANDs or NORs as spare cells would be sufficient. Usually, both NANDs and NORs are thrown in for more flexibility. 3 input, or even better 4 input NANDs and NORs should be used.

A small trick is tying the inputs of all NANDs to a logical “1″ and all inputs of the NORs to a logical “0″. This way if you decide to use only 2 of the 4 inputs the other inputs do not affect the output (check it yourself), this in turn means less layout work when tying and untying the inputs of those spare cells.

The integration of spare cells is usually done after the synthesis step and in the verilog netlist basically looks like an instantiation of library cells. This should not done before, since the synthesis tool will just optimize all those cells away as they drive nothing. The layout guy has to somehow by feeling (or black magic) spread the spare cells around in an even way.

I believe that when an ECO (Engineering Change Order) is needed and a metal-fix is considered - this is where our real work as digital designers start. I consider ECOs, and in turn the use of spare cells to solve or patch a problem, as the epitome our usage of skills, experience, knowledge and creativity!

More on ECOs will be written in the future…

h1

A Short Note on Automatic Clock Gates Insertion

June 13, 2007

As we discussed before, clock gating is one of the most solid logic design techniques, which one can use when aiming for low power design.
It is only natural that most tools on the market support an automatic clock gating insertion option. Here is a quote from a synopsys article describing their power compiler tool

…Module clock gating can be used at the architectural level to disable the clock to parts of the design that are not in use. Synopsys’ Power Compiler™ helps replace the clock gating logic inserted manually, gating the clock to any module using an Integrated Clock Gating (ICG) cell from the library. The tool automatically identifies such combinational logic…

But what does it really mean? What is this combinational logic that the tool “recognizes”?

The answer is relatively simple. Imagine a flip-flop with an enable signal. Implementation wise, this is done with a normal flip-flop and a MUX before with a feedback path to preserve the logical value of the flop when the enable is low. This is equivalent to a flop with the MUX removed and the enable signal controlling the enable of a clock gate cell, which in turn drives the clock for the flip-flop.

The picture below is better than any verbal explanation.

auto_clock_gating.png

h1

Late Arriving Signals

May 23, 2007

As I mentioned before, it is my personal opinion that many digital designers put themselves more and more further away from the physical implementation of digital circuits and concentrate more on the HDL implementations. A relatively simple construction like the one I am about to discuss, is already quite hard to debug directly in HDL. With a visual aid of how the circuit looks like, it is much easier (and faster) to find a solution.

The classic example we will discuss is that of a late arriving signal. Look at the picture below. The critical path through the circuit is along the red arrow. Let’s assume that there is a setup violation on FF6.
late_arriving_signal_1.png Let’s also assume that in this example the logic cloud marked as “A”, which in turn controls the MUX that chooses between FF3 and FF4, is quite heavy. The combination of cloud “A” and cloud “B” plus the MUXes in sequence is just too much. But we have to use the result of “A” before calculating “B”! What can be done?

The most important observation is that we could duplicate the entire logic that follows “A”. We assume for the duplicated blocks that one time the result of “A” was a logic “0″ and in another logic “1″. Later we could choose between the two calculations. Another picture will make it clearer. late_arriving_signal_2.png
Notice how the MUX that selected between FF3 and FF4 has vanished. There is now a MUX that selects between FF3 and FF5 (”A” result was a “0″) and a MUX in the parallel logic that selects between FF4 and FF5 (”A” result was a “1″) .
At the end of the path we introduced a new MUX which selects between the two calculations we made, this time depending on cloud “A”. It is easy to see that although this implementation takes more area due to the duplicated logic, the calculation of the big logic clouds “A” and “B” is done in parallel rather than in series.

This technique is relatively easy to implement and to spot if you have a circuit diagram of your design. Also do not count on the synthesis tool to do this for you. It might be able to do it with relatively small structures but when those logic clouds get bigger, you should implement this trick on your own - you will see improvements in timing (and often in synthesis run time). What you pay for is area and maybe power - nothing comes for free…