12 Steps to Godlike Layout
by Mark Neidengard
Dense layout begins with an overall philosophy for placing pFETs and nFETs. Most operators, especially conventional static CMOS ones, require many P's and many N's with gate signals in common. The PUN and PDN should be kept close together for ease of connection. To avoid paying well-to-well spacing, cluster FETs of the same flavor. This suggests alternate "stacks" of P's and N's where all transistors are parallel. Arrange the gates horizontally so that width can be altered without affecting cell height; this is important when designing to a bit-pitch. Arrange these columns in domino fashion (i.e. N-P-P-N-N-P-...) for the tightest spacing.
Each stack of P or N will consist of one or more contiguous "islands" of diffusion with gates and contacts. Wherever possible, all transistors within an island should be the same width. Progressive sizing of transistors in series ("tapering") can have certain performance benefits, but doing so forces the inner-transistor nodes to be larger, to satisfy poly-to-active design rules. This increases the internal node parasitic capacitance and bloats the vertical height of the islands compared to islands of homogenous transistors. My personal experience laying out for a high-performance CPU was that tapering is not essential to high performance design. My view is that almost never is tapering for boosted performance worth pessimizing the density of transistor islands.
The Minimum-Size Fallacy
Many people in the field of VLSI attach immense importance to minimum-size transistors, as though they are the most suitable for performing calculation. This idea is misleading for two reasons:
Minimum-size devices are slow. Almost any real-world application where switching time matters will require greater current drive than what a puny minimum size device uses. Device width is the direct means by which VLSI engineers trade power for speed: in the aforementioned CPU I worked on I used device widths from 4-lambda to 500-lambda. All critical circuits should be rigorously sized for performance, which seldom or never results in minimum size devices: try it for yourself using L-Edit and Tspice.
- Interconnect tyrrany dominates every aspect of modern VLSI. The more transistors, the more wires needed to connect them together. The more wires, the longer they get, and hence the larger percentage of total parasitics. Contacts occupy a huge percentage of a minimum-sized FET's area, meaning almost none of that area is available for routing. Contact size is roughly constant, so wider devices fit more wires over them. If wiring area is taken as constant, it is often smart to make bigger, stronger transistors to fill that space and hence switch everything as fast as possible.
IDEAL LAYOUT SHOULD BE AREA-LIMITED NEITHER BY WIRING NOR BY TRANSISTORS, at least at the local interconnect level (i.e. within a single cell).
In a column of transistors that is "much larger" than minimum width but uses minimum-size contacts, there is considerable longitude of where precisely to put the contacts. All contacts that represent one node can be put in a straight line and connected with a straight piece of metal. If metal1 is 3-lambda wide, inter-metal1 spacing is 3-lambda, and contacts are 4-lambda on a side, this means that you can fit two minimum-width wires with contacts in a ten-lambda wide chunk of column space. IT IS TO YOUR ADVANTAGE TO ROUTE PARALLEL PIECES OF VERTICAL METAL OVER YOUR TRANSISTOR STACKS.
As a corrolary, I use 10-lambda as the minimum width for my nFETs, since it allows running two wires of metal1 overtop of it and still fits contacts.
If we are going to use big transistors, we may find ourselves wanting very big transistors. A useful technique is to treat a device of width W as K devices of width W/K. You can then lay those devices out in a "domino" fashion with shared source and drain nodes: not only does this reduce the width of the transistor column, it also reduces the total area of the source and drain nodes (due to sharing). It can also make it easier to form contiguous islands, since the source or drain can be made adjacent to two other devices instead of just one.
It will often be the case that running metal over your transistors will leave little area for contacts. When this happens, if you don't want to stretch the entire stack to accomodate the contact space, you can simply push the contact some or all the way off the rectangle of the stack and use enough diffusion to keep it connected. This technique can improve density substantially, at the cost of increasing diffusion area and hence parasitic capacitance (and to a certain extent, the series resistance of the signal being connected to).
Since CMOS is ratio-less, we seldom care about the precise width of our devices. In an island of parallel transistors with contacts in between, all diffusion area besides the actual contacts themselves is "wasted". You can bend transistors around the contacts to squeeze some of this extra area out. A side-effect is reduction of overall island height which helps total density. It's almost always a good idea to widen the island a bit if it will permit much better "snaking" of transistors and height reduction. Contact eviction can also help get those extra kinks in.
DON'T USE THIS FOR ANALOG VLSI UNLESS YOU REALLY KNOW WHAT YOU'RE DOING.
In cases where there are multiple transistors in series, the designer has a choice of which to put closer to the output of the operator and which to put closer to the power supply. Compared to a simple series of transistors, a branching series like A&(B|C) will have "extra" diffusion area due to the extra source and drain diffusion. By reordering the chain, you can decide where to lump that diffusion onto.
- Lump "extra" diffusion onto the power supply nodes wherever possible. The power supplies never switch and therefore don't care about how much parasitic capacitance they see: in fact, capacitance on the power supply actually helps fight certain kinds of circuit misbehavior.
Failing that, for dynamic circuits lump "extra" diffusion on the output node to minimize worst-case charge sharing. A little switching time is worth added correctness.
Failing that, lump the diffusion where it will least affect switching time (i.e. next to transistors with early input signals or behind infrequently-on transistors).
Euler Paths are the way to figure out which transistors can be gathered together into an island. When composing your transistor networks prior to layout, do so with an eye to what the Euler paths look like. Any time you can have congruent Euler paths for both PUN and PDN, the transistor islands can be trivially connected with straight lines of metal or poly: no ugly routing required. In fact, it is possible to add connections to certain transistor networks to create or rearrange Euler paths for greater congruence. Experience indicates that a little performance hit is worth increased routing elegance.
Picking congruent Euler paths sometimes conflicts with diffusion offloading. I recommend favoring Euler congruency unless there is a huge performance or correctness difference visible with SPICE.
When positioning two islands end-to-end, try not to let the contacts at the edges be lined up vertically. If you can avoid doing so, you can push the islands closer together by as much as four lambda. Some contact eviction may also be helpful here.
Poly and metal1 are for local interconnect and tend to run in all directions. All higher layers should run either primarily vertical or primarily horizontal. Doing this promotes better routing density, more orderly layout, and is crucial for highly hierarchial designs.
The uglier the routing problem, the greater the urge to jump to a higher metal layer for solution. Modern processes have many metal layers, but also have more long-range signals and power supplies to route. Metal2 can probably be used to assist local routing, but assume that the higher routing layers are already full and only use them for local routing when you have a really good reason. 95% of the routing battle is won or lost on poly and metal1 (and for very complex cells, metal2).