An Adder Tree in Lava

Consider the task of adding eight numbers n1, n2, .. n8 use a binary tree of adders as shown below.

In a language like VHDL one can write a recursive function that takes an unconstrained array of numbers as input and returns their sum computed with an adder tree. However, there is no standard way in VHDL of specifying how the adder tree should be laid out. Adder trees are typically pipelined and it takes just one adder to be badly placed to degrade the performance of the whole adder tree. Since Lava can specify how the adders are to be laid out the design engineer can make decisions and calculations about intermediate wire delays. One example layout scheme for a pipelined adder tree is shown below (the clock wires are now shown):

This layout scheme places the adders in a row with the final addition occurring in the middle. Both the left and right hand sides of the final adder also contain adder trees in which the final sum occurs in the middle. This recursive structure continues until the leaf additions are encountered (e.g. the adder for n1 and n2).

The middle Combinator

Note that this combinator produces a two-sided tile which has two inputs on the left and one output on the right.

Tree Circuits in Lava

Using the middle combinator we can define a tree circuit combinator for any kind of two input and one output circuit:

tree circuit [a] = a 
tree circuit [a, b] = circuit (a, b) 
tree circuit xs 
  = middle (tree circuit) circuit (tree circuit)(halve xs) 

A combinational adder tree with bit-growth can be made with the flexibleAdder circuit. This circuit has the type (Bit, Bit) -> (Bit) so it is suitable for use as an element of an adder tree. To define a combinational adder tree one simply writes:

adderTree = tree flexibleAdder

It is possible to rewrite a call to the tree combinator to see how it works for a particular circuit and input list:

tree flexibleAdder [i1, i2, i3, i4]
  = (middle (tree flexibleAdder) circuit (tree flexibleAdder)) (halveList [i1, i2,i3, i4]) 
  = (middle (tree flexibleAdder) circuit (tree flexibleAdder)) [[i1, i2], [i3, i4]] 
  = flexibleAdder (tree flexibleAdder [i1, i1], tree flexibleAdder [i2, i3]) 
  = flexibleAdder (i1+i2, i3+i4) 
  = i1+i2+i3+i4

We can use the flexibleAdderFD circuit to make a pipelined adder tree:

adderTreeFD clk = tree (flexibleAdderFD clk)

The adders in this tree will use FD register components. An example layout of four 96-input pipelined adder trees stacked upon each other is shown below on a XCV300 part. The bit-growth in the adder tree helps to identify the recursive nature of the layout.

The adder trees shown above sum 96 9-bit numbers each and with this layout the circuit operates at 173MHz on a XCV300 at speed grade 6. If the layout information is removed and the same netlist is placed and routed again the maximum speed is 129MHz i.e. 34% slower. The graph below shows that specifying layout had significant performance advantages. The blue bars represent the speed of circuits with layout information and the red bars represented the speed of the same netlist with the layout information (RLOCs) removed. Each pair of bars corresponds to a target timing specification. The leftmost pair of bars is for the case where the place and route tools were asked to meet a timing specification of 180MHz but delivered 173MHz for the RLOC-ed circuit and 129MHz without the RLOCs.

Providing layout information also speeds out the place and route times as shown in the graph below. For the 180MHz timespec case the netlist with layout took less then 400 seconds to route (nothing needs to be placed) but took more than 1800 seconds to place and route without layout information.

The FPGA Editor view of the four placed adder trees is shown below.

Without the RLOCs the following implementation is produced:

The adder tree layout was easy to express in Lava and has produced significant performance advantages. Many other tree layout schemes can be easily explored in Lava. It is also very easy to make trees of other kinds of circuits like multiplier etc. All one has to do is to supply such circuits to the first argument of the tree combinator.

Next section: A Sorter Example in Lava