An analog CMOS chip set for neural networks with arbitrary topologies

Lansner, John; Lehmann, Torsten

Published in:
IEEE Transactions on Neural Networks

Link to article, DOI:
10.1109/72.217186

Publication date:
1993

Document Version
Publisher's PDF, also known as Version of record

Link back to DTU Orbit

Citation (APA):

General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

- Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
- You may not further distribute the material or use it for any profit-making activity or commercial gain
- You may freely distribute the URL identifying the publication in the public portal

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
An Analog CMOS Chip Set for Neural Networks with Arbitrary Topologies

John A. Lansner and Torsten Lehmann

Abstract—An analog CMOS chip set for implementations of artificial neural networks (ANNs) has been fabricated and tested. The chip set consists of two cascadable chips: a neuron chip and a synapse chip. Neurons on the neuron chips can be interconnected at random via synapses on the synapse chips thus implementing an ANN with arbitrary topology. The neuron test chip contains an array of 4 neurons with well defined hyperbolic tangent activation functions which is implemented by using "parasitic" lateral bipolar transistors. The synapse test chip is a cascadable 4 × 4 matrix-vector multiplier with variable, 10 bit resolution matrix elements. The propagation delay of the test chips was measured to 2.6 μs per layer.

I. INTRODUCTION

Several approaches on artificial neural network (ANN) implementations in analog VLSI technology have been reported in the literature. Among other things flexible topology [3], [12], [11], differential capacitive weights storage [4], [10], [13], inner product multipliers [1], [2], [10] and hyperbolic tangent activation functions [9], [10] have been considered. In this paper, we have combined and perturbated the existing solutions with our own work to obtain an efficient general purpose ANN in analog VLSI. ANN's are often modeled as

\[ y = g(w^T x) \]

where \( y \) is the neuron activation vector, \( x \) is the input vector, \( w \) is the connection strength (synapse) matrix and \( g \) is a nonlinear function (a squashing function) [8], [7]. Thus a hardware ANN could consist of a matrix-vector multiplier (a synapse chip) followed by a squashing function vector (a neuron chip); it turns out that this splitting of the synapses and the neurons on separate chips provides easy expandability for fully parallel systems [3], [7], [12]. In this paper, we present such an analog CMOS chip set.

II. THE HARDWARE

The signal representation was chosen to ensure the desired cascadability: the neuron chip has current inputs and voltage outputs and the synapse chip has voltage inputs and current outputs. Using this current-voltage scheme, the outputs from several synapses can be connected to one neuron input, and the output from one neuron can be distributed to several synapse chips. Thus in principal, any ANN configuration can be made with these chips.

A. The Neuron Chip

We have chosen the hyperbolic tangent, \( \tanh \), as the activation function for two reasons: 1) Due to the exponential nature of bipolar transistors the \( \tanh \) is simple to implement and hence well-defined; 2) it has a convenient gradient function which will make a future implementation of a learning algorithm for the ANN easy (simulations on required accuracy can be found in [7]).

The neuron chip contains an array of neurons. Each neuron has three stages as shown in Fig. 1(a)–(c). Because of the variable number of connected synapses per neuron, the neuron has to have an adjustable gain. The adjusted signal is transferred by a sigmoid function, the hyperbolic tangent.

The input current \( i_{a,j} \) (cf. (6)) is converted to a voltage \( v' \) by an opamp with feedback. The feedback is a controlled differential resistance, \( R_{gain} \), being the gain-term factor. The "Double-MOSFET" method [1], [2], [14] with four NMOS transistors in the non-saturation region is used. We have the converted voltage

\[ v' = R_{gain} i_{a,j}, \quad R_{gain} = \frac{1}{K N L_4 V_{gain}}, \quad V_{gain} = V_{gain1} - V_{gain2}. \]

(2)

\( K_N, W_g \), and \( L_g \) denote the transconductance parameter, the channel width, and the channel length of the four \( M_g \) transistors, respectively. \( V_{gain} \) controls the gain-term factor. To keep the transistors operating in the non-saturation region we have \( V_{gain1}, V_{gain2} \in \{1V, 5V\} \Rightarrow V_{gain} \in \{0V, 4V\} \). The voltage \( v' \) is transferred by a hyperbolic tangent function to the voltage \( v_{out} \). The \( \tanh \) function is basically obtained from a differential pair of transistors. Using MOSFET transistors in the subthreshold mode is one possibility [9] but because of the signal levels we have instead chosen to use the "parasitic" lateral bipolar transistors inherent in a CMOS process, LPNP [5], operated in the active region. The difference current is given as a function of the voltage \( v' \),

\[ i_{C1} - i_{C2} = I_{bias} \alpha \tanh(v'/2V_t) \]

(3)

where \( V_t \) is the thermal voltage and \( \alpha = -i_C/i_E \), where \( i_E \) and \( i_C \) are the emitter- and lateral collector current, respectively, for a single LPNP. Because of the (vertical) substrate collector current we have \( \alpha \approx 1/2 \). The difference current is converted to a voltage by an opamp with feedback:

\[ v_{out} = V_{ref} + R_{tanh} f_{bias} \alpha \tanh(v'/2V_t), \]

\[ R_{tanh} = \frac{1}{K_N L_4 V_{tanh}}, \quad V_{tanh} = V_{tanh1} - V_{tanh2}. \]

(4)
$W_1$ and $L_1$ are the channel width and length of the $M_i$'s. $V_{tanh}$ and $I_{bias}$ control the magnitude of the output range. To keep the transistors working in the non-saturation region we have $V_{tanh} \in \{0V, 4V\}$. $V_{ref}$ controls the center of the output range.

The transfer function for a neuron is given by (2) and (4),

$$v_{out} = V_{ref} + R_{tanh}I_{bias} \tanh(R_{gain}i_{s,j}/(2V_{t}))$$

where $R_{gain}$ and $R_{tanh}$ are controlled by $V_{gain}$ and $V_{tanh}$ as stated in (2) and (4).

**B. The Synapse Chip**

The synapse chip is a parallel, cascadable, analog, CMOS matrix-vector multiplier (MVM) which is to be used both in the implementations of the ANN’s and in the implementations of learning algorithms in the future. The synaptic weights are stored as differential voltages on capacitors—refreshed by a static RAM via a D/A converter [4], [13].

The $(m \times n)$ MVM consists of $m$ inner product vector multipliers (IPM’s) as shown in Fig. 2 [1], [2], [10]. The MOS transistors are working in the nonsaturation region. It can be shown [1] that the IPM output current ideally is given by

$$i_{s,j} = g_j \cdot (\delta V_{O,A,j} - V_{ref}) = \frac{g_j}{(W/L)_i} \left( v_{C1} - v_{C2} \right)$$

where $g_j$ is the transconductance of the output stage. The $(\delta v_{w,j1} - \delta v_{w,j2})$'s and $(\delta v_{y,i1} - \delta v_{y,i2})$'s are the voltage represented coordinates of the to input vectors, $v_{C} \equiv (v_{C1} - v_{C2})$ is the control voltage for the “Double-MOSFET” feedback and $V_{ref}$ is a reference voltage. The $(W/L)_i$'s are the width/length ratios of the $M_i$ transistors. Setting $v_{w,i1} \equiv v_{w,i2} - v_{y,i1} \ll i_i$ for all the IPM’s and $v_{w,j1} \equiv v_{w,j2} - v_{y,j1} \ll W_2$ for the $j$th IPM gives the matrix-vector multiplier (cf. (1)).

To save pins, single-ended signals was selected on the chip (costing 1 bit of resolution); that is $v_{w,j1} = v_{C2} = 2V$ and $v_{y,j2} = V_{ref} = -2V$. To ensure good resolution and high noise rejection (at the cost of linearity), large input voltage levels were selected on the synapse chip: $|v_{w,j1}|_{\max} = |v_{y,j1}|_{\max} = 1V$. The transconductor was implemented with $g_j = 100\, \mu S$.

As the high impedance $v_{w,jk}$ inputs of the IPM’s are used as inputs for the matrix elements, these elements can be stored on the chip as charges on capacitors [4]. A differential sampling scheme [4] is used to write the matrix elements on the capacitors to reduce the effect of charge injection [6] and leakage currents. This way only four transistors and two capacitors are essentially needed for each matrix element, thus making the potential dimensions $(m \times n)_{\max}$ of the matrix large. The matrix unit element (a synapse) is shown in Fig. 3. In addition to the $m$ IPM’s, there is a row- and column-decoder on the synapse chip, which are used to address the synapses.

**III. EXPERIMENTAL RESULTS**

A $k = 4$ input/output neuron chip and a $n = 4$ input, $m = 4$ output synapse chip has been fabricated to illustrate the principle of operation. A neuron chip with 100 neurons and a synapse chip with $\leq 10^6$ synapses should be feasible. The area overhead on the synapse chip caused by opamps, feedbacks, transconductors and address decoders is $224973\, \mu m^2$ (or presently $\approx 6 \times \text{synapsearea}$) per row.
**TABLE I**

<table>
<thead>
<tr>
<th>Property</th>
<th>Value</th>
<th>Bits</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Neuron size</td>
<td>$A_{neuron} = 379309 \mu m^2$</td>
<td>6 LSBS</td>
<td></td>
</tr>
<tr>
<td>Neuron nonlinearity</td>
<td>$D_{v} \leq 10%$</td>
<td>26 LSBS</td>
<td></td>
</tr>
<tr>
<td>Neuron derivative nonlinearity</td>
<td>$</td>
<td>d_{v}</td>
<td>\leq 10 \mu A$</td>
</tr>
<tr>
<td>Neuron input offset</td>
<td>$</td>
<td>V_{in}</td>
<td>\leq 5 mV$</td>
</tr>
<tr>
<td>Neuron output offset</td>
<td>$t_{delay} \leq 1.8 \mu s$</td>
<td>1/2 LSBS</td>
<td></td>
</tr>
<tr>
<td>Neuron propagation delay*</td>
<td>$t_{prop} \leq 0.8 \mu s$</td>
<td>1/2 LSBS</td>
<td></td>
</tr>
<tr>
<td>LPNPE current gain</td>
<td>$\alpha \approx 0.55$</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Synapse size</td>
<td>$A_{syn} = 33280 \mu m^2$</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Matrix offset</td>
<td>$</td>
<td>V_{off}</td>
<td>\leq 16 mV$</td>
</tr>
<tr>
<td>Matrix resolution</td>
<td>$V_{swe} \leq 2 mV$</td>
<td>1/4 LSBS</td>
<td></td>
</tr>
<tr>
<td>Synapse nonlinearity</td>
<td>$D_{syn} \leq 16%$</td>
<td>21 LSBS</td>
<td></td>
</tr>
<tr>
<td>Synapse output offset</td>
<td>$</td>
<td>I_{off}</td>
<td>\leq 14 \mu A$</td>
</tr>
<tr>
<td>Synapse input offset</td>
<td>$</td>
<td>V_{off}</td>
<td>\leq 6 mV$</td>
</tr>
<tr>
<td>Synapse propagation delay*</td>
<td>$t_{prop} \leq 2.0 \mu s$</td>
<td>1/2 LSBS</td>
<td></td>
</tr>
<tr>
<td>Synapse write time*</td>
<td>$t_{write} \leq 150 ns$</td>
<td>1/4 LSBS</td>
<td></td>
</tr>
<tr>
<td>Matrix (weight) drift</td>
<td>$</td>
<td>d_{j}</td>
<td>\leq 0.5 mV/s$</td>
</tr>
<tr>
<td>Weight range</td>
<td>$\frac{\max E_{\text{input}}}{\text{max}} \leq [0.4, 40]$</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Layer propagation delay*</td>
<td>$t_{prop} \leq 2.6 \mu s$</td>
<td>1/2 LSBS</td>
<td></td>
</tr>
<tr>
<td></td>
<td>$t_{prop} \leq 1.1 \mu s$</td>
<td>1/2 LSBS</td>
<td></td>
</tr>
</tbody>
</table>

1Time from input change to output has settled within 1/2 LSB.
2Necessary length of write pulse that ensures the output will settle within 1/2 LSB.

A summary of the most important properties of the chips is shown in Table I. 1 LSBx is one least significant bit for an X bit resolution of the appropriate signal. The nonlinearity, $D$, of a quantity $x$ is defined as the maximum deviation from the desired value: $D = \max |f(x) - \xi|/|\xi|$ max where $f(\cdot)$ is a nonlinear function. The offset errors and the nonlinearities cited in the table are caused by device mismatch (e.g., threshold voltage variations) and nonideal components (e.g., the channel mobility is field dependent) [14].

A measurement of the neuron transfer characteristics can be seen in Fig. 4(a). The maximum deviation from the desired tanh functions, $D_{v}$, is about 2% of the output range. The gain is adjustable with a range of 1:30 (0.1 V $<$ $V_{gain}$ $<$ 3 V). The derivative of $V_{out}$ with respect to $i_{in}$ has been compared to $d tanh/sd$. The deviation ($D_{d}$) is less than 10% of the maximum value of $dV_{out}/ds_{j}$. The synapse transfer characteristics is shown in Fig. 4(b). The characteristics showed a good linearity ($D_{syn} \leq 3\%$ or 5 bits accuracy)—with the exception of the case with negative $v_{w,j}$ values and positive $v_{y,i}$ values ($D_{syn} \leq 16\%$). This is due to the fact that it was necessary to lower $V_{SS}$ to ensure a reasonable output current swing. The problem can be solved by improving the transconductor and the resulting nonlinearity is estimated to $D_{syn} \leq 3\%$. The synapse matrix resolution (i.e., the smallest $\Delta v_{w,j}$ distinguishable at the output) was measured to $V_{swe} \leq 2 mV$ or 10 bit at the least for a 2 V range of “matrix voltages” (note that we distinguish between resolution and accuracy). This should be sufficient for a range of ANN applications [7].

The output offset currents on the synapse chip and the input offset currents on the neuron chip are quite large. The reason could be that the opamps have low gains (< 60 dB), which together with opamp offset voltages of 2 mV would give the measured current offsets. This, however, is not necessarily a major problem (provided that the network is trained and used using the same chips) as the offset currents just displaces the neuron biases [8]. Likewise the matrix offset voltages could be used as small, random, initial weights when the network is trained. It should be noted that the offset errors are (mostly) nonsystematic.
Finally measurements on two interconnected chips were made. In Fig. 5(a) the combined transfer characteristics of a synapse followed by a neuron is shown. The step response of the synapse-neuron combination is shown in Fig. 5(b). The delay through one layer of an ANN based on our chips can be measured on this curve: for an 8 bit output accuracy we have \( t_{pd} \leq 2.6 \mu s \). Experimental results on an ANN based on the chip set are not yet available—a PC expansion board is under development and results should be available in the near future.

IV. CONCLUSIONS

In this paper we have presented two cascadable, analog CMOS chips: a neuron chip and a synapse chip. The chips have been tested and have shown excellent properties with respect to ANN applications:

The neuron function is well-defined, and the derivative can be calculated directly from the output voltage. LPNP transistors work well as a differential pair. The adjustable gain ensures that the numbers of connected synapse inputs can be variable within a wide range.

The synapse matrix resolution is about 10 bits and the leakage currents in the capacitors holding the matrix elements are extremely small. The multiplication nonlinearities are extremely small. The multiplication nonlinearities are small, fast, accurate, analog neural networks with arbitrary topologies can be implemented by using full size neuron chips (with 100 neurons) and synapse chips (with 100\(^2\) synapses).

ACKNOWLEDGMENT

This work was performed as parts of Ph.D. studies under the supervision of Prof. Erik Bruun. It was supported by the Danish Technical Research Council and the Danish Natural Science Council. Thanks are due to Thomas Kaulberg for the design of the amplifiers. The chips were fabricated through the EUROCHIP initiative.

REFERENCES


