- Home
- Documents
*Analog VLSI Stochastic Perturbative Learning Architectures 2004-11-21¢ Examples of early...*

prev

next

out of 15

View

2Download

0

Embed Size (px)

Analog Integrated Circuits and Signal Processing, 13, 195–209 (1997) c© 1997 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

Analog VLSI Stochastic Perturbative Learning Architectures∗

GERT CAUWENBERGHS gert@bach.ece.jhu.edu

Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218

Received May 20, 1996; Accepted October 17, 1996

Abstract. We present analog VLSI neuromorphic architectures for a general class of learning tasks, which include supervised learning, reinforcement learning, and temporal difference learning. The presented architectures are parallel, cellular, sparse in global interconnects, distributed in representation, and robust to noise and mismatches in the implementation. They use a parallel stochastic perturbation technique to estimate the effect of weight changes on network outputs, rather than calculating derivatives based on a model of the network. This “model-free” technique avoids errors due to mismatches in the physical implementation of the network, and more generally allows to train networks of which the exact characteristics and structure are not known. With additional mechanisms of reinforcement learning, networks of fairly general structure are trained effectively from an arbitrarily supplied reward signal. No prior assumptions are required on the structure of the network nor on the specifics of the desired network response.

Key Words: Neural networks, neuromorphic engineering, reinforcement learning, stochastic approximation

1. Introduction

Carver Mead introduced “neuromorphic engineering” [1] as an interdisciplinary approach to the design of bi- ologically inspired neural information processing sys- tems, whereby neurophysiological models of percep- tion and information processing in living organisms are mapped onto analog VLSI systems that not only emu- late their functions but also resemble their structure [2].

Essential to neuromorphic systems are mechanisms of adaptation and learning, modeled after neural “plas- ticity” in neurobiology [3], [4]. Learning can be broadly defined as a special case of adaptation whereby past experience is used effectively in readjusting the network response to previously unseen, although sim- ilar, stimuli. Based on the nature and availability of a training feedback signal, learning algorithms for artifi- cial neural networks fall under three broad categories: unsupervised, supervised and reward/punishment (re- inforcement). Physiological experiments have re- vealed plasticity mechanisms in biology that correpond to Hebbian unsupervised learning [5], and classical

∗This work was supported by ARPA/ONR under MURI grant N00014-95-1-0409. Chip fabrication was provided through MOSIS.

(pavlovian) conditioning [6], [7] characteristic of re- inforcement learning.

Mechanisms of adaptation and learning also provide a means to compensate for analog imperfections in the physical implementation of neuromorphic systems, and fluctuations in the environment in which they op- erate. Examples of early implementations of analog VLSI neural systems with integrated adaptation and learning functions can be found in [8]. While off- chip learning can be effective as long as training is performed with the chip “in the loop”, chip I/O band- width limitations make this approach impractical for networks with large number of weight parameters. Fur- thermore, on-chip learning provides autonomous, self- contained systems able to adapt continuously in the environment in which they operate.

On-chip learning in analog VLSI has proven to be a challenging task for several reasons. The least of prob- lems is the need for local analog storage of the learned parameters, for which adequate solutions exist in vari- ous VLSI technologies [34]–[39]. More significantly, learning algorithms that are efficiently implemented on general-purpose digital computers do not necessarily map efficiently onto analog VLSI hardware. Second, even if the learning algorithm supports a parallel and

196 G. Cauwenberghs

scalable architecture suitable for analog VLSI imple- mentation, inaccuracies in the implementation of the learning functions may significantly affect the perfor- mance of the trained system. Third, the learning can only effectively compensate for inaccuracies in the net- work implementation when their physical sources are contained directly inside the learning feedback loop. Algorithms which assume a particular model for the underlying characteristics of the system being trained are expected to perform poorer than algorithms which directly probe the response of the system to external and internal stimuli. Finally, the nature of typical learn- ing algorithms make assumptions on prior knowledge which place heavy constraints on practical use in hard- ware. This is particularly the case for physical systems of which the characteristics nor the optimization ob- jectives are properly defined.

This paper addresses these challenges by using stochastic perturbative algorithms for model-free esti- mation of gradient information [9], in a general frame- work that includes reinforcement learning under de- layed and discontinuous rewards [10]–[14]. In particu- lar, we extend earlier work on stochastic error-descent architectures for supervised learning [15] to include computational primitives implementing reinforcement learning. The resulting analog VLSI architecture re- tains desirable properties of a modular and cellular structure, model-free distributed representation, and robustness to noise and mismatches in the implemen- tation. In addition, the architecture is applicable to the most general of learning tasks, where an unknown “black-box” dynamical system is adapted using a exter- nal “black-box” reinforcement-based delayed and pos- sibly discrete reward signal. As a proof of principle, we apply the model-free training-free adaptive techniques to blind optimization of a second-order noise-shaping modulator for oversampled data conversion, controlled by a neural classifier. The only evaluative feedback used in training the classifier is a discrete failure signal which indicates when some of the integrators in the modulation loop saturate.

Section 2 reviews supervised learning and stochastic perturbative techniques, and presents a corresponding architecture for analog VLSI implementation. The fol- lowing section covers a generalized form of reinforce- ment learning, and introduces a stochastic perturbative analog VLSI architecture for reinforcement learning. Neuromorphic implementations in analog VLSI and their equivalents in biology are the subject of section 4. Finally, section 5 concludes the findings.

2. Supervised Learning

In a metaphorical sense, supervised learning assumes the luxury of a committed “teacher”, who constantly evaluates and corrects the network by continuously feeding it target values for all network outputs. Super- vised learning can be reformulated as an optimization task, where the network parameters (weights) are ad- justed to minimize the distance between the targets and actual network outputs. Generalization and overtrain- ing are important issues in supervised learning, and are beyond the scope of this paper.

Let y(t) be the vector of network outputs with com- ponentsyi (t), and correspondinglyytarget(t) be the sup- plied target output vector. The network contains ad- justable parameters (or weights)p with componentspk, and state variablesx(t) with componentsxi (t) (which may contain external inputs). Then the task is to min- imize the scalar error index

E(p; t) = ∑

i

|ytargeti (t)− yi (t)|ν . (1)

in the parameterspi , using a distance metric with norm ν > 0.

2.1. Gradient Descent

Gradient descent is the most common optimization technique for supervised learning in neural networks, which includes the widely used technique of back- propagation (or “dynamic feedback”) [16] for gradient derivation, applicable to general feedforward multilay- ered networks.

In general terms, gradient descent minimizes the scalar performance indexE by specifying incremen- tal updates in the parameter vectorp according to the error gradient∇pE :

p(t + 1) = p(t)− η ∇pE(t). (2)

One significant problem with gradient descent and its variants for on-line supervised learning is the com- plexity of calculating the error gradient components ∂E/∂pk from a model of the system. This is especially so for complex systems involving internal dynamics in the state variablesxj (t):

∂E ∂pk = ∑ i, j

∂E(t) ∂yi · ∂yi (t) ∂xj

· ∂xj (t) ∂pk

(3)

Analog VLSI 197

where derivation of the dependencies∂xj /∂pk over time constitutes a significant amount of computation that typically scales super-linearly with the dimension of the network [15]. Furthermore, the derivation of the gradient in (3) assumes accurate knowledge of the model of the network (y(t) as a function ofx(t), and re- currence relations in the state variablesx(t)). Accurate model knowledge cannot be assumed for analog VLSI neural hardware, due to mismatches in the physical implementation which can not be predicted at the time of fabrication. Finally, often a model for the system being optimized may not be readily available, or may be too complicated for practical (real-time) evaluation. In such cases, a black-box approach to optimization is more effective in every regard. This motivates the use of the well-known technique of stochastic approxima- tion [17] for blind optimization in analog VLSI sys- tems. We apply this technique to supervised learning as well as to more advanced models of “reinforcement” l