2016 International Conference on Network and Information Systems for Computers
An Adaptive Statistical Neural Network Model
Peilei LIU
Jintao TANG, Haichi LIU, Ting WANG
College of Computer National University of Defense Technology Changsha 410073, China Department of Information Resource Management Academy of National Defense Information Wuhan 430010, China E-mail:
[email protected] College of Computer National University of Defense Technology Changsha 410073, China E-mail: {tangjintao, liuhaichi, tingwang}@ nudt.edu.cn
foundation in statistics. For example, EML is trained through direct computation instead of gradient descent search. Therefore, its learning is very fast. However, learning mechanism in EML only fits the specific structure of singlehidden-layer network. And it is hard to extend to deep network. As a result, ELM couldn’t make use of the powerful representative ability of deep network. In addition, they are in fact supervised batch learning algorithms as well. There have been some self-organized or self-adaptive neural network models which could execute incremental unsupervised learning, such as ART (Adaptive Resonance Theory) [5]. And ART is good at explaining the cognitive mechanism of human brain as well. However, there are some suspicions about ART. For example, it lacks theory foundation solid enough [10]. And its learning algorithm and parameters setting depend on engineer’s experience more than on mathematical theory. In addition, clusters in ART which represented by neurons are unstable. These clusters could change a lot from the initial state, with the progress of learning. And the results depend upon the order in which the training data are processed. ART has not given a theory explanation good enough for this. Biological neural network is an inspiration. Obviously, it can execute incremental unsupervised learning. In consideration of the slow computing speed of neurons, the learning mechanism of biological neural network must be very fast. According to evidences in neuroscience, the synapse connections are self-adaptive to outside stimulus. And the neural network is self-organized by neurons. Inspired by the learning mechanism of biological neural network, we put forward a neural network model called ASNN (adaptive statistical neural network) in this paper. This model is built on the foundation of statistical theory. And it can execute incremental unsupervised learning without inducing the “catastrophic forgetting” problem. ASNN is trained through direct computation other than gradient descend search. Therefore its learning process is very fast in theory.
Abstract—Traditional back propagating algorithms of artificial neural network are supervised batch-learning algorithms. And they are faced with challenges from the huge number of realtime data in Internet. In this paper, we put forward an unsupervised incremental learning model. This is an adaptive statistical neural network model. It is built on the foundation of statistical theory rather than gradient descent search. Experiments on classical datasets demonstrate that the clustering algorithm of this model is comparable with traditional clustering algorithms such as K-means. Moreover, it can also execute supervised learning in theory. Keywords-Neural network; statistical learning; incremental learning; clustering algorithm
I.
INTRODUCTION
It is well known that the neural network with multiple layers is powerful in representation [1]. And the most popular learning algorithms such as BP (back propagating) algorithm [2] and Boltzmann machine [3] are supervised batch-learning algorithms. With the development of Internet and social medium, a huge number of unlabeled real-time data have been produced every day. This puts forward a challenge to these traditional algorithms. Taking the BP algorithm as an instance, it uses the gradient descent method for training neural network. As a result, its training process is time-consuming compared with other models such as EML (extreme learning machine) [4]. And as a supervised algorithm, it is hard to make use of the huge number of unlabeled data. In addition, BP algorithm is not good at incremental learning due to the “catastrophic forgetting” problem [5]. Specifically, when learning new data, BP network will forget what it has already learned. Therefore whenever meeting a small number of new data, the BP algorithm has to retrain the whole model from scratch. This is different from the biological neural network. Finally, deep learning is successful in applications [6-8]. And a fast unsupervised learning algorithm is important to the pretraining phase of deep network [9]. In contrast to BP network, statistical machine learning models such as EML and SVM (support vector machine) use different learning methods. They have solid theory 978-1-4673-8838-2/16 $31.00 © 2016 IEEE DOI 10.1109/ICNISC.2016.23
242
II.
PRELIMINARIES
probabilistic, because the boundaries of events are fuzzy and the input data are noisy. Fig. 1 is merely an ideal case. Realistic events could be more complex. For example, they may overlap each other. The rationale of spatial localization is very similar. Since an object could be represented by an input vector, an attribute of the object should be represented by an element or a continuous range of elements in this feature vector according the Hypothesis 1. Moreover, elements in the feature vector aren’t independent. Instead, they are correlated with each other. Therefore when m elements in the feature vector have been observed, the probability of this object occurring should be (3):
Different machine learning models suit for processing different types of data. Before explaining the ASNN model, we firstly introduce the hypothesis about distribution of input data. Hypothesis 1: the input samples follow the rationale of spatial and temporal localization.
Past
Event 1
Now
Future
P’=1–(1–q’)m=1–exp(–c3m).
Event 2
Where q’ is the probability of the object occurring when a single element having been observed. Definition 1: the spatial range of a neuron’s dendritic branches covering is called its receptive field. Similarly, the spatial range of a dendrite’s branches covering is called the receptive field of the dendrite. The spatial localization rationale in Hypothesis 1 implies that neural network should be organized as locally connected network, when the size of input vector is larger than the receptive field of a single neuron. This is similar to the structure of Convolutional Neural Network (CNN) [8]. Namely, each neuron only connects to partial neurons in the previous layer. For simplicity, in this paper we mainly consider the case when the size of input vector is equal to the receptive field of a single neuron. If we don’t consider the input order of data and the decay of event’s confidence in (2), Hypothesis 1 can be loosened as Hypothesis 2. And in this case, (1) and (3) are still true. Hypothesis 2: elements in input vectors are correlated with each other in both spatial and temporal dimensions. There are actually two meanings in Hypothesis 2. Firstly, elements in each input vector are not independent. Instead, they are bind to each other in some degree. As an example, we can confirm a hand by a single fingerprint independently. But a hand usually has fingers. And when seeing any one of them, we can infer that the other four fingers should also exist in great probability. In other words, information of fingers are redundant. Secondly, similar inputs will occur more than once in the time line. And in a dense dataset, the input vector distinctive to all the others should be viewed as noise. In contrast to Hypothesis 2, if elements in input vectors were independent, (1) would become a linear function p = n/c0 (n