Radial Basis Function Network

Introduction

Radial Basis Function network was formulated by Broomhead and Lowe in 1988. Since Radial basis functions (RBFs) have only one hidden layer, the convergence of optimization objective is much faster, and despite having one hidden layer RBFs are proven to be universal approximators.

RBF networks have many applications like function approximation, interpolation, classification and time series prediction. All these applications serve various industrial interests like stock price prediction, anomaly detection in data, fraud detection in financial transaction etc.

This article provides a detailed introduction to Radial Basis Function Networks along with its mathematical and algorithmic development, then a comparison between RBF and MLP (multi-layer perceptron) is presented. In the last section, RBF netowrk is trained to predict the coustomer’s response to subscription for term deposit.

Architecture of RBF

RBF network is an artificial neural network with an input layer, a hidden layer, and an output layer. The Hidden layer of RBF consists of hidden neurons, and activation function of these neurons is a Gaussian function. Hidden layer generates a signal corresponding to an input vector in the input layer, and corresponding to this signal, network generates a response.

Design and Development

To generate an output, neuron process the input signal through a function called activation function of the neuron. In RBF activation function of hidden neuron is \(\phi (X)\) i.e. for an input vector X output produced by the hidden neuron will be \(\phi (X)\) .

\(\displaystyle\phi (x) = e^{- \frac{(x-\mu)^2}{\sigma^2}}\)
Above figure represents a Gaussian neural activation function for 1-D input x with center (mean) \(\mu\) . Here \(\phi (x)\) represents the output of Gaussian node for given value of x. It can be clearly seen that the signal strength decreases as the input (X) move away from the center. The range of the Gaussian function is determined by ?, and output beyond the range is considered to be negligible.

The basic idea of this model is that the entire feature vector space is partitioned by Gaussian neural nodes, where each node generates a signal corresponding to an input vector, and strength of the signal produced by each neuron depends on the distance between its center and the input vector. Also for inputs lying closer in Euclidian vector space, the output signals that are generated must be similar.

\(\displaystyle \phi (X) = e^{\frac{-||X-\mu||^2}{\sigma^2}}\) (1)

Here, \(\mu\) is center of the neuron and \(\phi (X)\) is response of the neuron corresponding to input X.

In above figure circles represent Gaussian neural nodes and boundary of circles represents the range of the corresponding nodes also known as the receptive field of neurons. Here 2-D vector space is partitioned by 12 Gaussian nodes. Every input vector activates the collective system of neurons to some extent and the combination of these activations enables RBF to decide how to respond. Above configuration of neurons will generate similar output signals for input vectors A and B whereas for C output generated will be quite different.

In RBF architecture, weights connecting input vector to hidden neurons represents the center of the corresponding neuron. These weights are predetermined in such a way that entire space is covered by the receptive field of these neurons, whereas values of weights connecting hidden neuron to output neurons are determined to train the network.

It’s appropriate if vectors lying close in the Euclidian space falls under the receptive field of the same neuron; therefore centers of hidden neurons are determined using K-Means clustering.

k Means Clustering algorithm:

Choose the number of cluster centers “K”.
Randomly choose K points from the dataset and set them as K centroids of the data.
For all the points in the dataset, determine the centroid closest to it.
For all centroids, calculate the average of all the points lying closest to the same centroid.
Change the value of all the centroids to corresponding averages calculated in (4).
Go to (3) until convergence.

The range of receptive fields is chosen such that entire domain of input vector is covered by the receptive field of the neurons. So, the value of sigma is chosen according to maximum distance “d” between two hidden neurons.

\(\sigma = \frac{d}{\sqrt{2M}}\) (2)

Where d is the maximum distance between two hidden neurons and M is the number of hidden neurons.

Mathematical Development

Let \(g_{ij}\) be an element of matrix G representing the output of j^th neuron for i^th input vector and \(W_{ij}\) be an element of matrix W representing weight connecting i^thoutput neuron to j^th hidden neuron. In RBF, the activation function of output neuron is linear i.e. “ g(z)= z “ where z is the weighted summation of signals from hidden layer. Multiplying i^th row of G with j^th columns of W does the weighted summation of signals from the hidden layer which is equal to signal produced by j^th output neuron.

GW= T

Where T is a column vector and i^th row contains the target value (actual desired output) of i^th training vector.

From above equation, by method of pseudo inverse

\(W = (G^{T}G)^{-1}G^{T}T\) (3)

where \(G^{T}\) is the transpose of matrix \(G\).

Algorithm

Define the number of hidden neurons “K”.
set the positions of RBF centers using K-means clustering algorithm.
Calculate \(\sigma\) using equation (2)
Calculate actions of RBF node using equation (1)
Train the output using equation (3)

RBF v/s MLP

MLPs are advantageous over RBFs when the underlying characteristic feature of data is embedded deeply inside very high dimensional data sets. For example, in image recognition, features depicting the key information about the image is hidden inside tens of thousands of pixel. For such training examples, the redundant features are filtered as the information progress through the stack of hidden layers in MLPs, and as a result, better performance is achieved.

Having only one hidden layer RBFs have much faster convergence rate as compared to MLP. For low dimensional data where deep feature extraction is not required and results are directly correlated with the component of input vectors, then RBF network is usually preferred over MLP. RBFs are universal approximators, and unlike most machine learning models RBF is a robust learning model.

Implementation

Anticipating a client’s response from his characteristic details like age, marital status, education, job etc require years of experience and learning. Here an RBF based AI implementation on bank marketing data set is presented. Given the details of a client, the network will try to predict whether or not a client will subscribe for a term deposit.

The dataset contains feature information about different clients like age, job, marital status, loan, housing etc.

Data-Source: http://archive.ics.uci.edu/ml/datasets/Bank+Marketing

Classes: There is two classes “yes” and “no” corresponding to the response of a client.

Step 1: Import necessary packages

import math
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import numpy as numpy

step 2: Get data from the file and encode the labels using LabelEncoder class.

Data= pd.read_table("bank-full.csv", sep= None, engine= "python")
cols= ["age","balance","day","duration","campaign","pdays","previous"]
data_encode= Data.drop(cols, axis= 1)
data_encode= data_encode.apply(LabelEncoder().fit_transform)
data_rest= Data[cols]
Data= pd.concat([data_rest,data_encode], axis= 1)

step 3: split the data into training and testing set.

data_train, data_test= train_test_split(Data, test_size= 0.33, random_state= 4)
X_train= data_train.drop("y", axis= 1)
Y_train= data_train["y"]
X_test= data_test.drop("y", axis=1)
Y_test= data_test["y"]

step 4: Scale the data using StandardScaler class.

scaler= StandardScaler()
scaler.fit(X_train)
X_train= scaler.transform(X_train)
X_test= scaler.transform(X_test)

step 5: Determine \(\sigma\)

max= 0;
for i in range(K_clust):
 for j in range(column):
  d= numpy.linalg.norm(Cent[i]-Cent[j])
  if (max<d):                            
   max= d
d= max
sigma= d/math.sqrt(2*K_clust)

step 6: Determine centers of the neurons using KMeans.

K_cent= 8
km= KMeans(n_clusters= K_cent, max_iter= 100)
km.fit(X_train)
cent= km.cluster_centers_

step 7: Determine the value of \(\sigma\)

max=0 
for i in range(K_cent):
 for j in range(K_cent):
  d= numpy.linalg.norm(cent[i]-cent[j])
  if(d> max):
   max= d
d= max

sigma= d/math.sqrt(2*K_cent)

step 8: Set up matrix G.

shape= X_train.shape
row= shape[0]
column= K_cent
G= numpy.empty((row,column), dtype= float)
for i in range(row):
    for j in range(column):
        dist= numpy.linalg.norm(X_train[i]-cent[j])
        G[i][j]= math.exp(-math.pow(dist,2)/math.pow(2*sigma,2))

step 9: Find weight matrix W to train the network.

GTG= numpy.dot(G.T,G)
GTG_inv= numpy.linalg.inv(GTG)
fac= numpy.dot(GTG_inv,G.T)
W= numpy.dot(fac,Y_train)

step 10: Set up matrix G for the test set.

row= X_test.shape[0]
column= K_cent
G_test= numpy.empty((row,column), dtype= float)
for i in range(row):
 for j in range(column):
  dist= numpy.linalg.norm(X_test[i]-cent[j])
  G_test[i][j]= math.exp(-math.pow(dist,2)/math.pow(2*sigma,2))

step 11: Analyze the accuracy of prediction on test set

prediction= numpy.dot(G_test,W)
prediction= 0.5*(numpy.sign(prediction-0.5)+1)

score= accuracy_score(prediction,Y_test)
print score.mean()

With an RBF network, a prediction with an accuracy of 88% is achieved and so is the cause with MLP. However, the computational cost of training an MLP is much higher as compared to RBF; hence, here it’s better to use RBF network instead of MLP.

rahulbansal

I am MS student at IISER Mohali, majoring in physics and carrying out a thesis research on Machine Learning. My pedagogical philosophy is, "difficulties fades away at the end of a long day".