In this article, we will discuss “How Neural Network Extrapolate: From Feedforward To Graph Neural Networks”.

## Introduction to the Feedforward network

In this article, you will understand How Neural Network Extrapolate: From Feedforward To Graph Neural Networks. A feedforward network, a type of deep neural network, has no circular interconnections between its nodes. Feedforward neural networks, also referred to as multi-layered networks of neurons, only allow information to go forward. The data eventually exits the output nodes after traveling through the hidden units and entering the input nodes. Data never flows backward; it always moves forward, even though there may be many hidden nodes along its path as shown in figure 1.1. During the model’s training, the activation function is used to update the weights of the neural network. Most frequently, Relu is being used by scientists.

### Activation Function

A straightforward mathematical formula called an activation function converts an input into an output that must fall within a particular range. As implied by their names, they turn on the neuron when the output achieves the predetermined function threshold value. In essence, they are in charge of turning the neuron ON and off. The neuron receives the product of its inputs, weights that have been randomly initialized, and a fixed bias for every layer. This sum is subjected to the activation function, and a result is produced. In order to help the network understand complicated patterns in the data, for instance in the case of photos, text, videos, or sounds, activation functions add non-linearity into the system. Without an activation function, our model will operate in a manner similar to a linear regression model, which has a constrained capacity for learning.

Relu (Rectified linear Unit) ReLU, also known as the rectified linear activation function, is a non-linear or simple linear function that, if the input is positive, displays the input directly; else, it produces zero. It is the most extensively used activation function in neural networks, especially deep convolutional networks (CNNs) and multilayered perceptrons. It performs better than older models like the sigmoid or tanh despite being simple. It is expressed mathematically in bellow equation eq. 1.1 as:

f(x)=max(0,x)

The graphic demonstrates that all of the negative numbers have been reduced to zero, but the positive numbers have been kept exactly as they were. Because we provided a series of values that increased one after the other as inputs, the output is linear with a rising slope. After graphing ReLU, it initially appears to be a linear transformation. However, it is a non-linear function that is necessary to recognize and understand complicated correlations from the training examples. For positive values, it behaves as a linear function; for negative values, it behaves as a non-linear kernel function.

We can consider a single-layer deep learning model as an example of a feed-forward neural network. Inputs are passed to layers in this model. The input values are multiplied by the weights in a neural network. Then all weighted inputs are summed up to generate a single output. Usually, the generated output is 1 and if the sum will be below the threshold, the output generated by the model is -1. The threshold during model training is set as 0. It is commonly used in the classification task. Neural network compares the output of nodes with the desired output using a gradient descent algorithm to update the weights of the neural network model. Backpropagation is used to update the weights in multi-layered perceptron

The easiest algorithm produces a sparse (which is also some kind of block matrix) N×N adjacency matrix for a typical MLP/FFN with a total of NN neurons (including input and output neurons), where each neuron n_l^k at layer L has a directed edge that goes into all neurons at layer *L*+1

## Algorithm

- Create an N×N matrix G∈{0,1}^NxN with zeros
- Comment 1: G_ij is the element of the matrix at row i and column j.
- Comment 2: Indices i and j start at 1 and end at N
- Comment 3: if we set G_ij =1, then there’s a directed edge from neuron i to neuron j (but not necessarily vice-versa: for that to be true, we would also need G_ji=1)
- Comment 4: we need to create a mapping between the indices i and j and the neurons in the neural network; this is done below!

- Let c(L) be the number of neurons at layer L
- For each layer L=0,…,L−1
- Comment 5: L=0 is the input layer and L is the output layer
- For k=1,…,c(L)
- Comment 6: for example, n_l^k=n_2^3 is the third neuron at the first hidden layer
- Let M= ∑_(h=0)^(l-1)〖c(h)〗
- Comment 7: M is the number of neurons processed so far in the previous layers (excluding the neurons in the current layer)

- i=k+M
- For t=1,…,c(l+1)
- j=t+c(l)+M
- j is basically the index of the graph G that corresponds to the neuron t in the next layer L+1

- Set Gij=1

- j=t+c(l)+M

- Return the matrix G