Let’s begin with the Perceptron, the most basic neural network. It functions as a NN in essence, with a single hidden layer that contains a single hidden unit with an activation function. A 1* xn *vector serves as the input layer and weights.

The following function can be represented by a perceptron: Typically is an activation such as tanh, sigmoid, and relu. Because the Perceptron is a linear classifier, it is well known that it cannot simulate an XOR.

According to the universal approximation theorem, any continuous function]à[0, 1] may be arbitrarily well approximated by a neural network with at least one hidden layer and a finite number of weights. We will demonstrate this in the following subsections.

## Universal Approximation Theorem

Let’s imagine someone gave you a wavy function, like f(x).

The ability of a neural network to compute any function, regardless of complexity, is one of its most outstanding qualities. For any function, there is an assurance that there will be a neural network that outputs the value f(x) (or a close approximation) for every potential input, x, e.g. Even if the function, f=f(x1,…,xm), has many inputs and many outputs, the aforementioned conclusion is still true. Here is a network computation for a function with m=3 inputs and n=2 outputs.

Also Read: Is Mac better than PC for Data Analysis? Complete Breakdown

This indicates that neural networks are somewhat universal, meaning that there is already a neural network available for each function that we wish to compute. Users of neural networks are familiar with the universality theorem. However, the reason why is true is not well known.

Let’s begin by learning how to build a neural network that roughly estimate a function with only single input and output. This, if carefully examined, is the crux of the universality problem. Once we have a firm grasp of this unique example, expanding to include functions with numerous inputs and outputs is rather simple.

### Step 1: With one of the neurons, create a step function.

Let’s concentrate on the top hidden neuron first. By giving it a lot of weight, we can use a sigmoid to arbitrarily well mimic the step function, and by modifying the bias, we can place it anywhere. The similar case may be made for the tanh activation but not for ReLu, as a side note.) Since the weights of the first layer only need to be high enough for our toy example, we won’t be concerned with changing them and will instead treat them as constant. Additionally, in order to make the plots more understandable, we will show the step’s location rather than the bias, which can be calculated simply with the formula s = b/ w.

### Step 2: Create a “bin” with a step function that is the reverse.

We can accurately approximate a bin and regulate its position, size, and height by using the other neuron to build a step function and placing opposing weights in the second layer.

Now that you presumably understand where this is headed, let’s make it even clearer by using only one value, h, to stand in for the “bin’s” height for both w1 and w2, respectively.

### Step 3: Discretize(obscure) the operation

In the last stage, we aggregate various “bins” to create a histogram that roughly represents the function. The illustration above uses only 5 bins (10 hidden units), which is a very crude approximation; however, we can clearly sharpen it by simply including more bins.

## FAQs

### What purpose do universal approximations serve?

The universal approximation theorem states that neural networks may roughly represent any function. This is strong, I tell you. As a result, neural networks are now capable of doing any task that can be thought of as a function calculation.

### What is the universal approximation theorem actually true?

The Approximate Universal According to the theorem, neural networks have a certain level of universality, meaning that no matter what f(x) is, there is a network that can roughly approach the outcome and complete the task! This outcome is valid for any quantity of inputs and outputs.

### Can any function be approximated by a single-layer neural network?

The Approximate Universal Theorem asserts that for inputs falling inside a specified range, a neural network with one hidden layer may approximate any continuous function. We won’t be able to approximate a function if it bounces around or has a lot of gaps.