How does ReLU allow neural networks to approximate continuous nonlinear functions? | by Thi-Lam-Thuy LE | January 2024

Discover how a neural network with a hidden layer using ReLU activation can represent any continuous nonlinear function.

Activation functions play an integral role in neural networks. (NN) since they introduce nonlinearity and allow the network to learn more complex features and functions than a simple linear regression. One of the most used activation functions is the Rectified Linear Unit. (ReLU), which has been theoretically It has been shown to allow NNs to approximate a wide range of continuous functions, making them powerful function approximators.

In this publication, we study in particular the Nonlinear Continuous approximation (CNL) functions, the main objective of using an NN over a simple linear regression model. More precisely, we investigate 2 subcategories of CNL functions: Continuous PieceWise Linear (CPWL)and continuous curve (DC) functions. We will show how these two types of functions can be represented using an NN consisting of a hidden layer, provided there are enough neurons with ReLU activation.

For illustrative purposes, we consider only single-feature inputs, but the idea also applies to multiple-feature inputs.

Figure 1: Rectified linear unit (ReLU) function.

ReLU is a piecewise linear function consisting of two linear parts: one that cuts off negative values where the output is zero and another that provides a continuous linear mapping for non-negative values.

CPWL functions are continuous functions with multiple linear portions. The slope is constant in each slice, but changes abruptly at the transition points when new linear functions are added.

Figure 2: Example of approximation of the CPWL function using NN. At each transition point, a new ReLU function is added/subtracted from the input to increase/decrease the slope.

In a NN with a hidden layer using ReLU activation and a linear output layer, the activations are added to form the objective function CPWL. Each hidden layer unit is responsible for one linear piece. In each unit, a new ReLU function corresponding to the slope change is added to produce the new slope. (see figure 2). Since this activation function is always positive, the output layer weights corresponding to the units that increase the slope will be positive and, conversely, the weights corresponding to the units that decrease the slope will be negative. (see figure 3). The new function is added at the transition point but does not contribute to the resulting function before (and sometimes after) that point due to the deactivation range of the ReLU activation function.

Figure 3: Approximation of the CPWL objective function in Fig.2 using an NN consisting of a hidden layer with ReLU activation and a linear output layer.

Example

To make it more concrete, we consider an example of a CPWL function consisting of 4 linear segments defined below.

To represent this objective function, we will use an NN with 1 hidden layer of 4 units and a linear layer that generates the weighted sum of the activation outputs of the previous layer. Let's determine the network parameters so that each unit in the hidden layer represents a segment of the target. For the sake of this example, the output layer bias (b2_0) is set to 0.

Figure 5: The network architecture to model the PWL function defined in Fig.4.

Figure 6: The activation output of unit 0 (a1_0).

Figure 7: The activation output of unit 1 (a1_1), which is added to the output (a2_0) to produce segment (2). The red arrow represents the change in slope.

Figure 8: The output of unit 2 (a1_2), which is added to the output (a2_0) to produce segment (3). The red arrow represents the change in slope.

Figure 9: The output of unit 3 (a1_3), which is added to the output (a2_0) to produce segment (4). The red arrow represents the change in slope.

The next type of continuous nonlinear function that we will study is the CC function. There is no adequate definition for this subcategory, but An informal way to define CC functions are continuous nonlinear functions that are piecewise nonlinear. Various examples of CC functions are: quadratic function, exponential function, sinus function, etc.

A CC function can be approximated by a series of infinitesimal linear pieces, which is called piecewise linear approximation of the function. The greater the number of linear pieces and the smaller the size of each segment, the better the approximation to the objective function. Therefore, the same network architecture above with a sufficiently large number of hidden units can produce a good approximation for a curve function.

However, in reality, the network is trained to fit a given data set where the input-output mapping function is unknown. An architecture with too many neurons is prone to overfitting, high variation, and requires more time to train. Therefore, an appropriate number of hidden units should be large enough to adequately fit the data and, at the same time, small enough to avoid overfitting. Furthermore, with a limited number of neurons, a good low-loss approximation has more transition points in a restricted domain, rather than equidistant transition points in a uniformly sampling manner (as shown in Fig.10).

Figure 10: Two piecewise linear approximations for a continuous curve function (dashed line). Approximation 1 has more transition points in the restricted domain and models the objective function better than approximation 2.

In this post, we have studied how the ReLU activation function allows multiple units to contribute to the resulting function without interfering, allowing for continuous approximation of nonlinear functions. Furthermore, we have discussed about the choice of network architecture and the number of hidden units to obtain a good approximation result.

I hope this post is helpful for your Machine Learning learning process!

More questions to think about:

How does the approximation ability change if the number of hidden layers with ReLU activation increases?
How are ReLU activations used for a classification problem?

*Unless otherwise noted, all images are the author's.

How does ReLU allow neural networks to approximate continuous nonlinear functions? | by Thi-Lam-Thuy LE | January 2024

Technical Terrence Team

Top 5 AI Photography Tools for Secondary Education

Leave a Reply Cancel reply

Recommended.

Ethereum Price Could Hit New All-Time High With Little Resistance, Analyst Explains Why

Edtech Show & Tell: July 2024

Bitcoin's Current Decline Could Be Temporary – Key Signs of a Rally Revealed

Blue Origin attributes last summer’s NS-23 rocket failure to a faulty engine nozzle

BlockDAG Leads Best Crypto Prospects of 2024 Survey with $9.9 Million Pre-Sale, Beating Decentraland and Tezos

Categories

Important Links

How does ReLU allow neural networks to approximate continuous nonlinear functions? | by Thi-Lam-Thuy LE | January 2024

Discover how a neural network with a hidden layer using ReLU activation can represent any continuous nonlinear function.

Related

Technical Terrence Team

Top 5 AI Photography Tools for Secondary Education

Leave a Reply Cancel reply

Recommended.

Ethereum Price Could Hit New All-Time High With Little Resistance, Analyst Explains Why

Edtech Show & Tell: July 2024

Bitcoin's Current Decline Could Be Temporary – Key Signs of a Rally Revealed

Blue Origin attributes last summer’s NS-23 rocket failure to a faulty engine nozzle

BlockDAG Leads Best Crypto Prospects of 2024 Survey with $9.9 Million Pre-Sale, Beating Decentraland and Tezos

Categories

Important Links

Get daily news updates to your inbox!