Blogs/Network Activation Functions

Network Activation Functions

peterwashington Nov 02 2021 4 min read 0 views
Neural Networks

We first encountered activation functions when we covered logistic regression. Recall that binary logistic regression involves simply applying the sigmoid activation function to a linear equation of the inputs. Logistic regression learns the m and b parameters for y = mx + b. To get a probability value for classification, we apply sigmoid activation to mx + b to get:

In neural networks, we apply activation functions to every single node in the network. The most popular activation function to apply is called a Rectified Linear Unit (ReLU), which is a fancy phrase for the following:

With the ReLU activation, after summing up the mx values for the incoming nodes, we set the value to 0 if it is negative. For example:

In this case, the value of the right-most node is max(0, 8 * 4 + 3 * -1 + 6 * 2) = max(0, 41) = 41.

Let’s change the input values a little:

In this case, the value of the next node is max(0, -8 * 4 + 3 * -1 + -6 * 2) = max(0, -47) = 0.

ReLU is a non-linear activation function. This nonlinearity is the crucial element for allowing neural networks to learn any function instead of only linear function. If you have enough input nodes, and enough layers with ReLU activation, then you can approximate any input function, and the neural network therefore has the capacity to learn approximations of any function. These activations are the key ingredient to neural networks: it is the difference between learning a linear function and any arbitrarily complex function.