Akhil Chaudhary
NIT Warangal graduate, Software Engineer at Wipro.
3 min read 199 views

ReLU Activation

ReLU Activation_image

A neural network is a weighted linear combination of inputs without any activation. An activation function introduces non-linearity in the model learning stage and allows the model to learn complex patterns in the data.

Rectified Linear Unit (ReLU) activation outputs the input directly if it is positive, otherwise, it outputs zero. Mathematically this can be represented as:

ReLU has become the default activation function for most neural networks. ReLU activated networks achieve better performance due to sparse activations as negatively valued neurons are deactivated. Also unlike sigmoid and tanh activations, ReLU doesn’t suffer from the problem of vanishing gradients. Thus, ReLU activated networks train faster and achieve better model performance.

ReLU activations also have some limitations as well. The function is non-differentiable at zero. This derivative at zero has to be arbitrarily chosen as 0 or 1. 

“Dying ReLU” is a situation in which a large number of neurons always output zero, or in other words are always dead. When the majority of the ReLU inputs are in the negative range, the gradient fails to flow during backpropagation and weights do not get updated. Learning essentially gets stopped. This can be caused by a high learning rate or a large negative bias.

This can be resolved by using a smaller learning rate or using variants of ReLU. “Leaky ReLU” is a modified function in which, instead of zero, the output is a very small quantity proportional to the input (f(x) = 0.01x when x < 0 say). 


def relu(input):

  if input > 0:

    return input


    return 0


The ReLU activation has been a great innovation in the neural networks field. It is the most popular activation function and with good reason. It has allowed models to learn complex, non-linear patterns and has vastly improved the learning capabilities of modern-day neural networks.