Blogs/Deriving Derivative Rules

# Deriving Derivative Rules

peterwashington Nov 03 2021 13 min read 0 views
Calculus

We can use the general formula for the derivative to derive derivative rules which can be used to quickly calculate the derivative to simple functions. In this section, we will derive the following derivative rule:

Before we derive this rule, let’s first look at an example. We want to find the derivative of the function fx=xat x=-2. Using the limit-based derivative formula from the Derivatives Intuition blog post:

More generally, if we are trying to find the derivative for fx=xat x=a, we can apply the same process with the variables:

At this point, we can make use of the Binomial Theorem, which is the following:

We can apply this formula (which is usually taught in Calculus class) back into our derivative calculation:

Canceling out the an-an in the numerator, we get:

Dividing the numerator by h after factoring out h in the numerator, we get:

Since we are taking the limit as h approaches 0, and all of the terms include a multiplication by h (which is approaching 0) except for the first term, the limit evaluates to simply the first term, which is:

And that gives us our final derivative rule:

Using this basic approach, we can derive the following common derivative rules taught in Calculus class:

Beyond these basic rules, the derivatives of more complex functions can be thought of as breaking apart the derivative into pieces. In particular:

The rule that is the most relevant to gradient descent, which is the fundamental process used to train machine learning models, is called the chain rule:

In other words, the derivative of y with respect to x is the derivative of y with respect to some other function u(x) times the derivative of u with respect to x.

For example, let’s say we want to find the derivative of y(x) = ln(sin(x)). We can think of this as u(x) = sin(x) and y(u) = ln(u). When written in this way, we can calculate the answer using the chain rule:

Therefore, the full derivative using the chain rule is: