In the previous post we looked at limits, this time we’ll take a look at derivatives in calculus (which utilises limits). If you aren’t confident with limits yet, don’t worry, you can check out my last post to get up to speed super quick. Derivatives allow us to calculate the slope (also known as the gradient) at any point of a function, even if the function is not a straight line! The process is called differentiation, and it’s super useful in real life because most real-world functions are not straight lines, and we often need to know how fast a function is changing (the slope/gradient) 🙂

## Finding the slope – straight lines

Recall from algebra (or my last post) that the slope of a straight line is just the rise over run, like in the first graph below. These straight lines usually take the slope-intercept form of y = mx + b, where m is the slope and b is the y-intercept (where the line crosses the y-axis).

Finding the slope for this line is easy: for every 1 unit across the x-axis, the y-axis increases by 2. So the slope is \frac{rise}{run} = \frac{2}{1} = 2. This is all well and good, but what do we do if the function is not a straight line? We could resort to the method proposed in my last post, but that works only for finding the slope at a given point. What we want instead is to be able to find the slope at every point of the function, by deriving some new function which gives us the slope of the original function for any value of x. This is where derivatives come in.

## Finding the slope – curves

Remember a slope is just a rate of change, and that is exactly what a derivative gives us: a rate of change of a function f(x) at some point x. Check the graph below for an example.

This is the graph of the function y = x^{2}, which we can see is constantly changing (so it isn’t a straight line). Our approach to this problem uses limits: to see which value the gradient approaches as we make some ‘infinitely small’ change to the input of the function. You can visualise this as us zooming waaaay in on some part of the curve until it looks completely straight, and then drawing a straight tangent line at that point of the graph, which we then use to calculate the slope using our classic rise over run approach. Look at the graph below, I did the visualising for you 🙂

We zoomed in pretty far to the curve (red line) above at the point (1, 1), and the red line almost looks like it’s completely straight, just like the tangent line we drew (blue line). Imagine if we zoomed in as far as we possibly could, those two lines would look the same to us. This is where limits come in, and our familiar rise over run.

First, recall that the rise over run is just some difference in y values of the function when plugging in two different x values into the function. Also recall that the slope/gradient we get out from the rise is the average gradient between these to x points of the function. The maths below might look scary at first, but stick with me and I’ll explain everything:

\begin{aligned}
Slope &= \frac{rise}{run} \\
&= \frac{y_2 - y_1}{x_2 - x_1} \\
&= \frac{f(x+h) - f(x)}{h}
\end{aligned}

All we are saying here is that the slope or gradient is the difference of the y values divided by some small change in x (represented by h on the last line), when we plug two different x values in. Think of this as taking some small step h along the x-axis, and recording the change in the height of the curve. The slope is the change in y divided by the small step h. So the last line above is the same thing as the second line, just written slightly differently. Just remember they are the same.

Now remember that by using limits we can shrink this step h along the axis smaller and smaller and see what happens to the output of the function:

\begin{aligned}
f'(x) &= \lim_{h \to 0} \frac{f(x+h)-f(x)}{h}
\end{aligned}

The f'(x) part is read: “f prime”, and it’s just some notation to show that this is not the original function f(x) but a function derived from f(x). So f'(x) just means “the derivative of f(x)“. There’s actually an important point here so let’s make it clearer: the above formula is the general definition of the derivative.

Think for a moment about what will happen in the above formula as h gets closer to 0. Let’s work through an example where we use our curve f(x) = x^2 and plug in x = 2 to see what happens. Stick with me, I’ll explain it all in a moment:

\begin{aligned}
f'(2) &= \lim_{h \to 0} \frac{f(2+h)-f(2)}{h} \\
&= \lim_{h \to 0} \frac{(2+h)^2-2^2}{h} \\
&= \lim_{h \to 0} \frac{(4 + 4h + h^2) - 4}{h} \\
&= \lim_{h \to 0} \frac{4h + h^2}{h} \\
&= \lim_{h \to 0} (4+h) \\
&= 4 + 0 = 4
\end{aligned}

Okay, you might be wondering: “what sort of witchcraft is this?!”. Before you grab your torch and pitchfork, allow me to explain – it’s not witchcraft at all and I guarantee you can do it too! 🙂

First we just substituted our x value of 2 into the formula for the slope, and we add the limit part to show that our little steps h along the x-axis get closer and closer to 0 (smaller and smaller). Then we simplified a bit until we end up with \lim_{h \to 0} (4+h) .
Now this last step is the important part. Since we expressed that \lim_{h \to 0} approaches 0, we can now just substitute h=0 to get our final answer. It’s like we’re imagining what happens if we took a step with exactly 0 distance along the x-axis.

So, after all this, the gradient of the curve f(x) = x^2 at x = 2 is 4. Try the same process for yourself using a different value of x and see what you find.

We can follow this approach to find the derivative of any function, by replacing f(x) with the function we want to differentiate in the general formula given above. Let’s try using x^2 again, but without substituting any value in for f(x) this time:

\begin{aligned}
f(x) &= x^2 \\ \
f'(x) &= \lim_{h \to 0} \frac{(x+h)^2 - x^2}{h} \\
&= \lim_{h \to 0} \frac{(x^2 + hx + hx +h^2) - x^2}{h} \\
&= \lim_{h \to 0} \frac{2hx +h^2}{h} \\
&= \lim_{h \to 0} (2x +h) \\
&=2x +0 \\
&=2x \\

\end{aligned}

Cool, so the derivative of f(x) = x^2 is: f'(x) = 2x. Simple! 🙂 We can even plot this alongside our original function to see what it looks like (black line below):

• As the original function f(x) = x^2 approaches zero on the x-axis, the gradient (derivative function: green line) is negative and approaches 0. You can think of this as meaning that the slope of the curve is negative or going down.
• Then, as the function passes 0 and begins to increase, the gradient also becomes positive (because the curve is now going up).
• The derivative function f'(x) = 2x is linear, because the gradient of the original function is increasing at a constant rate as we move from left to right on the x-axis. In other words, the original function is getting steeper and steeper.

Now I have some good news for you: this is the complicated way of doing differentiation. If you found things difficult to understand so far, you can breathe a sigh of relief knowing that it all gets easier from here. I explained it this way so you could understand what is actually happening “under the hood” when we differentiate, but there are some shortcut rules you can learn to avoid using limits all the time.

## Shortcut rules for differentiation:

Here are eight great shortcut rules you can use to differentiate most functions, you should really memorise these because they’ll save you so much time in future. For each of the examples below I recommend you plot the original function and its derivative using this handy tool. It will help you understand what the derivative of each function looks like.

## 1. Constant rule

This one is easy. When differentiating, any constants in the function simply become 0, so:

\begin{aligned}
&f(x) = c \\

&f'(x) = 0 \\
\end{aligned}

(Be careful, this doesn’t mean coefficients become 0, but we’ll get to that in a second).

## 2. Power rule

This differentiation rule is applicable if the input variable is raised to some power/exponent. For example: f(x) = x^3. For this rule we move the exponent n down to be a multiple of the input variable, then we replace the exponent with n-1. The rule looks like this:

\begin{aligned}
f(x) &= x^n \\
f'(x) &= nx^{n-1}\\
\end{aligned}

Here’s an example:

\begin{aligned}
&f(x) = x^3 \\
&f'(x) = 3x^{3-1} = 3x^2
\end{aligned}

When the exponent is a fraction (remember: fractional exponents are another way of writing roots), some more algebraic manipulation might be required to simplify the resulting derivative:

\begin{aligned}
&f(x) = x^{1/3} = \sqrt[3]{x} \\
&f'(x) = \frac{1}{3}x^{\frac{1}{3} -1 }= \frac{1}{3}x^{-\frac{2}{3}} = \frac{1}{3x^{\frac{2}{3}}}
\end{aligned}

Also be careful if the input variable is not raised to any power (it is understood that it is raised to the power of 1): f(x) = x. In these cases the rule still applies, but the outcome is simple: replace the whole variable with 1. Here’s why (remember: anything raised to the power of 0 is one):

\begin{aligned}
&f(x) = x = x^1 \\
&f'(x) = 1x^{1-1} = 1x^0 = 1\times1 = 1
\end{aligned}

The power rule is probably the most common rule you will need to use when differentiating, so learn it well! 🧠

## 3. Constant multiple rule

Here’s another easy one once you know the power rule: if the variable is raised to a power but also has a coefficient, then we multiply the coefficient by that power and continue with the power rule. Like this:

\begin{aligned}
&f(x) = 2x^3 \\
&f'(x) = 3\times2x^{3-1} = 6x^2
\end{aligned}

This one gets a little bit more complicated when fractions are involved, but there’s no need to worry. Just take a deep breath, apply the rules you already know and a bit of algebra, work through it step by step, and you’ll be fine:

\begin{align}
f(x) & = \frac{2}{5}x^{1/3} \\
\space f'(x) &= \frac{2}{5}\times \frac{1}{3}x^{\frac{1}{3} -1 }= \frac{2\times\frac{1}{3}x^{-\frac{2}{3}}}{5} \\
&= \frac{2}{5}\times\frac{1}{3}\times \frac{1}{x^{\frac{2}{3}}} \\
&= \frac{2 \times1 \times 1}{5 \times 3 \times x^{\frac{2}{3}}} \\
&= \frac{2}{15x^{\frac{2}{3}}}
\end{align}

In step (2) we applied the power rule. In step (3) we realised that anything to a negative power can be expressed as a fraction instead (and we switch the sign of the exponent too), e.g.: x^{-2} = \frac{1}{x^2}. In step (3) I also expressed each fractional term separately so you can see how it will all fit together more easily. After that, steps (4) and (5) should look pretty self explanatory to you 🙂

## 4. Sum rule

You know I said those other rules were easy? Well this one is super easy. For any function containing a sum of different terms involving the input variable, just treat all of them separately! So if you’re differentiating a function like this: f(x) = x^3 + x^2 + x + 7 you can use the rules we just learned for each term individually like this:

\begin{aligned}
f(x) & =  x^3 + x^2 + x + 7 \\
f'(x) & = 3x^{3-1} + 2x^{2-1} + 1x^{1-1} \\
& = 3x^{2} + 2x + 1 \\
\end{aligned}

## 5. Difference rule

This is exactly the same as the sum rule, but in this case instead of addition there is subtraction in the function. Just do exactly the same thing you did with the sum rule and treat each term separately 🙂

## 6. Product rule

Use this rule if you’re differentiating a function that contains the product of two functions with the input variable in them, like this example: f(x) = x^3 \times 2x^4. Here’s a neat way of remembering what to do in these situations (remember: the prime symbol ‘ indicates the derivative):

\begin{aligned}
y & = this \times that \\
y' & = this' \times that+ this \times that' \\
\end{aligned}

Let’s make that more concrete with an example:

\begin{aligned}
f(x) &= x^3 \times 2x^4 \\
f'(x) &= (3x^2 \times2x^4) + (x^3 \times 8x^3)
\end{aligned}

## 7. Quotient rule

Use this rule when the function contains a quotient (fraction) with a function of the input variable in the numerator and the denominator, something like this: f(x) = \frac{x^4}{3x^5}. For this rule we apply the following:

\begin{aligned}
y &= \frac{top}{bottom} \\
y' &= \frac{(top' \times bottom) - (top \times bottom') }{bottom^2} \\
\end{aligned}

Make sure you pay close attention to the prime ‘ symbols above, the subtraction, and the extra power of 2 in the denominator. Let’s try an example to see how this works:

\begin{aligned}
f(x) &= \frac{x^4}{3x^5} \\
f'(x) &= \frac{(4x^3 \times 3x^5) - (x^4 \times 15x^4)}{(3x^5)^2} \\
&= \frac{(4x^3 \times 3x^5) - (x^4 \times 15x^4)}{9x^{10}}
\end{aligned}

The trick with this rule is making sure you remember the correct order in the numerator and the extra power of 2 in the denominator.

## 8. Chain rule

This is probably the hardest rule for differentiating functions. You use it when there is a function inside another function. The name given to functions-within-a-function is a composite function. Here’s an example of what I mean: f(x) = \sqrt{x^4}. This is because the square root is a function, and the power counts as a function too. Here’s a handy rule for differentiating using the chain rule:

\begin{aligned}
y &= outer(inner) \\
y' &= outer(inner)' \times inner' \\
\end{aligned}

It’s really important to remember that when you differentiate to get outer(inner)' that you don’t differentiate the part inside the function. Let me show you what I mean:

\begin{aligned}
f(x) &= \sqrt{x^4} = (x^4)^{\frac{1}{2}}\\
f'(x) &= \frac{1}{2}(x^4)^{-\frac{1}{2}} \times 4x^3 \\
&=\frac{1}{2} \times \frac{1}{(x^4)^{\frac{1}{2}}} \times 4x^3 \\
&=\frac{1}{2} \times \frac{1}{x^2} \times 4x^3 \\
&=\frac{4x^3}{2x^2} \\
&= 2x
\end{aligned}


Notice in the first step how I differentiated \sqrt{something}. It doesn’t matter what this something is when differentiating the outer function. We don’t differentiate that inner something at the same time we differentiate the outer part. We differentiate it separately and then multiply the two together. I did also simplify the inner part after applying the chain rule, just so the final result was a bit nicer to read.

That’s really all there is to it. Depending on the situation you might have to do some different algebraic wizardry to simplify the result after using the chain rule, but just remembering the first part y' = outer(inner)' \times inner' is enough, and then you can rely on your algebra skills to carry you home 🙂

The chain rule is especially cool because deep neural networks use it extensively in the backpropagation algorithm – this lets the neural net figure out which connections it needs to adjust in order to do better at some prediction task. This works because deep neural networks are really just big composite functions with many layers of functions-within-a-function. Like an onion of functions 🧅. I’ll cover this in more detail at a later date.

Anyway, if all else fails when you’re differentiating, and you can’t seem to differentiate some function and you don’t understand why, you can use this simple online tool to help you. If you do use that tool, I really recommend you study the result and figure out where you were making a mistake before. That’s it, now you know most of differentiation.

I wanted to cover the topic of limits in calculus. Calculus is often called the mathematics of change, and handles topics like rates of change. These are super important topics with tons of applications, especially within machine learning, deep learning and AI in general. I’ll build on this topic of limits in the following posts on derivatives and integrals; topics that will be fundamental when I cover Deep Learning from first principles at some later date. Let’s get stuck right in 🙂

## Limits – take it to infinity!

First some background. Limits are really a central idea in calculus. In plain English it goes something like this: think of a number (let’s say the number 4), now realise there there are infinitely many decimal numbers either side of this number (e.g. 3.9999999999 and 4.0000000000001). If we keep making these decimals get longer and longer those numbers both edge closer and closer to the whole number (4) without ever quite reaching it. If we could keep going infinitely we’d eventually reach 4 (kinda), right? We would say that 4 is the limit. Okay, it’s not quite like that, but at least that’s the basic idea.

3.999999999 >>> 4 <<< 4.00000001

So why is this useful? Well it means that in any situation where some rate of a function is changing, we can figure out what the exact rate is at a given moment. Okay, that sounds confusing so let’s make it more concrete using an example: let’s say we find a dark, dried up well and we want to know how deep it is. We can’t see the bottom so we decide to drop a stone in and count how long it takes before we hear the stone bounce. We have the formula for how far the stone has fallen after a given number of seconds t:

d(t) = 0.5 \times 9.81 \times t^2

Where d(t) is the distance in metres the ball has fallen after t seconds. After we drop the stone, we hear it hit the bottom after 3.5 seconds. So the stone fell about 60 metres! We can calculate the average speed of the stone by dividing the distance by the time (where y is the distance and t is the time at two observations):

\begin{aligned}
&average \space speed = \frac{distance}{time}= \frac{y_2 - y_1}{t_2 - t_1}
\end{aligned}

Which after plugging in the numbers we discover the stone fell at about 17.14m/s! But this is an average. If we drew a graph for our function d(t)=0.5 \times 0.981 \times t^2 , this approach would be equivalent to drawing a straight line from 0 to 3.5 on the graph and calculating the good old “rise over run” to get the slope. In this instance the slope is our speed. See for yourself below:

The green secant line (intersects the curve in two places) connects the points (0,0) and (3.5, 60). The slope of the green line is the average speed during this time: 60.8/3.5 = 17.14! It’s good to keep this visualisation in mind for when we cover derivatives – we are drawing a straight line and using it to approximate the actual observed curved line. We will do almost exactly the same thing with derivatives later on!

We can use this approach to find the average speed at any given time range, for example how fast did the stone fall between 1 second and 3.5 seconds? We know that after 3.5 seconds the stone had fallen 60m, and using our formula for distance (or the graph above) we can see that the stone fell 4.9m after 1 second. So between 1 second and 3.5 seconds the stone fell 60 - 4.9 = 55.1 metres, over the course of 2.5 seconds (3.5 – 1):

Average \space speed = \frac{60 - 4.9}{3.5 - 1} = \frac{55.1}{2.5} = 22.4

## Finding the ‘instantaneous’ speed:

Cool, so the average speed between 1 second and 3.5 seconds was 22.4 m/s! But can we apply the approach to find the exact speed at 1 second after the stone was dropped? Let’s try it:

Average \space speed = \frac{4.9 - 4.9}{1 - 1} = \frac{0}{0} = ?

Okay so \frac{0}{0} is undefined. Does this mean the speed at 1 second was nothing… ? You must be thinking that cannot be true, and you are right. This is where limits come in, let’s express this whole thing as a limit, and substitute our original distance equation in to the equation for speed:

Instantaneous \space speed =\lim_{t\to1} \frac{(0.5 \times 9.81 \times t^2) - (0.5 \times 9.81 \times 1^2)}{t - 1}

This formula is exactly the same as the one above for the average speed, where we take the difference between two distances at two different times to figure out how fast the stone was moving. We also substituted in some given time t, and also the distance at 1 second (because we want to find the speed of the stone at exactly 1 second, remember?).

You read the \lim_{t\to1} part like this: as t approaches 1. What does this mean? Think back to the start of this post where we pointed out that there are infinitely many numbers on either side of a given number, that’s what this \lim_{t\to1} is saying: that if t takes increasingly tiny steps closer and closer to 1 in the following expression, then what output value do we start to get closer to? You can imagine this visually using the same straight line on the graph of a curve we showed above, with one difference: imagine you are moving the two ends of the straight line (where it intersects the curve) closer and closer to 1. What does the slope of that straight line get closer and closer to as we move the ends closer and closer to 1? If we get the ends of this line as close as possible to 1, and zoom in really close, the curve will also look like a straight line, and the straight line we drew will be tangent to the curve. So we can use the slope of the straight line we drew to approximate the slope (a.k.a gradient) of the curve at this point! 🙂

We have a little more work to do before we can figure out what that speed value is at exactly 1 second though. We already tried substituting 1 for t in the equation Average \space speed = \frac{4.9 - 4.9}{1 - 1} = \frac{0}{0} = ? and it was undefined. So what do we do? The good news is we can do some good old algebra to make this work:

\begin{aligned}
Instantaneous \space speed &=\lim_{t\to1} \frac{(0.5 \times 9.81 \times t^2) - (0.5 \times 9.81 \times 1^2)}{t - 1}\\
&= \lim_{t\to1} \frac{(4.905 \times t^2) - (4.905 \times 1^2)}{t - 1}\\
&= \lim_{t\to1} \frac{4.905(t^2 -1^2)}{t - 1}\\
&= \lim_{t\to1} \frac{4.905(t-1)(t+1)}{t - 1}\\
&= \lim_{t\to1}{4.905(t+1)}\\
\end{aligned}

After a bit of simplification and factoring, now we have a form where we can plug in our value (1) and see what the ‘instantaneous’ speed of the stone is at exactly 1 second after we dropped it into the well:

= 4.905(1+1) = 4.905 \times 2 = 9.81

So all we need to do is simplify our function and that will make it possible for us to substitute the limit value into the function to find out what the output is. This works because our algebraic manipulation will plug the gap in the function where it is undefined.

So we figured it out, at one second after we dropped the stone, the speed of the falling stone is 9.81 metres per second!

## Summing it all up:

Let’s take a moment to digest everything we just did. We figured out how deep our wishing well is. We realised that speed is a rate \frac{distance}{time}, and that we can use this fraction to figure out the average speed over any given distance and time. But we can’t use it to figure out the exact speed in a single moment. We realised that we can use limits to figure out how fast the stone is travelling at any moment, by gradually making t closer and closer to the time we wanted the speed for (1 second). We used some algebra to rearrange our formula for speed so that we could plug in the exact value (1 second) and find the exact speed of the stone one second after dropping it into the well!
We realised that we can think of this whole process as drawing a secant line across our function curve, and slowly moving that secant line closer and closer to our point of interest until it becomes a tangent line. The slope of this tangent line will be the same as the slope of the curve at that exact point.

## Finding Limits – step by step process:

All of the above was just for context, this part is really all you need to find limits out in the wild:

1. We tried substituting our value of the limit (1) into the function, and we got \frac{0}{0} which is undefined.
2. We simplified and factored our function.
3. We plugged the value of our limit (1) back into the formula and got the speed of our stone at the exact point we were interested in!

We can follow this general approach to find pretty much any limit. You should really memorise this process. If a function contains a fraction with square roots the approach will be slightly different, we just need to rationalise the function by multiplying the numerator and denominator by the conjugate of the expression containing the square root (a conjugate is just the same expression with + and – switched) like this:

\begin{aligned}
&\lim_{x\to4}{\frac{\sqrt{x} -2}{x-4}}\\
&=\lim_{x\to4}{\frac{\sqrt{x} -2}{x-4} \times \frac{\sqrt{x}+2}{\sqrt{x}+2}}\\
&=\lim_{x\to4}{\frac{(\sqrt{x})^2 -2^2}{(x-4)(\sqrt{x} +2)}}\\
&=\lim_{x\to4}{\frac{x - 4}{(x-4)(\sqrt{x} +2)}}\\
&=\lim_{x\to4}{\frac{1}{\sqrt{x} +2}}\\
\end{aligned}

And now the substitution for x = 4 will work: \frac{1}{\sqrt{4}+2} = \frac{1}{4} 🙂

In some other cases you might need to do some other algebraic simplification/manipulation. You can try adding/subtracting/multiplying/dividing fractions, cancelling, etc. In the end, just try to simplify the function and then substitution should work. If all else fails, use this online tool to see a breakdown of how to simplify and find limits!

Scroll to Top