C.1 Chain Rule - Developing a Conceptual Understanding

Calculus students often hear that "the Chain Rule" is scary or difficult. Let's break it down, and learn how to find the most complicated derivatives. On this screen we'll develop a conceptual understanding of the Chain Rule, and then on the next screen we'll present the rule and work some initial practice problems.

You might have noticed that, so far in this "Calculating Derivatives" chapter, we have only found the derivative of simple functions like $𝑥 2,$ $s i n 𝑥,$ $𝑒 𝑥,$ and the product and quotient of such functions. We have not yet found the derivative of a composite (or compound) function like $s i n (2 𝑥),$ or $𝑒 𝑥 2 .$ The reason is that we need a new tool to be able to do so! That tool is called "the Chain Rule," which we will spend this entire Section exploring because it's that important. We'll of course also provide lots of practice so you'll confidently have its use comfortably in your toolkit.

Note: Some students initially find abstract discussion of the Chain Rule difficult to understand. If you're one of them, we encourage you to jump to the Check Questions at the bottom of next screen to see how easy the Chain Rule actually is to use in practice. You can then proceed to the "Basic Practice" screen to develop your problem-solving skills further. Once you see how to use the Chain Rule routinely, you may find the discussion of why it works the way it does easier to follow.

Why we need a new rule

To begin, let's quickly consider two examples to illustrate why we need a new rule at all.

First, consider the function $𝑓 (𝑥) = (2 𝑥) 3 .$ Viewing this function as $𝑓 (𝑥) = 8 𝑥 3,$ we know from the Power Rule that its derivative is $𝑓' (𝑥) = 24 𝑥 2 .$

Naively, looking at $𝑓 (𝑥) = (2 𝑥) 3$ to find the derivative you might simply bring the 3 down in front of the parentheses and change the power to 2: $𝑓' (𝑥) = [(2 𝑥) 3]' ? ⏞ = 3 (2 𝑥) 2 [D o e s t h i s n a i v e a p p r o a c h w o r k ? ?] ? ⏞ = 3 (4 𝑥 2) ? ⏞ = 12 𝑥 2 \neq 24 𝑥 2 ✗ [N o! [(2 𝑥) 3]' \neq 3 (2 𝑥) 2]$ Ack: our naive approach gives a result that is off by a factor of 2 from the correct answer.

As a second example, consider the function $𝑝 (𝑥) = (3 𝑥 - 1) 2 .$ If we view this function as $𝑝 (𝑥) = 9 𝑥 2 - 6 𝑥 + 1,$ then we know immediately from the Power Rule that its derivative is $𝑝' (𝑥) = 18 𝑥 - 6 .$

Again thinking naively, you might simply bring the power of 2 down in front of the parentheses: $𝑝' (𝑥) = [(3 𝑥 - 1) 2]' ? ⏞ = 2 (3 𝑥 - 1) [D o e s t h i s n a i v e a p p r o a c h w o r k ? ?] ? ⏞ = 6 𝑥 - 2 \neq 18 𝑥 - 6 ✗ [N o! 6 𝑥 - 2 \neq 18 𝑥 - 6]$ Again the naive approach doesn't work, and this time we're off by a factor of 3. (Hmmm.)

As you'll see below, the Chain Rule resolves this discrepancy, and will let us — easily, with practice! — find the derivatives of functions that are quite complicated.

In order to understand the Chain Rule, we first need to make sure we're clear about compound functions.

Compound (Composite) Functions Review

Recall that a compound function, also known as a composite function, is a function comprised of one or more functions inside it.

For instance, $(𝑥 2 + 1) 7$ is comprised of the inner function $𝑥 2 + 1$ inside the outer function $(\dots) 7 .$

As another example, $𝑒 s i n 𝑥$ is comprised of the inner function $s i n 𝑥$ inside the outer function $𝑒 \dots .$

As yet another example, $l n (𝑡 3 - 2 𝑡 2 + 5)$ is comprised of the inner function $𝑡 3 - 2 𝑡 2 + 5$ inside the outer function $l n (\dots) .$

$Tip icon$

"How can I tell what the inner and outer functions are?"

Here's a foolproof method to determine the inner and outer functions: Imagine calculating the numerical output of the function for a particular input value of x and identify the steps you would take, because you'll always automatically start with the inner function and work your way out to the outer function.

For example, imagine computing $(𝑥 2 + 1) 7$ for $𝑥 = 3 .$ Without thinking about it, you would first calculate $𝑥 2 + 1$ (which equals $32 + 1 = 10$ ), so that's the inner function, guaranteed. Then you would next calculate $107,$ and so $(\dots) 7$ is the outer function.

This imaginary computational process works every time to identify correctly what the inner and outer functions are.

Example 1: Identify Inner and Outer Functions of Compound Functions

Each function below can be thought of as a composition of functions, $𝑓 (𝑔 (𝑥)),$ where $𝑔 (𝑥)$ is the input, or "inside" of $𝑓 (𝑥)$ , the "outside" function. In each case, identify an inside and an outside function that, when composed, are equivalent to the given function.

Note: Often there is more than one way to define the inside and outside functions, and even to determine how many "layers deep" the functions go. Our solution below may not be the only correct possibility.

$𝑝 (𝑥) = (3 𝑥 - 1) 2 .$
$𝑠 (𝑥) = 1 1 + 𝑒 - 𝑥 .$ (We'll view this as being comprised of three functions.)
$𝑓 (𝑡) = 𝐴 c o s (𝑏 𝑡) .$

Solution.

In the tables below we present three different ways of describing each function's decomposition: verbal description, "box notation," and more common "function notation" using x, u, t and such.

(a) Given $𝑝 (𝑥) = (3 𝑥 - 1) 2,$ if you were to compute $𝑝 (4)$ you would first calculate $(3 • 4 - 1) = 11,$ so $3 𝑥 - 1$ is the inner function. You would then square that value of 11, and so $(◻) 2$ is the outer function.

	Inside	Outside
description	multiply the input by 3, and subtract 1	square the input
boxes	$3 ◻ - 1$	$◻ 2$
function notation	$𝑔 (𝑥) = 3 𝑥 - 1$	$𝑓 (𝑢) = 𝑢 2$

So, composing our outside and inside functions, we get

𝑓 (𝑔 (𝑥)) = (3 𝑥 - 1) 2

. Another way to say it is that we take all of the "stuff" from our inside function, and put it into the input box of the outside function.

$(𝑓 \circ 𝑔) (𝑥)$ also means
"f of g of x, $𝑓 (𝑔 (𝑥))$ "

Alternate notation: You may see the compound (or composite) function $𝑓 (𝑔 (𝑥))$ written instead as $(𝑓 \circ 𝑔) (𝑥) .$ These mean exactly the same thing, and both are said as "f of g of x." Most students prefer the former notation, as do we, and so we'll almost always use it as we did above. However, if you encounter $(𝑓 \circ 𝑔) (𝑥),$ please know it's simply a different way of writing $𝑓 (𝑔 (𝑥)) .$

Now please set aside this quick review of compound functions. It'll be imporant again in a bit, but first we're going to develop an intuitive understanding of the Chain Rule before we present it formally.

Developing a Conceptual Understanding of the Chain Rule: A balloon ascends and cools

Before we present the official Chain Rule, let's consider an example situation to illustrate the basic idea.

Imagine a balloon that travels straight upward at a rate given in m/s.

As you may know, as you move upward away from the Earth's surface, the temperature of the air decreases. Specifically, the air around it gets cooler and cooler at a rate given in $\circ F / m .$ For simplicity, let's imagine the balloon's temperature always matches that of the surrounding air.

Picture of a hot air balloon ascending. Text on the left next to an upward-pointing arrow reads balloon ascends at the time-rate d( elevation) / d( time) = dE/dt = 0.004 m/s. Text on the right reads air temperature changes at the elevation-rate d( temperature )/d (elevation) = dT/dE = -0.01 deg F/m.

To keep our focus on the key point here, we're going to pretend that the two rates are constant. Specifically, let's imagine that the balloon ascends such that its elevation, E, changes at the constant rate with respect to time: $𝑑 (e l e v a t i o n) 𝑑 (t i m e) = 𝑑 𝐸 𝑑 𝑡 = 0.004 m s$ Hence, for instance in 1 second the balloon ascends 0.004 meters.

Let's also imagine that as the balloon travels upward from the Earth's surface, its temperature, T, changes at the constant rate with respect to elevation: $𝑑 (t e m p e r a t u r e) 𝑑 (e l e v a t i o n) = 𝑑 𝑇 𝑑 𝐸 = - 0.01 \circ F m$ For instance, when the balloon gains 1 meter in elevation, its temperature changes by $- 0.01 \circ F .$

Here's the question that gets to the core of the Chain Rule:

What is $𝑑 (t e m p e r a t u r e) 𝑑 (t i m e) ?$
For instance, in 1 second, how much does the balloon's temperature change?

[Do you have an answer in mind? If not, please stop and develop one for yourself. In particular, imagine what happens over 1 second: the balloon travels upward ___ m, which means its temperature changes by . . . . ]

If your instinct was simply to multiply the two rates, then great! Hold onto that intuition, because it is perfectly correct and is at the core of the Chain Rule.

If not, then think about what happens over the course of a 1-second time change. Given $𝑑 𝐸 / 𝑑 𝑡 = 0.004 m s,$ over 1 second the balloon's elevation increases by 0.004 meters.

Now shift focus. When the balloon's elevation increases by 0.004 meters, what temperature change does it undergo? Since the temperature rate of change is $𝑑 𝑇 / 𝑑 𝐸 = - 0.01 \circ F / m,$ then over the 1-second elevation increase of 0.004 meters the balloon's temperature changes by $Δ 𝑇 = (0.004 m) (- 0.01 \circ F m) = - 0.00004 \circ F$ So for a time change of 1 second, our temperature changes by $- . 00004 \circ F .$

Having focused on what happens in 1 second, let's return to the time-rate at which the balloon's temperature T changes, $𝑑 𝑇 𝑑 𝑡 .$ To find the small change in temperature, dT, relative to a small change in time, dt, we simply multiply the rate at which elevation changes with time (dE/dt) by the rate at which the temperature changes with elevation (dT/dE): $𝑑 𝑇 𝑑 𝑡 = 𝑑 𝑇 𝑑 𝐸 \cdot 𝑑 𝐸 𝑑 𝑡 = (0.004 m s) (- 0.01 \circ F m) = - 0.00004 \circ F s$ Notice that the units cancel in this calculation, as we would expect.

If that all makes sense, you have the fundamental idea behind the Chain Rule.

Recasting the balloon scenario in terms of functions

To see how this scenario relates to compound functions, let's recast the balloon's temperature change in function notation.

We know that variations in time result in corresponding variations in elevation, so elevation is a function of time. We denote this functional dependence by writing $𝐸 = 𝐸 (𝑡) .$ (That equation is read as "elevation E is a function of time t," and that's all it means.)

Similarly, variations in elevation result in corresponding variations in temperature, so temperature is a function of elevation. We denote this by writing $𝑇 = 𝑇 (𝐸) .$

Putting the pieces together, we can write temperature T as a function of time t as the composition of the functions T and $𝐸,$ written as $𝑇 (𝑡) = 𝑇 (𝐸 (𝑡))$ So really, from the beginning we could have said this scenario considers a composition of the functions T and $𝐸 .$ $𝐸 (𝑡)$ is just a pesky intermediate function that we had to go through to see what the relationship between the balloon's temperature and time.

Returning to the question of $𝑓 (𝑔 (𝑥))$ , which for the moment we'll write as $𝑦 (𝑢 (𝑥)) .$ Your quick calculation above shows that if you are interested in the derivative of y with respect to x, but there's a pesky intermediate function u between y and x, you can still find the derivative — easily! — by taking the derivative of the outside function with respect to the inside function, $𝑑 𝑦 𝑑 𝑢,$ and then multiplying by the derivative of the inside function with respect to the input variable x, $𝑑 𝑢 𝑑 𝑥 .$ That is, $𝑑 𝑦 𝑑 𝑥 = 𝑑 𝑦 𝑑 𝑢 \cdot 𝑑 𝑢 𝑑 𝑥$ This is exactly the process you probably landed on intuitively when you multiplied the two rates and computed $𝑑 𝑇 𝑑 𝑡 = 𝑑 𝑇 𝑑 𝐸 \cdot 𝑑 𝐸 𝑑 𝑡$ above.

The key thing to notice: to find the rate-of-change of the overall function $𝑇 (𝑡) = 𝑇 (𝐸 (𝑡))$ with respect to the inner variable t, you automatically multiplied the rate-of-change of the outer function by the rate-of-change of the inner function. That is the Chain Rule, which we will formalize on the next screen. We'll also work some initial practice problems, and easily resolve the discrepancy we saw at the beginning of this screen when finding the derivative of $𝑓 (𝑥) = (2 𝑥) 3$ and $𝑝 (𝑥) = (3 𝑥 - 1) 2 .$

The Upshot

A compound (or composite) function is comprised of an outer function and an inner function.
Typically, without much thought, to find the rate-of-change of a compound function in an everyday scenario you would automatically multiply the rate-of-change of the outer function with the rate-of-change of the inner function. That is the Chain Rule.