Differentiation

Contents · Rules of Differentiation

Differentiability

Motivation: The basic observation that we will use to motivate the multivariable definition of differentiability is the fact that there are many surfaces that look like a plane if you zoom in a lot. We call the plane that a surface looks like when zoom in a lot a tangent plane. The problem we will task ourselves with is how exactly we should go about computing tangent planes.

It always helps to start with concrete examples. Let's consider the function $f: \mathbf{R}^2 \rightarrow \mathbf{R}$ such that $f(p) = x(p)^2 + y(p)^2$ for all $p \in \mathbf{R}^2$. Suppose we want to compute the plane tangent to the graph of this surface at the point $(1, 1, 2)$ (pictured). How should we go about it?

Well, we can start by shifting the graph of the function around so that the point on the surface at (1, 1, 2) relocates to the origin. This is useful it because the tangent plane to the shifted surface at the origin will be (1) identical to the tangent plane of interest, except for its position, and (2) the graph of some linear form, which is good because we know what those look like.


Example: Let $a = (1, 1)$ and let $h \in \mathbf{R}^2$. Then the surface given by graphing $f(a + h) - f(a)$, where $h$ is the input, is the shifted-over surface we want. Let's expand this expression to see if we can find a linear form.

$$f(a + h) - f(a) = \langle 2 + x(h), 2 + y(h) \rangle \cdot h$$
Show Calculation $$\begin{align*} f(a + h) - f(a) &= x(a + h)^2 + y(a + h)^2 - x(a)^2 - y(a)^2\\ &= (x(a) + x(h))^2 + (y(a)+y(h))^2 - x(a)^2 - y(a)^2\\ &= (1 + x(h))^2 + (1 + y(h))^2 - 2\\ &= 1 + 2x(h) + x(h)^2 + 1 + 2y(h) + y(h)^2 - 2\\ &= 2x(h) + x(h)^2 + 2y(h) + y(h)^2\\ &= (2 + x(h))x(h) + (2 + y(h))y(h)\\ &= \langle 2 + x(h), 2 + y(h) \rangle \cdot h \end{align*}$$

Well, this looks almost like a linear form. We have a vector dotted with the input after all. The problem is those pesky $h_i$'s. If it weren't for them, the vector on the left would be constant, and we'd be done. But we can't just set them to 0, because then $h$ would be 0 so the whole thing would be 0.

But we have a trick up our sleeve. What if we define a separate function $\phi: \mathbf{R}^2 \rightarrow \mathbf{R}^2$ such that $\phi(h) = \langle 2 + x(h), 2 + y(h) \rangle$? Then we could do $\phi(0) = \langle 2, 2\rangle$ and get a linear form $\lambda(h) = \phi(0)\cdot h$. As it turns out, the graph of this linear form is the tangent plane we wanted! However, now we have another question to answer. Does this trick "work" even in situations where there is no tangent plane? And if not, what changes?


Example: Consider the function $f$ defined by $f(p) = \sqrt{x(p)^2 + y(p)^2}$. The graph of $f$ is an upside down cone with its tip exactly at the origin. No matter how close you get to the tip, it will never look like a plane, instead keeping its sharpness. What happens if we try to do what we did last time?

Well, setting $a = (0,0)$ and computing $f(a + h) - f(a)$ gives us the following candidate for $\phi$. Hideous though it may be, this function has the property that $f(a + h) - f(a)$ is equal to $\phi(h)\cdot h$ for all $h$, just like in the last example.

$$ \phi(h) = \begin{cases} \psi(h) & x(h) \neq 0, y(h) \neq 0\\ |x(h)|\hat \imath / x(h) & x(h) \neq 0, y(h) = 0\\ |y(h)|\hat \jmath / y(h) & x(h) = 0, y(h) \neq 0\\ 0 & x(h) = 0, y(h) = 0\\ \end{cases} $$ $$\psi(h) = \left\langle \frac{1}{2}\frac{|x(h)|}{x(h)}\sqrt{1 + \frac{y(h)^2}{x(h)^2}}, \frac{1}{2}\frac{|y(h)|}{y(h)}\sqrt{\frac{x(h)^2}{y(h)^2} + 1}\right\rangle$$
Show Calculation

The last three cases are relatively straightforward, so you should verify them yourself. We show the first case as it is the trickiest.

$$\begin{align*} f(a + h) - f(a) &= \sqrt{x(a + h)^2 + y(a + h)^2} - \sqrt{x(a)^2 - y(a)^2}\\ &= \sqrt{x(h)^2 + y(h)^2}\\ &= \frac{1}{2}(\sqrt{x(h)^2 + y(h)^2} + \sqrt{x(h)^2 + y(h)^2})\\ &= \frac{1}{2}\left(\sqrt{x(h)^2\left(1 + \frac{y(h)^2}{x(h)^2}\right)} + \sqrt{y(h)^2\left(\frac{x(h)^2}{y(h)^2} + 1\right)}\right)\\ &= \frac{1}{2}\left(|x(h)|\sqrt{1 + \frac{y(h)^2}{x(h)^2}} + |y(h)|\sqrt{\frac{x(h)^2}{y(h)^2} + 1}\right)\\ &= \frac{1}{2}\frac{|x(h)|}{x(h)}\sqrt{1 + \frac{y(h)^2}{x(h)^2}}\,x(h) + \frac{1}{2}\frac{|y(h)|}{y(h)}\sqrt{\frac{x(h)^2}{y(h)^2} + 1}\,y(h)\\ &= \left\langle \frac{1}{2}\frac{|x(h)|}{x(h)}\sqrt{1 + \frac{y(h)^2}{x(h)^2}}, \frac{1}{2}\frac{|y(h)|}{y(h)}\sqrt{\frac{x(h)^2}{y(h)^2} + 1}\right\rangle \cdot h \end{align*}$$

Of course, just because $\phi$ is ugly doesn't mean we should judge it for that. It's the personality that counts. Unfortunately, $\phi$'s personality is also not good enough. Consider the sequences of vectors $h_n = \langle 1/n, 1/n \rangle$ and $k_n = -h_k$. These sequences approach 0, and for all $n$, because of the absolute values, we have that $\phi(h_n) = \langle \sqrt 2, \sqrt 2\rangle/2$ and $\phi(k_n) = -\langle \sqrt 2, \sqrt 2\rangle/2$, which you should verify. Since these are constant sequences, these must be their limits. Since they're not equal to $\phi(0) = 0$, This means that $\phi$ is not continuous at 0.

Worse still, because the limits disagree, we cannot even change the value of $\phi(0)$ in order to make $\phi$ continuous at 0. We must therefore conclude that there is no continuous function $\phi$ such that $f(a + h) - f(a) = \phi(h)\cdot h$ in this particular case. Contrast this with the previous example, where the very first $\phi$ we considered was obviously continuous, as both components were polynomials. Having made this observation, we can now state a definition.


Definition: We say a function $f: \mathbf{R}^n \rightarrow \mathbf{R}$ is differentiable at $a \in \mathbf{R}^n$ if there exists a function $\phi$ continuous at $0$ such that $f(a+h) - f(a) = \phi(h) \cdot h$. We call $\phi(0)$ the gradient $\nabla f(a)$ and the linear map defined by the formula $\nabla f(a)\cdot h$ the differential $df(a)$.


Remark: If you're not familiar with multivariable differentiation, don't worry about this remark. If you are familiar with it, you might realize that this is not how it is usually defined. We will therefore need to show that our definition is equivalent to the usual one, as this is not totally trivial.

Equivalence of the Definitions

Lemma: A function $o$ is given by $o(h) = \eta(h)\|h\|$ for some evanescent scalar-valued $\eta$ if and only if it is given by $o(h) = \mu(h)\cdot h$ for some evanescent vector-valued $\mu$.

Proof of the Lemma

($\Rightarrow$) Let $\epsilon > 0$ be given. Since $\eta$ is evanescent, it follows that the function $\mu$, defined to be 0 at the input 0, and $\eta(h)h/\|h\|$ elsewhere, is continuous at 0, since we know that there exists $\delta > 0$ such that whenever $0 < \|h\| < \delta$:

$$\|\eta(h)h/\|h\|\| = |\eta(h)|\|h\|/\|h\| = |\eta(h)| < \epsilon$$

We now see that $o(h) = \mu(h) \cdot h$ for all $h$ in the domain of $o$, since when $h = 0$, both sides are equal to 0, and when $h \neq 0$: $$\begin{align*} o(h) &= \eta(h)\|h\|\\ &= \eta(h)\|h\|^2/\|h\|\\ &= \eta(h)(h \cdot h)/\|h\|\\ &= (\eta(h)h/\|h\|)\cdot h\\ &= \mu(h) \cdot h \end{align*}$$

($\Leftarrow$) If $o$ can be expressed by $o(h) = \mu(h) \cdot h$, then $o(h) = \|\mu(h)\|\|h\|\cos(\theta)$, where $\theta$ depends on $h$. The function $\eta$ defined by $\eta(h) = \|\mu(h)\|\cos(\theta)$ then gives us $o(h) = \eta(h)\|h\|$. We just need to show that $\eta$ is evanescent. This can be done by remembering that for all $\theta$, $-1 \le \cos(\theta) \le 1$, so multiplying both sides by $\|\mu(h)\|$ gives $-\|\mu(h)\| \le \eta(h) \le \|\mu(h)\|$. Since $\mu(h)$ is evanescent, we know $\eta(0) = 0$. Furthermore, by squeeze theorem, $\lim_{h \rightarrow 0} \eta(h) = 0$. $\square$

Theorem: The function $f: \mathbf{R}^n \rightarrow \mathbf{R}$ is differentiable at $a \in \mathbf{R}^n$ if and only if there exists a linear form $\lambda$ and an evanescent function $\eta$ such that $f(a+h) - f(a) - \lambda(h) = \eta(h)\|h\|$. Furthermore, $\lambda$ = $df(a)$.

Proof of Equivalence

($\Rightarrow$) From the hypothesis, we know there is some linear form $\lambda$, together with a vector which we will suggestively call $\phi(0)$, as well as some scalar-valued evanescent function $\eta$; and from the lemma, a vector-valued evanescent function $\mu$, such that: $$\begin{align*} f(a + h) - f(a) &= \lambda(h) + \eta(h)\|h\|\\ &= \phi(0)\cdot h + \mu(h)\cdot h\\ &= (\phi(0) + \mu(h))\cdot h \end{align*}$$

Since $\phi(0)$ is a constant, and the sum of a constant and a continuous function is continuous, it follows that $\phi$ defined by $\phi(h) = \phi(0) + \mu(h)$ is the desired continuous function satisfying $f(a+h) - f(a) = \phi(h)\cdot h$. Furthermore, since $\lambda(h) = \phi(0) \cdot h$, $\lambda = df(a)$.

($\Leftarrow$) Since $f$ is differentiable, we know there exists a vector-valued function $\phi$ such that $f(a+h) - f(a) = \phi(h) \cdot h$. Define $\mu$ such that $\mu(h) = \phi(h) - \phi(0)$. Then $\mu$ is evanescent, and by the lemma there must exist a scalar-valued function $\eta$ such that: $$\begin{align*} f(a + h) - f(a) &= \phi(h) \cdot h\\ &= (\phi(0) + \phi(h) - \phi(0)) \cdot h\\ &= (\phi(0) + \mu(h)) \cdot h\\ &= \phi(0)\cdot h + \mu(h) \cdot h\\ &= \phi(0)\cdot h + \eta(h)\|h\| \end{align*}$$

By definition $\phi(0)\cdot h = df(a)(h)$, as desired. $\square$


Exercises

Don't do these yet, remember these pages are still a work in progress and I still don't know if I will like these exercises.

  1. Show that the gradient of a constant function is always zero. Conclude that the same is true of the differential.
  2. Show that in $\mathbf{R}^n$, at any point $a$, the gradient $\nabla x_i$ of the coordinate function $x_i$ is $\hat e_i$ and that the differential $dx_i(a)$ is $x_i$
  3. Using the definition of the gradient, compute the following:
    1. $\nabla(x^2 + y^2)$
    2. $\nabla(2x^3 + xy^3)$
    3. $\nabla(xy^2 + 3z)$
  4. Show that the function $f$ defined by $f(p) = x(p)y(p)/(x(p)^2 + y(p)^2)$ and $f(0) = 0$ is not differentiable at $0$. Hint: you don't need to check $\phi$.
  5. Show that the function $f$ defined by $f(p) = x(p)^2y(p)/(x(p)^2 + y(p)^2)$ and $f(0) = 0$ is not differentiable at $0$. Hint: this is somewhat similar to the cone example but less annoying.