Infinitesimal Calculus

← Home

Work in progress. There may be errors or missing information.

Contents

Preface

Growth Rates

  1. Definition and Properties
  2. Growth Rates of Linear Maps

The Differential

  1. Defining the Differential
  2. Rules of Differentiation

0 - Preface

I am writing this for three reasons. The first is that there seems to be a conception that the infinitesimal calculus as devised by Leibniz and taught in the book Calculus Made Easy, and rigorous real analysis as formulated in the 19th century with things like the epsilon-delta definition of a limit, are two different things, with the former just being a bunch of sloppy heuristics that can perhaps be made rigorous by non-standard analysis. In reality, Leibnizian calculus follows from real analysis, and this is made obvious simply by slightly adjusting the notations and terminology. Consequently, there is no reason not to teach calculus using the simple Leibnizian methods.

The second is that the way multivariable calculus is typically taught gives the impression that it is somehow different from single variable calculus. This is out of a desire to avoid talking about linear maps. It is quite embarassing that as a result of this, differentiation takes a backseat in the typical Calculus III course in favor of mere partial differentiation. The total differential, if it is ever presented, is usually treated as an afterthought and defined in terms of partial derivatives with no explanation. In reality, there is no essential difference between single variable and multivariable calculus, and this is made obvious again simply by slightly adjusting the notations and terminology.

The third is because I wanted to provide a short and simple reference for the whole of multivariable calculus. I found out after I started writing this that Spivak already tried something similar with his Calculus On Manifolds. This text could be considered a further simplification of the subject matter in the same vein, except without the manifolds.

1 - Growth Rates

1.1 - Definition and Properties

Definition: We say a function $f: \mathbf{R}^n \rightarrow \mathbf{R}^m$ such that $f(0) = 0$ is evanescent if $\lim_{h \rightarrow 0} \|f(h)\| = 0$, and negligible if $\|f(h)\|/\|h\|$ is evanescent with $h$. Intuitively, a function is evanescent if it gets closer to 0 as the input gets closer to 0, and negligible if it shrinks much faster than its input as its input shrinks.

Remark: Whenever the function we are calling negligible or evanescent has multiple letters, we may specify "with $h$" or whatever other letter to clarify what is approaching 0.

Remark: Leibniz considered higher degrees of negligibility in his work. We can make this notion rigorous by defining $n^{th}$ degree negligibility using the limit $\lim_{h \rightarrow 0} \|f(h)\|/\|h\|^n = 0$. This notion is useful for dealing with higher order differentiation, and it would allow us to do things like dividing by $dx$, however we won't worry about this in this text.


Proposition: Linear combinations of negligible functions are negligible. The same is true for evanescent functions.

Proof: Let $\eta, \mu$ be negligible, and $a, b \in \mathbf{R}$. Then:

$$\begin{align*} 0 &\le \textstyle\lim_{h \rightarrow 0} \|a\,\eta(h) + b\,\mu(h)\|/\|h\|\\ &\le \textstyle\lim_{h \rightarrow 0} (a\,\|\eta(h)\|/\|h\| + b\,\|\mu(h)\|/\|h\|)\\ &= a\,\textstyle\lim_{h \rightarrow 0} (\|\eta(h)\|/\|h\|) + b\,\textstyle\lim_{h \rightarrow 0} (\|\mu(h)\|/\|h\|)\\ &= a\cdot 0 + b\cdot 0 = 0 \end{align*}$$

So the result follows from the squeeze theorem. The case of evanescent functions is similar. $\square$

Proposition: The composition of evanescent functions is evanescent.

Proof: Let $\epsilon > 0$ be given. If $f(h)$ is evanescent, then there exists $\delta_1 > 0$ such that when $\|h\| < \delta_1$, $\|f(h)\| < \epsilon$. Furthermore, if $g(h)$ is evanescent, then there exists $\delta_2 > 0$ such that whenever $\|h\| < \delta_2$, $\|g(h)\| < \delta_1$, which implies that $\|f(g(h))\| < \epsilon$. $\square$


Definition: We say two functions $f$ and $g$ are negligibly close if $f - g$ is negligible, and we denote it as $f \simeq g$. Trivially, whenever $f = g$, we also have $f \simeq g$, but the converse could be false.


Proposition: Negligible closeness is an equivalence relation.

Proof: For reflexivity, note that $f - f = 0$ which is obviously negligible. For symmetry, note that if $f - g$ is negligible then $g - f$ is too since a negligible function multiplied by a constant, in this case -1, is still negligible. For transitivity, let $f - g$ and $g - k$ be negligible. Since the sum of negligible functions is negligible, $f - g + g - k = f - k$ is negligible. $\square$


Exercises

Exercise 1.1.1: Prove that negligible and evanescent functions are closed under multiplication.

Exercise 1.1.2: Prove that the product of an evanescent function and a negligible function is negligible.

Exercise 1.1.3: Let $f, g, k$ be functions and $c$ be a constant. Prove that if $f \simeq g$, then $f + k \simeq g + k$ and $cf \simeq cg$.


1.2 - Growth Rates of Linear Maps

Proposition: Let $T: \mathbf{R}^n \rightarrow \mathbf{R}^m$ be a linear map. Then (i) there exists $C > 0$ such that for all $h$, $\|T(h)\| \le C\|h\|$, and (ii) $T$ is evanescent.

Proof: First, we will prove (i). Let $\mathbf{r}_i$ be a vector whose entries correspond to the $i$th row of $T$'s matrix representation. Let $s$ be the magnitude of the longest $\mathbf{r}_i$, and $m$ be the number of rows. Then:

$$\begin{align*} \|T(h)\|^2 &= \textstyle\sum_i (\mathbf{r}_i \cdot h)^2 \\ &\le \textstyle\sum_i \|\mathbf{r}_i\|^2 \|h\|^2 \\ &\le \textstyle\sum_i s^2 \|h\|^2 \\ &\le s^2\textstyle\sum_i \|h\|^2 \\ &= s^2m\|h\|^2 \end{align*}$$

Where the first line follows from the definition of matrix-vector multiplication and the second lind follows from Cauchy-Schwartz. Thus, $C = \sqrt{s^2m}$ works. Next, let $\epsilon > 0$. Choose $\delta = \epsilon/C$. It follows that whenever $\|h\| < \delta$, we get $\|T(h)\| \le C\|h\| < \epsilon$, which by definition is (ii). $\square$


Proposition (Chain Rule Lemma): If a function $\eta : \mathbf{R}^m \rightarrow \mathbf{R}^k$ is negligible, then the function $\eta \circ s$ is negligible whenever the difference between $s: \mathbf{R}^n \rightarrow \mathbf{R}^m$ and a linear map is negligible.

Proof: Let $\phi$ be the function from $\mathbf{R}^m$ to $\mathbf{R}$ such that $\phi(h) = \|\eta(h)\|/\|h\|$ when $h$ is not zero, and $\phi(0) = 0$. Since $\eta$ is negligible, $\eta(0) = 0$ and $\phi$ is evanescent. Furthermore, for all $h$, $\|\eta(h)\| = \phi(h)\|h\|$.

By our hypothesis, we know that $s$ is the sum of a linear map $T$ and some negligible function $\mu$. Since both are evanescent, $s$ is evanescent, so $\phi \circ s$ is also evanescent. Also observe that for some $C > 0$:

$$\begin{align*} 0 \le \|\eta(s(h))\|/\|h\| &= \phi(s(h))\|s(h)\|/\|h\|\\ &\le \phi(s(h))(\|T(h)\| + \|\mu(h)\|)/\|h\|\\ &\le \phi(s(h))(C\|h\| + \|\mu(h)\|)/\|h\|\\ &\le \phi(s(h))(C + \|\mu(h)\|/\|h\|)\\ \end{align*}$$

It is not hard to show that this last expression is evanescent with $h$ by the evanescence of $\phi \circ s$ and the negligibility of $\mu$ using the limit definition. We leave this for the reader. Thus, $\eta \circ s$ is negligible by the squeeze theorem. $\square$


Exercises

Exercise 1.2.1: Prove that for linear $T$ and negligible $\eta$, $T \circ \eta$ is negligible.

Exercise 1.2.2: Prove that linear maps are never negligible.


2 - Differentiation

2.1 - Defining the Differential

Definition: We say a function $\varphi: \mathbf{R}^n \rightarrow \mathbf{R}^m$ at $p \in \mathbf{R}^n$ is differentiable if there exists a linear map $d\varphi_p: \mathbf{R}^n \rightarrow \mathbf{R}^m$, called the differential, such that $\varphi(p+h) - \varphi(p) - d\varphi_p(h)$ is negligible to $h$, or equivalently:

$$\varphi(p+h) - \varphi(p) \simeq d\varphi_p(h)$$

Remark: The matrix representation of $d\varphi_p$, denoted $\varphi'(p)$, is called the derivative. In the single variable case we identify the derivative with its only entry. We may sometimes write $d\varphi$ without specifying the point in cases where it is clear from context.

Remark: We will suggestively denote the components of $h$ as $dx, dy,$ and so on. The reason is that it reminds us to do the chain rule in computations. This is compatible with viewing $dx, dy$, etc., as covectors of a tangent space. We will talk about this in detail later.


Theorem (Transcendental Law of Homogeneity): Let $\eta: \mathbf{R}^n \rightarrow \mathbf{R}$ be a homogeneous polynomial function of degree at least 2. Then $\eta$ is negligible.

Proof: Since homogeneous polynomial functions are linear combinations of monomial functions, and negligible functions are closed under linear combinations, it will suffice to show that a monomial of degree at least 2 is negligible. We will denote the input vector as $h$, and denote any two of its components (doesn't matter which ones) as $dx, dy$.

Let $\epsilon > 0$ be given. Since $dx$ and $dy$ are components of $h$, they form a right triangle with hypotenuse of length $\|h\|$. Since the hypotenuse of a triangle is the longest side, we know that $|dx|$ and $|dy|$ are both no greater than $\|h\|$, from which it follows that when $\|h\| < \epsilon$:

$$|dx\,dy| = |dx||dy| \le \|h\|^2 < \epsilon \|h\|$$

Which means that any monomial of degree 2 is negligible. If the degree is greater than two, then we have an evanescent function times a negligible function, which is negligible. $\square$

Remark: This is really only a special case of the full heuristic principle of the transcendental law of homogeneity used by Leibniz. A more general statement would be: Let $\eta: \mathbf{R}^n \rightarrow \mathbf{R}$ be a homogeneous polynomial function of degree at least $k$. Then $\eta$ is $k-1^{th}$ degree negligible.


Exercises


2.2 - Rules of Differentiation

Theorem (Linearity): Let $f, g: \mathbf{R}^n \rightarrow \mathbf{R}^m$ be differentiable, and let $a, b \in \mathbf{R}$ be constants. Then $d(af+bg) = a\,df + b\,dg$.

Proof: This one's pretty boring.

$$\begin{align*} (af+bg)(p + h) &= a\,f(p + h) + b\,g(p+h)\\ &\simeq a\,(f(p) + df) + b\,(g(p) + dg)\\ &= a\,f(p) + a\,df + b\,g(p) + b\,dg\\ &= (af+bg)(p) + a\,df + b\,dg \,\square \end{align*}$$

Theorem (Chain Rule): Let $F: \mathbf{R}^m \rightarrow \mathbf{R}^k$ be differentiable at $\varphi(p)$ and $\varphi: \mathbf{R}^n \rightarrow \mathbf{R}^m$ be differentiable at $p$. Then $d(F \circ \varphi) = dF_{\varphi(p)} \circ d\varphi$ at $p$.

Proof: We know $\eta(h) = F(\varphi(p) + h) - F(\varphi(p)) - F'(\varphi(p))h$ is negligible by the definition of differentiability of $F$. Let $s(h) = d\varphi(h) + \mu(h)$, where $\mu$ may be any negligible function. Then by the chain rule lemma, $\eta \circ s$ must also be negligible. Rearranging then gives:

$$F(\varphi(p) + d\varphi(h) + \mu(h)) - F(\varphi(p)) \simeq F'(\varphi(p))(d\varphi(h) + \mu(h))$$

We are now ready to compute $F(\varphi(p + h)) - F(\varphi(p))$. In the first step, we use the definition of differentiability on $\varphi$. The second step is justified by what we did above. The third and final step is justified by the linearity of $dF$ and the fact that a linear map composed with a negligible function is negligible:

$$\begin{align*} F(\varphi(p + h)) - F(\varphi(p)) &= F(\varphi(p) + d\varphi(h) + \mu(h)) - F(\varphi(p))\\ &\simeq F'(\varphi(p))(d\varphi(h) + \mu(h))\\ &\simeq F'(\varphi(p))\,d\varphi(h) \,\square \end{align*}$$

Theorem (Product Rule): Let $u$ and $v$ be differentiable functions from $\mathbf{R}^n$ to $\mathbf{R}$. Then $d(uv) = u\,dv + v\,du$.

Proof: Let $F: \mathbf{R}^2 \rightarrow \mathbf{R}$ be a function such that $F(x, y) = xy$, and let $\varphi: \mathbf{R}^n \rightarrow \mathbf{R}^2$ be a function such that $\varphi(a) = (u(a), v(a))$. We can then compute $dF_p$ for some $p \in \mathbf{R}^2$ by finding the difference:

$$\begin{align*} F(p + h) - F(p) &= (x+dx)(y+dy) - xy\\ &= xy + x\,dy + y\,dx + dx\,dy - xy\\ &= x\,dy + y\,dx + dx\,dy\\ &\simeq x\,dy + y\,dx \end{align*}$$

So $dF(h) = x\,dy + y\,dx$ at $p$. Since $d\varphi = (du, dv)$ at $a$, and $F \circ \varphi = uv$, the chain rule tells us that $d(uv) = d(F \circ \varphi) = dF_{\varphi(a)} \circ d\varphi = u\,dv + v\,du\,$. $\square$


Theorem (Quotient Rule): Let $u$ and $v$ be differentiable functions from $\mathbf{R}^n$ to $\mathbf{R}$. Then $d(u/v) = (v\,du - u\,dv)/v^2$.

Proof: This proof is based on how Euler derived the quotient rule. Recall the formula for computing a finite geometric sum:

$$1 + h + h^2 + \cdots + h^n = \frac{1 - h^n}{1 - h} = \frac{1}{1-h} - \frac{h^n}{1-h}$$

When $h = 0$, $1/(1-h) = 1$, so it is obviously not evanescent, let alone negligible. The powers of $h$, on the other hand, are, by the transcendental law of homogeneity. Since a non-negligible function multiplied by a negligible function is negligible, this simplifies to:

$$1 + h \simeq \frac{1}{1-h}$$

Let $y: \mathbf{R}\backslash\{0\} \rightarrow \mathbf{R}$ be given by $y(x) = 1/x$. Then we can compute $dy$:

$$\begin{align*} y(x + dx) - y(x) &= 1/(x + dx) - 1/x\\ &= -\frac{dx}{x^2}\frac{1}{1 + dx/x}\\ &\simeq -\frac{dx}{x^2}\left(1 - \frac{dx}{x}\right)\\ &= -\frac{dx}{x^2} + \frac{dx^2}{x}\\ &\simeq -\frac{dx}{x^2} \end{align*}$$

Furthermore, since $1/v = y \circ v$, the chain rule tells us: $$d(1/v) = d(y \circ v) = dy_{v} \circ dv = -dv/v^2$$

Finally, by the product rule:

$$\begin{align*} d(u/v) &= u\,d(1/v) + (du)/v\\ &= -(u\,dv)/v^2 + (v\,du)/v^2\\ &= (v\,du - u\,dv)/v^2\,\square \end{align*}$$

Exercises


2.3 - Differential Forms

Definition: Let $p \in \mathbf{R}^n$. Then the tangent space of $\mathbf{R}^n$ at $p$, denoted $T_p \mathbf{R}^n$ is defined as the set of all ordered pairs $v_p = (p, v), v \in \mathbf{R}^n$, which are each called tangent vectors to $\mathbf{R}^n$ at $p$. Addition, scalar multiplication, etc. in $T_p \mathbf{R}^n$ is defined as you'd expect: $v_p + w_p = (p, v + w)$ and $cv_p = (p, cv)$

Remark: It should come to you as no surprise that every tangent space to $\mathbf{R}^n$ is isomorphic to $\mathbf{R}^n$. We can visualize $T_p \mathbf{R}^n$ as as a copy of $\mathbf{R}^n$ centered at $p$ whose coordinate axes are labelled $dx, dy$, etc.