Real Analysis

Sequence and series (show)

Note. By default, the notation $R^n$ implicitly implies that $n\in N^+$.

Note. Recall that, by the definition we made for sequence in the "Zermelo-Fraenkel set theory" section of the "foundation of mathematics" chapter, a sequence $(a_n)$ of a metric space $X$ is a map $f$ from $N$ to $X$. Hence by default, index starts from $0$. However, given $N_0\in N$, we can also define a function $g_{N_0}:N\setminus N_0\to N$ such that $g_{N_0}(x)=x-N_0$, then $f\circ g_{N_0}$ defines a function from $N\setminus N_0$ to $X$, which we will also consider a sequence, with index starting from $N_0$. We will only discuss sequences with index starting from $0$, but the properties of sequences with other starting indexes naturally follow.

Convergent sequence
A sequence $(a_n)$ of a metric space $X$ is said to converge if there exists $p\in X$ such that for all $\varepsilon\gt0$, there exists $m\in N$, such that for all $n\ge m$, $d(a_n,p)\lt\varepsilon$. Otherwise $(a_n)$ is said to diverge. If there exists $p\in X$ such that $(a_n)$ converges to $p$, we say $p$ is the limit of $(a_n)$, and denote $p$ by $$\lim_{n\to\infty}a_n$$ We say $\lim_{n\to\infty}a_n$ exists if $(a_n)$ converges.

Metric and topological definitions of limit
Suppose $X$ is a metric space, $(p_i)$ is a sequence of $X$, and $p\in X$. Then $(p_i)$ converges to $p$ if and only if $(p_i)$ topologically converges to $p$. (show proof)

Proposition. If a sequence converges, it converges to a unique point. (show proof)

Note. Note that given a sequence $(a_n)$ of real numbers and let $(c_n)$ be its corresponding complex sequence, then $(a_n)$ converges if and only if $(c_n)$ converges, and if they converge, the real number $\lim_{n\to\infty}a_n$ corresponds to the complex number $\lim_{n\to\infty}c_n$.

Proposition. Let $(a_n)$ and $(b_n)$ be convergent sequences of a real/complex inner product space and let $c$ be a scalar, then $$\lim_{n\to\infty}(ca_n)=c\lim_{n\to\infty}a_n$$ $$\lim_{n\to\infty}(a_n\pm b_n)=\lim_{n\to\infty}a_n\pm \lim_{n\to\infty}b_n$$ If the inner product space is $R$ or $C$, then $$\lim_{n\to\infty}(a_nb_n)=\lim_{n\to\infty}a_n\lim_{n\to\infty}b_n$$ (show proof)

Proposition. Let $(a_{i,j})$ be a double sequence of a metric space. If $\lim_{j\to\infty}a_{i,j}$ exists for each $i$ and $\lim_{i\to\infty,j\to\infty}a_{i,j}$ exists, then $$\lim_{i\to\infty}\lim_{j\to\infty}a_{i,j}=\lim_{i\to\infty,j\to\infty}a_{i,j}$$ (show proof)

Subsequence
Let $k:N\to N$ be a map such that for all $m\gt n$, $k(m)\gt k(n)$. Then $(a_{k(n)})$ is called a subsequence of $(a_n)$.

Cauchy sequence
A sequence $(a_n)$ of a metric space $X$ is said to be a Cauchy sequence if for all $\varepsilon\gt0$, there exists $m\in N$, such that for all $n,k\ge m$, $d(a_n,a_k)\lt\varepsilon$.

Proposition. In a metric space $X$, if a sequence converges, it is a Cauchy sequence. (show proof)

Definition. Let $E$ be a non-empty subset of a metric space $X$. Let $S$ be the set of real numbers $d(p,q)$ where $p,q\in E$. Then $\sup(S)$ is called the diameter of $E$, denoted $\text{diam } E$. Note that $\text{diam } E$ may be $\infty$.

Lemma. If $\mathcal C$ is a collection of closed and bounded subsets of $R^k$ such that the intersection of every finite sub-collection of $\mathcal C$ is non-empty, then $\cap\mathcal C$ is non-empty. (show proof)

Lemma. If $(K_n)$ is a sequence of non-empty, closed and bounded subsets of $R^k$ such that $K_{n+1}\subseteq K_n$ and $\lim_{n\to\infty}\text{diam } K_n=0$, then $\cap K_n$ consists of exactly one point. (show proof)

Lemma. If $E$ is a non-empty bounded subset of $R^k$, then $\text{diam }\overline E=\text{diam } E$. (show proof)

Proposition. In $R^k$, every Cauchy sequence converges. (show proof)

Completeness of metric spaces
A metric space is said to be complete if every Cauchy sequence converges.

Note. Since $C$ is essentially $R\times R$, which is naturally isomorphic to $R^2$, $C$ is complete.

Lemma. For an increasing/decreasing sequence, its limit equals its supremum/infimum. The existence of either one implies the existence of the other. (show proof)

Lemma. Every real-valued sequence has either an increasing subsequence or a decreasing subsequence. (show proof)

Bolzano-Weierstrass theorem
Every bounded sequence in $R^n$ has a convergent subsequence. (show proof)

Note. Again, since $C$ is naturally isomorphic to $R^2$, every bounded sequence in $C$ has a convergent subsequence.

Arithmetic sequence
Suppose we have a sequence $(a_n)$ of $R$ such that there exists $c\in R$ with $a_{n+1}=a_n+c$ for all $n$, then $(a_n)$ is called an arithmetic sequence. Also, $(a_n)$ has the property that $$\sum_{i=0}^n a_i=\frac{(n+1)(a_0+a_n)}{2}$$ (show proof)

Geometric sequence
Suppose we have a sequence $(a_n)$ of $R$ such that there exists $c\in R$ with $a_{n+1}=ca_n$ for all $n$, then $(a_n)$ is called a geometric sequence. Also, if $c\neq1$, $(a_n)$ has the property that $$\sum_{i=0}^n a_i=\frac{a_0(1-c^{n+1})}{1-c}$$ (show proof)

Proposition. A geometric sequence $(a_n)$ with $a_0\gt0$ and $a_{n+1}=ca_n$ where $c\in(0,1)$ converges to $0$. (show proof)

Series
Let $X$ be a real/complex inner product space. Given a sequence $(a_n)$ of $X$, the series of $(a_n)$ is a sequence $(b_n)$ defined by $$b_n=\sum_{i=0}^na_i$$ If $(b_n)$ converges, we denote $\lim_{n\to\infty}b_n$ by $$\sum_{i=0}^\infty a_i$$ or simply $\sum a_n$. We say $\sum a_n$ exists if the series of $(a_n)$ converges.

Proposition. If $\sum a_n$ exists, then $\lim_{n\to\infty}a_n=0$. (show proof)

Comparison test
Let $(a_n)$ and $(b_n)$ be sequences of real/complex numbers. If $\abs{a_n}\le\abs{b_n}$ for all $n\ge N$ for some $N$, and if $\sum\abs{b_n}$ exists, then $\sum\abs{a_n}$ exists. (show proof)

Limit superior and limit inferior
Let $(a_n)$ be a sequence of $R$. We define $$\limsup_{n\to\infty}a_n=\inf_{n\in N}\sup_{m\ge n}a_m$$ and $$\liminf_{n\to\infty}a_n=\sup_{n\in N}\inf_{m\ge n}a_m$$ Note that $\limsup_{n\to\infty}a_n$ and $\liminf_{n\to\infty}a_n$ always exist in $\overline R$. If $\limsup_{n\to\infty}a_n=\liminf_{n\to\infty}a_n$ at a real number, then $\lim_{n\to\infty}a_n$ exists in $R$, and if $\lim_{n\to\infty}a_n$ exists in $R$, then then $$\limsup_{n\to\infty}a_n=\lim_{n\to\infty}a_n=\liminf_{n\to\infty}a_n$$

Proposition. Let $(a_n)$ be a sequence of $R$. Let $S\subset\overline R$ be the collection of subsequential limits of $(a_n)$ (if unbounded above/below, $\infty$/$-\infty$ is considered a subsequential limit). Then $\limsup_{n\to\infty}a_n=\sup S$ and $\liminf_{n\to\infty}a_n=\inf S$. Also, there exists a subsequential limit to $\limsup_{n\to\infty}a_n$ and a subsequential limit to $\liminf_{n\to\infty}a_n$ (or an increasing/decreasing subsequence unbounded above/below in case of $\infty$/$-\infty$). (show proof)

Root test
Let $(a_n)$ be a sequence of real/complex numbers. If $$\limsup_{n\to\infty}\abs{a_n}^{1/n}\lt1$$ then $\sum\abs{a_n}$ exists. Note that when $n=0$, $\abs{a_n}^{1/n}$ is replaced by $0$ to avoid division by $0$. (show proof)

Ratio test
Let $(a_n)$ be a sequence of non-zero real/complex numbers. If $$\limsup_{n\to\infty}\abs{\frac{a_{n+1}}{a_n}}\lt1$$ then $\sum\abs{a_n}$ exists. (show proof)

Proposition. Let $(a_n)$ be a sequence of $R^n$. If $\sum\norm{a_n}$ exists, then $\sum a_n$ exists, and for every reindex $(b_m)$ of $(a_n)$, $\sum b_m=\sum a_n$. Note that this applies to real or complex sequences as well by replacing norm with absolute value. (show proof)

Limit and continuity (show)

Limit
Let $X$ and $Y$ be metric spaces, let $U\subseteq X$ and $f:U\to Y$, and let $p$ be a limit point of $U$. If there exists $q\in Y$, such that for all $\varepsilon\gt0$ there exists $\delta\gt0$ such that for all $x\in U$ with $0\lt d_X(x,p)\lt\delta$, we have $d_Y(f(x),q)\lt\varepsilon$, then we say that the limit of $f$ at $p$, denoted $$\lim_{x\to p}f(x)$$ exists at $q$, which is denoted $f(x)\to q$ as $x\to p$.

Proposition. If a limit exists, it is unique. (show proof)

Localness of limit
Let $X$ and $Y$ be metric spaces. Let $p\in X$ and $U_f,U_g\subseteq X$. If $f:U_f\to Y$ and $g:U_g\to Y$ agree on some neighborhood $N$ of $p$ with respect to $X$ except possibly at $p$ (required to be both defined or both undefined at each point), then if the limit of $f$ at $p$ exists, the limit of $g$ at $p$ also exists at the same point. (show proof)

Continuity
Let $X$ and $Y$ be metric spaces, let $U\subseteq X$ and $f:U\to Y$, and let $p\in U$. If for all $\varepsilon\gt0$ there exists $\delta\gt0$ such that for all $x\in U$ with $d_X(x,p)\lt\delta$, we have $d_Y(f(x),f(p))\lt\varepsilon$, then $f$ is said to be continuous at $p$. If $f$ is continuous at every point in $U$, then $f$ is said to be continuous.

Metric and topological definitions of continuity
Suppose $X$ and $Y$ are metric spaces, $U\subseteq X$, and $f:U\to Y$, then $f$ is continuous if and only if $f$ is topologically continuous, with $U$ taken to be a topological subspace of $X$. (show proof)

Proposition. If $p\in U$ and $p$ is a limit point of $U$, then $f$ is continuous at $p$ if and only if $$\lim_{x\to p}f(x)=f(p)$$ (show proof)

Localness of continuity
Let $X$ and $Y$ be metric spaces. Let $p\in X$ and $U_f,U_g\subseteq X$. If $f:U_f\to Y$ and $g:U_g\to Y$ agree on some neighborhood $N$ of $p$ with respect to $X$ (required to be both defined or both undefined at each point), then if $f$ is continuous at $p$, $g$ is also continuous at $p$. (show proof)

Proposition. Let $X$ and $Y$ be metric spaces. Let $U\subseteq X$ and $f:U\to Y$. If $f$ is continuous, then for every subset $V$ of $U$, $f|_V$ is continuous. (show proof)

Note. When we discuss $R^n$, it is implicitly assumed that $n$ is a positive natural number. Also, we may implicitly regard $R^1$ as $R$ and $R$ as $R^1$, so that when we discuss $R^n$, $R$ is implicitly included in the discussion, and when we discuss $R$, we also implicitly discuss $R^1$.

Limit with infinities
Let $f:U\to Y$ with $U\subseteq X$, where $X$ and $Y$ are metric spaces by default.

Suppose $X=R$ and $U$ is not bounded above. If there exists $q\in Y$ such that for all $\varepsilon\gt0$ there exists $\delta\in R$ such that for all $x\in U$ with $x\gt\delta$, we have $d(f(x),q)\lt\varepsilon$, then we write $f(x)\to q$ as $x\to\infty$, or $$\lim_{x\to \infty}f(x)=q$$ Suppose $Y=R$ and $p$ is a limit point of $U$. If for all $\varepsilon\in R$ there exists $\delta\gt0$ such that for all $x\in U$ with $0\lt d(x,p)\lt\delta$, we have $f(x)\gt \varepsilon$, then we write $f(x)\to\infty$ as $x\to p$, or $$\lim_{x\to p}f(x)=\infty$$ Suppose $X=Y=R$ and $U$ is not bounded above. If for all $\varepsilon\in R$ there exists $\delta\in R$ such that for all $x\in U$ with $x\gt\delta$, we have $f(x)\gt\varepsilon$, then we write $f(x)\to\infty$ as $x\to\infty$, or $$\lim_{x\to\infty}f(x)=\infty$$ The cases where $\infty$s are replaced by $-\infty$s are defined similarly.

One-sided limit
Let $f:U\to Y$ with $U\subseteq R$, where $Y$ is a metric space.

Let $p\in R$ such that for all $r\gt0$, there exists $x\in U\cap(p-r,p)$. If there exists $q\in Y$, such that for all $\varepsilon\gt0$ there exists $\delta\gt0$ such that for all $x\in U\cap(p-r,p)$, we have $d_Y(f(x),q)\lt\varepsilon$, then $q$ is called the left-sided limit of $f$ at $p$, denoted $$\lim_{x\to p^-}f(x)$$
Let $p\in R$ such that for all $r\gt0$, there exists $x\in U\cap(p,p+r)$. If there exists $q\in Y$, such that for all $\varepsilon\gt0$ there exists $\delta\gt0$ such that for all $x\in U\cap(p,p+r)$, we have $d_Y(f(x),q)\lt\varepsilon$, then $q$ is called the right-sided limit of $f$ at $p$, denoted $$\lim_{x\to p^+}f(x)$$

Limit with multiple variables
Let $f:U\to Y$ with $U\subseteq R^n$, where $Y$ is a metric space. And let $\vb c=c_1,\ldots,c_n\in\overline R$, such that for all $\sigma_1,\ldots,\sigma_n\gt0$, if we define $X_1,\ldots,X_n$ by

if $c_i\in R$, then $X_i=(c_i-\sigma_i)\cup(c_i+\sigma_i)$,
if $c_i=\infty$, then $X_i=(\sigma_i,\infty)$,
if $c_i=-\infty$, then $X_i=(-\infty,-\sigma_i)$,

then $U\cap X_1\times\ldots\times X_n$ is non-empty.

If there exists $y\in Y$, such that for all $\varepsilon\gt0$, there exist $\sigma_1,\ldots,\sigma_n\gt0$, such that for all $\vb x\in U\cap X_1\times\ldots\times X_n$, $d(f(\vb x),y)\lt\varepsilon$, then we write $$\lim_{\vb x\to\vb c}f(\vb x)=y$$ This definition can be naturally extended to the cases when $y$ is $\pm\infty$ and/or when some of $c_1,\ldots,c_n$ are approached on one side.

Limit of function operations
Let $f:U\to R^n$ and $g:V\to R^n$ with $W=U\cap V$ and $U,V\subseteq X$ where $X$ is a metric space. Suppose $\lim_{x\to a}f(x)$ and $\lim_{x\to a}g(x)$ exists.

If $c\in R$, then $$\lim_{x\to a}(cf)(x)=c\lim_{x\to a}f(x)$$ If $a$ is a limit point of $W$, then $$\lim_{x\to a}(f\pm g)(x)=\lim_{x\to a}f(x)\pm \lim_{x\to a}g(x)$$ $$\lim_{x\to a}(f\cdot g)(x)=\lim_{x\to a}f(x)\cdot \lim_{x\to a}g(x)$$ (show proof)

Proof. Let $\lim_{x\to a}f(x)=\vb b$ and $\lim_{x\to a}g(x)=\vb d$. If $c=0$, the first equation is trivially true. Let $c\neq 0$, for every $\varepsilon\gt 0$, there exists $\delta\gt 0$ such that $x\in U\cap(B_\delta(a)\setminus\{a\})$ implies $\norm{f(x)-\vb b}\lt\varepsilon/\abs{c}$, which implies $\norm{cf(x)-c\vb b}=\abs{c}\norm{f(x)-\vb b}\lt\varepsilon$, proving the first equation.

For every $\varepsilon\gt 0$, there exists $\delta\gt 0$ such that $x\in W\cap(B_\delta(a)\setminus\{a\})$ implies both $\norm{f(x)-\vb b}\lt\varepsilon/2$ and $\norm{g(x)-\vb d}\lt\varepsilon/2$, which implies $\norm{(f+g)(x)-(\vb b+\vb d)}\le\norm{f(x)-\vb b}+\norm{g(x)-\vb d}\lt\varepsilon$, proving the addition version of the second equation. The subtraction version is proven by having $f-g=f+((-1)\times g)$ and using the proven results.

To prove the third equation, we consider component functions $f_i,g_i$. Let $\varepsilon\gt 0$, there exists $\delta\gt 0$ such that $x\in W\cap(B_\delta(a)\setminus\{a\})$ implies both $$\abs{f_i(x)-b_i}\le\norm{f(x)-\vb b}\lt\frac{\inf(1,\varepsilon)}{2(\abs{d_i}+1)}\le\frac{1}{2}$$ and $$\abs{g_i(x)-d_i}\le\norm{g(x)-\vb d}\lt\frac{\inf(1,\varepsilon)}{2(\abs{b_i}+1)}\le\frac{1}{2}$$ which implies $$\abs{g_i(x)}\le\abs{g_i(x)-d_i}+\abs{d_i}\lt\frac{1}{2}+\abs{d_i}$$ Now we have $$\abs{f_i(x)g_i(x)-b_id_i} =\abs{f_i(x)g_i(x)-b_ig_i(x)+b_ig_i(x)-b_id_i} \le\abs{g_i(x)}\abs{f_i(x)-b_i}+\abs{b_i}\abs{g_i(x)-d_i} \lt\cfrac{(\frac{1}{2}+\abs{d_i})\inf(1,\varepsilon)}{2(\abs{d_i}+1)}+\cfrac{\abs{b_i}\inf(1,\varepsilon)}{2(\abs{b_i}+1)} \lt\inf(1,\varepsilon) \le\varepsilon$$ So $$\lim_{x\to a}(f_ig_i)(x)=b_id_i$$ And we conclude $$\lim_{x\to a}(f\cdot g)(x)=\lim_{x\to a}(\sum_if_ig_i)(x) =\sum_i\lim_{x\to a}(f_ig_i)(x)=\sum_ib_id_i=\vb b\cdot\vb d=\lim_{x\to a}f(x)\cdot \lim_{x\to a}g(x)$$ $\blacksquare$

If $X=R$, in every equation, $a$ can be replaced by $\pm\infty$, with the condition on $a$ replaced by $W$ being unbounded above/below. These variants can be proven in the same way, except changing the $\delta$ conditions in the $(\varepsilon,\delta)$ statements accordingly.

If $f$ and $g$ are real-valued, we can replace the dot products with multiplications in the third equation, resulting in $$\lim_{x\to a}(fg)(x)=\lim_{x\to a}f(x)\lim_{x\to a}g(x)$$

Continuity of function operations
Let $f:U\to R^n$ and $g:V\to R^n$ with $W=U\cap V$ and $U,V\subseteq X$ where $X$ is a metric space. Suppose $a\in W$, $f$ and $g$ are both continuous at $a$, and $c\in R$, then $cf,f\pm g,f\cdot g$ are all continuous at $a$. (show proof)

If $f$ and $g$ are real-valued, we can replace the dot product with multiplication, so that $fg$ is continuous at $a$.

Limit of component functions
Let $f:U\to R^n$ with $U\subseteq X$ where $X$ is a metric space. Suppose $a$ is a limit point of $U$, then $$\lim_{x\to a}f(x)=(\lim_{x\to a}f_1(x),\ldots,\lim_{x\to a}f_n(x))$$ where the existence of one side implies the existence of the other. (show proof)

If $U\subseteq R$, $a$ can be replaced by $\pm\infty$, with the condition on $a$ replaced by $U$ being unbounded above/below. These variants can be proven in the same way, except changing the $\delta$ conditions in the $(\varepsilon,\delta)$ statements accordingly.

Continuity of component functions
Let $f:U\to R^n$ with $U\subseteq X$ where $X$ is a metric space. Suppose $a\in U$, then the component functions $f_1,\ldots,f_n$ are continuous at $a$ if and only if $f$ is continuous at $a$. (show proof)

Limit of composite functions
Let $f:U\to Z$ and $g:V\to Y$ with $U\subseteq Y$ and $V\subseteq X$ where $X,Y,Z$ are metric spaces. Suppose $a\in V$ and $b\in U$ such that

$\lim_{x\to a}g(x)=b$,
$\lim_{y\to b}f(y)=f(b)$, and
there exists $r\gt0$ such that for all $x\in V\cap(B_r(a)\setminus\{a\})$, $g(x)\in U$.

Then we have $$\lim_{x\to a}(f\circ g)(x)=f(b)=f(\lim_{x\to a}g(x))$$ (show proof)

Note that $a$ and/or $b$ can be replaced by $\pm\infty$, with a few changes:

If $a$ is replaced by $\pm\infty$, then the condition on $r$ is modified accordingly.
If $b$ is replaced by $\pm\infty$, then the condition $\lim_{y\to b}f(y)=f(b)$ is replaced by that $\lim_{y\to\pm\infty}f(y)$ exists or equals $\pm\infty$, and the conclusion will become $$\lim_{x\to a}(f\circ g)(x)=\lim_{y\to \pm\infty}f(y)$$

These variants can be proven in the same way, except minor modifications.

Continuity of composite functions
Let $f:U\to Z$ and $g:V\to Y$ with $U\subseteq Y$ and $V\subseteq X$ where $X,Y,Z$ are metric spaces. Suppose $a\in V$ such that

$g$ is continuous at $a$, $g(a)\in U$, and $f$ is continuous at $g(a)$,
$a$ is a limit point of $V$, $g(a)$ is a limit point of $U$, and $a$ is also a limit point of the domain of $f\circ g$, and
there exists $r\gt0$ such that for all $x\in V\cap B_r(a)$, $g(x)\in U$.

Then $f\circ g$ is continuous at $a$. (show proof)

Proposition. $\frac{1}{x}$ is continuous on $R\setminus\{0\}$. (show proof)

Proposition. Let $f:U\to R$ with $U\subseteq X$, where $X$ is a metric space. Suppose $a\in U$ and $b\neq0$ such that

$\lim_{x\to a}f(x)=b$, and
there exists $r\gt0$ such that for all $x\in U\cap(B_r(a)\setminus\{a\})$, $f(x)\neq0$.

Then we have $$\lim_{x\to a}\p{\frac{1}{f}}(x)=\frac{1}{b}$$ (show proof)

Note that $a$ and/or $b$ can be replaced by $\pm\infty$, with a few changes:

If $a$ is replaced by $\pm\infty$, then the condition on $r$ is modified accordingly.
If $b$ is replaced by $\pm\infty$, then the conclusion will become $$\lim_{x\to a}\p{\frac{1}{f}}(x)=\lim_{y\to \pm\infty}\frac{1}{y}=0$$

These variants can be proven in the same way, following from limit of composite functions.

Squeeze theorem
Let $g:A\to R$, $f:B\to R$ and $h:C\to R$ with $A,B,C\subseteq X$ where $X$ is a metric space. Let $a$ be a limit point of $B$. Suppose there exists a neighborhood $N$ of $a$, satisfying $B\cap N\subseteq A\cap C$, such that for every $x\in B\cap (N\setminus\{a\})$, we have $g(x)\le f(x)\le h(x)$, and $$\lim_{x\to a}g(x)=\lim_{x\to a}h(x)=c$$ then $$\lim_{x\to a}f(x)=c$$ (show proof)

Note that $a$ can be replaced by $\pm\infty$, with the condition on $a$ and the definition of $N$ modified accordingly. These variants can be proven in the same way, except minor modifications.

Uniform continuity
Let $f:U\to Y$ with $U\subseteq X$, where $X$ and $Y$ are metric spaces. If for all $\varepsilon\gt0$, there exists $\delta\gt0$, such that for all $a,b\in U$, $d(a,b)\lt\delta$ implies $d(f(a),f(b))\lt\varepsilon$, then $f$ is said to be uniformly continuous in $U$.

Proposition. Uniform continuity implies continuity.

Proposition. If domain is compact, continuity implies uniform continuity. (show proof)

Differentiation (show)

Note.
We will use the following default settings:

If not otherwise defined, $f$ is a function $f:U\to R^m$ where $U$ is a subset of $R^n$ and $n,m$ are non-zero natural numbers.
$f_1,\ldots,f_m$ are component functions of $f$ such that $f(\vb x)=(f_1(\vb x),\ldots,f_m(\vb x))$ for all $\vb x\in U$. By default, component functions share the same domain $U$ with $f$.
We may implicitly regard $R^1$ as $R$ and $R$ as $R^1$, so that when we discuss $R^n$, $R$ is implicitly included in the discussion, and when we discuss $R$, we also implicitly discuss $R^1$.

Derivative
Suppose $U\subseteq R$ and $f$ is real-valued. If $p$ is an interior point of $U$, and if the limit $$\lim_{h\to 0}\frac{f(p+h)-f(p)}{h}$$ exists, with $h$ defined on $\{h\in R|p+h\in U,h\neq0\}$, then the derivative of $f$ at $p$, denoted $f'(p)$ or $\dv{f}{x}(p)$ if we denote the variable as $x$, is defined as that limit, and $f$ is said to be differentiable at $p$. If $f$ is differentiable at every point in $U$, then $f$ is said to be differentiable, and $f'$, a function from $U$ to $R$, is called the derivative of $f$. If $f'$ is continuous, then $f$ is called continuously differentiable.

Partial derivative
Suppose $f$ is real-valued. If $\vb p$ is a point such that for some $r\gt0$, for all $h\in(-r,r)$, $\vb p+h\vb e_i\in U$, and if the limit $$\lim_{h\to 0}\frac{f(\vb p+h\vb e_i)-f(\vb p)}{h}$$ exists, with $h$ defined on $\{h\in R|\vb p+h\vb e_i\in U,h\neq0\}$, then the partial derivative of $f$ at $\vb p$ with respect to the $i$th variable, denoted $D_if(\vb p)$ or $\pdv{f}{x_i}(\vb p)$ if we denote the $i$th variable as $x_i$, is defined as that limit.

Total derivative
If $\vb p$ is an interior point of $U$, and if there exists a linear map $L:R^n\to R^m$ such that $$\lim_{\vb h\to \vb 0}\frac{\Vert f(\vb p+\vb h)-f(\vb p)-L(\vb h)\Vert}{\Vert\vb h\Vert}=0$$ with $\vb h$ defined on $\{\vb h\in R^n|\vb p+\vb h\in U,\vb h\neq\vb0\}$, then $L$ is called the total derivative of $f$ at $\vb p$, denoted $Df(\vb p)$. Since $Df(\vb p)$ is a linear map, it has a standard matrix representation $Af(\vb p)$. Since $Df(\vb p)(\vb h)=Af(\vb p)\vb h$, we will simply denote $Af(\vb p)$ as $Df(\vb p)$ for convenience. As a result, $Df(\vb p)$ may represent a linear map from $R^n$ to $R^m$, or an $m\times n$ matrix.

Localness of derivative
Due to the localness of limit, derivatives/partial derivatives/total derivatives are local, meaning that if $f$ and $g$ agree on some neighborhood (or single dimensional neighborhood in case of partial derivative) of $p$ or $\vb p$, then a real number or a linear map must simultaneously be a derivative/partial derivative/total derivative of $f$ and $g$ at $p$ or $\vb p$, or simultaneously not.

Proposition. If a derivative/partial derivative/total derivative exists, it is unique. (show proof)

Proof. Uniqueness of derivative and partial derivative follows directly from uniqueness of limit.

For total derivative, Suppose $A$ and $B$ are both total derivatives of $f$ at $\vb p$, an interior point of $U$, then there exists $r\gt0$ such that $B_r(\vb p)\subseteq U$. Let $B_r(\vb 0)\setminus\{\vb 0\}$ be the domain for $\vb h$, then $\vb p+\vb h\in B_r(\vb p)\subseteq U$ and the limits for $A$ and $B$ are both still $0$ for this domain for $\vb h$, due to localness of limit. Since $$\frac{\norm{f(\vb p+\vb h)-f(\vb p)-A(\vb h)}}{\norm{\vb h}}+\frac{\norm{f(\vb p+\vb h)-f(\vb p)-B(\vb h)}}{\norm{\vb h}} \ge\frac{\norm{B(\vb h)-A(\vb h)}}{\norm{\vb h}} \ge0$$ and $$\lim_{\vb h\to\vb 0}\p{\frac{\norm{f(\vb p+\vb h)-f(\vb p)-A(\vb h)}}{\norm{\vb h}}+\frac{\norm{f(\vb p+\vb h)-f(\vb p)-B(\vb h)}}{\norm{\vb h}}}=0$$ by squeeze theorem, $$\lim_{\vb h\to\vb 0}\frac{\norm{(B-A)(\vb h)}}{\norm{\vb h}}=\lim_{\vb h\to\vb 0}\frac{\norm{B(\vb h)-A(\vb h)}}{\norm{\vb h}}=0$$ where $B-A$ is a linear map.

Suppose there exists $\vb x\in R^n\setminus\{\vb 0\}$ such that $(B-A)(\vb x)\neq\vb 0$. Let $\varepsilon=\frac{\norm{(B-A)(\vb x)}}{\Vert\vb x\Vert}$, then $\varepsilon\gt0$, and for all $\delta\gt0$, we have $\frac{\delta\vb x}{2\Vert\vb x\Vert}\in B_\delta(\vb 0)\setminus\{\vb 0\}$, such that $$\abs{\frac{\norm{(B-A)(\frac{\delta\vb x}{2\Vert\vb x\Vert})}}{\norm{\frac{\delta\vb x}{2\Vert\vb x\Vert}}}-0} =\frac{\frac{\delta}{2\Vert\vb x\Vert}\norm{(B-A)(\vb x)}}{\frac{\delta}{2}} =\frac{\norm{(B-A)(\vb x)}}{\Vert\vb x\Vert} \ge\varepsilon$$ We have shown that $0$ is not the limit of $\frac{\norm{(B-A)(\vb h)}}{\norm{\vb h}}$ at $\vb h\to\vb 0$, which is a contradiction. Therefore, for all $\vb x\in R^n$, $(B-A)(\vb x)=\vb 0$, implying $A=B$. $\blacksquare$

Proposition. If $U\subseteq R$ and $f$ is real-valued, differentiability and total differentiability are equivalent. In particular, given $f'(p)$, $Df(p)(h)=f'(p)h$; given $Df(p)$, $f'(p)=Df(p)(1)$. (show proof)

Note. We will use the term differentiability in place of total differentiability since there is no ambiguity.

Note. With norm and distance for matrices defined, totally continuous differentiability can be naturally defined. Trivially, if $U\subseteq R$ and $f$ is real-valued, continuous differentiability and totally continuous differentiability are equivalent, hence we will use the term continuous differentiability in place of totally continuous differentiability since there is no ambiguity.

Proposition. Linearity implies differentiability. (show proof)

Lemma. A linear map is continuous at $\vb 0$. (show proof)

Proposition. Differentiability implies continuity. (show proof)

Lemma. $\Vert\vb v\Vert\le\sum_i\abs{v_i}$. (show proof)

Derivative of component functions
$f$ is differentiable at $\vb p$ if and only if all component functions $f_j$ are differentiable at $\vb p$. In addition, if $f$ or all $f_j$ are differentiable at $\vb p$, then $$Df(\vb p)_{j,*}=Df_j(\vb p)$$ (show proof)

Proposition. Existence of total derivative implies existence of all partial derivatives. In particular, $\pdv{f_j}{x_i}(\vb p)=Df(\vb p)_{ji}$. (show proof)

Jacobian matrix
Given differentiable $f:U\to R^m$ where $U\subseteq R^n$, for all $i\in\{1,\ldots,n\}$ and $j\in\{1,\ldots,m\}$, there exists a unique function $\pdv{f_j}{x_i}:U\to R$ that maps every $\vb p\in U$ to $\pdv{f_j}{x_i}(\vb p)$. Then we can uniquely define a map $Jf$ for $f$ such that $Jf_{ji}=\pdv{f_j}{x_i}$. We call $Jf$ the Jacobian matrix of $f$. And we can consider $J$ an operator that maps every differentiable $f:U\to R^m$ to $Jf$. Operations on Jacobian matrices resemble operations on matrices, if well-defined. Note that, for all $\vb p\in U$, $$Jf_{ji}(\vb p)=\pdv{f_j}{x_i}(\vb p)=Df(\vb p)_{ji}$$

Derivative of function operations
Suppose $f:U\to R^m$ and $g:V\to R^m$, where $U,V\subseteq R^n$, are differentiable at $\vb p$, and $c\in R$, then $$D(cf)(\vb p)=cDf(\vb p)$$ $$D(f\pm g)(\vb p)=Df(\vb p)\pm Dg(\vb p)$$ and for all $\vb h\in R^n$, $$D(f\cdot g)(\vb p)(\vb h)=Df(\vb p)(\vb h)\cdot g(\vb p)+Dg(\vb p)(\vb h)\cdot f(\vb p)$$ where the last equation is called product rule (show proof).

Proof. For the first equation, $$\lim_{\vb h\to \vb 0}\frac{\norm{(cf)(\vb p+\vb h)-(cf)(\vb p)-cDf(\vb p)(\vb h)}}{\norm{\vb h}} =\lim_{\vb h\to \vb 0}\frac{\abs{c}\norm{f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)}}{\norm{\vb h}} =\abs{c}\lim_{\vb h\to \vb 0}\frac{\norm{f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)}}{\norm{\vb h}} =0$$ Hence $D(cf)(\vb p)=cDf(\vb p)$.

Since $\vb p$ is an interior point of both $U$ and $V$, it is an interior point of $U\cap V$. For the second equation, let $B_r(\vb p)\subseteq U\cap V$, and $\vb h\in B_r(\vb 0)\setminus\{\vb 0\}$, then $$\frac{\norm{f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)}}{\norm{\vb h}}+\frac{\norm{g(\vb p+\vb h)-g(\vb p)-Dg(\vb p)(\vb h)}}{\norm{\vb h}} \ge\frac{\norm{(f\pm g)(\vb p+\vb h)-(f\pm g)(\vb p)-(Df(\vb p)\pm Dg(\vb p))(\vb h)}}{\norm{\vb h}} \ge0$$ Since $$\lim_{\vb h\to \vb 0}\p{\frac{\norm{f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)}}{\norm{\vb h}}+\frac{\norm{g(\vb p+\vb h)-g(\vb p)-Dg(\vb p)(\vb h)}}{\norm{\vb h}}}=0$$ by squeeze theorem, $$\lim_{\vb h\to \vb 0}\frac{\norm{(f\pm g)(\vb p+\vb h)-(f\pm g)(\vb p)-(Df(\vb p)\pm Dg(\vb p))(\vb h)}}{\norm{\vb h}}=0$$ Hence $D(f\pm g)(\vb p)=Df(\vb p)\pm Dg(\vb p)$.

For the third equation, let $j\in\{1,\ldots,m\}$, then both $f_j$ and $g_j$ are differentiable at $\vb p$. For $\vb h\in B_r(\vb 0)\setminus\{\vb 0\}$, let $$\varepsilon_j(\vb h)=\cfrac{f_j(\vb p+\vb h)-f_j(\vb p)-Df_j(\vb p)(\vb h)}{\norm{\vb h}} \quad\text{and}\quad\eta_j(\vb h)=\cfrac{g_j(\vb p+\vb h)-g_j(\vb p)-Dg_j(\vb p)(\vb h)}{\norm{\vb h}}$$ then $$f_j(\vb p+\vb h)-f_j(\vb p)=Df_j(\vb p)(\vb h)+\norm{\vb h}\varepsilon_j(\vb h) \quad\text{and}\quad g_j(\vb p+\vb h)-g_j(\vb p)=Dg_j(\vb p)(\vb h)+\norm{\vb h}\eta_j(\vb h)$$ Hence $$\frac{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)}{\norm{\vb h}} =\frac{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p+\vb h)+f_j(\vb p)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)}{\norm{\vb h}}$$ $$=\frac{\p{f_j(\vb p+\vb h)-f_j(\vb p)}g_j(\vb p+\vb h)+\p{g_j(\vb p+\vb h)-g_j(\vb p)}f_j(\vb p)}{\norm{\vb h}} =\frac{\p{Df_j(\vb p)(\vb h)+\norm{\vb h}\varepsilon_j(\vb h)}g_j(\vb p+\vb h)+\p{Dg_j(\vb p)(\vb h)+\norm{\vb h}\eta_j(\vb h)}f_j(\vb p)}{\norm{\vb h}}$$ $$=\frac{Df_j(\vb p)(\vb h)\p{g_j(\vb p)+Dg_j(\vb p)(\vb h)+\norm{\vb h}\eta_j(\vb h)}+Dg_j(\vb p)(\vb h)f_j(\vb p)}{\norm{\vb h}}+\varepsilon_j(\vb h)g_j(\vb p+\vb h)+\eta_j(\vb h)f_j(\vb p)$$ $$=\frac{Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p)}{\norm{\vb h}}+\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}+Df_j(\vb p)(\vb h)\eta_j(\vb h)+\varepsilon_j(\vb h)g_j(\vb p+\vb h)+\eta_j(\vb h)f_j(\vb p)$$ By Cauchy-Schwarz inequality, $$\abs{\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}} =\frac{\abs{(Df_j(\vb p))^T\cdot\vb h}\abs{Dg_j(\vb p)(\vb h)}}{\norm{\vb h}} \le\frac{\norm{(Df_j(\vb p))^T}\norm{\vb h}\abs{Dg_j(\vb p)(\vb h)}}{\norm{\vb h}} =\norm{(Df_j(\vb p))^T}\abs{Dg_j(\vb p)(\vb h)}$$ Since $Df_j(\vb p),Dg_j(\vb p)$ are both linear, they are continuous at $\vb0$, so $$\lim_{\vb h\to \vb 0}Df_j(\vb p)(\vb h)=0 \quad\text{and}\quad \lim_{\vb h\to \vb 0}Dg_j(\vb p)(\vb h)=0$$ implying $$\lim_{\vb h\to \vb 0}\abs{Dg_j(\vb p)(\vb h)}=0$$ So $$\lim_{\vb h\to \vb 0}\norm{(Df_j(\vb p))^T}\abs{Dg_j(\vb p)(\vb h)} =\norm{(Df_j(\vb p))^T}\lim_{\vb h\to \vb 0}\abs{Dg_j(\vb p)(\vb h)} =0$$ By squeeze theorem, $$\lim_{\vb h\to \vb 0}\abs{\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}}=0$$ implying $$\lim_{\vb h\to \vb 0}\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}=0$$ Note that $$\lim_{\vb h\to \vb 0}\abs{\varepsilon_j(\vb h)}=0 \quad\text{and}\quad \lim_{\vb h\to \vb 0}\abs{\eta_j(\vb h)}=0$$ implying $$\lim_{\vb h\to \vb 0}\varepsilon_j(\vb h)=0 \quad\text{and}\quad \lim_{\vb h\to \vb 0}\eta_j(\vb h)=0$$ Since $g_j$ is differentiable at $\vb p$, it is continuous at $\vb p$, thus $$\lim_{\vb h\to \vb 0}g_j(\vb p+\vb h)=g_j(\vb p)$$ Now we have $$\lim_{\vb h\to \vb 0}\p{\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}+Df_j(\vb p)(\vb h)\eta_j(\vb h)+\varepsilon_j(\vb h)g_j(\vb p+\vb h)+\eta_j(\vb h)f_j(\vb p)}$$ $$=\lim_{\vb h\to \vb 0}\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}+\lim_{\vb h\to \vb 0}Df_j(\vb p)(\vb h)\eta_j(\vb h)+\lim_{\vb h\to \vb 0}\varepsilon_j(\vb h)g_j(\vb p+\vb h)+\lim_{\vb h\to \vb 0}\eta_j(\vb h)f_j(\vb p)$$ $$=\lim_{\vb h\to \vb 0}Df_j(\vb p)(\vb h)\lim_{\vb h\to \vb 0}\eta_j(\vb h)+\lim_{\vb h\to \vb 0}\varepsilon_j(\vb h)\lim_{\vb h\to \vb 0}g_j(\vb p+\vb h)+f_j(\vb p)\lim_{\vb h\to \vb 0}\eta_j(\vb h) =0$$ Then we have $$\lim_{\vb h\to \vb 0}\frac{\abs{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)-(Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p))}}{\norm{\vb h}}$$ $$=\lim_{\vb h\to \vb 0}\abs{\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}+Df_j(\vb p)(\vb h)\eta_j(\vb h)+\varepsilon_j(\vb h)g_j(\vb p+\vb h)+\eta_j(\vb h)f_j(\vb p)} =0$$ Since $$\frac{\sum_{j=1}^n\abs{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)-(Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p))}}{\norm{\vb h}}$$ $$\ge\frac{\abs{\sum_{j=1}^n\p{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)-(Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p))}}}{\norm{\vb h}}$$ $$=\frac{\abs{(f\cdot g)(\vb p+\vb h)-(f\cdot g)(\vb p)-(Df(\vb p)(\vb h)\cdot g(\vb p)+Dg(\vb p)(\vb h)\cdot f(\vb p))}}{\norm{\vb h}}$$ and we know that $$\lim_{\vb h\to \vb 0}\frac{\sum_{j=1}^n\abs{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)-(Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p))}}{\norm{\vb h}}$$ $$=\sum_{j=1}^n\lim_{\vb h\to \vb 0}\frac{\abs{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)-(Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p))}}{\norm{\vb h}} =0$$ by squeeze theorem, $$\lim_{\vb h\to \vb 0}\frac{\abs{(f\cdot g)(\vb p+\vb h)-(f\cdot g)(\vb p)-(Df(\vb p)(\vb h)\cdot g(\vb p)+Dg(\vb p)(\vb h)\cdot f(\vb p))}}{\norm{\vb h}}=0$$ Clearly, $Df(\vb p)(\vb h)\cdot g(\vb p)+Dg(\vb p)(\vb h)\cdot f(\vb p)$, as a function of $\vb h$, is linear. Hence $D(f\cdot g)(\vb p)$ exists and is defined by $$D(f\cdot g)(\vb p)(\vb h)=Df(\vb p)(\vb h)\cdot g(\vb p)+Dg(\vb p)(\vb h)\cdot f(\vb p)$$ $\blacksquare$

If $f$ and $g$ are real-valued with real domains, then clearly, $$(cf)'(p)=cf'(p)$$ $$(f\pm g)'(p)=f'(p)\pm g'(p)$$ $$(fg)'(p)=f'(p)g(p)+g'(p)f(p)$$
Suppose $f$ and $g$ share the same domain and are both differentiable, then $$J(cf)=cJf$$ $$J(f\pm g)=Jf\pm Jg$$ $$J(f\cdot g)=((Jf)^Tg+(Jg)^Tf)^T$$ (show proof)

Derivative of composite functions
Also called chain rule. Let $V\subset R^n$ and $U\subset R^m$, let $g:V\to R^m$ and $f:U\to R^l$. Suppose $g$ is differentiable at $\vb p$ and $f$ is differentiable at $g(\vb p)$, then $$D(f\circ g)(\vb p)=Df(g(\vb p))\circ Dg(\vb p)$$ (show proof).

Proof. Since $g$ is differentiable at $\vb p$ and $f$ is differentiable at $g(\vb p)$, there exists $s\gt0$ such that $B_s(\vb p)\subseteq V$ and $B_s(g(\vb p))\subseteq U$. Since $g$ is differentiable at $\vb p$, it is also continuous at $\vb p$. So there exists $0\lt r\lt s$ such that $\vb x\in B_r(\vb p)$ implies $g(\vb x)\in B_s(g(\vb p))$. For all $\vb h\in B_r(\vb 0)$ and $\vb k\in B_s(\vb 0)$, define $$u(\vb h)=g(\vb p+\vb h)-g(\vb p)-Dg(\vb p)(\vb h) \quad\text{and}\quad v(\vb k)=f(g(\vb p)+\vb k)-f(g(\vb p))-Df(g(\vb p))(\vb k)$$ Also define $\varepsilon(\vb h)$ and $\eta(\vb k)$ such that when $\vb h\neq\vb 0$, $\varepsilon(\vb h)=\frac{\norm{u(\vb h)}}{\norm{\vb h}}$, when $\vb k\neq\vb 0$, $\eta(\vb k)=\frac{\norm{v(\vb k)}}{\norm{\vb k}}$, and $\varepsilon(\vb 0)=\eta(\vb 0)=0$. Then $$\lim_{\vb h\to\vb 0}\varepsilon(\vb h)=0 \quad\text{and}\quad \lim_{\vb k\to\vb 0}\eta(\vb k)=0$$ and $$\norm{u(\vb h)}=\varepsilon(\vb h)\norm{\vb h} \quad\text{and}\quad \norm{v(\vb k)}=\eta(\vb k)\norm{\vb k}$$ where $\vb h$ or $\vb k$ may or may not be $\vb 0$. Given $\vb h\in B_r(\vb 0)$, let $\vb k=g(\vb p+\vb h)-g(\vb p)$, then $\vb k\in B_s(\vb 0)$, and since $g$ is continuous at $\vb p$, $\lim_{\vb h\to\vb 0}\vb k=\vb 0$. Now we have $$\norm{\vb k}=\norm{u(\vb h)+Dg(\vb p)(\vb h)}\le\norm{u(\vb h)}+\norm{Dg(\vb p)(\vb h)}\le(\varepsilon(\vb h)+\norm{Dg(\vb p)})\norm{\vb h}$$ and $$(f\circ g)(\vb p+\vb h)-(f\circ g)(\vb p)-(Df(g(\vb p))\circ Dg(\vb p))(\vb h) =f(g(\vb p)+\vb k)-f(g(\vb p))-(Df(g(\vb p))\circ Dg(\vb p))(\vb h)$$ $$=v(\vb k)+Df(g(\vb p))(\vb k)-Df(g(\vb p))(Dg(\vb p)(\vb h)) =v(\vb k)+Df(g(\vb p))(g(\vb p+\vb h)-g(\vb p)-Dg(\vb p)(\vb h)) =v(\vb k)+Df(g(\vb p))(u(\vb h))$$ and hence $$\norm{(f\circ g)(\vb p+\vb h)-(f\circ g)(\vb p)-(Df(g(\vb p))\circ Dg(\vb p))(\vb h)} =\norm{v(\vb k)+Df(g(\vb p))(u(\vb h))}$$ $$\le\norm{v(\vb k)}+\norm{Df(g(\vb p))(u(\vb h))} \le\eta(\vb k)\norm{\vb k}+\norm{Df(g(\vb p))}\norm{u(\vb h)} \le\eta(\vb k)(\varepsilon(\vb h)+\norm{Dg(\vb p)})\norm{\vb h}+\norm{Df(g(\vb p))}\varepsilon(\vb h)\norm{\vb h}$$ Thus, when $\vb h\neq\vb 0$, $$\frac{\norm{(f\circ g)(\vb p+\vb h)-(f\circ g)(\vb p)-(Df(g(\vb p))\circ Dg(\vb p))(\vb h)}}{\norm{\vb h}} \le\eta(\vb k)(\varepsilon(\vb h)+\norm{Dg(\vb p)})+\norm{Df(g(\vb p))}\varepsilon(\vb h)$$ Note that $\lim_{\vb h\to\vb 0}\varepsilon(\vb h)=0$ and, since $\lim_{\vb k\to\vb 0}\eta(\vb k)=0=\eta(\vb 0)$, by limit of composite functions, $\lim_{\vb h\to\vb 0}\eta(\vb k)=\eta(\lim_{\vb h\to\vb 0}\vb k)=\eta(\vb 0)=0$. We have $$\lim_{\vb h\to\vb 0}\p{\eta(\vb k)(\varepsilon(\vb h)+\norm{Dg(\vb p)})\norm{\vb h}+\norm{Df(g(\vb p))}\varepsilon(\vb h)\norm{\vb h}}=0$$ Therefore, by squeeze theorem, $$\lim_{\vb h\to\vb 0}\frac{\norm{(f\circ g)(\vb p+\vb h)-(f\circ g)(\vb p)-(Df(g(\vb p))\circ Dg(\vb p))(\vb h)}}{\norm{\vb h}}=0$$ We have shown that $D(f\circ g)(\vb p)$ exists and $D(f\circ g)(\vb p)=Df(g(\vb p))\circ Dg(\vb p)$. $\blacksquare$

If $U,V\subseteq R$ and $f,g$ are real-valued, then clearly, $$(f\circ g)'(p)=f'(g(p))g'(p)$$ Suppose $f$ and $g$ are differentiable, and the range of $g$ is a subset of the domain of $f$, then $$J(f\circ g)=(Jf\circ g)Jg$$ where $Jf\circ g$ is a matrix of functions with $(Jf\circ g)_{kj}=Jf_{kj}\circ g$. (show proof)

Proposition. $$\dv{}{x}x^{n+1}=(n+1)x^n$$ (show proof)

Local extremum
Suppose $f$ is real-valued. If there exists $\vb p\in U$, such that there exists $\delta\gt0$ such that for all $\vb q\in U$ with $d(\vb p,\vb q)\lt\delta$, $f(\vb q)\le f(\vb p)$, then we say $f$ has a local maximum at $\vb p$. We define local minimum similarly. We say $f$ has a local extremum at $\vb p$ if $f$ has a local maximum or a local minimum at $\vb p$.

Proposition. Suppose $f$ is a real-valued function defined on $(a,b)$ where $a\lt b$. If $f$ has a local extremum at some $x\in(a,b)$ such that $f$ is also differentiable at $x$, then $f'(x)=0$. (show proof)

Proposition. Let $f:U\to R$ be continuous where $U\subseteq R^n$ is compact, then $f$ is bounded. (show proof)

Extreme value theorem
Let $f:U\to R$ be continuous where $U\subseteq R^n$ is non-empty and compact, then $f$ attains maximum and a minimum in $U$. (show proof)

Intermediate value theorem
Let $f:[a,b]\to R$ be continuous where $a\le b$, then for every real number $y$ between $f(a)$ and $f(b)$, there exists $x\in[a,b]$ such that $f(x)=y$. (show proof)

Proposition. Let $f:(a,b)\to R$ where $a\lt b$. If $f$ has a local extremum at some $x\in(a,b)$ such that $f$ is also differentiable at $x$, then $f'(x)=0$. (show proof)

Cauchy's mean value theorem
Let $f:[a,b]\to R$ and $g:[a,b]\to R$ be continuous on $[a,b]$ and differentiable in $(a,b)$ where $a\lt b$, then there exists $x\in(a,b)$ such that $$(f(b)-f(a))g'(x)=(g(b)-g(a))f'(x)$$ (show proof)

Mean value theorem
Let $f:[a,b]\to R$ be continuous on $[a,b]$ and differentiable in $(a,b)$ where $a\lt b$, then there exists $x\in(a,b)$ such that $$(f(b)-f(a))=(b-a)f'(x)$$ (show proof)

Lemma. Let $B_r(\vb p)$ be some open ball in $R^n$, then given $\vb a,\vb b\in B_r(\vb p)$, define $s(t)=(1-t)\vb a+t\vb b$, then $s(t)\in B_r(\vb p)$ for all $t\in[0,1]$. (show proof)

Lemma. Given $\vb v\in R^n$, $$\Vert\vb v\Vert\le\sum_i\abs{v_i}\le\sqrt n\Vert\vb v\Vert$$ (show proof)

Note. Note that the concept of continuous differentiability can be extended to total derivatives, with a metric on $R^{m\times n}$ defined in the "linear transformation" section of the "linear algebra" chapter.

Proposition. Suppose $U$ is an open set. Then $f$ is continuously differentiable if and only if all partial derivatives $D_if_j$ exist and are continuous on $U$. (show proof)

Proof. Suppose $f$ is continuously differentiable. Then all partial derivatives of $f$ exist on $U$. Given a partial derivative $D_if_j$, for all $\vb x,\vb y\in U$, $$d(D_if_j(\vb x),D_if_j(\vb y))=\norm{(Df(\vb x)\vb e_i)_j-(Df(\vb y)\vb e_i)_j} =\norm{((Df(\vb x)-Df(\vb y))\vb e_i)_j}\le\norm{(Df(\vb x)-Df(\vb y))\vb e_i}\le\norm{Df(\vb x)-Df(\vb y)}=d(Df(\vb x),Df(\vb y))$$ Hence for all $\vb p\in U$, for all $\varepsilon\gt0$, there exists $\delta\gt0$ such that $B_\delta(\vb p)\subseteq U$ and for all $\vb q\in B_\delta(\vb p)$, $Df(\vb q)\in B_\varepsilon(Df(\vb p))$, hence $D_if_j(\vb q)\in B_\varepsilon(D_if_j(\vb p))$, implying $D_if_j$ is continuous in $U$.

Now suppose all partial derivatives $D_if_j$ exist and are continuous on $U$. Given $\vb p\in U$, for all $\varepsilon\gt0$, let $\eta=\frac{\varepsilon}{2nm}$, define $r\gt0$ such that $B_r(\vb p)\subseteq U$ and for all $\vb q\in B_r(\vb p)$, $D_if_j(\vb q)\in B_{\eta}(D_if_j(\vb p))$. Let $\vb h\in B_r(\vb 0)\setminus\{\vb0\}$. For some scalars $h_i$, $\vb h=\sum_ih_i\vb e_i$. Denote $\sum_{k=1}^ih_k\vb e_k$ by $\vb v_i$ for $i$ from $0$ to $n$, then $$f_j(\vb p+\vb h)-f_j(\vb p)=\sum_i(f_j(\vb p+\vb v_i)-f_j(\vb p+\vb v_{i-1}))$$ Note that for each $\vb v_i$, $d(\vb p+\vb v_i,\vb p)=\norm{(\vb p+\vb v_i)-\vb p}=\norm{\vb v_i}\le\norm{\vb h}\lt r$, so $\vb p+\vb v_i\in B_r(\vb p)$. If we define $s_i(t)=(1-t)(\vb p+\vb v_{i-1})+t(\vb p+\vb v_i)$, then $s_i(t)\in B_r(\vb p)$ for all $t\in[0,1]$. Suppose $h_i\neq0$, then for all $t\in[0,1]$, $$(f_j\circ s_i)'(t) =\lim_{h\to0}\frac{f_j(s_i(t)+hh_i\vb e_i)-f_j(s_i(t))}{h} =h_i\lim_{h\to0}\frac{f_j(s_i(t)+hh_i\vb e_i)-f_j(s_i(t))}{hh_i} =h_iD_if_j(s_i(t))$$ Suppose $h_i=0$, then $f_j\circ s_i$ is a constant function, so we still have $(f_j\circ s_i)'(t)=h_iD_if_j(s_i(t))$ for all $t\in[0,1]$. By mean value theorem, there exists $c_i\in(0,1)$ such that $$h_iD_if_j(s_i(c_i))=(f_j\circ s_i)'(c_i)=(f_j\circ s_i)(1)-(f_j\circ s_i)(0)=f_j(\vb p+\vb v_i)-f_j(\vb p+\vb v_{i-1})$$ Hence $$f_j(\vb p+\vb h)-f_j(\vb p)=\sum_i(h_iD_if_j(s_i(c_i)))$$ And we have $$\norm{f_j(\vb p+\vb h)-f_j(\vb p)-\sum_i(h_iD_if_j(\vb p))} =\norm{\sum_i(h_iD_if_j(s_i(c_i)))-\sum_i(h_iD_if_j(\vb p))} =\norm{\sum_i(h_i(D_if_j(s_i(c_i))-D_if_j(\vb p)))} \le\sum_i\abs{h_i}\norm{D_if_j(s_i(c_i))-D_if_j(\vb p)} \lt\sum_i\abs{h_i}\eta \le\sqrt{n}\norm{\vb h}\eta \lt\norm{\vb h}\varepsilon$$ Therefore, $f_j$ is differentiable at $\vb p$ and $Df_j(\vb p)_i=D_if_j(\vb p)$. Since each component function is differentiable, $f$ is differentiable at $\vb p$ and $Df(\vb p)_{j,*}=Df_j(\vb p)$. For continuity of $Df$, with $\vb p$, $\varepsilon$ and $r$ defined the same way, for all $\vb q\in B_r(\vb p)$, given $\vb x$ such that $\Vert\vb x\Vert\le1$, we have $$\norm{(Df(\vb q)-Df(\vb p))\vb x} \le\sum_j\abs{\sum_i(D_if_j(\vb q)-D_if_j(\vb p))x_i} \le\sum_j\sum_i\abs{(D_if_j(\vb q)-D_if_j(\vb p))}\abs{x_i} \le\sum_j\sum_i\frac{\varepsilon}{2nm}\Vert\vb x\Vert \le\frac{\varepsilon}{2}$$ Hence $d(Df(\vb q),Df(\vb p))=\norm{(Df(\vb q)-Df(\vb p))}\le\frac{\varepsilon}{2}\lt\varepsilon$. $\blacksquare$

Proposition. Let $V\subset R^n$ and $U\subset R^m$ be open sets, let $g:V\to R^m$ and $f:U\to R^l$ be continuously differentiable. If for all $\vb x\in V$, $g(\vb x)\in U$, then $f\circ g$ is continuously differentiable. (show proof)

L'Hôpital's rule
Suppose we have $f:U\to R$ and $g:V\to R$ where $U,V\subseteq R$, an open interval $I$, and $c\in I$, such that

$\lim_{x\to c}f(x)=\lim_{x\to c}g(x)=0$ or $\lim_{x\to c}g(x)=\pm\infty$,
$f$ and $g$ are differentiable in $I\setminus\{c\}$,
$g'(x)\neq0$ for all $x\in I\setminus\{c\}$, and
$\lim_{x\to c}\frac{f'(x)}{g'(x)}$ exists in $\overline R$,

then $$\lim_{x\to c}\frac{f(x)}{g(x)}=\lim_{x\to c}\frac{f'(x)}{g'(x)}$$ Note that $c$ can be replaced by $\pm\infty$, with $I$ being an open interval with an endpoint being $c$. (show proof)

Proof. Let $I_1$ and $I_2$ be open intervals obtained by splitting $I$ by $c$. Note that $g'$ is non-zero in $I_1$ and $I_2$. Suppose $a,b\in I_1$ with $g(a)=g(b)=0$. If $a\neq b$, then there exists $x$ strictly between $a$ and $b$ such that $g'(x)=0$, a contradiction. Hence $g$ either has no zero or has a unique zero in $I_1$. Same for $I_2$. Therefore, we can find an open interval $c\in(a,b)\subseteq I$ such that $g$ is non-zero in $(a,b)\setminus\{c\}$.

Suppose $\lim_{x\to c}\frac{f'(x)}{g'(x)}=L\in R$. Then for every $\varepsilon\gt0$, there exists $\delta\gt0$ such that $c+\delta\lt b$ and for all $x\in(c,c+\delta)$, $\frac{f'(x)}{g'(x)}\in(L-\varepsilon/2,L+\varepsilon/2)$. Let $y\in(c,c+\delta)$, then for every $x\in(c,y)$, by Cauchy's mean value theorem, there exists $t\in(x,y)$ such that $(f(y)-f(x))g'(t)=(g(y)-g(x))f'(t)$. Again, since $g'$ is non-zero in $(c,b)$, $g(y)-g(x)\neq0$. Hence $$\frac{f'(t)}{g'(t)}=\frac{f(y)-f(x)}{g(y)-g(x)}$$
Suppose $\lim_{x\to c}f(x)=\lim_{x\to c}g(x)=0$. Note that we have $$\frac{f'(t)}{g'(t)}=\frac{f(y)-f(x)}{g(y)-g(x)}=\frac{\frac{f(y)}{g(y)}-\frac{f(x)}{g(y)}}{1-\frac{g(x)}{g(y)}}$$ Since $t\in(c,c+\delta)$, we have $$\frac{\frac{f(y)}{g(y)}-\frac{f(x)}{g(y)}}{1-\frac{g(x)}{g(y)}}=\frac{f'(t)}{g'(t)}\in(L-\varepsilon/2,L+\varepsilon/2)$$ Since $\lim_{x\to c}f(x)=\lim_{x\to c}g(x)=0$, $$\lim_{x\to c}\frac{\frac{f(y)}{g(y)}-\frac{f(x)}{g(y)}}{1-\frac{g(x)}{g(y)}}=\frac{f(y)}{g(y)}$$ and we have $\frac{f(y)}{g(y)}\in[L-\varepsilon/2,L+\varepsilon/2]\subset(L-\varepsilon,L+\varepsilon)$, where $y$ can be arbitrarily chosen in $(c,c+\delta)$. By a symmetric argument, there exists $\delta'\gt0$ such that for all $y\in (c-\delta',c)$, $\frac{f(y)}{g(y)}\in(L-\varepsilon,L+\varepsilon)$. Therefore, let $\delta^*=\inf(\delta,\delta')$, then for all $x\in (c-\delta^*,c+\delta^*)\setminus\{c\}$, $\frac{f(x)}{g(x)}\in(L-\varepsilon,L+\varepsilon)$. We have shown that $\lim_{x\to c}\frac{f(x)}{g(x)}=L$.

Suppose $\lim_{x\to c}g(x)=\pm\infty$. Note that we have $$\frac{f'(t)}{g'(t)}=\frac{f(x)-f(y)}{g(x)-g(y)}=\frac{\frac{f(x)}{g(x)}-\frac{f(y)}{g(x)}}{1-\frac{g(y)}{g(x)}}$$ $$\frac{f'(t)}{g'(t)}\p{1-\frac{g(y)}{g(x)}}=\frac{f(x)}{g(x)}-\frac{f(y)}{g(x)}$$ $$\frac{f'(t)}{g'(t)}=\frac{f(x)}{g(x)}-\p{\frac{f(y)}{g(x)}-\frac{f'(t)}{g'(t)}\frac{g(y)}{g(x)}}$$ And we will denote $\frac{f(y)}{g(x)}-\frac{f'(t)}{g'(t)}\frac{g(y)}{g(x)}$ by $r(x)$, where $y$ is fixed and $t$ depends on $x$, so that $$\frac{f(x)}{g(x)}=\frac{f'(t)}{g'(t)}+r(x)$$ Since $t\in(c,c+\delta)$, $\frac{f'(t)}{g'(t)}\in(L-\varepsilon/2,L+\varepsilon/2)$, so $\frac{f(x)}{g(x)}\in(L-\varepsilon/2+r(x),L+\varepsilon/2+r(x))$. Note that, since $\lim_{x\to c}g(x)=\pm\infty$, we have $\lim_{x\to c}\frac{1}{g(x)}=0$. Combining with the fact that $\frac{f'(t)}{g'(t)}$, as a function of $x$, is bounded in $(c,y)$, we have $$\lim_{x\to c^+}r(x)=0$$ So given the same $\varepsilon$, there exists $\sigma\gt0$ such that $c+\sigma\lt y$ and for all $x\in (c,c+\sigma)$, $r(x)\in(-\varepsilon/2,\varepsilon/2)$. Then for all $x\in (c,c+\sigma)$, $\frac{f(x)}{g(x)}\in(L-\varepsilon,L+\varepsilon)$. By a symmetric argument, there exists $\sigma'\gt0$ such that for all $x\in (c-\sigma',c)$, $\frac{f(x)}{g(x)}\in(L-\varepsilon,L+\varepsilon)$. Therefore, let $\sigma^*=\inf(\sigma,\sigma')$, then for all $x\in (c-\sigma^*,c+\sigma^*)\setminus\{c\}$, $\frac{f(x)}{g(x)}\in(L-\varepsilon,L+\varepsilon)$. We have shown that $\lim_{x\to c}\frac{f(x)}{g(x)}=L$.

The cases where $c=\pm\infty$ or $\lim_{x\to c}\frac{f'(x)}{g'(x)}=\pm\infty$ can be proven similarly. If $c=\pm\infty$, we only need to prove one side of the limit, obviously. $\blacksquare$

Differentiability class
Suppose $U$ is open. Let $k\in N$. If given any $j\in\{1,\ldots,m\}$, any $k$-tuple $\vb i$ of $\{1,\ldots,n\}$, there exists an $(k+1)$-tuple of functions $(g_0,\ldots,g_k)$ from $U$ to $R$ such that

$g_0=f_j$,
$g_{l}=D_{i_l}g_{l-1}$ for $l\in\{1,\ldots,k\}$, and
$g_k$ is continuous,

then $f$ is said to be of differentiability class $k$, and we also say that $f$ is $C^k$. The set of all $C^k$ functions on $U$ is denoted $C^k(U)$. Note that $C^0$ is equivalent to continuity and $C^1$ is equivalent to continuous differentiability.

Smoothness
Suppose $U$ is open. If $f\in C^k(U)$ for any natural number $k$, then $f$ is said to be smooth, and we also say that $f$ is $C^\infty$. The set of all $C^\infty$ functions on $U$ is denoted $C^\infty(U)$.

Note. By definition, $f$ is smooth (or $C^k$) if and only if all component functions are smooth (or $C^k$). Also, smoothness implies continuous differentiability and hence continuity.

Smoothness of function operations
Given smooth (or $C^k$) functions $f:U\to R^m$ and $g:U\to R^m$ where $U\subseteq R^n$ and $c\in R$, $cf$, $f\pm g$, $f\cdot g$ are smooth (or $C^k$), if $f$ and $g$ are real valued, then $fg$ is smooth (or $C^k$). (show proof)

Smoothness of composite functions
Given smooth (or $C^k$) functions $f:V\to R^l$ and $g:U\to R^m$ where $V\subseteq R^m$ and $U\subseteq R^n$ such that $g(U)\subseteq V$, the composite function $f\circ g:U\to R^l$ is smooth (or $C^k$). (show proof)

Proposition. Let $f:U\to R^m$ where $U\subseteq R^n$ be $C^2$, then $$D_iD_jf=D_jD_if$$ for all $i,j\in\{1,\ldots,n\}$. (show proof)

Generalization of smoothness
Assume $m,n\neq0$ as usual. A map $f:U\to R^m$ where $U$ is any subset of $R^n$ is called smooth if and only if for all $x\in U$, there exist an open subset $U_x$ of $R^n$ containing $x$ and a smooth map $f_x:U_x\to R^m$ that agrees with $f$ on $U_x\cap U$. If $m=0$ or $n=0$, we define $f$ to be smooth. When $m\neq0$, $n\neq0$, and $U$ is open, this definition is equivalent to the original definition of smoothness (show proof). Trivially, smoothness in this definition implies continuity.

Localness of smoothness
Let $m,n\in N$ and let $f:U\to R^m$, where $U$ is any subset of $R^n$, be smooth. Then given any subset $V$ of $U$, $f|_V$ is smooth. (show proof)

Generalized smoothness of function operations
Let $m,n\in N$. Given smooth functions $f:U\to R^m$ and $g:U\to R^m$ where $U\subseteq R^n$ and $c\in R$, $cf$, $f\pm g$, $f\cdot g$ are smooth, if $f$ and $g$ are real valued, then $fg$ is smooth. (show proof)

Generalized smoothness of composite functions
Let $l,m,n\in N$. Given smooth functions $f:V\to R^l$ and $g:U\to R^m$ where $V\subseteq R^m$ and $U\subseteq R^n$ such that $g(U)\subseteq V$, the composite function $f\circ g:U\to R^l$ is smooth. (show proof)

Diffeomorphism
Let $U\subseteq R^n$ and $V\subseteq R^m$ where $n,m\in N$. If $f:U\to V$ is a smooth bijection whose inverse is also smooth, then $f$ is said to be a diffeomorphism, and $U$ and $V$ are said to be diffeomorphic. Trivially, a diffeomorphism is also a homeomorphism, in the topological sense.

Proposition. Suppose $f:V\to W$ and $g:U\to V$ are diffeomorphisms, then $f\circ g:U\to W$ is a diffeomorphism. (show proof)

Proposition. Let $U$ be an open subset of $R^n$ and $V$ be an open subset of $R^m$ where $n,m\in N$ such that $U$ and $V$ are non-empty and diffeomorphic, then $n=m$, and if $n,m$ are non-zero, given any diffeomorphism $f:U\to V$, for all $p\in U$, $Df(p)^{-1}=D(f^{-1})(f(p))$. (show proof)

Definition. Suppose $f:X\to X$ where $X$ is a metric space. If there exists $c\lt1$ such that for all $x,y\in X$, we have $$d(f(x),f(y))\le cd(x,y)$$ then $f$ is said to be a contraction on $X$.

Lemma. Let $X$ be a metric space such that every Cauchy sequence converges. Then given a contraction $f$ on $X$, there exists a unique $x\in X$ such that $f(x)=x$. (show proof)

Lemma. Suppose $U$ is an open ball of $R^n$, $f$ is differentiable, and there exists $M\in R$ such that for every $\vb x\in U$, we have $\norm{Df(\vb x)}\le M$. Then for all $\vb a,\vb b\in U$, $$\norm{f(\vb b)-f(\vb a)}\le M\norm{\vb b-\vb a}$$ (show proof)

Lemma. $\frac{1}{x}$ is smooth on $R\setminus\{0\}$. (show proof)

Lemma. Let $U\subseteq R^n$ be open. Suppose we have $a_{ij}:U\to R$ and $b_{ij}:U\to R$ for $i,j\in\{1,\ldots,n\}$, such that for all $\vb p\in U$, $(a_{ij}(\vb p))_{ij}(b_{ij}(\vb p))_{ij}=I$, then every $a_{ij}$ is smooth (or $C^k$) if and only if every $b_{ij}$ is smooth (or $C^k$). (show proof)

Inverse function theorem
Suppose $f:U\to R^n$ is smooth (or $C^k$ with $k\gt0$), where $U\subseteq R^n$ is open. If $\vb p\in U$ and $Df(\vb p)$ is invertible, then there exists an open subset $V$ of $U$ such that

$\vb p\in V$,
$f(V)$ is open,
$f|_V:V\to f(V)$ is a bijection, and
let $g$ denote ${f|_V}^{-1}$, then $g$ is smooth (or $C^k$) and $$Dg(\vb y)=Df(g(\vb y))^{-1}$$ for all $\vb y\in f(V)$.

(show proof)

Proof. Whether $f$ is smooth or $C^k$ with $k\gt0$, it is continuously differentiable.

Let $A=Df(\vb p)$ and $\varepsilon=\frac{1}{2\norm{A^{-1}}}$. Since $f$ is continuously differentiable at $\vb p$, there exists an open ball $V\subseteq U$ centered at $\vb p$ such that for all $\vb x\in V$, $\norm{Df(\vb x)-A}\lt\varepsilon$. For every $\vb y\in f(V)$, define a function $$\varphi_{\vb y}(\vb x)=\vb x+A^{-1}(\vb y-f(\vb x))$$ on $V$, then $$\norm{D(\varphi_{\vb y})(\vb x)}=\norm{I-A^{-1}Df(\vb x)}=\norm{A^{-1}(A-Df(\vb x))}\le\norm{A^{-1}}\norm{A-Df(\vb x)}\lt\norm{A^{-1}}\varepsilon=\frac{1}{2}$$ For all $\vb x_1,\vb x_2\in V$, if $f(\vb x_1)=f(\vb x_2)$, let $\vb y=f(\vb x_1)=f(\vb x_2)$, then $$\norm{\vb x_1-\vb x_2}=\norm{\varphi_{\vb y}(\vb x_1)-\varphi_{\vb y}(\vb x_2)}\le\frac{1}{2}\norm{\vb x_1-\vb x_2}$$ implying $\vb x_1=\vb x_2$. We have shown that $f|_V:V\to R^n$ is injective, thus $f|_V:V\to f(V)$ is bijective.

Let $\vb y^*\in f(V)$. Then there exists $\vb x^*\in V$ such that $f(\vb x^*)=\vb y^*$. Let $B$ be an open ball centered at $\vb x^*$ with radius $r$ such that $\overline B\subseteq V$. Let $\vb y\in B_{r\varepsilon}(\vb y^*)$. then $$\norm{\varphi_{\vb y}(\vb x^*)-\vb x^*}=\norm{A^{-1}(\vb y-\vb y^*)}\lt\norm{A^{-1}}r\varepsilon=\frac{r}{2}$$ When $\vb x\in\overline B$, $$\norm{\varphi_{\vb y}(\vb x)-\vb x^*}\le\norm{\varphi_{\vb y}(\vb x)-\varphi_{\vb y}(\vb x^*)}+\norm{\varphi_{\vb y}(\vb x^*)-\vb x^*}\lt\frac{1}{2}\norm{\vb x-\vb x^*}+\frac{r}{2}\le r$$ thus $\varphi_{\vb y}(\vb x)\in B$. Hence the restriction of $\varphi_{\vb y}$ on $\overline B$ is a contraction on $\overline B$. Since every Cauchy sequence converges in $R^n$, every Cauchy sequence of $\overline B$ also converges in $R^n$, and hence in $\overline B$. Thus, there exists a unique $\vb x\in\overline B$ such that $\varphi_{\vb y}|_{\overline B}(\vb x)=\vb x$. For this $\vb x$, $A^{-1}(\vb y-f(\vb x))=\vb0$, thus $\vb y-f(\vb x)=\vb 0$ since $A^{-1}$ is bijective. Therefore, $\vb y=f(\vb x)\in f(V)$. We have shown that $f(V)$ is open.

Let $\vb y\in f(V)$ and denote $g(\vb y)$ by $\vb x$. Let $r\gt0$ such that $B_r(\vb y)\subseteq f(V)$ and let $\vb k\in B_r(\vb 0)$, then $\vb y+\vb k\in f(V)$. Let $\vb h=g(\vb y+\vb k)-\vb x$, then $\vb x+\vb h\in V$. Note that for all $\vb k\in B_r(\vb 0)\setminus\{\vb 0\}$, since $g$ is bijective, $g(\vb y)\neq g(\vb y+\vb k)$, thus $\vb h\neq\vb0$. Now we have $$\varphi_{\vb y}(\vb x+\vb h)-\varphi_{\vb y}(\vb x)=\vb h+A^{-1}(f(\vb x)-f(\vb x+\vb h))=\vb h-A^{-1}\vb k$$ Hence $$\norm{\vb h-A^{-1}\vb k}=\norm{\varphi_{\vb y}(\vb x+\vb h)-\varphi_{\vb y}(\vb x)}\le\frac{1}{2}\norm{\vb h}$$ Since $$\norm{\vb h}\le\norm{\vb h-A^{-1}\vb k}+\norm{A^{-1}\vb k}\le\frac{1}{2}\norm{\vb h}+\norm{A^{-1}\vb k}$$ we have $$\norm{\vb h}\le2\norm{A^{-1}\vb k}\le2\norm{A^{-1}}\norm{\vb k}=\frac{\norm{\vb k}}{\varepsilon}$$ For all $\sigma\gt0$, let $\delta=\inf(\sigma\varepsilon,r)$, then for all $\vb k\in B_\delta(\vb0)$, $\norm{\vb h}\le\frac{\norm{\vb k}}{\varepsilon}\lt\sigma$. Hence $\lim_{\vb k\to\vb 0}\vb h=\vb0$.

Note that, since $\vb x\in V$, $d(Df(\vb x),A)=\norm{Df(\vb x)-A}\lt\frac{1}{2\norm{A^{-1}}}\lt\frac{1}{\norm{A^{-1}}}$, and by a lemma proven in the "linear algebra" chapter, $Df(\vb x)$ is invertible, and we denote $Df(\vb x)^{-1}$ by $T$. Since $$g(\vb y+\vb k)-g(\vb y)-T\vb k=\vb h-T\vb k=T(T^{-1}\vb h-\vb k)=-T(f(\vb x+\vb h)-f(\vb x)-Df(\vb x)\vb h)$$ we have $$\frac{\norm{g(\vb y+\vb k)-g(\vb y)-T\vb k}}{\norm{\vb k}}\le\frac{\norm{T}}{\varepsilon}\frac{\norm{f(\vb x+\vb h)-f(\vb x)-Df(\vb x)\vb h}}{\norm{\vb h}}$$ For all $\sigma\gt0$, there exists $\delta\gt0$ with $B_\delta(\vb x)\subseteq V$ such that for all $\vb h^*\in B_\delta(\vb 0)\setminus\{\vb 0\}$, $\frac{\norm{f(\vb x+\vb h^*)-f(\vb x)-Df(\vb x)\vb h^*}}{\norm{\vb h^*}}\in B_\sigma(0)$, and for that $\delta$, there exists $\eta\gt0$ with $\eta\le r$ such that for all $\vb k\in B_\eta(\vb 0)\setminus\{\vb 0\}$, $\vb h\in B_\delta(\vb 0)\setminus\{\vb 0\}$, and hence $\frac{\norm{f(\vb x+\vb h)-f(\vb x)-Df(\vb x)\vb h}}{\norm{\vb h}}\in B_\sigma(0)$. Therefore, $$\lim_{\vb k\to\vb 0}\frac{\norm{f(\vb x+\vb h)-f(\vb x)-Df(\vb x)\vb h}}{\norm{\vb h}}=0$$ Since $\frac{\norm{T}}{\varepsilon}$ does not depend on $\vb k$, we have $$\lim_{\vb k\to\vb 0}\frac{\norm{T}}{\varepsilon}\frac{\norm{f(\vb x+\vb h)-f(\vb x)-Df(\vb x)\vb h}}{\norm{\vb h}}=0$$ By squeeze theorem, $$\lim_{\vb k\to\vb 0}\frac{\norm{g(\vb y+\vb k)-g(\vb y)-T\vb k}}{\norm{\vb k}}=0$$ Therefore, $$Dg(\vb y)=T=Df(g(\vb y))^{-1}$$
Since $g$ is differentiable, it is $C^0$. Suppose $g$ is $C^k$ and $f$ is $C^{k+1}$ where $k\in N$. Then every entry $D_if_j\circ g$ of $Jf\circ g$ is $C^k$. Thus every entry $D_ig_j$ of $Jg$ is $C^k$, implying $g$ is $C^{k+1}$. By induction, if $f$ is $C^k$ for some non-zero $k$, then $g$ is $C^k$; if $f$ is smooth, then $g$ is smooth. $\blacksquare$

Notation. Given $(a_1,\ldots,a_n)=\vb a\in R^n$ and $(b_1,\ldots,b_m)=\vb b\in R^m$, denote $(a_1,\ldots,a_n,b_1,\ldots,b_m)\in R^{n+m}$ by $(\vb a,\vb b)$.

Given $A\in R^{m\times (n+m)}$, let $A_x$ denote an $m\times n$ matrix such that ${A_x}_{ij}=A_{i,j}$, and let $A_y$ denote an $m\times m$ matrix such that ${A_y}_{ij}=A_{i,j+n}$, then clearly, for all $\vb h\in R^n$ and $\vb k\in R^m$, $$A(\vb h,\vb k)=A_x\vb h+A_y\vb k$$

Lemma. If $A\in R^{m\times (n+m)}$ and $A_y$ is invertible, then for every $\vb h\in R^n$, there exists a unique $\vb k\in R^m$ such that $A(\vb h,\vb k)=\vb0$; namely, $$\vb k=-(A_y)^{-1}A_x\vb h$$ (show proof)

Implicit function theorem
Suppose $f:U\to R^m$ is smooth (or $C^k$ with $k\gt0$), where $U\subseteq R^{n+m}$ is open. If $\vb a\in R^n$ and $\vb b\in R^m$ such that

$(\vb a,\vb b)\in U$,
$f(\vb a,\vb b)=\vb0$, and
with $Df(\vb a,\vb b)$ denoted $A$, $A_y$ is invertible,

then there exist $V\subseteq R^{n+m}$ and $W\subseteq R^n$ such that

$(\vb a,\vb b)\in V, \vb a\in W$;
$V,W$ are open;
for every $\vb x\in W$, there exists a unique $\vb y\in R^m$ such that $(\vb x,\vb y)\in V$ and $f(\vb x,\vb y)=\vb0$;
if we define $f^*:W\to R^m$ such that $g(\vb x)$ is the unique $\vb y$ above, then
- $f^*$ is smooth (or $C^k$);
- $f^*(\vb a)=\vb b$; and
- $Df^*(\vb a)=-(A_y)^{-1}A_x$.

(show proof)

Proof. Define $F:U\to R^{n+m}$ by $F(\vb x,\vb y)=(\vb x,f(\vb x,\vb y))$. Then $F$ is smooth (or $C^k$). For some $r\gt0$, we have $B_r(\vb a,\vb b)\subseteq U$. Suppose $(\vb h,\vb k)\in B_r(\vb0,\vb0)$, then $(\vb a+\vb h,\vb b+\vb k)\in U$. Note that $f(\vb a,\vb b)=\vb0$. If we let $r(\vb h,\vb k)=f((\vb a,\vb b)+(\vb h,\vb k))-A(\vb h,\vb k)$, then $$F((\vb a,\vb b)+(\vb h,\vb k))-F(\vb a,\vb b)=(\vb h,f(\vb a+\vb h,\vb b+\vb k))=(\vb h,A(\vb h,\vb k))+(\vb0,r(\vb h,\vb k))$$ hence $$\lim_{(\vb h,\vb k)\to\vb0}\frac{\norm{F((\vb a,\vb b)+(\vb h,\vb k))-F(\vb a,\vb b)-(\vb h,A(\vb h,\vb k))}}{\norm{(\vb h,\vb k)}} =\lim_{(\vb h,\vb k)\to\vb0}\frac{\norm{(\vb0,r(\vb h,\vb k))}}{\norm{(\vb h,\vb k)}} =\lim_{(\vb h,\vb k)\to\vb0}\frac{\norm{r(\vb h,\vb k)}}{\norm{(\vb h,\vb k)}} =0$$ by definition of total derivative. Since $L(\vb h,\vb k)=(\vb h,A(\vb h,\vb k))$ is clearly linear, we have $DF(\vb a,\vb b)=L$. Note that if $DF(\vb a,\vb b)(\vb h,\vb k)=\vb0$, then $(\vb h,A(\vb h,\vb k))=(\vb0,\vb0)$, so $\vb h=\vb0$ and $A(\vb 0,\vb k)=\vb0$, implying $\vb k=-(A_y)^{-1}A_x\vb 0=\vb0$. This shows that $DF(\vb a,\vb b)$ is injective, and hence is invertible.

By inverse function theorem, there exists an open subset $V\subseteq U$ such that $(\vb a,\vb b)\in V$, $F(V)$ is open, and the restriction $F|_V:V\to F(V)$ is bijective. Note that $(\vb a,\vb0)=F(\vb a,\vb b)\in F(V)$. If we define $W$ to be all $\vb x\in R^n$ such that $(\vb x,\vb0)\in F(V)$, then clearly, $W$ is also open, and $\vb a\in W$. For every $\vb x\in W$, $(\vb x,\vb0)\in F(V)$, then there exists $(\vb x',\vb y)\in V$ such that $F(\vb x',\vb y)=(\vb x,\vb0)$, which then implies that $\vb x'=\vb x$ and $f(\vb x,\vb y)=\vb0$. Note that $\vb y\in R^m$ and $(\vb x,\vb y)\in V$. To show uniqueness, suppose $\vb y'\in R^m$ such that $(\vb x,\vb y')\in V$ and $f(\vb x,\vb y')=\vb0$, then $F(\vb x,\vb y')=(\vb x,f(\vb x,\vb y'))=(\vb x,f(\vb x,\vb y))=F(\vb x,\vb y)$. Since $(\vb x,\vb y'),(\vb x,\vb y)\in V$, $F|_V(\vb x,\vb y')=F|_V(\vb x,\vb y)$, and by injectivity of $F|_V$, $(\vb x,\vb y')=(\vb x,\vb y)$, hence $\vb y'=\vb y$. Now define $f^*:W\to R^m$ such that $f^*(\vb x)$ is the unique $\vb y\in R^m$ with $(\vb x,\vb y)\in V$ and $f(\vb x,\vb y)=\vb0$. Then for all $\vb x\in W$, $f(\vb x,f^*(\vb x))=\vb0$. Since $f(\vb a,\vb b)=\vb0$, we have $f^*(\vb a)=\vb b$.

Also by inverse function theorem, denote ${F|_V}^{-1}:F(V)\to V$ by $G$, then $G$ is smooth (or $C^k$), and for all $\vb x\in W$, since $F|_V(\vb x,f^*(\vb x))=(\vb x,\vb 0)$, $G(\vb x,\vb 0)=(\vb x,f^*(\vb x))$. Define $H:W\to V$ by $H(\vb x)=G(\vb x,\vb 0)$, then $H$ is smooth (or $C^k$). Note that for all $\vb x\in W$, $H(\vb x)=G(\vb x,\vb 0)=(\vb x,f^*(\vb x))$. Thus $f^*$ is smooth (or $C^k$). Let $\vb x\in W$, then $DH(\vb x)$ is $(n+m)\times n$ with $DH(\vb x)_{i,j}=I_{ij}$ for all $i\in\{1,\ldots,n\}$ and $DH(\vb x)_{n+i,j}=Df^*(\vb x)_{ij}$ for all $i\in\{1,\ldots,m\}$. Hence for all $\vb h\in R^n$, $DH(\vb x)\vb h=(\vb h,Df^*(\vb x)\vb h)$. Since $(f\circ H)(\vb x)=f(\vb x,f^*(\vb x))=\vb 0$ for all $\vb x\in W$, we have $Df(H(\vb x))DH(\vb x)=D(f\circ H)(\vb x)=0$. With $Df(H(\vb a))=Df(\vb a,\vb b)=A$, we have $ADH(\vb a)=0$. It follows that, for all $\vb h\in R^n$, $$A_x\vb h+A_yDf^*(\vb a)\vb h=A(\vb h,Df^*(\vb a)\vb h)=ADH(\vb a)\vb h=\vb0$$ implying $Df^*(\vb a)\vb h=-(A_y)^{-1}A_x\vb h$. Therefore, $$Df^*(\vb a)=-(A_y)^{-1}A_x$$ $\blacksquare$

Riemann integral (show)

Note. By default, when a closed interval $[a,b]$ is given, we assume $a\le b$.

Partition
A partition $P$ on $[a,b]$ is a tuple of points $x_0,\ldots,x_n\in R$ for some positive natural number $n$ such that $$a=x_0\le\ldots\le x_n=b$$ Each $[x_{i-1},x_i]$ is called a segment of $P$. We will denote $[x_{i-1},x_i]$ by $P_i$ and $x_i-x_{i-1}$ by $\Delta x_i$. Also, we will use $\mathscr P([a,b])$, or just $\mathscr P$ if unambiguous, to denote the collection of partitions on $[a,b]$.

Refinement
A partition $P^*$ is said to be a refinement of a partition $P$, if point in $P$ is in $P^*$. Given Partitions $P_1$ and $P_2$, let $P^*$ be a tuple obtained by combining the points of $P_1$ and $P_2$ in order (this can be done with the recursion theorem), then $P^*$ is called a common refinement of $P_1$ and $P_2$, as it is clearly a partition and a refinement of both $P_1$ and $P_2$.

Riemann integral
Given a partition $P=(x_0,\ldots,x_n)$ on $[a,b]$, a real-valued function $f$ defined and bounded on $[a,b]$, and a real-valued function $\alpha$ defined and increasing on $[a,b]$, denote $\alpha(x_i)-\alpha(x_{i-1})$ by $\Delta\alpha_i$ and define $$U(P,f,\alpha)=\sum_{i=1}^n\p{\sup_{x\in P_i}f(x)}\Delta\alpha_i$$ $$L(P,f,\alpha)=\sum_{i=1}^n\p{\inf_{x\in P_i}f(x)}\Delta\alpha_i$$ If there exists $I\in R$ such that $$\inf_{P\in\mathscr P}U(P,f,\alpha)=I=\sup_{P\in\mathscr P}L(P,f,\alpha)$$ then $I$ is said to be the integral of $f$ on $[a,b]$ with respect to $\alpha$, denoted $$\int_a^b f d\alpha\quad\text{or}\quad\int_a^b f(x) d\alpha(x)$$ and $f$ is said to be integrable on $[a,b]$ with respect to $\alpha$, where $f$ is called the integrand and $\alpha$ is called the integrator of the integral. If $\alpha$ is an identity function, then we may just say that $f$ is integrable on $[a,b]$. Uniqueness of an integral, if exists, follows directly from uniqueness of supremum/infimum.

Definition. Suppose $a\gt b$ and $\int_b^a f d\alpha$ exists, we define $\int_a^b f d\alpha$ by $-\int_b^a f d\alpha$.

Proposition. Suppose $f$ is bounded and $\alpha$ is increasing on $[a,b]$. If $P^*$ is a refinement of $P$, then $$U(P^*,f,\alpha)\le U(P,f,\alpha)$$ $$L(P^*,f,\alpha)\ge L(P,f,\alpha)$$ (show proof)

Proposition. Suppose $f$ is bounded and $\alpha$ is increasing on $[a,b]$. $\inf_{P\in\mathscr P}U(P,f,\alpha)$ and $\sup_{P\in\mathscr P}L(P,f,\alpha)$ exist and $$\inf_{P\in\mathscr P}U(P,f,\alpha)\ge\sup_{P\in\mathscr P}L(P,f,\alpha)$$ (show proof)

Proposition. Suppose $f$ is bounded and $\alpha$ is increasing on $[a,b]$. $f$ is integrable on $[a,b]$ with respect to $\alpha$ if and only if for every $\varepsilon\gt0$ there exists a partition $P$ such that $$U(P,f,\alpha)-L(P,f,\alpha)\lt\varepsilon$$ (show proof)

Proposition. Continuity implies integrability. (show proof)

Properties of integral
Suppose $f$ is integrable on $[a,b]$ with respect to $\alpha,\alpha_1,\alpha_2$, $f_1,f_2$ is integrable on $[a,b]$ with respect to $\alpha$, and $c\in R$.

$$\int_a^b(f_1+f_2)d\alpha=\int_a^bf_1d\alpha+\int_a^bf_2d\alpha$$
$$\int_a^b(cf)d\alpha=c\int_a^bfd\alpha$$
$$\int_a^bfd(\alpha_1+\alpha_2)=\int_a^bfd\alpha_1+\int_a^bfd\alpha_2$$
if $c\gt0$, $$\int_a^bfd(c\alpha)=c\int_a^bfd\alpha$$
if $c\in[a,b]$, $$\int_a^cfd\alpha+\int_c^bfd\alpha=\int_a^bfd\alpha$$
if $f_1(x)\le f_2(x)$ on $[a,b]$, $$\int_a^bf_1d\alpha\le\int_a^bf_2d\alpha$$

(show proof)

Proof. For every $\varepsilon\gt0$, there exist $P_1,P_2$ such that $U(P_1,f_1,\alpha)-L(P_1,f_1,\alpha)\lt\frac{\varepsilon}{2}$ and $U(P_2,f_2,\alpha)-L(P_2,f_2,\alpha)\lt\frac{\varepsilon}{2}$. Let $P$ be the common refinement of $P_1$ and $P_2$, then $$U(P,f_1+f_2,\alpha)-L(P,f_1+f_2,\alpha) \le(U(P,f_1,\alpha)+U(P,f_2,\alpha))-(L(P,f_1,\alpha)+L(P,f_2,\alpha)) \le(U(P_1,f_1,\alpha)-L(P_1,f_1,\alpha))+(U(P_2,f_2,\alpha)-L(P_2,f_2,\alpha)) \lt\varepsilon$$ so $f_1+f_2$ is integrable on $[a,b]$ with respect to $\alpha$. Then we have $$\int_a^bf_1d\alpha+\int_a^bf_2d\alpha-\varepsilon\lt L(P_1,f_1,\alpha)+L(P_2,f_2,\alpha)\le L(P,f_1+f_2,\alpha) \le\int_a^b(f_1+f_2)d\alpha\le U(P,f_1+f_2,\alpha)\le U(P_1,f_1,\alpha)+U(P_2,f_2,\alpha)\lt \int_a^bf_1d\alpha+\int_a^bf_2d\alpha+\varepsilon$$ Hence $$\int_a^b(f_1+f_2)d\alpha=\int_a^bf_1d\alpha+\int_a^bf_2d\alpha$$
Suppose $c\neq0$. For every $\varepsilon\gt0$, there exist $P$ such that $U(P,f,\alpha)-L(P,f,\alpha)\lt\frac{\varepsilon}{\abs{c}}$, so $U(P,cf,\alpha)-L(P,cf,\alpha)=\abs{c}(U(P,f,\alpha)-L(P,f,\alpha))\lt\varepsilon$, implying $cf$ is integrable on $[a,b]$ with respect to $\alpha$. If $c\gt0$, then we have $$c\int_a^bfd\alpha-\varepsilon\lt cL(P,f,\alpha)=L(P,cf,\alpha)\le\int_a^b(cf)d\alpha\le U(P,cf,\alpha)=cU(P,f,\alpha)\lt c\int_a^bfd\alpha+\varepsilon$$ If $c\lt0$, then we have $$c\int_a^bfd\alpha-\varepsilon\lt cU(P,f,\alpha)=L(P,cf,\alpha)\le\int_a^b(cf)d\alpha\le U(P,cf,\alpha)=cL(P,f,\alpha)\lt c\int_a^bfd\alpha+\varepsilon$$ Suppose $c=0$, both $\int_a^b(cf)d\alpha$ and $c\int_a^bfd\alpha$ are clearly $0$. Hence $$\int_a^b(cf)d\alpha=c\int_a^bfd\alpha$$
For every $\varepsilon\gt0$, there exists $P_1,P_2$ such that $U(P_1,f,\alpha_1)-L(P_1,f,\alpha_1)\lt\frac{\varepsilon}{2}$ and $U(P_2,f,\alpha_2)-L(P_2,f,\alpha_2)\lt\frac{\varepsilon}{2}$. Let $P$ be the common refinement of $P_1$ and $P_2$, then $$U(P,f,\alpha_1+\alpha_2)-L(P,f,\alpha_1+\alpha_2) =(U(P,f,\alpha_1)+U(P,f,\alpha_2))-(L(P,f,\alpha_1)+L(P,f,\alpha_2)) \le(U(P_1,f,\alpha_1)-L(P_1,f,\alpha_1))+(U(P_2,f,\alpha_2)-L(P_2,f,\alpha_2)) \lt\varepsilon$$ so $f$ is integrable on $[a,b]$ with respect to $\alpha_1+\alpha_2$. Then we have $$\int_a^bfd\alpha_1+\int_a^bfd\alpha_2-\varepsilon\lt L(P_1,f,\alpha_1)+L(P_2,f,\alpha_2)\le L(P,f,\alpha_1+\alpha_2)\le \int_a^bfd(\alpha_1+\alpha_2)\le U(P,f,\alpha_1+\alpha_2)\le U(P_1,f,\alpha_1)+U(P_2,f,\alpha_2)\lt\int_a^bfd\alpha_1+\int_a^bfd\alpha_2+\varepsilon$$ Hence $$\int_a^bfd(\alpha_1+\alpha_2)=\int_a^bfd\alpha_1+\int_a^bfd\alpha_2$$
For every $\varepsilon\gt0$, there exist $P$ such that $U(P,f,\alpha)-L(P,f,\alpha)\lt\frac{\varepsilon}{c}$, so $U(P,f,c\alpha)-L(P,f,c\alpha)=c(U(P,f,\alpha)-L(P,f,\alpha))\lt\varepsilon$, implying $f$ is integrable on $[a,b]$ with respect to $c\alpha$. Then we have $$c\int_a^bfd\alpha-\varepsilon\lt cL(P,f,\alpha)=L(P,f,c\alpha)\le\int_a^bfd(c\alpha)\le U(P,f,c\alpha)=cU(P,f,\alpha)\lt c\int_a^bfd\alpha+\varepsilon$$ Hence $$\int_a^bfd(c\alpha)=c\int_a^bfd\alpha$$
For every $\varepsilon\gt0$, there exist $P$ such that $U(P,f,\alpha)-L(P,f,\alpha)\lt\varepsilon$. Let $P^*$ be the refinement of $P$ obtained by adding $c$ in $P$. Let $P_1,P_2$ be partitions of $[a,c],[c,b]$ obtained from $P^*$. Then for $i\in\{1,2\}$, $$U(P_i,f,\alpha)-L(P_i,f,\alpha)\le U(P^*,f,\alpha)-L(P^*,f,\alpha)\le U(P,f,\alpha)-L(P,f,\alpha)\lt\varepsilon$$ so $f$ is integrable on $[a,c]$ and $[c,b]$ with respect to $\alpha$. Then we have $$\int_a^bfd\alpha-\varepsilon\lt L(P,f,\alpha)\le L(P_1,f,\alpha)+L(P_2,f,\alpha)\le \int_a^cfd\alpha+\int_c^bfd\alpha\le U(P_1,f,\alpha)+U(P_2,f,\alpha)\le U(P,f,\alpha)\lt\int_a^bfd\alpha+\varepsilon$$ Hence $$\int_a^cfd\alpha+\int_c^bfd\alpha=\int_a^bfd\alpha$$
For every $\varepsilon\gt0$, there exist $P$ such that $U(P,f_2,\alpha)-L(P,f_2,\alpha)\lt\varepsilon$, so $$\int_a^bf_1d\alpha\le U(P,f_1,\alpha)\le U(P,f_2,\alpha)\lt\int_a^bf_2d\alpha+\varepsilon$$ Hence, $$\int_a^bf_1d\alpha\le\int_a^bf_2d\alpha$$ $\blacksquare$

Proposition. Suppose $f$ and $g$ are continuous on $[a,b]$, where $a\lt b$, with $f(x)\le g(x)$ for all $x\in[a,b]$. If there exists $c\in[a,b]$ such that $f(x)\lt g(x)$, then $$\int_a^bfd\alpha\lt\int_a^bgd\alpha$$ (show proof)

Antiderivative
Let $U$ be an open subset of $R$. Suppose we have $f:U\to R$ and $F:U\to R$ such that $F$ is differentiable and $F'=f$, then $F$ is said to be an antiderivative of $f$.

Proposition. Suppose $U$ is an open interval, whose endpoints may or may not be $\pm\infty$, and $F:U\to R$ is an antiderivatives of $f:U\to R$, then the set of functions $F+C$, where $C:U\to R$ is any constant function, is the set of antiderivatives of $f$. (show proof)

Fundamental theorem of calculus
Suppose $U$ is an open subset of $R$ with $[a,b]\subset U$. If $f:U\to R$ is integrable on $[a,b]$ and $F:U\to R$ is an antiderivative of $f$, then $$\int_a^bf(x)dx=F(b)-F(a)$$ (show proof)

Lemma. Suppose $f$ is integrable on $[a,b]$ with respect to $\alpha$. Let $M\ge0$ such that $\abs{f(t)}\le M$ for all $t\in[a,b]$, then $$\abs{\int_a^bfd\alpha}\le M(\alpha(b)-\alpha(a))$$ (show proof)

Proposition. Suppose $f:(a,b)\to R$, where $a\lt b$, is continuous. Fix any $c\in(a,b)$ and define $F:(a,b)\to R$ by $$F(x)=\int_c^xf(t)dt$$ then $F$ is an antiderivative of $f$. Note that $a$ can be $-\infty$ and $b$ can be $\infty$. (show proof)

Integration by part
Suppose $U$ is an open subset of $R$ with $[a,b]\subset U$. Let $F:U\to R$ and $G:U\to R$ be continuously differentiable. If $f=F'$ and $g=G'$, then $$\int_a^bfG(x)dx+\int_a^bFg(x)dx=FG(b)-FG(a)$$ (show proof)

Proposition. Suppose $U$ is a subset of $R$ with $[a,b]\subseteq U$, suppose $f:U\to R$ is continuous, and suppose $\alpha$ is a real-valued continuously differentiable function defined and increasing on $[a,b]$. Then $$\int_a^bfd\alpha=\int_a^bf(x)\alpha'(x)dx$$ (show proof)

Proof. $f$ is integrable on $[a,b]$ with respect to $\alpha$ since it is continuous. since $f$ is continuous and $\alpha$ is continuously differentiable, both $\alpha'$ and $f\alpha'$ are continuous and hence integrable on $[a,b]$. Since $f$ is continuous, $\abs{f}$ is also continuous and hence bounded on $[a,b]$. Let $M=\sup_{x\in[a,b]}\abs{f(x)}$, then $M\ge0$.

For every $\varepsilon\gt0$, let $\delta=\frac{\varepsilon}{M}$ if $M\neq0$ and $\delta=1$ if $M=0$, then there exists $P_1$ such that $U(P_1,\alpha',x)-L(P_1,\alpha',x)\lt\delta$ and we have $M(U(P_1,\alpha',x)-L(P_1,\alpha',x))\lt\varepsilon$. Also, there exist $P_2$ such that $U(P_2,f,\alpha)-L(P_2,f,\alpha)\lt\varepsilon$ and $P_3$ such that $U(P_3,f\alpha',x)-L(P_3,f\alpha',x)\lt\varepsilon$. Let $P$ be the common refinement of $P_1,P_2,P_3$, then $$M(U(P,\alpha',x)-L(P,\alpha',x))\lt\varepsilon$$ $$U(P,f,\alpha)-L(P,f,\alpha)\lt\varepsilon$$ $$U(P,f\alpha',x)-L(P,f\alpha',x)\lt\varepsilon$$
By mean value theorem, for each $i$, there exists $t_i\in P_i$ such that $$\Delta\alpha_i=\alpha'(t_i)\Delta x_i$$ If $s_i\in P_i$ for each $i$, then $$\abs{\sum_if(s_i)\Delta\alpha_i-\sum_if(s_i)\alpha'(s_i)\Delta x_i} =\abs{\sum_if(s_i)\alpha'(t_i)\Delta x_i-\sum_if(s_i)\alpha'(s_i)\Delta x_i} \le\sum_i\abs{f(s_i)}\abs{\alpha'(t_i)-\alpha'(s_i)}\Delta x_i \le M\sum_i\abs{\alpha'(t_i)-\alpha'(s_i)}\Delta x_i \le M\sum_i(\sup_{x\in P_i}\alpha'(x)-\inf_{x\in P_i}\alpha'(x))\Delta x_i =M(U(P,\alpha',x)-L(P,\alpha',x)) \lt\varepsilon$$ Hence $$\sum_if(s_i)\Delta\alpha_i \lt\sum_if(s_i)\alpha'(s_i)\Delta x_i+\varepsilon \le U(P,f\alpha',x)+\varepsilon$$ and $$\sum_if(s_i)\alpha'(s_i)\Delta x_i \lt\sum_if(s_i)\Delta\alpha_i+\varepsilon \le U(P,f,\alpha)+\varepsilon$$ Then we have $$U(P,f,\alpha)\le U(P,f\alpha',x)+\varepsilon$$ and $$U(P,f\alpha',x)\le U(P,f,\alpha)+\varepsilon$$ implying $$\abs{U(P,f,\alpha)-U(P,f\alpha',x)}\le\varepsilon$$ Therefore, $$\abs{\int_a^bfd\alpha-\int_a^bf(x)\alpha'(x)dx} \le\p{U(P,f,\alpha)-\int_a^bfd\alpha}+\abs{U(P,f,\alpha)-U(P,f\alpha',x)}+\p{U(P,f\alpha',x)-\int_a^bf(x)\alpha'(x)dx} \lt 3\varepsilon$$ And we conclude that $$\int_a^bfd\alpha=\int_a^bf(x)\alpha'(x)dx$$ $\blacksquare$

Proposition. Suppose $f$ is integrable on $[a,b]$ with respect to $\alpha$ and suppose $\gamma$ is a real-valued continuous function defined and strictly increasing on $[A,B]$ such that $\gamma(A)=a$ and $\gamma(B)=b$. Define $\beta=\alpha\circ\gamma$ and $g=f\circ\gamma$. Then $$\int_a^bfd\alpha=\int_A^Bgd\beta$$ (show proof)

Integration by substitution
Suppose $U$ is a subset of $R$ with $[a,b]\subseteq U$, suppose $f:U\to R$ is continuous, and suppose $\gamma$ is a real-valued continuously differentiable function defined and strictly increasing on $[A,B]$ such that $\gamma(A)=a$ and $\gamma(B)=b$. Then $$\int_a^bf(x)dx=\int_A^B(f\circ\gamma)(x)\gamma'(x)dx$$ (show proof)

Multi-dimensional integral
Let $n$ be a natural number. Let $\vb a,\vb b\in R^n$ where $a_i\le b_i$, denote $[a_1,b_1]\times\ldots\times[a_n,b_n]$ by $[\vb a,\vb b]$. An $n$-partition $P$ with respect to $[\vb a,\vb b]$ is an $n$-array on $R^n$ with some dimensions $(k_1+1,\ldots,k_n+1)$ where $k_i\gt0$, such that there exist (and thus uniquely exist) partitions $P_1,\ldots,P_n$ of $[a_1,b_1],\ldots,[a_n,b_n]$ respectively, such that $P_i$ has size $k_i+1$ and for all $n$-index $(j_1,\ldots,j_k)$ fitting the dimensions of $P$, $$P(j_1,\ldots,j_k)=(P_1(j_1),\ldots,P_n(j_n))$$ We use $\mathscr P([\vb a,\vb b])$, or just $\mathscr P$ if unambiguous, to denote the collection of partitions on $[\vb a,\vb b]$. Given a partition $P\in\mathscr P([\vb a,\vb b])$ and a real-valued function $f$ defined and bounded on $[\vb a,\vb b]$, denote the collection of $n$-indexes $(j_1,\ldots,j_n)$ of $P$ such that $j_i\gt0$ by $I^*$, and given $(j_1,\ldots,j_n)\in I^*$, denote the collection of $(x_1,\ldots,x_n)\in[\vb a,\vb b]$ such that $x_i\in[P_i(j_i-1),P_i(j_i)]$ by $P_{j_1,\ldots,j_n}$ and $(P_i(j_i)-P_i(j_i-1))$ by $\Delta {P_i}_{j_i}$, then define $$U(P,f)=\sum_{(j_1,\ldots,j_n)\in I^*}\p{\sup_{\vb x\in P_{j_1,\ldots,j_n}}f(\vb x)}\prod_i\Delta {P_i}_{j_i}$$ $$L(P,f)=\sum_{(j_1,\ldots,j_n)\in I^*}\p{\inf_{\vb x\in P_{j_1,\ldots,j_n}}f(\vb x)}\prod_i\Delta {P_i}_{j_i}$$ If there exists $I\in R$ such that $$\inf_{P\in\mathscr P}U(P,f)=I=\sup_{P\in\mathscr P}L(P,f)$$ then $I$ is said to be the integral of $f$ on $[\vb a,\vb b]$, denoted $$\int_{[\vb a,\vb b]}fd\vb x\quad\text{or}\quad\int_{[\vb a,\vb b]}fd(x_1,\ldots,x_n)\quad\text{or}\quad\int_{[\vb a,\vb b]}f(x_1,\ldots,x_n)d(x_1,\ldots,x_n)$$ and $f$ is said to be integrable on $[\vb a,\vb b]$. If a multi-dimensional integral exists, it uniquely exists, due to the uniqueness of supremum/infimum. Note that this definition allows $n=0$, in which case the integral is just $f(())$, where $()\in R^0$.

Proposition. Continuity implies multi-dimensional integrability. (show proof)

Proposition. Suppose $f,f_1,f_2$ are integrable on $[\vb a,\vb b]$ and $c\in R$, then $$\int_{[\vb a,\vb b]}(f_1+f_2)d(x_1,\ldots,x_n)=\int_{[\vb a,\vb b]}f_1d(x_1,\ldots,x_n)+\int_{[\vb a,\vb b]}f_2d(x_1,\ldots,x_n)$$ $$\int_{[\vb a,\vb b]}(cf)d(x_1,\ldots,x_n)=c\int_{[\vb a,\vb b]}fd(x_1,\ldots,x_n)$$ (show proof)

Note. In the propositions below, if an expression involves an iterated integral, the number of iterated integrals is meta-logical, and any permutation involving that number is also meta-logical. Also, $x_i$ in an iterated integral does not denote a member of a tuple, but a member of a meta-logical tuple of variable symbols, and $dx_i$ simply means the integrator is an identity function.

Proposition. Suppose $f$ is continuous on $[\vb a,\vb b]$, then $$\int_{[\vb a,\vb b]}f(x_1,\ldots,x_n)d(x_1,\ldots,x_n)=\int_{a_1}^{b_1}\ldots\int_{a_n}^{b_n}f(x_1,\ldots,x_n)dx_n\ldots dx_1$$ (show proof)

Proof. Let $\vb a_k,\vb b_k$ be the restrictions of $\vb a,\vb b$ on $\{1,\ldots,k\}$. Let $F_k$ denote $$\int_{a_{k+1}}^{b_{k+1}}\ldots\int_{a_n}^{b_n}f(x_1,\ldots,x_n)dx_n\ldots dx_{k+1}$$ Let $k\in\{1,\ldots,n\}$ and suppose for inductive hypothesis that $$\int_{[\vb a,\vb b]}f(x_1,\ldots,x_n)d(x_1,\ldots,x_n)=\int_{[\vb a_k,\vb b_k]}F_kd(x_1,\ldots,x_k)$$ where $F_k$ defines a continuous real-valued function on $[\vb a_k,\vb b_k]$.

Given $\vb x=(x_1,\ldots,x_{k-1})\in[\vb a_{k-1},\vb b_{k-1}]$, denote the map $x\mapsto(x_1,\ldots,x_{k-1},x)$ by $\phi_{\vb x}$. Then $F_k\circ\phi_{\vb x}$ is continuous and hence integrable on $[a_k,b_k]$. Thus $F_{k-1}$ defines a real-valued function on $[\vb a_{k-1},\vb b_{k-1}]$. If $a_k=b_k$, then $F_{k-1}$ is constant and thus continuous. Now suppose $a_k\lt b_k$ and let $d$ denote $b_k-a_k$. Since $F_k$ is continuous and $[\vb a_k,\vb b_k]$ is compact, $F_k$ is uniformly continuous. Let $\vb x=(x_1,\ldots,x_{k-1})\in[\vb a_{k-1},\vb b_{k-1}]$. Let $\varepsilon\gt0$. Then for some $\delta\gt0$, for all $\vb w_1,\vb w_2\in[\vb a_k,\vb b_k]$ with $d(\vb w_1,\vb w_2)\lt\delta$, $d(F_k(\vb w_1),F_k(\vb w_2))\lt\frac{\varepsilon}{d}$. Let $\vb y=(y_1,\ldots,y_{k-1})\in[\vb a_{k-1},\vb b_{k-1}]$ with $d(\vb x,\vb y)\lt\delta$. Then for all $x\in[a_k,b_k]$, $d(\phi_{\vb x}(x),\phi_{\vb y}(x))=d(\vb x,\vb y)\lt\delta$, implying $d(F_k(\phi_{\vb x}(x)),F_k(\phi_{\vb y}(x)))\lt\frac{\varepsilon}{d}$. Thus $$\int_{a_k}^{b_k}F_k(\phi_{\vb x}(x))dx-\varepsilon =\int_{a_k}^{b_k}(F_k(\phi_{\vb x}(x))-\frac{\varepsilon}{d})dx \le\int_{a_k}^{b_k}F_k(\phi_{\vb y}(x))dx \le\int_{a_k}^{b_k}(F_k(\phi_{\vb x}(x))+\frac{\varepsilon}{d})dx =\int_{a_k}^{b_k}F_k(\phi_{\vb x}(x))dx+\varepsilon$$ Then $d(F_{k-1}(\vb x),F_{k-1}(\vb y))\lt\varepsilon$. We have shown that $F_{k-1}$ is continuous, and thus integrable.

Let $\varepsilon\gt0$. There exists a partition $P$ of $[\vb a_k,\vb b_k]$, constructed from partitions $P_1,\ldots,P_k$, such that $U(P,F_k)-L(P,F_k)\lt\varepsilon$. Then $P'$ constructed from $P_1,\ldots,P_{k-1}$ is a partition of $[\vb a_{k-1},\vb b_{k-1}]$. Note that $$U(P',F_{k-1}) =\sum_{j_1,\ldots,j_{k-1}}\p{\sup_{\vb x\in P'_{j_1,\ldots,j_{k-1}}}\int_{a_k}^{b_k}(F_k\circ\phi_{\vb x})(x)dx}\prod_{i=1}^{k-1}\Delta {P_i}_{j_i} \le\sum_{j_1,\ldots,j_{k-1}}\p{\sup_{\vb x\in P'_{j_1,\ldots,j_{k-1}}}U(P_k,F_k\circ\phi_{\vb x})}\prod_{i=1}^{k-1}\Delta {P_i}_{j_i} $$ $$ =\sum_{j_1,\ldots,j_{k-1}}\p{\sup_{\vb x\in P'_{j_1,\ldots,j_{k-1}}}\sum_l\p{\sup_{x\in{P_k}_{l}}(F_k\circ\phi_{\vb x})(x)}\Delta{P_i}_{l}}\prod_{i=1}^{k-1}\Delta {P_i}_{j_i} \le\sum_{j_1,\ldots,j_{k-1}}\sum_l\p{\sup_{\vb x\in P'_{j_1,\ldots,j_{k-1}}}\p{\sup_{x\in{P_k}_{l}}(F_k\circ\phi_{\vb x})(x)}\Delta{P_i}_{l}}\prod_{i=1}^{k-1}\Delta {P_i}_{j_i} $$ $$ =\sum_{j_1,\ldots,j_{k-1}}\sum_l\p{\sup_{\vb x\in P'_{j_1,\ldots,j_{k-1}}}\sup_{x\in{P_k}_{l}}(F_k\circ\phi_{\vb x})(x)}\Delta{P_i}_{l}\prod_{i=1}^{k-1}\Delta {P_i}_{j_i} \le\sum_{l_1,\ldots,l_k}\p{\sup_{\vb x\in P_{l_1,\ldots,l_k}}F_k(\vb x)}\prod_{j=1}^k\Delta {P_j}_{l_j} =U(P,F_k)$$ Similarly, $L(P',F_{k-1})\ge L(P,F_k)$. Thus $$\int_{[\vb a_{k-1},\vb b_{k-1}]}F_{k-1}d(x_1,\ldots,x_{k-1})-\varepsilon\le U(P',F_{k-1})-\varepsilon\le U(P,F_k)-\varepsilon\lt\int_{[\vb a_k,\vb b_k]}F_kd(x_1,\ldots,x_k) \lt L(P,F_k)+\varepsilon\le L(P',F_{k-1})+\varepsilon\le\int_{[\vb a_{k-1},\vb b_{k-1}]}F_{k-1}d(x_1,\ldots,x_{k-1})+\varepsilon$$ Therefore, $$\int_{[\vb a,\vb b]}f(x_1,\ldots,x_n)d(x_1,\ldots,x_n)=\int_{[\vb a_k,\vb b_k]}F_kd(x_1,\ldots,x_k)=\int_{[\vb a_{k-1},\vb b_{k-1}]}F_{k-1}d(x_1,\ldots,x_{k-1})$$ We have concluded the inductive step.

Note that $[\vb a_n,\vb b_n]$ is just $[\vb a,\vb b]$ and $F_n$ is just $f(x_1,\ldots,x_n)$. By induction, we have $$\int_{[\vb a,\vb b]}f(x_1,\ldots,x_n)d(x_1,\ldots,x_n)=\int_{[\vb a_0,\vb b_0]}F_0d(x_1,\ldots,x_0)=\int_{a_1}^{b_1}\ldots\int_{a_n}^{b_n}f(x_1,\ldots,x_n)dx_n\ldots dx_1$$ $\blacksquare$

Proposition. Suppose $f$ is continuous on $[\vb a,\vb b]$, and $\sigma$ is a permutation of ${1,\ldots,n}$, then $$\int_{a_1}^{b_1}\ldots\int_{a_n}^{b_n}f(x_1,\ldots,x_n)dx_n\ldots dx_1=\int_{a_{\sigma_1}}^{b_{\sigma_1}}\ldots\int_{a_{\sigma_n}}^{b_{\sigma_n}}f(x_1,\ldots,x_n)dx_{\sigma_n}\ldots dx_{\sigma_1}$$ (show proof)

Proof. Let $\vb a_\sigma,\vb b_\sigma$ denote $(a_{\sigma_1},\ldots,a_{\sigma_n}),(b_{\sigma_1},\ldots,b_{\sigma_n})$. Let $\phi$ denote the map $(x_1,\ldots,x_n)\mapsto(x_{\sigma_1},\ldots,x_{\sigma_n})$ from $[\vb a,\vb b]$ to $[\vb a_\sigma,\vb b_\sigma]$. Then $\phi^{-1}$ is the map $(y_1,\ldots,y_n)\mapsto(y_{\sigma^{-1}_1},\ldots,y_{\sigma^{-1}_n})$ from $[\vb a_\sigma,\vb b_\sigma]$ to $[\vb a,\vb b]$. Since $\phi^{-1}$ is continuous, $f\circ\phi^{-1}$ is continuous and thus integrable. Let $\varepsilon\gt0$. There exists a partition $P$ of $[\vb a,\vb b]$, constructed from partitions $P_1,\ldots,P_n$, such that $U(P,f)-L(P,f)\lt\varepsilon$. Let $P'$ denote the partition of $[\vb a_\sigma,\vb b_\sigma]$ constructed from $P_{\sigma_1},\ldots,P_{\sigma_n}$. Then $$ U(P',f\circ\phi^{-1}) =\sum_{j_1,\ldots,j_n}\p{\sup_{\vb x\in P'_{j_1,\ldots,j_n}}f(x_{\sigma^{-1}_1},\ldots,x_{\sigma^{-1}_n})}\prod_i\Delta {P'_i}_{j_i} =\sum_{j_1}\ldots\sum_{j_n}\p{\sup_{\vb x\in P'_{j_1,\ldots,j_n}}f(x_{\sigma^{-1}_1},\ldots,x_{\sigma^{-1}_n})}\Delta {P'_1}_{j_1}\ldots\Delta {P'_n}_{j_n} $$ $$ =\sum_{j_{\sigma^{-1}_1}}\ldots\sum_{j_{\sigma^{-1}_n}}\p{\sup_{\vb x\in P'_{j_1,\ldots,j_n}}f(x_{\sigma^{-1}_1},\ldots,x_{\sigma^{-1}_n})}\Delta {P'_1}_{j_1}\ldots\Delta {P'_n}_{j_n} =\sum_{l_1}\ldots\sum_{l_n}\p{\sup_{\vb x\in P'_{l_{\sigma_1},\ldots,l_{\sigma_n}}}f(x_{\sigma^{-1}_1},\ldots,x_{\sigma^{-1}_n})}\Delta {P'_1}_{l_{\sigma_1}}\ldots\Delta {P'_n}_{l_{\sigma_n}} $$ $$ =\sum_{l_1}\ldots\sum_{l_n}\p{\sup_{\vb x\in P_{l_1,\ldots,l_n}}f(x_1,\ldots,x_n)}\Delta {P_1}_{l_1}\ldots\Delta {P_n}_{l_n} =\sum_{l_1,\ldots,l_n}\p{\sup_{\vb x\in P_{l_1,\ldots,l_n}}f(x_1,\ldots,x_n)}\prod_j\Delta {P_j}_{l_j} =U(P,f) $$ Similarly, $L(P',f\circ\phi^{-1})=L(P,f)$. Thus $$ \int_{a_1}^{b_1}\ldots\int_{a_n}^{b_n}f(x_1,\ldots,x_n)dx_n\ldots dx_1 =\int_{[\vb a,\vb b]}f(x_1,\ldots,x_n)d(x_1,\ldots,x_n) =\int_{[\vb a_\sigma,\vb b_\sigma]}(f\circ\phi^{-1})(y_1,\ldots,y_n)d(y_1,\ldots,y_n) $$ $$ =\int_{a_{\sigma_1}}^{b_{\sigma_1}}\ldots\int_{a_{\sigma_n}}^{b_{\sigma_n}}(f\circ\phi^{-1})(y_1,\ldots,y_n)dy_n\ldots dy_1 =\int_{a_{\sigma_1}}^{b_{\sigma_1}}\ldots\int_{a_{\sigma_n}}^{b_{\sigma_n}}f(x_1,\ldots,x_n)dx_{\sigma_n}\ldots dx_{\sigma_1} $$ $\blacksquare$

Power series (show)

Sequence of functions
Let $X$ and $Y$ be sets, and let $\mathscr F$ be the set of functions from $U\subseteq X$ to $Y$, then a sequence $(f_n)$ of $\mathscr F$ is said to be a sequence of functions from $U$ to $Y$.

Convergence of function sequence
Suppose $(f_n)$ is a sequence of functions from $U\subseteq X$ to $Y$, where $X$ and $Y$ are metric spaces. If for all $x\in E\subseteq U$, $\lim_{n\to\infty}f_n(x)$ exists, then we can define a function $f:E\to Y$ such that $$f(x)=\lim_{n\to\infty}f_n(x)$$ and say that $(f_n)$ converges to $f$ on $E$. The set $E^*=\{x\in U|\lim_{n\to\infty}f_n(x)\text{ exists}\}$ is called the domain of convergence for $(f_n)$.

Uniform convergence of function sequence
Suppose $(f_n)$ is a sequence of functions from $U\subseteq X$ to $Y$, where $X$ and $Y$ are metric spaces. If there exists $f:E\to Y$, where $E\subseteq U$, such that for all $\varepsilon\gt0$, there exists $m\in N$ such that for all $n\ge m$ and $x\in E$, $d(f_n(x),f(x))\lt\varepsilon$, then we say that $(f_n)$ uniformly converges to $f$ on $E$.

Proposition. Suppose $(f_n)$ is a sequence of functions from $U\subseteq X$ to $Y$, where $X$ and $Y$ are metric spaces. If $(f_n)$ uniformly converges on $E\subseteq U$, then $(f_n)$ converges on $E$ to the same function. (show proof)

Proposition. Suppose $(f_n)$ is a sequence of functions from $U\subseteq X$ to $Y$, where $X$ and $Y$ are complete metric spaces. Also suppose that $E\subseteq U$. Then $(f_n)$ uniformly converges on $E$ if and only if for all $\varepsilon\gt0$, there exists $l\in N$, such that for all $m,n\ge l$ and $x\in E$, $d(f_m(x),f_n(x))\lt\varepsilon$. (show proof)

Proposition. Suppose $(f_n)$ is a sequence of functions from $U\subseteq X$ to $Y$, where $X$ and $Y$ are complete metric spaces. If $(f_n)$ uniformly converges to some $f:E\to Y$ on $E$, where $E\subseteq U$, and $p\in X$ is a limit point of $E$ such that $\lim_{x\to p}f_n(x)$ exists for all $n$, then $(\lim_{x\to p}f_n(x))$ converges and $$\lim_{x\to p}f(x)=\lim_{n\to\infty}\lim_{x\to p}f_n(x)$$ (show proof)

Proposition. Suppose $(f_n)$ is a sequence of differentiable functions from $U\subseteq R$ to $R$. Suppose $[a,b]\subseteq U$, where $a\lt b$, and there exists $c\in[a,b]$ such that $(f_n(c))$ converges. If $(f_n')$ converges uniformly on $[a,b]$, then $(f_n)$ converges uniformly on $[a,b]$ to a function $f:[a,b]\to R$, and for all $x\in(a,b)$, $$f'(x)=\lim_{n\to\infty}f_n'(x)$$ (show proof)

Proof. Let $\varepsilon\gt0$. By convergence, and hence the Cauchy property, of $(f_n(c))$ and uniform convergence of $(f_n')$ on $[a,b]$, there exists $N$ such that for all $m,n\ge N$,

$\abs{f_m(c)-f_n(c)}\lt\varepsilon/2$ and
for all $x\in[a,b]$, $\abs{f_m'(x)-f_n'(x)}\lt\frac{\varepsilon}{2(b-a)}$.

By mean value theorem, for all $x,t\in[a,b]$, $$\abs{(f_m(x)-f_n(x))-(f_m(t)-f_n(t))}\le\frac{\abs{x-t}\varepsilon}{2(b-a)}\le\varepsilon/2$$ Hence, for all $x\in[a,b]$, $$\abs{f_m(x)-f_n(x)}\le\abs{(f_m(x)-f_n(x))-(f_m(c)-f_n(c))}+\abs{f_m(c)-f_n(c)}\lt\varepsilon$$ We have shown that $(f_n)$ converges uniformly, and hence converges, on $[a,b]$. Therefore, we can define $f:[a,b]\to R$ such that $$f(x)=\lim_{n\to\infty}(f_n(x))$$ Note that $(f_n)$ converges uniformly to $f$ on $[a,b]$. Now fix $p\in(a,b)$ and define $$\phi_n(h)=\frac{f_n(p+h)-f_n(p)}{h}$$ and $$\phi(h)=\frac{f(p+h)-f(p)}{h}$$ where $h$ ranges in $[a-p,b-p]\setminus\{0\}$ denoted $D$, so $p+h\in[a,b]$. Then we have $$\lim_{h\to0}\phi_n(h)=f'_n(p)$$ Keeping the definitions for $\varepsilon$ and $N$, For all $m,n\ge N$ and $h\in D$, using an inequality above, $$\abs{\phi_m(h)-\phi_n(h)}=\abs{\frac{(f_m(p+h)-f_m(p))-(f_n(p+h)-f_n(p))}{h}}\le\frac{\varepsilon}{2(b-a)}$$ This shows that $(\phi_n)$ converges uniformly on $D$. Now also fix $h\in D$, since $(f_n)$ converges to $f$ on $[a,b]$, $(f_n(p+h))$ converges to $f(p+h)$ and $(f_n(p))$ converges to $f(p)$. so $(\phi_n(h))=\p{\frac{f_n(p+h)-f_n(p)}{h}}$ converges to $\phi(h)=\frac{f(p+h)-f(p)}{h}$. Since $h\in D$ is arbitrary, $(\phi_n)$ converges, and hence uniformly converges, to $\phi$ on $D$. Using the proposition above on $(\phi_n)$ at $0$, we have $$f'(p)=\lim_{h\to0}\phi(h)=\lim_{n\to\infty}\lim_{h\to0}\phi_n(h)=\lim_{n\to\infty}f'_n(p)$$ Since $p\in(a,b)$ is arbitrary, for all $x\in(a,b)$, $$f'(x)=\lim_{n\to\infty}f'_n(x)$$ $\blacksquare$

Power series
Given a sequence $(c_n)$ of complex numbers, define a sequence of functions $(\sum_{k=0}^n c_kz^k)$ from $C$ to $C$. Then we can define a function $f$ from its domain of convergence $U$ to $C$, such that $(\sum_{k=0}^n c_kz^k)$ converges to $f$ on $U$. And $f$ is called the power series of $(c_n)$, denoted $\sum_{n=0}^\infty c_nz^n$, or simply $\sum c_nz^n$.

Note. The restriction of a power series $\sum c_nz^n$, of a real sequence $(c_n)$, on real domain is called a power series for real numbers. Trivially, given a power series for real numbers and a real number $x$, either both the series and its complex counterpart converge at $x$ to the same real number, or they both diverge at $x$.

Radius of convergence
Given a power series $\sum c_nz^n$ with domain of convergence $U$. Let $$s=\limsup_{n\to\infty}\abs{c_n}^{1/n}$$ where the term at $n=0$ is replaced by $0$, then $s\in\{0\}\cup R^+\cup\{\infty\}$. If $s=0$, let $r=\infty$; if $s=\infty$, let $r=0$; otherwise let $r=1/s$. Then $r$ is called the radius of convergence for $\sum c_nz^n$. Let $z\in C$. If $\abs{z}\lt r$, then $z\in U$; if $\abs{z}\gt r$, then $z\notin U$ (show proof). Note that for a power series for real numbers, $(-r,r)$ is called its interval of convergence. Trivially, If $z\in(-r,r)$, then $z\in U$; if $z\in(-\infty,-r)\cup(r,\infty)$, then $z\notin U$.

The number $e$
$\sum_{n=0}^\infty\frac{1}{n!}$ exists (show proof), and we define the number $e$ by $$e=\sum_{n=0}^\infty\frac{1}{n!}$$

Proposition. $e\gt1$. (show proof)

Proposition. $$\lim_{n\to\infty}\p{1+\frac{1}{n}}^n=e$$ (show proof)

Exponentiation
We define $$\exp(z)=\sum_{n=0}^\infty\frac{z^n}{n!}$$ whose domain of convergence is $C$. (show proof)

Note. Since we have $0^0$ defined to be $1$, $\exp(0)=1$.

Lemma. Suppose $(a_n)$ and $(b_n)$ are sequences of complex numbers such that $\sum\abs{a_n}$ and $\sum b_n$ exist, then $$\sum_{n=0}^\infty\sum_{k=0}^na_kb_{n-k}=\sum_{n=0}^\infty a_n\sum_{n=0}^\infty b_n$$ (show proof)

Proposition. For all $z,w\in C$, $$\exp(z+w)=\exp(z)\exp(w)$$ (show proof)

Proposition. For all $z\in C$ and $n\in N$, $$\exp(nz)=\exp(z)^n$$ (show proof)

Proposition. For all $z\in C$, $$\exp(z)\neq0$$ and $$\exp(-z)=\frac{1}{\exp(z)}$$ (show proof)

Proposition. For all $z\in C$, $$\exp(\overline z)=\overline{\exp(z)}$$ (show proof)

Proposition. If $x\in R$, then $\exp(x)\in R^+$. (show proof)

Proposition. If $x\in Q$, then $\exp(x)=e^x$. (show proof)

Definition. Recall that we defined $e^x$ for all $x\in Q$, but not $x\in R$ in general. Now we define $e^x=\exp(x)$ for all $x\in R$.

Proposition. $e^x$ is strictly increasing. (show proof)

Lemma. $$\lim_{h\to0}e^h=1$$ (show proof)

Lemma. $$\lim_{h\to0}\frac{e^h-1}{h}=1$$ (show proof)

Proposition. For all $x\in R$, $$\dv{x}e^x=e^x$$ (show proof)

Note. By the above proposition, $e^x$ is smooth.

Proposition. $e^x$ is a bijection from $R$ to $R^+$. (show proof)

Logarithm
We define $\log:R^+\to R$ to be the inverse function of $e^x$. Then clearly, $\log$ is bijective and strictly increasing.

Proposition. For all $x,y\in R^+$, $$\log(xy)=\log(x)+\log(y)$$ and $$\log(\frac{1}{x})=-\log(x)$$ (show proof)

Proposition. For all $x\in R^+$, $$\dv{x}\log(x)=\frac{1}{x}$$ (show proof)

Proposition. For all $x\in R^+$, $$\log(x)=\int_1^x\frac{1}{t}dt$$ (show proof)

Proposition. If $r\in R^+$ and $x\in Q$, then $r^x=e^{x\log(r)}$. (show proof)

Exponentiation of real numbers with positive base
Recall that, let $r\in R^+$, we defined $r^x$ for all $x\in Q$, but not $x\in R$ in general. Now we define $$r^x=e^{x\log(r)}$$ for all $x\in R$.

$r^x$ is clearly differentiable and hence continuous, according to chain rule.

If $r\gt1$, then $r^x$ is clearly strictly increasing and bijective from $R$ to $R^+$, which implies $r^x\to\infty$ as $x\to\infty$ and $r^x\to0$ as $x\to-\infty$.
If $0\lt r\lt1$, then $r^x$ is clearly strictly decreasing and bijective from $R$ to $R^+$, which implies $r^x\to0$ as $x\to\infty$ and $r^x\to\infty$ as $x\to-\infty$.

Proposition. Suppose $r,s\in R^+$ and $x\in R$, then $$rs^x=r^xs^x$$ (show proof)

Proposition. Suppose $r\in R^+$ and $x,y\in R$, then $$r^{x+y}=r^xr^y$$ (show proof)

Power function
Recall that we defined exponentiation of real numbers with negative base and integer power, and exponentiation of real numbers with zero base and non-negative power. Let $r\in R$. The power function $x^r$ has codomain $R$ and domain dependent on $r$.

If $r\in N$, the domain is $R$.
If $r\in Z^-$, the domain is $R\setminus\{0\}$.
If $r\in R^+\setminus Z$, the domain is $R^+\cup\{0\}$.
If $r\in R^-\setminus Z$, the domain is $R^+$.

Regardless of $r$, the restriction of $x^r$ on $R^+$ is $e^{r\log(x)}$, which we will denote as $x^r|_+$.

$x^r|_+$ is clearly differentiable and hence continuous, according to chain rule.

If $r\gt0$, then $x^r|_+$ is clearly strictly increasing and bijective from $R^+$ to $R^+$, which implies $x^r|_+\to\infty$ as $x\to\infty$ and $x^r|_+\to0$ as $x\to0$.
If $r\lt0$, then $x^r|_+$ is clearly strictly decreasing and bijective from $R^+$ to $R^+$, which implies $x^r|_+\to0$ as $x\to\infty$ and $x^r|_+\to\infty$ as $x\to0$.

Properties of $x^r$ naturally follow from $x^r|_+$. (show)

Proposition. Suppose $x$ ranges in $R^+$, then $$\dv{x}x^r=rx^{r-1}$$ (show proof)

Note. If $r\in Z\setminus\{0\}$, then $\dv{x}x^r=rx^{r-1}$, where the domains of both sides match. (show proof)

Proposition. Let $x\in R^+$ and $y\in R$, then $$\log(x^y)=y\log(x)$$ (show proof)

Proposition. Suppose $r\in R^+$ and $x,y\in R$, then $$r^{xy}=(r^x)^y$$ (show proof)

Logarithm with base
Let $r\in R^+\setminus\{1\}$, we define $\log_r:R^+\to R$ by $$\log_r(x)=\frac{\log(x)}{\log(r)}$$ Then clearly $\log_e=\log$.

$\log_r(x)$ is clearly differentiable and hence continuous, according to chain rule.

If $r\gt1$, then $\log_r(x)$ is clearly strictly increasing and bijective from $R^+$ to $R$, which implies $\log_r(x)\to\infty$ as $x\to\infty$ and $\log_r(x)\to-\infty$ as $x\to0$.
If $0\lt r\lt1$, then $\log_r(x)$ is clearly strictly decreasing and bijective from $R^+$ to $R$, which implies $\log_r(x)\to-\infty$ as $x\to\infty$ and $\log_r(x)\to\infty$ as $x\to0$.

Proposition. Let $r\in R^+\setminus\{1\}$, then $\log_r$ is the inverse function of $r^x$. (show proof)

Proposition. Fix $r\in R^+\setminus\{1\}$.

If $x,y\in R^+$, then $$\log_r(xy)=\log_r(x)+\log_r(y)$$ If $x\in R^+$ and $y\in R$, then $$\log_r(x^y)=y\log_r(x)$$ (show proof)

Lemma. $\lim_{n\to\infty}n^{1/n}=1$ (show proof)

Lemma. If $(a_n)\to a\in R^+$ and $\limsup_{n\to\infty}b_n=b\in\{0\}\cup R^+$, then $\limsup_{n\to\infty}a_nb_n=ab$. (show proof)

Proposition. Suppose $f(x)=\sum_{n=0}^\infty c_nx^n$ is a power series on real numbers. Then for all $x$ in its interval of convergence, $$f'(x)=\sum_{n=1}^\infty nc_nx^{n-1}$$ (show proof)

Trigonometric functions
Define $$\cos(x)=\sum_{n=0}^\infty\frac{(-1)^n}{(2n)!}x^{2n}$$ and $$\sin(x)=\sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n+1}$$ on real numbers, with domains of convergence being $R$. (show proof)

Proposition. For all $x\in R$, $$\dv{x}\cos(x)=-\sin(x)$$ $$\dv{x}\sin(x)=\cos(x)$$ (show proof)

Proposition. For all $x\in R$, $$\exp(ix)=\cos(x)+i\sin(x)$$ $$\cos(x)=\frac{1}{2}(\exp(ix)+\exp(-ix))$$ $$\sin(x)=\frac{1}{2i}(\exp(ix)-\exp(-ix))$$ (show proof)

Proposition. For all $x\in R$, $$\abs{\exp(ix)}=1$$ $$\cos(x)^2+\sin(x)^2=1$$ (show proof)

Proposition. For all $x\in R$, $\abs{\cos(x)}\le1$ and $\abs{\sin(x)}\le1$. (show proof)

Lemma. There exists $p\in R^+$ such that $\cos(p)=0$. (show proof)

The number $\pi$
We have shown that the collection of $p\in R^+$ such that $\cos(p)=0$ is non-empty, and obviously bounded below. Now we define $$\pi=2\inf\{p\in R^+|\cos(p)=0\}$$ Then $\cos(\pi/2)=0$ and $\pi\in R^+$. (show proof)

Proposition. $$\exp(i\pi)=-1$$ (show proof)

Note. We may write $\exp(z)$ as $e^z$ for any complex number $z$, then the above proposition becomes $$e^{i\pi}+1=0$$ which is also known as Euler's identity.

Proposition. For all $z\in C$ and $x\in R$, $$\exp(z+2i\pi)=\exp(z)$$ $$\cos(x+2\pi)=\cos(x)$$ $$\sin(x+2\pi)=\sin(x)$$ (show proof)

Proposition. For all $x\in R$, $$\cos(-x)=\cos(x)$$ $$\sin(-x)=-\sin(x)$$ (show proof)

Proposition. For all $x,y\in R$, $$\cos(x+y)=\cos(x)\cos(y)-\sin(x)\sin(y)$$ $$\sin(x+y)=\cos(x)\sin(y)+\sin(x)\cos(y)$$ (show proof)

Proposition. For all $x\in R$, $$\cos(x)=\sin(x+\pi/2)$$ (show proof)

Proposition. Let $k\in Z$,

$\cos$ is strictly increasing from $[(2k-1)\pi,2k\pi]$ onto $[-1,1]$ and strict decreasing from $[2k\pi,(2k+1)\pi]$ onto $[-1,1]$,
$\sin$ is strictly increasing from $[(2k-1/2)\pi,(2k+1/2)\pi]$ onto $[-1,1]$ and strict decreasing from $[(2k+1/2)\pi,(2k+3/2)\pi]$ onto $[-1,1]$.

Also, $\cos(x)=0$ if and only if $x=(k+1/2)\pi$ for some $k\in Z$, $\sin(x)=0$ if and only if $x=k\pi$ for some $k\in Z$. (show proof)

Definition. We define $\tan:R\setminus\{(k+1/2)\pi:k\in Z\}\to R$ by $$\tan(x)=\frac{\sin(x)}{\cos(x)}$$

Proposition. For all $x\in R$, $$\dv{x}\tan(x)=\cos(x)^{-2}$$ (show proof)

Proposition. For all $x\in R$, $$\tan(x+\pi)=\tan(x)$$ (show proof)

Proposition. Let $k\in Z$, $\tan$ is strictly increasing from $((k-1/2)\pi,(k+1/2)\pi)$ onto $R$. Also, $\tan(x)=0$ if and only if $x=k\pi$ for some $k\in Z$. (show proof)

Definition. We define:

$\arccos$ as the inverse of the bijective restriction of $\cos$ from $[0,\pi]$ to $[-1,1]$;
$\arcsin$ as the inverse of the bijective restriction of $\sin$ from $[-\pi/2,\pi/2]$ to $[-1,1]$;
$\arctan$ as the inverse of the bijective restriction of $\tan$ from $(-\pi/2,\pi/2)$ to $R$.

Polar form
Define $\atan2:R^2\setminus{(0,0)}\to(-\pi,\pi]$ by

$\atan2(a,b)=\arctan(b/a)$ if $a\gt0$,
$\atan2(a,b)=\arctan(b/a)+\pi$ if $a\lt0$ and $b\ge0$,
$\atan2(a,b)=\arctan(b/a)-\pi$ if $a\lt0$ and $b\lt0$,
$\atan2(a,b)=\pi/2$ if $a=0$ and $b\gt0$,
$\atan2(a,b)=-\pi/2$ if $a=0$ and $b\lt0$,

Then define $\phi:R^2\setminus{(0,0)}\to R^2$ by $$\phi(x,y)=(\norm{(x,y)},\atan2(x,y))$$ Then $\phi(x,y)$ is called the polar form of $(x,y)$ for all $(x,y)\in R^2\setminus{(0,0)}$. Note that $\phi$ can be viewed as a bijection from $R^2\setminus{(0,0)}$ to $R^+\times(-\pi,\pi]$ with $$\phi^{-1}(r,\theta)=(r\cos(\theta),r\sin(\theta))$$ (show proof), And $\phi^{-1}(r,\theta)$ is called the Cartesian form of $(r,\theta)$ for all $(r,\theta)\in R^+\times(-\pi,\pi]$.

Proposition. Let $z=(a,b)\in C\setminus\{0\}$ and let $(r,\theta)$ be the polar form of $(a,b)$. Then $$z=r\exp(i\theta)$$ (show proof)

Notation. Suppose $f$ is a real-valued function on real domain that is $C^k$, then $f^{(k)}$ denotes the $k$-th derivative of $f$.

Taylor's theorem
Let $U$ be an open subset of $R$, let $f:U\to R$ be $C^{k+1}$ where $k\in N$, and let $p\in U$. Then for all $x\in U$ such that $\{p+t(x-p):t\in[0,1]\}\subseteq U$, $$f(x)=\sum_{j=0}^k\frac{1}{j!}f^{(j)}(p)(x-p)^j+\frac{1}{k!}\int_p^xf^{(k+1)}(t)(x-t)^kdt$$ (show proof)

Generalized Taylor's theorem
Let $U$ be an open subset of $R^n$ where $n\in N^+$, let $f:U\to R$ be $C^{k+1}$ where $k\in N$, and let $p\in U$. Then for all $x\in U$ such that $\{p+t(x-p):t\in[0,1]\}\subseteq U$, $$f(x)=P_k(x)+R_k(x)$$ where $$P_k(x)=\sum_{j=0}^k\frac{1}{j!}\sum_{I\in\{1,\ldots,n\}^j}D_{I_j}\ldots D_{I_1}f(p)\prod_{i=1}^j(x_{I_i}-p_{I_i})$$ and $$R_k(x)=\frac{1}{k!}\sum_{I\in\{1,\ldots,n\}^{k+1}}\int_0^1D_{I_{k+1}}\ldots D_{I_1}f(p+t(x-p))(1-t)^kdt\prod_{i=1}^{k+1}(x_{I_i}-p_{I_i})$$ (show proof)