Note.
We will use the following default settings:
- If not otherwise defined, $f$ is a function $f:U\to R^m$ where $U$ is a subset of $R^n$ and $n,m$ are non-zero natural numbers.
- $f_1,\ldots,f_m$ are component functions of $f$ such that $f(\vb x)=(f_1(\vb x),\ldots,f_m(\vb x))$ for all $\vb x\in U$.
By default, component functions share the same domain $U$ with $f$.
- We may implicitly regard $R^1$ as $R$ and $R$ as $R^1$, so that when we discuss $R^n$, $R$ is implicitly included in the discussion,
and when we discuss $R$, we also implicitly discuss $R^1$.
Derivative
Suppose $U\subseteq R$ and $f$ is real-valued.
If $p$ is an interior point of $U$, and if the limit
$$\lim_{h\to 0}\frac{f(p+h)-f(p)}{h}$$
exists, with $h$ defined on $\{h\in R|p+h\in U,h\neq0\}$,
then the derivative of $f$ at $p$, denoted $f'(p)$ or $\dv{f}{x}(p)$ if we denote the variable as $x$, is defined as that limit,
and $f$ is said to be differentiable at $p$.
If $f$ is differentiable at every point in $U$, then $f$ is said to be differentiable,
and $f'$, a function from $U$ to $R$, is called the derivative of $f$.
If $f'$ is continuous, then $f$ is called continuously differentiable.
Partial derivative
Suppose $f$ is real-valued.
If $\vb p$ is a point such that for some $r\gt0$, for all $h\in(-r,r)$, $\vb p+h\vb e_i\in U$, and if the limit
$$\lim_{h\to 0}\frac{f(\vb p+h\vb e_i)-f(\vb p)}{h}$$
exists, with $h$ defined on $\{h\in R|\vb p+h\vb e_i\in U,h\neq0\}$,
then the partial derivative of $f$ at $\vb p$ with respect to the $i$th variable, denoted $D_if(\vb p)$ or $\pdv{f}{x_i}(\vb p)$
if we denote the $i$th variable as $x_i$, is defined as that limit.
Total derivative
If $\vb p$ is an interior point of $U$, and if there exists a linear map $L:R^n\to R^m$ such that
$$\lim_{\vb h\to \vb 0}\frac{\Vert f(\vb p+\vb h)-f(\vb p)-L(\vb h)\Vert}{\Vert\vb h\Vert}=0$$
with $\vb h$ defined on $\{\vb h\in R^n|\vb p+\vb h\in U,\vb h\neq\vb0\}$,
then $L$ is called the total derivative of $f$ at $\vb p$, denoted $Df(\vb p)$.
Since $Df(\vb p)$ is a linear map, it has a standard matrix representation $Af(\vb p)$.
Since $Df(\vb p)(\vb h)=Af(\vb p)\vb h$,
we will simply denote $Af(\vb p)$ as $Df(\vb p)$ for convenience.
As a result, $Df(\vb p)$ may represent a linear map from $R^n$ to $R^m$, or an $m\times n$ matrix.
Localness of derivative
Due to the localness of limit, derivatives/partial derivatives/total derivatives are local,
meaning that if $f$ and $g$ agree on some neighborhood (or single dimensional neighborhood in case of partial derivative) of $p$ or $\vb p$,
then a real number or a linear map must simultaneously be a derivative/partial derivative/total derivative
of $f$ and $g$ at $p$ or $\vb p$, or simultaneously not.
Proposition.
If a derivative/partial derivative/total derivative exists, it is unique.
(show proof)
Proof.
Uniqueness of derivative and partial derivative follows directly from uniqueness of limit.
For total derivative,
Suppose $A$ and $B$ are both total derivatives of $f$ at $\vb p$, an interior point of $U$,
then there exists $r\gt0$ such that $B_r(\vb p)\subseteq U$.
Let $B_r(\vb 0)\setminus\{\vb 0\}$ be the domain for $\vb h$, then $\vb p+\vb h\in B_r(\vb p)\subseteq U$
and the limits for $A$ and $B$ are both still $0$ for this domain for $\vb h$, due to localness of limit. Since
$$\frac{\norm{f(\vb p+\vb h)-f(\vb p)-A(\vb h)}}{\norm{\vb h}}+\frac{\norm{f(\vb p+\vb h)-f(\vb p)-B(\vb h)}}{\norm{\vb h}}
\ge\frac{\norm{B(\vb h)-A(\vb h)}}{\norm{\vb h}}
\ge0$$
and
$$\lim_{\vb h\to\vb 0}\p{\frac{\norm{f(\vb p+\vb h)-f(\vb p)-A(\vb h)}}{\norm{\vb h}}+\frac{\norm{f(\vb p+\vb h)-f(\vb p)-B(\vb h)}}{\norm{\vb h}}}=0$$
by squeeze theorem,
$$\lim_{\vb h\to\vb 0}\frac{\norm{(B-A)(\vb h)}}{\norm{\vb h}}=\lim_{\vb h\to\vb 0}\frac{\norm{B(\vb h)-A(\vb h)}}{\norm{\vb h}}=0$$
where $B-A$ is a linear map.
Suppose there exists $\vb x\in R^n\setminus\{\vb 0\}$ such that $(B-A)(\vb x)\neq\vb 0$.
Let $\varepsilon=\frac{\norm{(B-A)(\vb x)}}{\Vert\vb x\Vert}$, then $\varepsilon\gt0$,
and for all $\delta\gt0$,
we have $\frac{\delta\vb x}{2\Vert\vb x\Vert}\in B_\delta(\vb 0)\setminus\{\vb 0\}$,
such that
$$\abs{\frac{\norm{(B-A)(\frac{\delta\vb x}{2\Vert\vb x\Vert})}}{\norm{\frac{\delta\vb x}{2\Vert\vb x\Vert}}}-0}
=\frac{\frac{\delta}{2\Vert\vb x\Vert}\norm{(B-A)(\vb x)}}{\frac{\delta}{2}}
=\frac{\norm{(B-A)(\vb x)}}{\Vert\vb x\Vert}
\ge\varepsilon$$
We have shown that $0$ is not the limit of $\frac{\norm{(B-A)(\vb h)}}{\norm{\vb h}}$ at $\vb h\to\vb 0$,
which is a contradiction.
Therefore, for all $\vb x\in R^n$, $(B-A)(\vb x)=\vb 0$, implying $A=B$.
$\blacksquare$
Proposition.
If $U\subseteq R$ and $f$ is real-valued, differentiability and total differentiability are equivalent.
In particular, given $f'(p)$, $Df(p)(h)=f'(p)h$; given $Df(p)$, $f'(p)=Df(p)(1)$.
(show proof)
Proof.
We will regard $1$-vectors as scalars.
Suppose $f$ is differentiable at $p$, then $g(h)=f'(p)h$ is a linear map.
We have $\lim_{h\to 0}\frac{f(p+h)-f(p)-g(h)}{h}=\lim_{h\to 0}\frac{f(p+h)-f(p)}{h}-\lim_{h\to 0}\frac{g(h)}{h}=f'(p)-f'(p)=0$,
Hence $\lim_{h\to 0}\frac{\abs{f(p+h)-f(p)-g(h)}}{\abs{h}}=\lim_{h\to 0}\abs{\frac{f(p+h)-f(p)-g(h)}{h}}=0$,
implying $f$ is totally differentiable at $p$.
Also, $Df(p)(h)=g(h)=f'(p)h$.
Suppose $f$ is totally differentiable at $p$, then for some linear map $g$,
$\lim_{h\to 0}\abs{\frac{f(p+h)-f(p)-g(h)}{h}}=\lim_{h\to 0}\frac{\abs{f(p+h)-f(p)-g(h)}}{\abs{h}}=0$.
Hence $\lim_{h\to 0}\frac{f(p+h)-f(p)-g(h)}{h}=0$.
Since $g(h)=hg(1)$, $\lim_{h\to 0}\frac{g(h)}{h}=g(1)$,
so $\lim_{h\to 0}\frac{f(p+h)-f(p)}{h}=\lim_{h\to 0}\frac{f(p+h)-f(p)-g(h)}{h}+\lim_{h\to 0}\frac{g(h)}{h}=g(1)$,
implying $f$ is differentiable.
Also, $f'(p)=g(1)=Df(p)(1)$.
$\blacksquare$
Note.
We will use the term differentiability in place of total differentiability since there is no ambiguity.
Note.
With norm and distance for matrices defined, totally continuous differentiability can be naturally defined.
Trivially, if $U\subseteq R$ and $f$ is real-valued, continuous differentiability and totally continuous differentiability are equivalent,
hence we will use the term continuous differentiability in place of totally continuous differentiability since there is no ambiguity.
Proposition.
Linearity implies differentiability.
(show proof)
Proof.
Let $f$ be linear, then for all $\vb p\in R^n$,
$$\lim_{\vb h\to \vb 0}\frac{\Vert f(\vb p+\vb h)-f(\vb p)-f(\vb h)\Vert}{\Vert\vb h\Vert}
=\lim_{\vb h\to \vb 0}\frac{\Vert \vb 0\Vert}{\Vert\vb h\Vert}=0$$
so $f$ is differentiable.
$\blacksquare$
Lemma.
A linear map is continuous at $\vb 0$.
(show proof)
Proof.
Suppose $f$ is linear and let $A$ be its matrix representation, which is $m\times n$.
Let $M$ be the maximum among absolute values of entries of $A$.
If $M=0$, then $A$ is a zero matrix, so $f$ is zero everywhere, implying it is continuous at $\vb 0$.
Now suppose $M\neq0$.
For every $\varepsilon\gt0$,
let $\delta=\varepsilon/(nM\sqrt m)$, then for every $\vb x\in R^n$ such that $\Vert\vb x\Vert\lt\delta$, by triangle inequality, we have
$$\norm{A\vb x}
=\sqrt{\sum_{j=1}^m\p{\sum_{i=1}^n(a_{ji}x_i)}^2}
=\sqrt{\sum_{j=1}^m\abs{\sum_{i=1}^n(a_{ji}x_i)}^2}
\le\sqrt{\sum_{j=1}^m\p{\sum_{i=1}^n\abs{a_{ji}x_i}}^2}
\le\sqrt{\sum_{j=1}^m\p{\sum_{i=1}^nM\Vert\vb x\Vert}^2}$$
$$=\sqrt{\sum_{j=1}^m\p{nM\Vert\vb x\Vert}^2}
=\sqrt{m\p{nM\Vert\vb x\Vert}^2}
=\sqrt{m}nM\Vert\vb x\Vert
\lt \sqrt{m}nM\delta
=\varepsilon$$
so $\lim_{\vb x\to \vb 0}f(\vb x)=\vb 0=f(\vb 0)$, implying $f$ is continuous at $\vb 0$.
$\blacksquare$
Proposition.
Differentiability implies continuity.
(show proof)
Proof.
Let $f$ be differentiable at $\vb p$. Let $B_r(\vb p)\subseteq U$.
Since $\lim_{\vb h\to \vb 0}\frac{\Vert f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)\Vert}{\Vert\vb h\Vert}=0$
and $\lim_{\vb h\to \vb 0}\Vert\vb h\Vert=0$,
we have $\lim_{\vb h\to \vb 0}\Vert f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)\Vert=0$.
Because $Df(\vb p)$ is a linear map, it is continuous at $\vb 0$,
so $\lim_{\vb h\to \vb 0}Df(\vb p)(\vb h)=Df(\vb p)(\vb 0)=\vb 0$,
and thus $\lim_{\vb h\to \vb 0}\Vert Df(\vb p)(\vb h)\Vert=0$.
For $\vb h\in B_r(\vb 0)\setminus\{\vb 0\}$, let $g(\vb h)=\Vert f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)\Vert+\Vert Df(\vb p)(\vb h)\Vert$,
then $\lim_{\vb h\to \vb 0}g(\vb h)=0$ and $\Vert f(\vb p+\vb h)-f(\vb p)\Vert \le g(\vb h)$ by triangle inequality.
For every $\varepsilon\gt 0$, there exists $r\gt\delta\gt 0$ such that $\vb h\in B_\delta(\vb 0)\setminus\{\vb 0\}$ implies $g(\vb h)\in B_\varepsilon(0)$,
then with the same $\delta$, we also have $\vb x\in B_\delta(\vb p)\setminus\{\vb p\}$ implies $\vb x-\vb p\in B_\delta(\vb 0)\setminus\{\vb 0\}$ and
$\Vert f(\vb x)-f(\vb p)\Vert
=\Vert f(\vb p+(\vb x-\vb p))-f(\vb p)\Vert
\le g(\vb x-\vb p)
\in B_\varepsilon(0)$,
or $f(\vb x)\in B_\varepsilon(f(\vb p))$.
So we have $\lim_{\vb x\to \vb p}f(\vb x)=f(\vb p)$, which means $f$ is continuous at $\vb p$.
$\blacksquare$
Lemma.
$\Vert\vb v\Vert\le\sum_i\abs{v_i}$.
(show proof)
Proof.
$$(\Vert\vb v\Vert)^2=\sum_i\abs{v_i}^2\le\sum_i\sum_j\abs{v_i}\abs{v_j}=(\sum_i\abs{v_i})^2$$
therefore $\Vert\vb v\Vert\le\sum_i\abs{v_i}$.
$\blacksquare$
Derivative of component functions
$f$ is differentiable at $\vb p$ if and only if all component functions $f_j$ are differentiable at $\vb p$.
In addition, if $f$ or all $f_j$ are differentiable at $\vb p$, then
$$Df(\vb p)_{j,*}=Df_j(\vb p)$$
(show proof)
Proof.
Suppose $Df(\vb p)$ exists.
Then we have
$$\lim_{\vb h\to \vb 0}\frac{\Vert f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)\Vert}{\Vert\vb h\Vert}=0$$
For all $j\in\{1,\ldots,m\}$, by squeeze theorem, we have
$$\lim_{\vb h\to \vb 0}\frac{|f_j(\vb p+\vb h)-f_j(\vb p)-\p{Df(\vb p)(\vb h)}_j|}{\Vert\vb h\Vert}
=\lim_{\vb h\to \vb 0}\frac{|\p{f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)}_j|}{\Vert\vb h\Vert}
=0$$
Note that $\p{Df(\vb p)(\vb h)}_j$ is a linear map from $R^n$ to $R$.
This shows that $Df_j(\vb p)$ exists, and $Df_j(\vb p)(\vb h)=\p{Df(\vb p)(\vb h)}_j$.
Now suppose all $Df_j(\vb p)$ exists.
Define an $m\times n$ matrix $L$ by $L_{j,*}=Df_j(\vb p)$,
where $L$ can be regarded as a matrix or a linear map, interchangeably. Then we have
$$\lim_{\vb h\to \vb 0} \sum_j\frac{\abs{(f(\vb p+\vb h)-f(\vb p)-L(\vb h))_j}}{\norm{\vb h}}
=\sum_j\lim_{\vb h\to \vb 0} \frac{\abs{(f(\vb p+\vb h)-f(\vb p)-L(\vb h))_j}}{\norm{\vb h}}
=\sum_j\lim_{\vb h\to \vb 0} \frac{\abs{f_j(\vb p+\vb h)-f_j(\vb p)-Df_j(\vb p)(\vb h)}}{\norm{\vb h}}
=0$$
Let $B_r(\vb p)\subseteq U$.
Then for $\vb h\in B_r(\vb 0)\setminus\{\vb 0\}$, we have
$$0
\le\frac{\norm{f(\vb p+\vb h)-f(\vb p)-L(\vb h)}}{\norm{\vb h}}
\le\frac{\sum_j\abs{(f(\vb p+\vb h)-f(\vb p)-L(\vb h))_j}}{\norm{\vb h}}$$
Then by squeeze theorem,
$$\lim_{\vb h\to \vb 0} \frac{\norm{f(\vb p+\vb h)-f(\vb p)-L(\vb h)}}{\norm{\vb h}}=0$$
Hence $Df(\vb p)=L$.
$\blacksquare$
Proposition.
Existence of total derivative implies existence of all partial derivatives.
In particular, $\pdv{f_j}{x_i}(\vb p)=Df(\vb p)_{ji}$.
(show proof)
Proof.
Suppose $Df(\vb p)$ exists. Then all $Df_j(\vb p)$ exist,
so we have $$\lim_{\vb h\to \vb 0}\frac{|f_j(\vb p+\vb h)-f_j(\vb p)-Df_j(\vb p)(\vb h)|}{\Vert\vb h\Vert}=0$$
Then for all $i\in\{1,\ldots,n\}$, we have
$$\lim_{h\to 0}\abs{\frac{f_j(\vb p+h\vb e_i)-f_j(\vb p)-hDf_j(\vb p)(\vb e_i)}{h}}
=\lim_{h\to 0}\frac{|f_j(\vb p+h\vb e_i)-f_j(\vb p)-Df_j(\vb p)(h\vb e_i)|}{\Vert h\vb e_i\Vert}
=0$$
which implies
$$\lim_{h\to 0}\frac{f_j(\vb p+h\vb e_i)-f_j(\vb p)-hDf_j(\vb p)(\vb e_i)}{h}=0$$
Since $$\lim_{h\to 0}\frac{hDf_j(\vb p)(\vb e_i)}{h}=Df_j(\vb p)(\vb e_i)$$
we conclude that
$$\pdv{f_j}{x_i}(\vb p)
=\lim_{h\to 0}\frac{f_j(\vb p+h\vb e_i)-f_j(\vb p)}{h}
=\lim_{h\to 0}\frac{f_j(\vb p+h\vb e_i)-f_j(\vb p)-hDf_j(\vb p)(\vb e_i)}{h}+\lim_{h\to 0}\frac{hDf_j(\vb p)(\vb e_i)}{h}
=Df_j(\vb p)(\vb e_i)
=(Df(\vb p)(\vb e_i))_j
=Df(\vb p)_{ji}$$
$\blacksquare$
Jacobian matrix
Given differentiable $f:U\to R^m$ where $U\subseteq R^n$,
for all $i\in\{1,\ldots,n\}$ and $j\in\{1,\ldots,m\}$, there exists a unique function $\pdv{f_j}{x_i}:U\to R$
that maps every $\vb p\in U$ to $\pdv{f_j}{x_i}(\vb p)$.
Then we can uniquely define a map $Jf$ for $f$ such that $Jf_{ji}=\pdv{f_j}{x_i}$.
We call $Jf$ the Jacobian matrix of $f$.
And we can consider $J$ an operator that maps every differentiable $f:U\to R^m$ to $Jf$.
Operations on Jacobian matrices resemble operations on matrices, if well-defined.
Note that, for all $\vb p\in U$, $$Jf_{ji}(\vb p)=\pdv{f_j}{x_i}(\vb p)=Df(\vb p)_{ji}$$
Derivative of function operations
Suppose $f:U\to R^m$ and $g:V\to R^m$, where $U,V\subseteq R^n$, are differentiable at $\vb p$, and $c\in R$, then
$$D(cf)(\vb p)=cDf(\vb p)$$
$$D(f\pm g)(\vb p)=Df(\vb p)\pm Dg(\vb p)$$
and for all $\vb h\in R^n$,
$$D(f\cdot g)(\vb p)(\vb h)=Df(\vb p)(\vb h)\cdot g(\vb p)+Dg(\vb p)(\vb h)\cdot f(\vb p)$$
where the last equation is called
product rule
(show proof).
Proof.
For the first equation,
$$\lim_{\vb h\to \vb 0}\frac{\norm{(cf)(\vb p+\vb h)-(cf)(\vb p)-cDf(\vb p)(\vb h)}}{\norm{\vb h}}
=\lim_{\vb h\to \vb 0}\frac{\abs{c}\norm{f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)}}{\norm{\vb h}}
=\abs{c}\lim_{\vb h\to \vb 0}\frac{\norm{f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)}}{\norm{\vb h}}
=0$$
Hence $D(cf)(\vb p)=cDf(\vb p)$.
Since $\vb p$ is an interior point of both $U$ and $V$, it is an interior point of $U\cap V$.
For the second equation, let $B_r(\vb p)\subseteq U\cap V$, and $\vb h\in B_r(\vb 0)\setminus\{\vb 0\}$, then
$$\frac{\norm{f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)}}{\norm{\vb h}}+\frac{\norm{g(\vb p+\vb h)-g(\vb p)-Dg(\vb p)(\vb h)}}{\norm{\vb h}}
\ge\frac{\norm{(f\pm g)(\vb p+\vb h)-(f\pm g)(\vb p)-(Df(\vb p)\pm Dg(\vb p))(\vb h)}}{\norm{\vb h}}
\ge0$$
Since
$$\lim_{\vb h\to \vb 0}\p{\frac{\norm{f(\vb p+\vb h)-f(\vb p)-Df(\vb p)(\vb h)}}{\norm{\vb h}}+\frac{\norm{g(\vb p+\vb h)-g(\vb p)-Dg(\vb p)(\vb h)}}{\norm{\vb h}}}=0$$
by squeeze theorem,
$$\lim_{\vb h\to \vb 0}\frac{\norm{(f\pm g)(\vb p+\vb h)-(f\pm g)(\vb p)-(Df(\vb p)\pm Dg(\vb p))(\vb h)}}{\norm{\vb h}}=0$$
Hence $D(f\pm g)(\vb p)=Df(\vb p)\pm Dg(\vb p)$.
For the third equation, let $j\in\{1,\ldots,m\}$, then both $f_j$ and $g_j$ are differentiable at $\vb p$.
For $\vb h\in B_r(\vb 0)\setminus\{\vb 0\}$, let
$$\varepsilon_j(\vb h)=\cfrac{f_j(\vb p+\vb h)-f_j(\vb p)-Df_j(\vb p)(\vb h)}{\norm{\vb h}}
\quad\text{and}\quad\eta_j(\vb h)=\cfrac{g_j(\vb p+\vb h)-g_j(\vb p)-Dg_j(\vb p)(\vb h)}{\norm{\vb h}}$$
then $$f_j(\vb p+\vb h)-f_j(\vb p)=Df_j(\vb p)(\vb h)+\norm{\vb h}\varepsilon_j(\vb h)
\quad\text{and}\quad g_j(\vb p+\vb h)-g_j(\vb p)=Dg_j(\vb p)(\vb h)+\norm{\vb h}\eta_j(\vb h)$$
Hence
$$\frac{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)}{\norm{\vb h}}
=\frac{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p+\vb h)+f_j(\vb p)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)}{\norm{\vb h}}$$
$$=\frac{\p{f_j(\vb p+\vb h)-f_j(\vb p)}g_j(\vb p+\vb h)+\p{g_j(\vb p+\vb h)-g_j(\vb p)}f_j(\vb p)}{\norm{\vb h}}
=\frac{\p{Df_j(\vb p)(\vb h)+\norm{\vb h}\varepsilon_j(\vb h)}g_j(\vb p+\vb h)+\p{Dg_j(\vb p)(\vb h)+\norm{\vb h}\eta_j(\vb h)}f_j(\vb p)}{\norm{\vb h}}$$
$$=\frac{Df_j(\vb p)(\vb h)\p{g_j(\vb p)+Dg_j(\vb p)(\vb h)+\norm{\vb h}\eta_j(\vb h)}+Dg_j(\vb p)(\vb h)f_j(\vb p)}{\norm{\vb h}}+\varepsilon_j(\vb h)g_j(\vb p+\vb h)+\eta_j(\vb h)f_j(\vb p)$$
$$=\frac{Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p)}{\norm{\vb h}}+\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}+Df_j(\vb p)(\vb h)\eta_j(\vb h)+\varepsilon_j(\vb h)g_j(\vb p+\vb h)+\eta_j(\vb h)f_j(\vb p)$$
By Cauchy-Schwarz inequality,
$$\abs{\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}}
=\frac{\abs{(Df_j(\vb p))^T\cdot\vb h}\abs{Dg_j(\vb p)(\vb h)}}{\norm{\vb h}}
\le\frac{\norm{(Df_j(\vb p))^T}\norm{\vb h}\abs{Dg_j(\vb p)(\vb h)}}{\norm{\vb h}}
=\norm{(Df_j(\vb p))^T}\abs{Dg_j(\vb p)(\vb h)}$$
Since $Df_j(\vb p),Dg_j(\vb p)$ are both linear, they are continuous at $\vb0$, so
$$\lim_{\vb h\to \vb 0}Df_j(\vb p)(\vb h)=0 \quad\text{and}\quad \lim_{\vb h\to \vb 0}Dg_j(\vb p)(\vb h)=0$$
implying
$$\lim_{\vb h\to \vb 0}\abs{Dg_j(\vb p)(\vb h)}=0$$
So
$$\lim_{\vb h\to \vb 0}\norm{(Df_j(\vb p))^T}\abs{Dg_j(\vb p)(\vb h)}
=\norm{(Df_j(\vb p))^T}\lim_{\vb h\to \vb 0}\abs{Dg_j(\vb p)(\vb h)}
=0$$
By squeeze theorem,
$$\lim_{\vb h\to \vb 0}\abs{\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}}=0$$
implying
$$\lim_{\vb h\to \vb 0}\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}=0$$
Note that
$$\lim_{\vb h\to \vb 0}\abs{\varepsilon_j(\vb h)}=0 \quad\text{and}\quad \lim_{\vb h\to \vb 0}\abs{\eta_j(\vb h)}=0$$
implying
$$\lim_{\vb h\to \vb 0}\varepsilon_j(\vb h)=0 \quad\text{and}\quad \lim_{\vb h\to \vb 0}\eta_j(\vb h)=0$$
Since $g_j$ is differentiable at $\vb p$, it is continuous at $\vb p$, thus
$$\lim_{\vb h\to \vb 0}g_j(\vb p+\vb h)=g_j(\vb p)$$
Now we have
$$\lim_{\vb h\to \vb 0}\p{\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}+Df_j(\vb p)(\vb h)\eta_j(\vb h)+\varepsilon_j(\vb h)g_j(\vb p+\vb h)+\eta_j(\vb h)f_j(\vb p)}$$
$$=\lim_{\vb h\to \vb 0}\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}+\lim_{\vb h\to \vb 0}Df_j(\vb p)(\vb h)\eta_j(\vb h)+\lim_{\vb h\to \vb 0}\varepsilon_j(\vb h)g_j(\vb p+\vb h)+\lim_{\vb h\to \vb 0}\eta_j(\vb h)f_j(\vb p)$$
$$=\lim_{\vb h\to \vb 0}Df_j(\vb p)(\vb h)\lim_{\vb h\to \vb 0}\eta_j(\vb h)+\lim_{\vb h\to \vb 0}\varepsilon_j(\vb h)\lim_{\vb h\to \vb 0}g_j(\vb p+\vb h)+f_j(\vb p)\lim_{\vb h\to \vb 0}\eta_j(\vb h)
=0$$
Then we have
$$\lim_{\vb h\to \vb 0}\frac{\abs{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)-(Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p))}}{\norm{\vb h}}$$
$$=\lim_{\vb h\to \vb 0}\abs{\frac{Df_j(\vb p)(\vb h)Dg_j(\vb p)(\vb h)}{\norm{\vb h}}+Df_j(\vb p)(\vb h)\eta_j(\vb h)+\varepsilon_j(\vb h)g_j(\vb p+\vb h)+\eta_j(\vb h)f_j(\vb p)}
=0$$
Since
$$\frac{\sum_{j=1}^n\abs{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)-(Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p))}}{\norm{\vb h}}$$
$$\ge\frac{\abs{\sum_{j=1}^n\p{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)-(Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p))}}}{\norm{\vb h}}$$
$$=\frac{\abs{(f\cdot g)(\vb p+\vb h)-(f\cdot g)(\vb p)-(Df(\vb p)(\vb h)\cdot g(\vb p)+Dg(\vb p)(\vb h)\cdot f(\vb p))}}{\norm{\vb h}}$$
and we know that
$$\lim_{\vb h\to \vb 0}\frac{\sum_{j=1}^n\abs{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)-(Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p))}}{\norm{\vb h}}$$
$$=\sum_{j=1}^n\lim_{\vb h\to \vb 0}\frac{\abs{f_j(\vb p+\vb h)g_j(\vb p+\vb h)-f_j(\vb p)g_j(\vb p)-(Df_j(\vb p)(\vb h)g_j(\vb p)+Dg_j(\vb p)(\vb h)f_j(\vb p))}}{\norm{\vb h}}
=0$$
by squeeze theorem,
$$\lim_{\vb h\to \vb 0}\frac{\abs{(f\cdot g)(\vb p+\vb h)-(f\cdot g)(\vb p)-(Df(\vb p)(\vb h)\cdot g(\vb p)+Dg(\vb p)(\vb h)\cdot f(\vb p))}}{\norm{\vb h}}=0$$
Clearly, $Df(\vb p)(\vb h)\cdot g(\vb p)+Dg(\vb p)(\vb h)\cdot f(\vb p)$, as a function of $\vb h$, is linear.
Hence $D(f\cdot g)(\vb p)$ exists and is defined by
$$D(f\cdot g)(\vb p)(\vb h)=Df(\vb p)(\vb h)\cdot g(\vb p)+Dg(\vb p)(\vb h)\cdot f(\vb p)$$
$\blacksquare$
If $f$ and $g$ are real-valued with real domains, then clearly,
$$(cf)'(p)=cf'(p)$$
$$(f\pm g)'(p)=f'(p)\pm g'(p)$$
$$(fg)'(p)=f'(p)g(p)+g'(p)f(p)$$
Suppose $f$ and $g$ share the same domain and are both differentiable, then
$$J(cf)=cJf$$
$$J(f\pm g)=Jf\pm Jg$$
$$J(f\cdot g)=((Jf)^Tg+(Jg)^Tf)^T$$
(show proof)
Proof.
$$J(cf)_{ji}(\vb p)=D(cf)(\vb p)_{ji}=(cDf(\vb p))_{ji}=cDf(\vb p)_{ji}=cJf_{ji}(\vb p)$$
$$J(f\pm g)_{ji}(\vb p)=D(f\pm g)(\vb p)_{ji}=(Df(\vb p)\pm Dg(\vb p))_{ji}=Df(\vb p)_{ji}\pm Dg(\vb p)_{ji}=Jf_{ji}(\vb p)\pm Jg_{ji}(\vb p)=(Jf_{ji}\pm Jg_{ji})(\vb p)=(Jf\pm Jg)_{ji}(\vb p)$$
$$
J(f\cdot g)^T_i(\vb p)
=J(f\cdot g)_i(\vb p)
=D(f\cdot g)(\vb p)_i
=D(f\cdot g)(\vb p)(\vb e_i)
=Df(\vb p)(\vb e_i)\cdot g(\vb p)+Dg(\vb p)(\vb e_i)\cdot f(\vb p)
=\sum_j(Df(\vb p)(\vb e_i))_jg(\vb p)_j+\sum_j(Dg(\vb p)(\vb e_i))_jf(\vb p)_j
$$ $$
=\sum_jDf(\vb p)_{ji}g(\vb p)_j+\sum_jDg(\vb p)_{ji}f(\vb p)_j
=\sum_j(Jf)^T_{ij}(\vb p)g_j(\vb p)+\sum_j(Jg)^T_{ij}(\vb p)f_j(\vb p)
=((Jf)^Tg+(Jg)^Tf)_i(\vb p)
$$
$\blacksquare$
Derivative of composite functions
Also called
chain rule.
Let $V\subset R^n$ and $U\subset R^m$, let $g:V\to R^m$ and $f:U\to R^l$. Suppose $g$ is differentiable at $\vb p$
and $f$ is differentiable at $g(\vb p)$, then
$$D(f\circ g)(\vb p)=Df(g(\vb p))\circ Dg(\vb p)$$
(show proof).
Proof.
Since $g$ is differentiable at $\vb p$ and $f$ is differentiable at $g(\vb p)$, there exists $s\gt0$
such that $B_s(\vb p)\subseteq V$ and $B_s(g(\vb p))\subseteq U$.
Since $g$ is differentiable at $\vb p$, it is also continuous at $\vb p$.
So there exists $0\lt r\lt s$ such that $\vb x\in B_r(\vb p)$ implies $g(\vb x)\in B_s(g(\vb p))$.
For all $\vb h\in B_r(\vb 0)$ and $\vb k\in B_s(\vb 0)$,
define $$u(\vb h)=g(\vb p+\vb h)-g(\vb p)-Dg(\vb p)(\vb h) \quad\text{and}\quad v(\vb k)=f(g(\vb p)+\vb k)-f(g(\vb p))-Df(g(\vb p))(\vb k)$$
Also define $\varepsilon(\vb h)$ and $\eta(\vb k)$ such that when $\vb h\neq\vb 0$, $\varepsilon(\vb h)=\frac{\norm{u(\vb h)}}{\norm{\vb h}}$,
when $\vb k\neq\vb 0$, $\eta(\vb k)=\frac{\norm{v(\vb k)}}{\norm{\vb k}}$,
and $\varepsilon(\vb 0)=\eta(\vb 0)=0$.
Then $$\lim_{\vb h\to\vb 0}\varepsilon(\vb h)=0 \quad\text{and}\quad \lim_{\vb k\to\vb 0}\eta(\vb k)=0$$
and $$\norm{u(\vb h)}=\varepsilon(\vb h)\norm{\vb h} \quad\text{and}\quad \norm{v(\vb k)}=\eta(\vb k)\norm{\vb k}$$
where $\vb h$ or $\vb k$ may or may not be $\vb 0$.
Given $\vb h\in B_r(\vb 0)$, let $\vb k=g(\vb p+\vb h)-g(\vb p)$, then $\vb k\in B_s(\vb 0)$,
and since $g$ is continuous at $\vb p$, $\lim_{\vb h\to\vb 0}\vb k=\vb 0$.
Now we have
$$\norm{\vb k}=\norm{u(\vb h)+Dg(\vb p)(\vb h)}\le\norm{u(\vb h)}+\norm{Dg(\vb p)(\vb h)}\le(\varepsilon(\vb h)+\norm{Dg(\vb p)})\norm{\vb h}$$
and
$$(f\circ g)(\vb p+\vb h)-(f\circ g)(\vb p)-(Df(g(\vb p))\circ Dg(\vb p))(\vb h)
=f(g(\vb p)+\vb k)-f(g(\vb p))-(Df(g(\vb p))\circ Dg(\vb p))(\vb h)$$
$$=v(\vb k)+Df(g(\vb p))(\vb k)-Df(g(\vb p))(Dg(\vb p)(\vb h))
=v(\vb k)+Df(g(\vb p))(g(\vb p+\vb h)-g(\vb p)-Dg(\vb p)(\vb h))
=v(\vb k)+Df(g(\vb p))(u(\vb h))$$
and hence
$$\norm{(f\circ g)(\vb p+\vb h)-(f\circ g)(\vb p)-(Df(g(\vb p))\circ Dg(\vb p))(\vb h)}
=\norm{v(\vb k)+Df(g(\vb p))(u(\vb h))}$$
$$\le\norm{v(\vb k)}+\norm{Df(g(\vb p))(u(\vb h))}
\le\eta(\vb k)\norm{\vb k}+\norm{Df(g(\vb p))}\norm{u(\vb h)}
\le\eta(\vb k)(\varepsilon(\vb h)+\norm{Dg(\vb p)})\norm{\vb h}+\norm{Df(g(\vb p))}\varepsilon(\vb h)\norm{\vb h}$$
Thus, when $\vb h\neq\vb 0$,
$$\frac{\norm{(f\circ g)(\vb p+\vb h)-(f\circ g)(\vb p)-(Df(g(\vb p))\circ Dg(\vb p))(\vb h)}}{\norm{\vb h}}
\le\eta(\vb k)(\varepsilon(\vb h)+\norm{Dg(\vb p)})+\norm{Df(g(\vb p))}\varepsilon(\vb h)$$
Note that $\lim_{\vb h\to\vb 0}\varepsilon(\vb h)=0$ and, since $\lim_{\vb k\to\vb 0}\eta(\vb k)=0=\eta(\vb 0)$,
by limit of composite functions, $\lim_{\vb h\to\vb 0}\eta(\vb k)=\eta(\lim_{\vb h\to\vb 0}\vb k)=\eta(\vb 0)=0$.
We have $$\lim_{\vb h\to\vb 0}\p{\eta(\vb k)(\varepsilon(\vb h)+\norm{Dg(\vb p)})\norm{\vb h}+\norm{Df(g(\vb p))}\varepsilon(\vb h)\norm{\vb h}}=0$$
Therefore, by squeeze theorem, $$\lim_{\vb h\to\vb 0}\frac{\norm{(f\circ g)(\vb p+\vb h)-(f\circ g)(\vb p)-(Df(g(\vb p))\circ Dg(\vb p))(\vb h)}}{\norm{\vb h}}=0$$
We have shown that $D(f\circ g)(\vb p)$ exists and $D(f\circ g)(\vb p)=Df(g(\vb p))\circ Dg(\vb p)$.
$\blacksquare$
If $U,V\subseteq R$ and $f,g$ are real-valued, then clearly,
$$(f\circ g)'(p)=f'(g(p))g'(p)$$
Suppose $f$ and $g$ are differentiable, and the range of $g$ is a subset of the domain of $f$, then
$$J(f\circ g)=(Jf\circ g)Jg$$
where $Jf\circ g$ is a matrix of functions with $(Jf\circ g)_{kj}=Jf_{kj}\circ g$.
(show proof)
Proof.
$$J(f\circ g)_{ki}(\vb p)=D(f\circ g)(\vb p)_{ki}=(Df(g(\vb p))\circ Dg(\vb p))_{ki}=(Df(g(\vb p))Dg(\vb p))_{ki}$$
$$=\sum_jDf(g(\vb p))_{kj}Dg(\vb p)_{ji}=\sum_j(Jf_{kj}\circ g)(\vb p)Jg_{ji}(\vb p)=(\sum_j(Jf\circ g)_{kj}Jg_{ji})(\vb p)=((Jf\circ g)Jg)_{ki}(\vb p)$$
$\blacksquare$
Proposition.
$$\dv{}{x}x^{n+1}=(n+1)x^n$$
(show proof)
Proof.
For $n=0$,
$\lim_{h\to 0}\frac{(p+h)-p}{h}=1$, thus $\dv{}{x}x^{0+1}=1=(0+1)x^0$.
Suppose $\dv{}{x}x^{k+1}=(k+1)x^k$,
then
$$
\dv{}{x}x^{(k+1)+1}
=\dv{}{x}x^{k+1}x+x^{k+1}\dv{}{x}x
=(k+1)x^{k+1}+x^{k+1}
=((k+1)+1)x^{k+1}
$$
By induction, for all $n\in N$, $$\dv{}{x}x^{n+1}=(n+1)x^n$$
$\blacksquare$
Local extremum
Suppose $f$ is real-valued. If there exists $\vb p\in U$, such that there exists $\delta\gt0$ such that for all $\vb q\in U$ with $d(\vb p,\vb q)\lt\delta$,
$f(\vb q)\le f(\vb p)$, then we say $f$ has a local maximum at $\vb p$.
We define local minimum similarly. We say $f$ has a local extremum at $\vb p$ if
$f$ has a local maximum or a local minimum at $\vb p$.
Proposition.
Suppose $f$ is a real-valued function defined on $(a,b)$ where $a\lt b$. If $f$ has a local extremum at some $x\in(a,b)$ such that $f$ is also differentiable at $x$,
then $f'(x)=0$.
(show proof)
Proof.
Suppose $f$ has a local maximum at $x$.
There exists $\delta\gt0$ such that $(x-\delta,x+\delta)\subseteq(a,b)$ and for all $y\in(x-\delta,x+\delta)$, $f(y)\le f(x)$.
Note that for all $h\in(-\delta,0)$, $\frac{f(x+h)-f(x)}{h}\ge0$, and for all $h\in(0,\delta)$,
$\frac{f(x+h)-f(x)}{h}\le0$. Hence $f'(x)$ cannot take a value greater than $0$ or less than $0$.
Since $f'(x)\in R$, we have $f'(x)=0$.
The case where $f$ has a local minimum at $x$ can be proven similarly.
$\blacksquare$
Proposition.
Let $f:U\to R$ be continuous where $U\subseteq R^n$ is compact, then $f$ is bounded.
(show proof)
Proof.
Suppose $f$ is not bounded above, then for every positive natural number $n$, there exists $x_n\in U$ such that $f(x_n)\gt n$.
By axiom of choice, this defines a sequence $(x_n)$ of $U$ such that $f(x_n)\gt n$.
Then $(x_n)$ is bounded, and by Bolzano-Weierstrass theorem, it has a convergent subsequence $(x_{n_k})$.
Suppose $(x_{n_k})$ converges to $c$, since $U$ is closed, $c\in U$.
Since $f$ is continuous at $c$, $\lim_{x\to c}f(x)=f(c)$, so $\lim_{n\to\infty}f(x_{n_k})=f(c)$, a contradiction to $x_{n_k}\gt n_k\ge k$.
Therefore, $f$ is bounded above. By a symmetric argument, $f$ is also bounded below.
$\blacksquare$
Extreme value theorem
Let $f:U\to R$ be continuous where $U\subseteq R^n$ is non-empty and compact, then $f$ attains maximum and a minimum in $U$.
(show proof)
Proof.
By the above proposition, $f$ is bounded.
By completeness of real numbers, $f$ has a supremum $M$.
Let $n$ be a natural number. Since $M$ is the supremum of $f$, $M-1/n$ is not an upper bound of $f$.
Therefore, for every positive natural number $n$, there exists $x_n\in U$ such that $M\ge f(x_n)\gt M-1/n$.
By axiom of choice, this defines a sequence $(x_n)$ of $U$ such that $f(x_n)$ converges to $M$.
Since $U$ is bounded, $(x_n)$ is bounded.
By Bolzano-Weierstrass theorem, there exists a convergent subsequence $(x_{n_k})$ of $(x_n)$.
Suppose $(x_{n_k})$ converges to $c$, then $c\in U$ as $U$ is closed.
Since $f$ is continuous at $c$, $\lim_{x\to c}f(x)=f(c)$, so $\lim_{n\to\infty}f(x_{n_k})=f(c)$, which implies $f(c)=M$.
Therefore, $f$ attains maximum in $U$. By a symmetric argument, $f$ also attains minimum in $U$.
$\blacksquare$
Intermediate value theorem
Let $f:[a,b]\to R$ be continuous where $a\le b$, then for every real number $y$ between $f(a)$ and $f(b)$, there exists $x\in[a,b]$ such that $f(x)=y$.
(show proof)
Proof.
Since $f(a)$ and $f(b)$ are already taken by $a$ and $b$, we only need to consider values strictly in between.
Suppose $f(a)=f(b)$, then the theorem is trivially true.
Now Suppose $f(a)\lt f(b)$, for any $u\in(f(a),f(b))$,
let $S$ be the set of all $x\in [a,b]$ such that $f(x)\le u$. Then $S$ is non-empty since $a$ is an element of $S$.
Because $S$ is also bounded, by completeness of real numbers, its supremum, denoted $c$, exists, and clearly, $c\in[a,b]$.
So $\lim_{x\to c}f(x)=f(c)$. This means that for every $\varepsilon\gt 0$, there exists $\delta\gt 0$ such that $x\in[a,b]\cap(c-\delta,c+\delta)$
implies $f(x)\in (f(c)-\varepsilon,f(c)+\varepsilon)$, that is, $f(c)\in(f(x)-\varepsilon,f(x)+\varepsilon)$.
By definition of supremum, there exists $s\in S$ such that $s\in(c-\delta,c]$, and we also have $s\in[a,b]$, so $f(c)\lt f(s)+\varepsilon\le u+\varepsilon$.
Now if $c\lt b$, take $r\in[a,b]\cap(c,c+\delta)$, then $r\notin S$, so $f(c)\gt f(r)-\varepsilon\gt u-\varepsilon$;
otherwise, we have $c=b$, so $f(c)=f(b)\gt u\gt u-\varepsilon$. So we have $f(c)\in(u-\varepsilon,u+\varepsilon)$.
Since $\varepsilon\gt0$ is arbitrarily chosen, we conclude that $f(c)=u$.
For the case $f(a)\gt f(b)$, take $g=-f$, then $g(a)\lt g(b)$, so for any $u\in(f(b),f(a))$, we have $-u\in(g(a),g(b))$, and there exists $c\in[a,b]$ such that
$g(c)=-u$, hence $f(c)=-g(c)=-(-u)=u$.
$\blacksquare$
Proposition.
Let $f:(a,b)\to R$ where $a\lt b$. If $f$ has a local extremum at some $x\in(a,b)$ such that $f$ is also differentiable at $x$,
then $f'(x)=0$.
(show proof)
Proof.
Suppose $f$ has a local maximum at $x$.
There exists $\delta\gt0$ such that $(x-\delta,x+\delta)\subseteq(a,b)$ and for all $y\in(x-\delta,x+\delta)$, $f(y)\le f(x)$.
Note that for all $h\in(-\delta,0)$, $\frac{f(x+h)-f(x)}{h}\ge0$, and for all $h\in(0,\delta)$,
$\frac{f(x+h)-f(x)}{h}\le0$. Hence $f'(x)$ cannot take a value greater than $0$ or less than $0$.
Since $f'(x)\in R$, we have $f'(x)=0$.
The case where $f$ has a local minimum at $x$ can be proven similarly.
$\blacksquare$
Cauchy's mean value theorem
Let $f:[a,b]\to R$ and $g:[a,b]\to R$ be continuous on $[a,b]$ and differentiable in $(a,b)$ where $a\lt b$, then there exists $x\in(a,b)$ such that
$$(f(b)-f(a))g'(x)=(g(b)-g(a))f'(x)$$
(show proof)
Proof.
Let $h(t)=(f(b)-f(a))g(t)-(g(b)-g(a))f(t)$ for $t\in[a,b]$.
Then $h$ is continuous on $[a,b]$ and differentiable in $(a,b)$,
and $h(a)=f(b)g(a)-f(a)g(b)=h(b)$.
Suppose for all $t\in(a,b)$, $h(t)=h(a)$, then let $c=(b-a)/2$, we have $c\in(a,b)$ and $h'(c)=0$.
Now suppose there exists $t\in(a,b)$ such that $h(t)\neq h(a)$.
Consider the case where $h(t)\gt h(a)$. by extreme value theorem, there exists $c\in[a,b]$ such that $h$ attains maximum at $c$.
Since $h(t)\gt h(a)=h(b)$, $c\in(a,b)$. So the restriction $h|_{(a,b)}$ of $h$ has $h|_{(a,b)}'(c)=0$, hence $h'(c)=0$.
By a symmetric argument, for the case where $h(t)\lt h(a)$, there exists $c\in(a,b)$ such that $h'(c)=0$.
Therefore, in every case, there exists $c\in(a,b)$ such that $h'(c)=0$. And for that $c$, we have $(f(b)-f(a))g'(c)-(g(b)-g(a))f'(c)=h'(c)=0$,
so $(f(b)-f(a))g'(c)=(g(b)-g(a))f'(c)$.
$\blacksquare$
Mean value theorem
Let $f:[a,b]\to R$ be continuous on $[a,b]$ and differentiable in $(a,b)$ where $a\lt b$, then there exists $x\in(a,b)$ such that
$$(f(b)-f(a))=(b-a)f'(x)$$
(show proof)
Proof.
Take $g(x)=x$. Since $g$ is linear, it is differentiable and hence continuous.
So the restriction $g|_{[a,b]}$ of $g$ is continuous on $[a,b]$ and differentiable in $(a,b)$.
Since $\lim_{h=0}\frac{g(x+h)-g(x)}{h}=\lim_{h=0}\frac{h}{h}=1$ for all $x\in R$,
we have $g|_{[a,b]}'(x)=1$ for all $x\in(a,b)$.
By Cauchy's mean value theorem, there exists $x\in(a,b)$ such that $(f(b)-f(a))g'(x)=(g(b)-g(a))f'(x)$, implying $(f(b)-f(a))=(b-a)f'(x)$ for that $x$.
$\blacksquare$
Lemma.
Let $B_r(\vb p)$ be some open ball in $R^n$, then given $\vb a,\vb b\in B_r(\vb p)$, define $s(t)=(1-t)\vb a+t\vb b$,
then $s(t)\in B_r(\vb p)$ for all $t\in[0,1]$.
(show proof)
Proof.
Let $\vb a,\vb b\in B_r(\vb p)$. Define $s(t)=(1-t)\vb a+t\vb b$.
When $t\in[0,1]$, we have
$$d(\vb p,s(t))=\norm{s(t)-\vb p}=\norm{(1-t)(\vb a-\vb p)+t(\vb b-\vb p)}\le\norm{(1-t)(\vb a-\vb p)}+\norm{t(\vb b-\vb p)}$$
$$=\abs{1-t}\norm{\vb a-\vb p}+\abs{t}\norm{\vb b-\vb p}\le\sup(\norm{\vb a-\vb p},\norm{\vb b-\vb p})\lt r$$
so $s(t)\in B_r(\vb p)$.
$\blacksquare$
Lemma.
Given $\vb v\in R^n$, $$\Vert\vb v\Vert\le\sum_i\abs{v_i}\le\sqrt n\Vert\vb v\Vert$$
(show proof)
Proof.
Since $$\sum_i\abs{v_i}^2\le(\sum_i\abs{v_i})^2$$
we have $$\Vert\vb v\Vert=\sqrt{\sum_i\abs{v_i}^2}\le\sum_i\abs{v_i}$$
By Cauchy-Schwarz inequality,
$$\sum_i\abs{v_i}=\sum_i(\abs{v_i}1)\le\sqrt{\sum_i\abs{v_i}^2}\sqrt{n}=\sqrt{n}\Vert\vb v\Vert$$
$\blacksquare$
Note.
Note that the concept of continuous differentiability can be extended to total derivatives, with a metric on $R^{m\times n}$ defined in the "linear transformation" section of the "linear algebra" chapter.
Proposition.
Suppose $U$ is an open set. Then $f$ is continuously differentiable if and only if all partial derivatives $D_if_j$ exist and are continuous on $U$.
(show proof)
Proof.
Suppose $f$ is continuously differentiable. Then all partial derivatives of $f$ exist on $U$.
Given a partial derivative $D_if_j$, for all $\vb x,\vb y\in U$,
$$d(D_if_j(\vb x),D_if_j(\vb y))=\norm{(Df(\vb x)\vb e_i)_j-(Df(\vb y)\vb e_i)_j}
=\norm{((Df(\vb x)-Df(\vb y))\vb e_i)_j}\le\norm{(Df(\vb x)-Df(\vb y))\vb e_i}\le\norm{Df(\vb x)-Df(\vb y)}=d(Df(\vb x),Df(\vb y))$$
Hence for all $\vb p\in U$, for all $\varepsilon\gt0$, there exists $\delta\gt0$ such that $B_\delta(\vb p)\subseteq U$ and for all $\vb q\in B_\delta(\vb p)$,
$Df(\vb q)\in B_\varepsilon(Df(\vb p))$, hence $D_if_j(\vb q)\in B_\varepsilon(D_if_j(\vb p))$, implying $D_if_j$ is continuous in $U$.
Now suppose all partial derivatives $D_if_j$ exist and are continuous on $U$.
Given $\vb p\in U$, for all $\varepsilon\gt0$, let $\eta=\frac{\varepsilon}{2nm}$, define $r\gt0$ such that $B_r(\vb p)\subseteq U$ and
for all $\vb q\in B_r(\vb p)$, $D_if_j(\vb q)\in B_{\eta}(D_if_j(\vb p))$.
Let $\vb h\in B_r(\vb 0)\setminus\{\vb0\}$. For some scalars $h_i$, $\vb h=\sum_ih_i\vb e_i$.
Denote $\sum_{k=1}^ih_k\vb e_k$ by $\vb v_i$ for $i$ from $0$ to $n$, then
$$f_j(\vb p+\vb h)-f_j(\vb p)=\sum_i(f_j(\vb p+\vb v_i)-f_j(\vb p+\vb v_{i-1}))$$
Note that for each $\vb v_i$, $d(\vb p+\vb v_i,\vb p)=\norm{(\vb p+\vb v_i)-\vb p}=\norm{\vb v_i}\le\norm{\vb h}\lt r$, so $\vb p+\vb v_i\in B_r(\vb p)$.
If we define $s_i(t)=(1-t)(\vb p+\vb v_{i-1})+t(\vb p+\vb v_i)$, then $s_i(t)\in B_r(\vb p)$ for all $t\in[0,1]$.
Suppose $h_i\neq0$, then for all $t\in[0,1]$,
$$(f_j\circ s_i)'(t)
=\lim_{h\to0}\frac{f_j(s_i(t)+hh_i\vb e_i)-f_j(s_i(t))}{h}
=h_i\lim_{h\to0}\frac{f_j(s_i(t)+hh_i\vb e_i)-f_j(s_i(t))}{hh_i}
=h_iD_if_j(s_i(t))$$
Suppose $h_i=0$, then $f_j\circ s_i$ is a constant function, so we still have $(f_j\circ s_i)'(t)=h_iD_if_j(s_i(t))$ for all $t\in[0,1]$.
By mean value theorem, there exists $c_i\in(0,1)$ such that
$$h_iD_if_j(s_i(c_i))=(f_j\circ s_i)'(c_i)=(f_j\circ s_i)(1)-(f_j\circ s_i)(0)=f_j(\vb p+\vb v_i)-f_j(\vb p+\vb v_{i-1})$$
Hence $$f_j(\vb p+\vb h)-f_j(\vb p)=\sum_i(h_iD_if_j(s_i(c_i)))$$
And we have
$$\norm{f_j(\vb p+\vb h)-f_j(\vb p)-\sum_i(h_iD_if_j(\vb p))}
=\norm{\sum_i(h_iD_if_j(s_i(c_i)))-\sum_i(h_iD_if_j(\vb p))}
=\norm{\sum_i(h_i(D_if_j(s_i(c_i))-D_if_j(\vb p)))}
\le\sum_i\abs{h_i}\norm{D_if_j(s_i(c_i))-D_if_j(\vb p)}
\lt\sum_i\abs{h_i}\eta
\le\sqrt{n}\norm{\vb h}\eta
\lt\norm{\vb h}\varepsilon$$
Therefore, $f_j$ is differentiable at $\vb p$ and $Df_j(\vb p)_i=D_if_j(\vb p)$.
Since each component function is differentiable, $f$ is differentiable at $\vb p$ and $Df(\vb p)_{j,*}=Df_j(\vb p)$.
For continuity of $Df$, with $\vb p$, $\varepsilon$ and $r$ defined the same way,
for all $\vb q\in B_r(\vb p)$, given $\vb x$ such that $\Vert\vb x\Vert\le1$,
we have
$$\norm{(Df(\vb q)-Df(\vb p))\vb x}
\le\sum_j\abs{\sum_i(D_if_j(\vb q)-D_if_j(\vb p))x_i}
\le\sum_j\sum_i\abs{(D_if_j(\vb q)-D_if_j(\vb p))}\abs{x_i}
\le\sum_j\sum_i\frac{\varepsilon}{2nm}\Vert\vb x\Vert
\le\frac{\varepsilon}{2}$$
Hence $d(Df(\vb q),Df(\vb p))=\norm{(Df(\vb q)-Df(\vb p))}\le\frac{\varepsilon}{2}\lt\varepsilon$.
$\blacksquare$
Proposition.
Let $V\subset R^n$ and $U\subset R^m$ be open sets, let $g:V\to R^m$ and $f:U\to R^l$ be continuously differentiable.
If for all $\vb x\in V$, $g(\vb x)\in U$, then $f\circ g$ is continuously differentiable.
(show proof)
Proof.
Differentiability of $f\circ g$ follows directly from chain rule.
Note that $J(f\circ g)=(Jf\circ g)Jg$. Thus $$J(f\circ g)_{ki}=\sum_j(Jf\circ g)_{kj}Jg_{ji}=\sum_j(Jf_{kj}\circ g)Jg_{ji}$$
By continuous differentiability of $f$ and $g$, every $Jf_{kj}$ is continuous on $U$ and
every $Jg_{ji}$ is continuous on $V$. By differentiability and hence continuity of $g$, and continuity of composite functions,
every $Jf_{kj}\circ g$ is continuous on $V$. Hence, by continuity of function operations,
every $J(f\circ g)_{ki}$ is continuous on $V$. Therefore, $f\circ g$ is continuously differentiable.
$\blacksquare$
L'Hôpital's rule
Suppose we have $f:U\to R$ and $g:V\to R$ where $U,V\subseteq R$, an open interval $I$, and $c\in I$, such that
- $\lim_{x\to c}f(x)=\lim_{x\to c}g(x)=0$ or $\lim_{x\to c}g(x)=\pm\infty$,
- $f$ and $g$ are differentiable in $I\setminus\{c\}$,
- $g'(x)\neq0$ for all $x\in I\setminus\{c\}$, and
- $\lim_{x\to c}\frac{f'(x)}{g'(x)}$ exists in $\overline R$,
then $$\lim_{x\to c}\frac{f(x)}{g(x)}=\lim_{x\to c}\frac{f'(x)}{g'(x)}$$
Note that $c$ can be replaced by $\pm\infty$, with $I$ being an open interval with an endpoint being $c$.
(show proof)
Proof.
Let $I_1$ and $I_2$ be open intervals obtained by splitting $I$ by $c$.
Note that $g'$ is non-zero in $I_1$ and $I_2$.
Suppose $a,b\in I_1$ with $g(a)=g(b)=0$. If $a\neq b$, then there exists $x$ strictly between $a$ and $b$ such that $g'(x)=0$, a contradiction.
Hence $g$ either has no zero or has a unique zero in $I_1$. Same for $I_2$.
Therefore, we can find an open interval $c\in(a,b)\subseteq I$ such that $g$ is non-zero in $(a,b)\setminus\{c\}$.
Suppose $\lim_{x\to c}\frac{f'(x)}{g'(x)}=L\in R$.
Then for every $\varepsilon\gt0$, there exists $\delta\gt0$ such that $c+\delta\lt b$ and for all $x\in(c,c+\delta)$, $\frac{f'(x)}{g'(x)}\in(L-\varepsilon/2,L+\varepsilon/2)$.
Let $y\in(c,c+\delta)$, then for every $x\in(c,y)$,
by Cauchy's mean value theorem, there exists $t\in(x,y)$ such that $(f(y)-f(x))g'(t)=(g(y)-g(x))f'(t)$.
Again, since $g'$ is non-zero in $(c,b)$, $g(y)-g(x)\neq0$.
Hence $$\frac{f'(t)}{g'(t)}=\frac{f(y)-f(x)}{g(y)-g(x)}$$
Suppose $\lim_{x\to c}f(x)=\lim_{x\to c}g(x)=0$.
Note that we have $$\frac{f'(t)}{g'(t)}=\frac{f(y)-f(x)}{g(y)-g(x)}=\frac{\frac{f(y)}{g(y)}-\frac{f(x)}{g(y)}}{1-\frac{g(x)}{g(y)}}$$
Since $t\in(c,c+\delta)$, we have $$\frac{\frac{f(y)}{g(y)}-\frac{f(x)}{g(y)}}{1-\frac{g(x)}{g(y)}}=\frac{f'(t)}{g'(t)}\in(L-\varepsilon/2,L+\varepsilon/2)$$
Since $\lim_{x\to c}f(x)=\lim_{x\to c}g(x)=0$,
$$\lim_{x\to c}\frac{\frac{f(y)}{g(y)}-\frac{f(x)}{g(y)}}{1-\frac{g(x)}{g(y)}}=\frac{f(y)}{g(y)}$$
and we have $\frac{f(y)}{g(y)}\in[L-\varepsilon/2,L+\varepsilon/2]\subset(L-\varepsilon,L+\varepsilon)$,
where $y$ can be arbitrarily chosen in $(c,c+\delta)$.
By a symmetric argument, there exists $\delta'\gt0$ such that for all $y\in (c-\delta',c)$, $\frac{f(y)}{g(y)}\in(L-\varepsilon,L+\varepsilon)$.
Therefore, let $\delta^*=\inf(\delta,\delta')$, then for all $x\in (c-\delta^*,c+\delta^*)\setminus\{c\}$, $\frac{f(x)}{g(x)}\in(L-\varepsilon,L+\varepsilon)$.
We have shown that $\lim_{x\to c}\frac{f(x)}{g(x)}=L$.
Suppose $\lim_{x\to c}g(x)=\pm\infty$.
Note that we have
$$\frac{f'(t)}{g'(t)}=\frac{f(x)-f(y)}{g(x)-g(y)}=\frac{\frac{f(x)}{g(x)}-\frac{f(y)}{g(x)}}{1-\frac{g(y)}{g(x)}}$$
$$\frac{f'(t)}{g'(t)}\p{1-\frac{g(y)}{g(x)}}=\frac{f(x)}{g(x)}-\frac{f(y)}{g(x)}$$
$$\frac{f'(t)}{g'(t)}=\frac{f(x)}{g(x)}-\p{\frac{f(y)}{g(x)}-\frac{f'(t)}{g'(t)}\frac{g(y)}{g(x)}}$$
And we will denote $\frac{f(y)}{g(x)}-\frac{f'(t)}{g'(t)}\frac{g(y)}{g(x)}$ by $r(x)$,
where $y$ is fixed and $t$ depends on $x$, so that
$$\frac{f(x)}{g(x)}=\frac{f'(t)}{g'(t)}+r(x)$$
Since $t\in(c,c+\delta)$, $\frac{f'(t)}{g'(t)}\in(L-\varepsilon/2,L+\varepsilon/2)$,
so $\frac{f(x)}{g(x)}\in(L-\varepsilon/2+r(x),L+\varepsilon/2+r(x))$.
Note that, since $\lim_{x\to c}g(x)=\pm\infty$, we have $\lim_{x\to c}\frac{1}{g(x)}=0$.
Combining with the fact that $\frac{f'(t)}{g'(t)}$, as a function of $x$, is bounded in $(c,y)$,
we have $$\lim_{x\to c^+}r(x)=0$$
So given the same $\varepsilon$, there exists $\sigma\gt0$ such that $c+\sigma\lt y$ and for all $x\in (c,c+\sigma)$, $r(x)\in(-\varepsilon/2,\varepsilon/2)$.
Then for all $x\in (c,c+\sigma)$, $\frac{f(x)}{g(x)}\in(L-\varepsilon,L+\varepsilon)$.
By a symmetric argument, there exists $\sigma'\gt0$ such that for all $x\in (c-\sigma',c)$, $\frac{f(x)}{g(x)}\in(L-\varepsilon,L+\varepsilon)$.
Therefore, let $\sigma^*=\inf(\sigma,\sigma')$, then for all $x\in (c-\sigma^*,c+\sigma^*)\setminus\{c\}$, $\frac{f(x)}{g(x)}\in(L-\varepsilon,L+\varepsilon)$.
We have shown that $\lim_{x\to c}\frac{f(x)}{g(x)}=L$.
The cases where $c=\pm\infty$ or $\lim_{x\to c}\frac{f'(x)}{g'(x)}=\pm\infty$ can be proven similarly.
If $c=\pm\infty$, we only need to prove one side of the limit, obviously.
$\blacksquare$
Differentiability class
Suppose $U$ is open. Let $k\in N$. If given any $j\in\{1,\ldots,m\}$, any $k$-tuple $\vb i$ of $\{1,\ldots,n\}$,
there exists an $(k+1)$-tuple of functions $(g_0,\ldots,g_k)$ from $U$ to $R$ such that
- $g_0=f_j$,
- $g_{l}=D_{i_l}g_{l-1}$ for $l\in\{1,\ldots,k\}$, and
- $g_k$ is continuous,
then $f$ is said to be of differentiability class $k$, and we also say that $f$ is $C^k$.
The set of all $C^k$ functions on $U$ is denoted $C^k(U)$.
Note that $C^0$ is equivalent to continuity and $C^1$ is equivalent to continuous differentiability.
Smoothness
Suppose $U$ is open. If $f\in C^k(U)$ for any natural number $k$, then $f$ is said to be smooth, and we also say that $f$ is $C^\infty$.
The set of all $C^\infty$ functions on $U$ is denoted $C^\infty(U)$.
Note.
By definition, $f$ is smooth (or $C^k$) if and only if all component functions are smooth (or $C^k$).
Also, smoothness implies continuous differentiability and hence continuity.
Smoothness of function operations
Given smooth (or $C^k$) functions $f:U\to R^m$ and $g:U\to R^m$ where $U\subseteq R^n$ and $c\in R$, $cf$, $f\pm g$, $f\cdot g$ are smooth (or $C^k$),
if $f$ and $g$ are real valued, then $fg$ is smooth (or $C^k$).
(show proof)
Proof.
We will only show the smooth case, the $C^k$ case follows similarly.
Note that $(cf)_j=cf_j$.
For each depth $k$, suppose
$$D_{i_{k-1}}\ldots D_{i_1}(cf)_j=c\alpha$$
where $\alpha:U\to R$ is a smooth.
Then
$$D_{i_k}\ldots D_{i_1}(cf)_j=D_{i_k}(c\alpha)=J(c\alpha)_{i_k}=cJ\alpha_{i_k}=cD_{i_k}\alpha$$
which takes the form $c\beta$ where $\beta:U\to R$ is a smooth.
Hence $cf$ is smooth.
Note that $(f+g)_j=f_j+g_j$.
For each depth $k$, suppose
$$D_{i_{k-1}}\ldots D_{i_1}(f+g)_j=\alpha+\beta$$
where $\alpha:U\to R$ and $\beta:U\to R$ are smooth.
Then
$$D_{i_k}\ldots D_{i_1}(f+g)_j=D_{i_k}(\alpha+\beta)=J(\alpha+\beta)_{i_k}=J\alpha_{i_k}+J\beta_{i_k}=D_{i_k}\alpha+D_{i_k}\beta$$
which takes the form $\alpha'+\beta'$ where $\alpha':U\to R$ and $\beta':U\to R$ are smooth.
Hence $f+g$ is smooth.
For each depth $k$, suppose
$$D_{i_{k-1}}\ldots D_{i_1}(f\cdot g)=\sum_t(\alpha_t\beta_t)$$
where each $\alpha_t:U\to R$ and $\beta_t:U\to R$ are smooth.
Then
$$
D_{i_{k}}\ldots D_{i_1}(f\cdot g)
=\p{J\p{\sum_t(\alpha_t\beta_t)}}_{i_{k}}
=\sum_tJ(\alpha_t\beta_t)_{i_{k}}
=\sum_t((J\alpha_t)^T\beta_t+(J\beta_t)^T\alpha_t)^T_{i_{k}}
=\sum_t(D_{i_{k}}\alpha_t\beta_t)+\sum_t(D_{i_{k}}\beta_t\alpha_t)
$$
which takes the form $\sum_s(\alpha_s\beta_s)$ where
each $\alpha_s:U\to R$ and $\beta_s:U\to R$ are smooth.
Hence $f\cdot g$ is smooth.
$\blacksquare$
Smoothness of composite functions
Given smooth (or $C^k$) functions $f:V\to R^l$ and $g:U\to R^m$ where $V\subseteq R^m$ and $U\subseteq R^n$
such that $g(U)\subseteq V$, the composite function $f\circ g:U\to R^l$ is smooth (or $C^k$).
(show proof)
Proof.
We will only show the smooth case, the $C^k$ case follows similarly.
Note that $(f\circ g)_j=f_j\circ g$.
For each depth $k$, suppose
$$D_{i_{k-1}}\ldots D_{i_1}(f\circ g)_j=\sum_t((\alpha_t\circ\beta_t)\gamma_t)$$
where $\alpha_t:V\to R$, $\beta_t:U\to V$, $\gamma_t:U\to R$ are smooth.
Then
$$D_{i_k}\ldots D_{i_1}(f\circ g)_j
=\p{J\sum_t((\alpha_t\circ\beta_t)\gamma_t)}_{i_k}
=\sum_t(J((\alpha_t\circ\beta_t)\gamma_t))_{i_k}
=\sum_t((J(\alpha_t\circ\beta_t)^T\gamma_t+J\gamma_t^T(\alpha_t\circ\beta_t))^T)_{i_k}
=\sum_t((J(\alpha_t\circ\beta_t)^T\gamma_t)_{i_k}+(J\gamma_t^T(\alpha_t\circ\beta_t))_{i_k})
$$ $$
=\sum_t(J(\alpha_t\circ\beta_t)_{i_k}\gamma_t+(J\gamma_t)_{i_k}(\alpha_t\circ\beta_t))
=\sum_t\p{\p{\sum_j((J\alpha_t)_j\circ\beta_t)(J\beta_t)_{ji_k}}\gamma_t+(J\gamma_t)_{i_k}(\alpha_t\circ\beta_t)}
=\sum_t\sum_j(D_j\alpha_t\circ\beta_t)D_{i_k}{\beta_t}_j\gamma_t+\sum_t(\alpha_t\circ\beta_t)D_{i_k}\gamma_t$$
which takes the form $\sum_s((\alpha_s\circ\beta_s)\gamma_s)$ where
$\alpha_s:V\to R$, $\beta_s:U\to V$, $\gamma_s:U\to R$ are smooth.
Hence $f\circ g$ is smooth.
$\blacksquare$
Proposition.
Let $f:U\to R^m$ where $U\subseteq R^n$ be $C^2$, then $$D_iD_jf=D_jD_if$$ for all $i,j\in\{1,\ldots,n\}$.
(show proof)
Proof.
Let $\vb p\in U$. Then
$$D_iD_jf(\vb p)=\lim_{h\to0}\frac{D_jf(\vb p+h\vb e_i)-D_jf(\vb p)}{h}
=\lim_{h\to0}\frac{\lim_{k\to0}\frac{f(\vb p+h\vb e_i+k\vb e_j)-f(\vb p+h\vb e_i)}{k}-\lim_{k\to0}\frac{f(\vb p+k\vb e_j)-f(\vb p)}{k}}{h}
=\lim_{h\to0}\lim_{k\to0}\frac{(f(\vb p+h\vb e_i+k\vb e_j)-f(\vb p+h\vb e_i))-(f(\vb p+k\vb e_j)-f(\vb p))}{hk}$$
Now define $u(t)=f(\vb p+t\vb e_i+k\vb e_j)-f(\vb p+t\vb e_i)$,
then $u'(t)=D_if(\vb p+t\vb e_i+k\vb e_j)-D_if(\vb p+t\vb e_i)$
and $$D_iD_jf(\vb p)=\lim_{h\to0}\lim_{k\to0}\frac{u(h)-u(0)}{hk}$$
By mean value theorem, there exists $s(h)$ strictly between $0$ and $h$ such that $u(h)-u(0)=hu'(s(h))=h(D_if(\vb p+s(h)\vb e_i+k\vb e_j)-D_if(\vb p+s(h)\vb e_i))$.
Note that $\lim_{h\to0}s(h)=0$.
Thus $$D_iD_jf(\vb p)=\lim_{h\to0}\lim_{k\to0}\frac{D_if(\vb p+s(h)\vb e_i+k\vb e_j)-D_if(\vb p+s(h)\vb e_i)}{k}
=\lim_{h\to0}D_jD_if(\vb p+s(h)\vb e_i)
=D_jD_if(\vb p)$$
$\blacksquare$
Generalization of smoothness
Assume $m,n\neq0$ as usual. A map $f:U\to R^m$ where $U$ is any subset of $R^n$ is called smooth if and only if
for all $x\in U$, there exist an open subset $U_x$ of $R^n$ containing $x$ and a smooth map
$f_x:U_x\to R^m$ that agrees with $f$ on $U_x\cap U$.
If $m=0$ or $n=0$, we define $f$ to be smooth.
When $m\neq0$, $n\neq0$, and $U$ is open, this definition is equivalent to the original definition of smoothness
(show proof).
Proof.
If $f$ is smooth by the original definition, it is trivial that $f$ is smooth by the generalized definition.
Suppose $f$ is smooth by the generalized definition.
Let $j\in\{1,\ldots,m\}$, let $l\in N$, let $\vb i$ be an $l$-tuple of $\{1,\ldots,n\}$.
Define $g_0=f_j$.
Let $x\in U$, then there exists $r_x\gt0$ such that $f(u)=f_x(u)$ for all $u\in B_{r_x}(x)\subseteq U\cap U_x$.
And there exists an $(l+1)$-tuple of functions $(g_{x,0},\ldots,g_{x,l})$ from $U_x$ to $R$ such that
- $g_{x,0}={f_x}_j$,
- $g_{x,k}=D_{i_k}g_{x,k-1}$ for $k\in\{1,\ldots,l\}$, and
- $g_{x,l}$ is continuous.
And we have $g_{x,0}(u)={f_x}_j(u)=f_j(u)=g_0(u)$ for all $u\in B_{r_x}(x)$.
Inductively, for $k$ from $1$ to $l$, suppose for all $x\in U$, for all $u\in B_{r_x}(x)$, $g_{x,k-1}(u)=g_{k-1}(u)$.
Let $x\in U$, then $g_{x,k}=D_{i_k}g_{x,k-1}$, by localness of derivative, for all $u\in B_{r_x}(x)$, $D_{i_k}g_{k-1}(u)=g_{x,k}(u)$.
Thus $D_{i_k}g_{k-1}(x)=g_{x,k}(x)$. Define $g_k=D_{i_k}g_{k-1}$,
then for all $x\in U$, for all $u\in B_{r_x}(x)$, $g_{x,k}(u)=D_{i_k}g_{k-1}(u)=g_k(u)$.
By induction, $g_{k}=D_{i_k}g_{k-1}$ for $k\in\{1,\ldots,l\}$.
And for all $x\in U$, for all $u\in B_{r_x}(x)$, $g_{x,l}(u)=g_l(u)$.
Let $x\in U$, since $g_{x,l}$ is continuous, by localness of continuity, $g_l$ is continuous at $x$.
Thus $g_l$ is continuous.
We have shown that $f$ is smooth by the original definition.
$\blacksquare$
Trivially, smoothness in this definition implies continuity.
Localness of smoothness
Let $m,n\in N$ and let $f:U\to R^m$, where $U$ is any subset of $R^n$, be smooth.
Then given any subset $V$ of $U$, $f|_V$ is smooth.
(show proof)
Proof.
Trivial.
$\blacksquare$
Generalized smoothness of function operations
Let $m,n\in N$. Given smooth functions $f:U\to R^m$ and $g:U\to R^m$ where $U\subseteq R^n$ and $c\in R$, $cf$, $f\pm g$, $f\cdot g$ are smooth,
if $f$ and $g$ are real valued, then $fg$ is smooth.
(show proof)
Proof.
Trivial.
$\blacksquare$
Generalized smoothness of composite functions
Let $l,m,n\in N$. Given smooth functions $f:V\to R^l$ and $g:U\to R^m$ where $V\subseteq R^m$ and $U\subseteq R^n$
such that $g(U)\subseteq V$, the composite function $f\circ g:U\to R^l$ is smooth.
(show proof)
Proof.
If $n=0$ or $l=0$, then $f\circ g$ is smooth by definition.
If $m=0$, then $f\circ g$ is constant and thus smooth.
Suppose $l,m,n\neq0$.
Let $x\in U$, then there exists an open subset $U_x$ of $R^n$ containing $x$ and a smooth map
$g_x:U_x\to R^m$ that agrees with $g$ on $U_x\cap U$.
Note that $g(x)\in V$, thus there exists an open subset $V_x$ of $R^m$ containing $g(x)$ and a smooth map
$f_x:V_x\to R^l$ that agrees with $f$ on $V_x\cap V$.
Then there exists $r_x\gt0$ such that $B_{r_x}(g(x))\subseteq V_x$.
Since smoothness implies continuity, there exists $s_x\gt0$ such that $B_{s_x}(x)\subseteq U_x$ and for all $u\in B_{s_x}(x)$, $g_x(u)\in B_{r_x}(g(x))$.
By smoothness of composite functions, $f_x|_{B_{r_x}(g(x))}\circ g_x|_{B_{s_x}(x)}$ is smooth.
And for all $u\in B_{s_x}(x)\cap U$, we have
$(f_x|_{B_{r_x}(g(x))}\circ g_x|_{B_{s_x}(x)})(u)=f_x(g_x(u))=f(g(u))=(f\circ g)(u)$.
We have shown that $f\circ g$ is smooth.
$\blacksquare$
Diffeomorphism
Let $U\subseteq R^n$ and $V\subseteq R^m$ where $n,m\in N$.
If $f:U\to V$ is a smooth bijection whose inverse is also smooth, then $f$ is said to be a diffeomorphism,
and $U$ and $V$ are said to be diffeomorphic.
Trivially, a diffeomorphism is also a homeomorphism, in the topological sense.
Proposition.
Suppose $f:V\to W$ and $g:U\to V$ are diffeomorphisms, then $f\circ g:U\to W$ is a diffeomorphism.
(show proof)
Proof.
This follows directly from generalized smoothness of composite functions.
$\blacksquare$
Proposition.
Let $U$ be an open subset of $R^n$ and $V$ be an open subset of $R^m$ where $n,m\in N$
such that $U$ and $V$ are non-empty and diffeomorphic, then $n=m$,
and if $n,m$ are non-zero, given any diffeomorphism $f:U\to V$, for all $p\in U$, $Df(p)^{-1}=D(f^{-1})(f(p))$.
(show proof)
Proof.
If $n=0$ or $m=0$, then we trivially have $n=m$. Now suppose $n,m$ are non-zero.
Let $f:U\to V$ be a diffeomorphism.
Let $p\in U$.
Then $I_n=D(f^{-1}\circ f)(p)=D(f^{-1})(f(p))Df(p)$
and $I_m=D(f\circ f^{-1})(f(p))=Df(p)D(f^{-1})(f(p))$.
Thus $n=\tr(D(f^{-1})(f(p))Df(p))=\tr(Df(p)D(f^{-1})(f(p)))=m$.
And we have $Df(p)^{-1}=D(f^{-1})(f(p))$.
Since $U$ and $V$ are diffeomorphic and $U$ is non-empty, such $f$ and $p$ exist, thus $n=m$.
$\blacksquare$
Definition.
Suppose $f:X\to X$ where $X$ is a metric space. If there exists $c\lt1$ such that for all $x,y\in X$, we have
$$d(f(x),f(y))\le cd(x,y)$$
then $f$ is said to be a contraction on $X$.
Lemma.
Let $X$ be a metric space such that every Cauchy sequence converges.
Then given a contraction $f$ on $X$, there exists a unique $x\in X$ such that $f(x)=x$.
(show proof)
Proof.
Let $x^*\in X$. Use recursion theorem to define a function $\phi:N\to X$ such that $\phi(0)=x^*$ and $\phi(n+1)=f(\phi(n))$.
Then $\phi$ defines a sequence $(x_n)$ of $X$. Since $f$ is a contraction on $X$,
there exists $c\lt1$ such that for all $x,y\in X$, we have $d(f(x),f(y))\le cd(x,y)$.
Note that $c\ge0$.
By induction, $d(x_n,x_{n+1})\le c^nd(x_0,x_1)$.
Suppose $n\lt m$, then
$$d(x_n,x_m)\le\sum_{i=n}^{m-1}d(x_i,x_{i+1})\le\sum_{i=n}^{m-1}(c^id(x_0,x_1))=d(x_0,x_1)\sum_{i=n}^{m-1}c^i
=d(x_0,x_1)\frac{c^n(1-c^{m-n})}{1-c}\le d(x_0,x_1)\frac{c^n}{1-c}$$
For every $\varepsilon\gt0$, there exists $N_0$ such that $d(x_0,x_1)\frac{c^{N_0}}{1-c}\lt\varepsilon$,
so for all $m,n\ge N_0$, $d(x_n,x_m)\le d(x_0,x_1)\frac{c^{N_0}}{1-c}\lt\varepsilon$.
Hence $(x_n)$ is a Cauchy sequence, and so $\lim x_n=x$ for some $x\in X$.
Now suppose $f(x)\neq x$, then $d(x,f(x))\gt0$, so there exists some $n$ such that $d(x_n,x)\lt\frac{d(x,f(x))}{4}$ and $d(x_n,x_{n+1})\lt\frac{d(x,f(x))}{4}$.
But then we have $d(x,x_{n+1})\le d(x_n,x)+d(x_n,x_{n+1})\lt\frac{d(x,f(x))}{2}$, so
$$d(f(x_n),f(x))=d(x_{n+1},f(x))\ge d(x,f(x))-d(x,x_{n+1})\gt\frac{d(x,f(x))}{2}\gt\frac{d(x,f(x))}{4}\gt d(x_n,x)$$
a contradiction.
Therefore, $f(x)=x$.
To show uniqueness, suppose $f(x)=x$ and $f(y)=y$, then $d(x,y)=d(f(x),f(y))\le cd(x,y)$, so $d(x,y)=0$, implying $x=y$.
$\blacksquare$
Lemma.
Suppose $U$ is an open ball of $R^n$, $f$ is differentiable, and there exists $M\in R$ such that for every $\vb x\in U$, we have $\norm{Df(\vb x)}\le M$.
Then for all $\vb a,\vb b\in U$, $$\norm{f(\vb b)-f(\vb a)}\le M\norm{\vb b-\vb a}$$
(show proof)
Proof.
Let $\vb a,\vb b\in U$. Define $s(t)=(1-t)\vb a+t\vb b$.
If $U$ is an open ball $B_r(\vb p)$, then when $t\in[0,1]$, we have $s(t)\in U$.
This is trivially true if $U$ is $R^n$.
Define $g(t)=f(s(t))$, then when $t\in[0,1]$, $Dg(t)=Df(s(t))Ds(t)=Df(s(t))(\vb b-\vb a)$ since $s(t)$ is the sum of a linear map and a constant map.
And we have $\norm{Dg(t)}\le\norm{Df(s(t))}\norm{\vb b-\vb a}\le M\norm{\vb b-\vb a}$.
Let $\vb z=g(1)-g(0)$ and define $c(t)=\vb z\cdot g(t)$ on $t\in[0,1]$.
Clearly, $c(t)$ is continuous on $[0,1]$ and differentiable in $(0,1)$.
By mean value theorem, there exists $x\in(0,1)$ such that
$$c(1)-c(0)=c'(x)=\vb z\cdot Dg(x)$$
But then we also have $$c(1)-c(0)=\vb z\cdot g(1)-\vb z\cdot g(0)=\vb z\cdot\vb z=\Vert\vb z\Vert^2$$
so by Cauchy-Schwarz inequality,
$$\Vert\vb z\Vert^2=\vb z\cdot Dg(x)\le\Vert\vb z\Vert\norm{Dg(x)}$$
and we have $\norm{f(\vb b)-f(\vb a)}=\norm{g(1)-g(0)}\le\norm{Dg(x)}\le M\norm{\vb b-\vb a}$.
$\blacksquare$
Lemma.
$\frac{1}{x}$ is smooth on $R\setminus\{0\}$.
(show proof)
Proof.
Let $p\in R\setminus\{0\}$.
$$
\lim_{h\to 0}\frac{\frac{1}{p+h}-\frac{1}{p}}{h}
=\lim_{h\to 0}\frac{-1}{p(p+h)}\frac{h}{h}
=\lim_{h\to 0}\frac{-1}{p(p+h)}\lim_{h\to 0}\frac{h}{h}
=\frac{-1}{p^2}
$$
Thus $$\dv{}{x}\frac{1}{x}=\frac{-1}{x^2}$$
Given a function of the form $c\p{\frac{1}{x}}^k$ where $c\in R$ and $k\in N\setminus\{0\}$,
we have
$$
\dv{}{x}c\p{\frac{1}{x}}^k
=ck\p{\frac{1}{x}}^{k-1}\frac{-1}{x^2}
=-ck\p{\frac{1}{x}}^{k+1}
$$
which is again of the form $c\p{\frac{1}{x}}^k$ with $k\neq0$.
By induction, every function of the form $c\p{\frac{1}{x}}^k$ with $k\neq0$ is smooth.
Since $\frac{1}{x}$ is itself of the form $c\p{\frac{1}{x}}^k$ with $k\neq0$, it is smooth.
$\blacksquare$
Lemma.
Let $U\subseteq R^n$ be open.
Suppose we have $a_{ij}:U\to R$ and $b_{ij}:U\to R$ for $i,j\in\{1,\ldots,n\}$,
such that for all $\vb p\in U$, $(a_{ij}(\vb p))_{ij}(b_{ij}(\vb p))_{ij}=I$,
then every $a_{ij}$ is smooth (or $C^k$) if and only if every $b_{ij}$ is smooth (or $C^k$).
(show proof)
Proof.
This follows from the inverse matrix formula $A^{-1}=\frac{1}{\det(A)}\text{adj}(A)$,
the properties of smooth (or $C^k$) functions, and that $\frac{1}{x}$ is smooth.
$\blacksquare$
Inverse function theorem
Suppose $f:U\to R^n$ is smooth (or $C^k$ with $k\gt0$), where $U\subseteq R^n$ is open.
If $\vb p\in U$ and $Df(\vb p)$ is invertible, then there exists an open subset $V$ of $U$
such that
- $\vb p\in V$,
- $f(V)$ is open,
- $f|_V:V\to f(V)$ is a bijection, and
- let $g$ denote ${f|_V}^{-1}$, then $g$ is smooth (or $C^k$) and
$$Dg(\vb y)=Df(g(\vb y))^{-1}$$
for all $\vb y\in f(V)$.
(show proof)
Proof.
Whether $f$ is smooth or $C^k$ with $k\gt0$, it is continuously differentiable.
Let $A=Df(\vb p)$ and $\varepsilon=\frac{1}{2\norm{A^{-1}}}$.
Since $f$ is continuously differentiable at $\vb p$,
there exists an open ball $V\subseteq U$ centered at $\vb p$ such that for all $\vb x\in V$, $\norm{Df(\vb x)-A}\lt\varepsilon$.
For every $\vb y\in f(V)$, define a function
$$\varphi_{\vb y}(\vb x)=\vb x+A^{-1}(\vb y-f(\vb x))$$
on $V$, then
$$\norm{D(\varphi_{\vb y})(\vb x)}=\norm{I-A^{-1}Df(\vb x)}=\norm{A^{-1}(A-Df(\vb x))}\le\norm{A^{-1}}\norm{A-Df(\vb x)}\lt\norm{A^{-1}}\varepsilon=\frac{1}{2}$$
For all $\vb x_1,\vb x_2\in V$, if $f(\vb x_1)=f(\vb x_2)$, let $\vb y=f(\vb x_1)=f(\vb x_2)$, then
$$\norm{\vb x_1-\vb x_2}=\norm{\varphi_{\vb y}(\vb x_1)-\varphi_{\vb y}(\vb x_2)}\le\frac{1}{2}\norm{\vb x_1-\vb x_2}$$
implying $\vb x_1=\vb x_2$. We have shown that $f|_V:V\to R^n$ is injective, thus $f|_V:V\to f(V)$ is bijective.
Let $\vb y^*\in f(V)$. Then there exists $\vb x^*\in V$ such that $f(\vb x^*)=\vb y^*$.
Let $B$ be an open ball centered at $\vb x^*$ with radius $r$ such that $\overline B\subseteq V$.
Let $\vb y\in B_{r\varepsilon}(\vb y^*)$.
then $$\norm{\varphi_{\vb y}(\vb x^*)-\vb x^*}=\norm{A^{-1}(\vb y-\vb y^*)}\lt\norm{A^{-1}}r\varepsilon=\frac{r}{2}$$
When $\vb x\in\overline B$,
$$\norm{\varphi_{\vb y}(\vb x)-\vb x^*}\le\norm{\varphi_{\vb y}(\vb x)-\varphi_{\vb y}(\vb x^*)}+\norm{\varphi_{\vb y}(\vb x^*)-\vb x^*}\lt\frac{1}{2}\norm{\vb x-\vb x^*}+\frac{r}{2}\le r$$
thus $\varphi_{\vb y}(\vb x)\in B$. Hence the restriction of $\varphi_{\vb y}$ on $\overline B$ is a contraction on $\overline B$.
Since every Cauchy sequence converges in $R^n$, every Cauchy sequence of $\overline B$ also converges in $R^n$,
and hence in $\overline B$.
Thus, there exists a unique $\vb x\in\overline B$ such that $\varphi_{\vb y}|_{\overline B}(\vb x)=\vb x$.
For this $\vb x$, $A^{-1}(\vb y-f(\vb x))=\vb0$, thus $\vb y-f(\vb x)=\vb 0$ since $A^{-1}$ is bijective.
Therefore, $\vb y=f(\vb x)\in f(V)$. We have shown that $f(V)$ is open.
Let $\vb y\in f(V)$ and denote $g(\vb y)$ by $\vb x$. Let $r\gt0$ such that $B_r(\vb y)\subseteq f(V)$ and let $\vb k\in B_r(\vb 0)$,
then $\vb y+\vb k\in f(V)$. Let $\vb h=g(\vb y+\vb k)-\vb x$, then $\vb x+\vb h\in V$.
Note that for all $\vb k\in B_r(\vb 0)\setminus\{\vb 0\}$, since $g$ is bijective, $g(\vb y)\neq g(\vb y+\vb k)$, thus $\vb h\neq\vb0$.
Now we have
$$\varphi_{\vb y}(\vb x+\vb h)-\varphi_{\vb y}(\vb x)=\vb h+A^{-1}(f(\vb x)-f(\vb x+\vb h))=\vb h-A^{-1}\vb k$$
Hence $$\norm{\vb h-A^{-1}\vb k}=\norm{\varphi_{\vb y}(\vb x+\vb h)-\varphi_{\vb y}(\vb x)}\le\frac{1}{2}\norm{\vb h}$$
Since $$\norm{\vb h}\le\norm{\vb h-A^{-1}\vb k}+\norm{A^{-1}\vb k}\le\frac{1}{2}\norm{\vb h}+\norm{A^{-1}\vb k}$$
we have $$\norm{\vb h}\le2\norm{A^{-1}\vb k}\le2\norm{A^{-1}}\norm{\vb k}=\frac{\norm{\vb k}}{\varepsilon}$$
For all $\sigma\gt0$, let $\delta=\inf(\sigma\varepsilon,r)$, then for all $\vb k\in B_\delta(\vb0)$,
$\norm{\vb h}\le\frac{\norm{\vb k}}{\varepsilon}\lt\sigma$. Hence $\lim_{\vb k\to\vb 0}\vb h=\vb0$.
Note that, since $\vb x\in V$, $d(Df(\vb x),A)=\norm{Df(\vb x)-A}\lt\frac{1}{2\norm{A^{-1}}}\lt\frac{1}{\norm{A^{-1}}}$,
and by a lemma proven in the "linear algebra" chapter, $Df(\vb x)$ is invertible, and we denote $Df(\vb x)^{-1}$ by $T$.
Since $$g(\vb y+\vb k)-g(\vb y)-T\vb k=\vb h-T\vb k=T(T^{-1}\vb h-\vb k)=-T(f(\vb x+\vb h)-f(\vb x)-Df(\vb x)\vb h)$$
we have $$\frac{\norm{g(\vb y+\vb k)-g(\vb y)-T\vb k}}{\norm{\vb k}}\le\frac{\norm{T}}{\varepsilon}\frac{\norm{f(\vb x+\vb h)-f(\vb x)-Df(\vb x)\vb h}}{\norm{\vb h}}$$
For all $\sigma\gt0$, there exists $\delta\gt0$ with $B_\delta(\vb x)\subseteq V$ such that for all $\vb h^*\in B_\delta(\vb 0)\setminus\{\vb 0\}$,
$\frac{\norm{f(\vb x+\vb h^*)-f(\vb x)-Df(\vb x)\vb h^*}}{\norm{\vb h^*}}\in B_\sigma(0)$,
and for that $\delta$, there exists $\eta\gt0$ with $\eta\le r$ such that for all $\vb k\in B_\eta(\vb 0)\setminus\{\vb 0\}$,
$\vb h\in B_\delta(\vb 0)\setminus\{\vb 0\}$, and hence $\frac{\norm{f(\vb x+\vb h)-f(\vb x)-Df(\vb x)\vb h}}{\norm{\vb h}}\in B_\sigma(0)$.
Therefore, $$\lim_{\vb k\to\vb 0}\frac{\norm{f(\vb x+\vb h)-f(\vb x)-Df(\vb x)\vb h}}{\norm{\vb h}}=0$$
Since $\frac{\norm{T}}{\varepsilon}$ does not depend on $\vb k$,
we have $$\lim_{\vb k\to\vb 0}\frac{\norm{T}}{\varepsilon}\frac{\norm{f(\vb x+\vb h)-f(\vb x)-Df(\vb x)\vb h}}{\norm{\vb h}}=0$$
By squeeze theorem, $$\lim_{\vb k\to\vb 0}\frac{\norm{g(\vb y+\vb k)-g(\vb y)-T\vb k}}{\norm{\vb k}}=0$$
Therefore, $$Dg(\vb y)=T=Df(g(\vb y))^{-1}$$
Since $g$ is differentiable, it is $C^0$.
Suppose $g$ is $C^k$ and $f$ is $C^{k+1}$ where $k\in N$.
Then every entry $D_if_j\circ g$ of $Jf\circ g$ is $C^k$.
Thus every entry $D_ig_j$ of $Jg$ is $C^k$,
implying $g$ is $C^{k+1}$.
By induction, if $f$ is $C^k$ for some non-zero $k$, then $g$ is $C^k$;
if $f$ is smooth, then $g$ is smooth.
$\blacksquare$
Notation.
Given $(a_1,\ldots,a_n)=\vb a\in R^n$ and $(b_1,\ldots,b_m)=\vb b\in R^m$,
denote $(a_1,\ldots,a_n,b_1,\ldots,b_m)\in R^{n+m}$ by $(\vb a,\vb b)$.
Given $A\in R^{m\times (n+m)}$, let $A_x$ denote an $m\times n$ matrix such that ${A_x}_{ij}=A_{i,j}$,
and let $A_y$ denote an $m\times m$ matrix such that ${A_y}_{ij}=A_{i,j+n}$,
then clearly, for all $\vb h\in R^n$ and $\vb k\in R^m$,
$$A(\vb h,\vb k)=A_x\vb h+A_y\vb k$$
Lemma.
If $A\in R^{m\times (n+m)}$ and $A_y$ is invertible, then for every $\vb h\in R^n$,
there exists a unique $\vb k\in R^m$ such that $A(\vb h,\vb k)=\vb0$; namely,
$$\vb k=-(A_y)^{-1}A_x\vb h$$
(show proof)
Proof.
$A(\vb h,\vb k)=\vb0$ if and only if $A_x\vb h+A_y\vb k=\vb0$ if and only if $\vb k=-(A_y)^{-1}A_x\vb h$.
$\blacksquare$
Implicit function theorem
Suppose $f:U\to R^m$ is smooth (or $C^k$ with $k\gt0$), where $U\subseteq R^{n+m}$ is open.
If $\vb a\in R^n$ and $\vb b\in R^m$ such that
- $(\vb a,\vb b)\in U$,
- $f(\vb a,\vb b)=\vb0$, and
- with $Df(\vb a,\vb b)$ denoted $A$, $A_y$ is invertible,
then there exist $V\subseteq R^{n+m}$ and $W\subseteq R^n$ such that
- $(\vb a,\vb b)\in V, \vb a\in W$;
- $V,W$ are open;
- for every $\vb x\in W$, there exists a unique $\vb y\in R^m$ such that $(\vb x,\vb y)\in V$ and $f(\vb x,\vb y)=\vb0$;
- if we define $f^*:W\to R^m$ such that $g(\vb x)$ is the unique $\vb y$ above,
then
- $f^*$ is smooth (or $C^k$);
- $f^*(\vb a)=\vb b$; and
- $Df^*(\vb a)=-(A_y)^{-1}A_x$.
(show proof)
Proof.
Define $F:U\to R^{n+m}$ by $F(\vb x,\vb y)=(\vb x,f(\vb x,\vb y))$.
Then $F$ is smooth (or $C^k$).
For some $r\gt0$, we have $B_r(\vb a,\vb b)\subseteq U$.
Suppose $(\vb h,\vb k)\in B_r(\vb0,\vb0)$, then $(\vb a+\vb h,\vb b+\vb k)\in U$.
Note that $f(\vb a,\vb b)=\vb0$. If we let $r(\vb h,\vb k)=f((\vb a,\vb b)+(\vb h,\vb k))-A(\vb h,\vb k)$, then
$$F((\vb a,\vb b)+(\vb h,\vb k))-F(\vb a,\vb b)=(\vb h,f(\vb a+\vb h,\vb b+\vb k))=(\vb h,A(\vb h,\vb k))+(\vb0,r(\vb h,\vb k))$$
hence
$$\lim_{(\vb h,\vb k)\to\vb0}\frac{\norm{F((\vb a,\vb b)+(\vb h,\vb k))-F(\vb a,\vb b)-(\vb h,A(\vb h,\vb k))}}{\norm{(\vb h,\vb k)}}
=\lim_{(\vb h,\vb k)\to\vb0}\frac{\norm{(\vb0,r(\vb h,\vb k))}}{\norm{(\vb h,\vb k)}}
=\lim_{(\vb h,\vb k)\to\vb0}\frac{\norm{r(\vb h,\vb k)}}{\norm{(\vb h,\vb k)}}
=0$$
by definition of total derivative.
Since $L(\vb h,\vb k)=(\vb h,A(\vb h,\vb k))$ is clearly linear, we have $DF(\vb a,\vb b)=L$.
Note that if $DF(\vb a,\vb b)(\vb h,\vb k)=\vb0$, then $(\vb h,A(\vb h,\vb k))=(\vb0,\vb0)$, so $\vb h=\vb0$ and $A(\vb 0,\vb k)=\vb0$,
implying $\vb k=-(A_y)^{-1}A_x\vb 0=\vb0$. This shows that $DF(\vb a,\vb b)$ is injective, and hence is invertible.
By inverse function theorem, there exists an open subset $V\subseteq U$ such that
$(\vb a,\vb b)\in V$, $F(V)$ is open, and the restriction $F|_V:V\to F(V)$ is bijective. Note that $(\vb a,\vb0)=F(\vb a,\vb b)\in F(V)$.
If we define $W$ to be all $\vb x\in R^n$ such that $(\vb x,\vb0)\in F(V)$, then clearly, $W$ is also open, and $\vb a\in W$.
For every $\vb x\in W$, $(\vb x,\vb0)\in F(V)$, then there exists $(\vb x',\vb y)\in V$ such that $F(\vb x',\vb y)=(\vb x,\vb0)$,
which then implies that $\vb x'=\vb x$ and $f(\vb x,\vb y)=\vb0$. Note that $\vb y\in R^m$ and $(\vb x,\vb y)\in V$.
To show uniqueness, suppose $\vb y'\in R^m$ such that $(\vb x,\vb y')\in V$ and $f(\vb x,\vb y')=\vb0$,
then $F(\vb x,\vb y')=(\vb x,f(\vb x,\vb y'))=(\vb x,f(\vb x,\vb y))=F(\vb x,\vb y)$.
Since $(\vb x,\vb y'),(\vb x,\vb y)\in V$, $F|_V(\vb x,\vb y')=F|_V(\vb x,\vb y)$, and by injectivity of $F|_V$, $(\vb x,\vb y')=(\vb x,\vb y)$, hence $\vb y'=\vb y$.
Now define $f^*:W\to R^m$ such that $f^*(\vb x)$ is the unique $\vb y\in R^m$ with $(\vb x,\vb y)\in V$ and $f(\vb x,\vb y)=\vb0$.
Then for all $\vb x\in W$, $f(\vb x,f^*(\vb x))=\vb0$.
Since $f(\vb a,\vb b)=\vb0$, we have $f^*(\vb a)=\vb b$.
Also by inverse function theorem,
denote ${F|_V}^{-1}:F(V)\to V$ by $G$, then $G$ is smooth (or $C^k$),
and for all $\vb x\in W$, since $F|_V(\vb x,f^*(\vb x))=(\vb x,\vb 0)$, $G(\vb x,\vb 0)=(\vb x,f^*(\vb x))$.
Define $H:W\to V$ by $H(\vb x)=G(\vb x,\vb 0)$, then $H$ is smooth (or $C^k$).
Note that for all $\vb x\in W$, $H(\vb x)=G(\vb x,\vb 0)=(\vb x,f^*(\vb x))$.
Thus $f^*$ is smooth (or $C^k$).
Let $\vb x\in W$, then $DH(\vb x)$ is $(n+m)\times n$ with $DH(\vb x)_{i,j}=I_{ij}$ for all $i\in\{1,\ldots,n\}$
and $DH(\vb x)_{n+i,j}=Df^*(\vb x)_{ij}$ for all $i\in\{1,\ldots,m\}$.
Hence for all $\vb h\in R^n$, $DH(\vb x)\vb h=(\vb h,Df^*(\vb x)\vb h)$.
Since $(f\circ H)(\vb x)=f(\vb x,f^*(\vb x))=\vb 0$ for all $\vb x\in W$,
we have $Df(H(\vb x))DH(\vb x)=D(f\circ H)(\vb x)=0$.
With $Df(H(\vb a))=Df(\vb a,\vb b)=A$,
we have $ADH(\vb a)=0$.
It follows that, for all $\vb h\in R^n$,
$$A_x\vb h+A_yDf^*(\vb a)\vb h=A(\vb h,Df^*(\vb a)\vb h)=ADH(\vb a)\vb h=\vb0$$
implying $Df^*(\vb a)\vb h=-(A_y)^{-1}A_x\vb h$.
Therefore, $$Df^*(\vb a)=-(A_y)^{-1}A_x$$
$\blacksquare$