From proof to simpler proof

Solve \sin(x)=2x for x.

By mere inspection, we have x=0.

Visually, it appears that 0 is the only solution (see Fig.1 or Fig. 2)

Fig. 1

Fig. 2

To show that 0 is the only solution of \sin(x)=2x analytically, let

f(x) = \sin(x)-2x

0 is a solution of \sin(x)=2x means

f(0) = 0\quad\quad\quad(1)

Suppose there is a solution

x^* \neq 0\quad\quad\quad(2)



Since f(x) is an function continuous and differentiable on (-\infty, +\infty),

by Lagrange’s Mean-Value Theorem (see “A Sprint to FTC“), there \exists c \in (0, x^*) such that


\overset{(1), (3)}{\implies} 0 = f'(c) x^*\quad\quad\quad(4)

\overset{(2)}{\implies} f'(c)=0\quad\quad\quad(5).

We know

f'(x) = \cos(x)-2.

From (5), we have




This is not possible since \forall x \in R, -1 \le \cos(x) \le 1.

A simpler alternative without direct applying Lagrange’s Mean-Value Theorem is:

f(x) = \sin(x)-2x \implies f'(x) = \cos(x) - 2 \overset{-1 \le \cos(x) \le 1}{\implies} f'(x) < 0\implies

f(x) =\sin(x)-2x is a strictly decreasing function.

Since f(0) = 0, \forall x<0, f(x) > 0 and \forall x>0, f(x) < 0.

Therefore, 0 is the only solution of f(x)=0. i.e.,

0 is the only solution of \sin(x) = 2x.


Analyzing Heron’s Algorithm

Finding the value of \sqrt{a} for a>0 is equivalent to seeking a positive number x whose square yields a: x^2=a. In other words, the solution to x^2-a=0.

We can find \sqrt{4} by solving x^2-4=0. It is 2 by mere inspection. \sqrt{9} can also be obtained by solving x^2-9=0, again, by mere inspection. But ‘mere inspection’ can only go so far. For Example, what is \sqrt{15241578750190521}?

Method for finding the square root of a positive real number date back to the acient Greek and Babylonian eras. Heron’s algorithm, also known as the Babylonian method, is an algorithm named after the 1^{st} – century Greek mathematician Hero of Alexandria, It proceeds as follows:

  1. Begin with an arbitrary positive starting value x_0.
  2. Let x_{k+1} be the average of x_k and \frac{a}{x_k}
  3. Repeat step 2 until the desired accuracy is achieved.

This post offers an analysis of Heron’s algorithm. Our aim is a better understanding of the algorithm through mathematics.

Let us begin by arbitrarily choosing a number x_0>0. If x_0^2-a=0, then x_0 is \sqrt{a}, and we have guessed the exact value of the square root. Otherwise, we are in one of the following cases:

Case 1: x_0^2-a > 0 \implies x_0^2 > a \implies \sqrt{a} x_0 > a \implies \sqrt{a} > \frac{a}{x_0} \implies \frac{a}{x_0} < \sqrt{a} < x_0

Case 2: x_0^2-a <0 \implies x_0 < \sqrt{a} \implies \sqrt{a} x_0 < a \implies \sqrt{a} < \frac{a}{x_0} \implies x_0 < \sqrt{a} < \frac{a}{x_0}

Both cases indicate that \sqrt{a} lies somewhere between \frac{a}{x_0} and x_0.

Let us define e_0 as the relative error of approximating \sqrt{a} by x_0:

e_0 =\frac{x_0-\sqrt{a}}{\sqrt{a}}

The closer e_0 is to 0, the better x_0 is as an approximation of \sqrt{a}.

Since x_0>0, \sqrt{a}>0,

e_0 + 1  = \frac{x_0-\sqrt{a}}{\sqrt{a}} + 1 = \frac{x_0}{\sqrt{a}} > 0\quad\quad\quad(1)

By (1),

x_0 = \sqrt{a} (e_0+1)\quad\quad\quad(2)

Let x_1 be the mid-point of x_0 and \frac{a}{x_0}:

x_1 = \frac{1}{2} (x_0 + \frac{a}{x_0})\quad\quad\quad(3)

and, e_1 the relative error of x_1 approximating \sqrt{a},

e_1 = \frac{x_1-\sqrt{a}}{\sqrt{a}}\quad\quad\quad(4)

We have

x_1-\sqrt{a}\overset{(3)}{=}\frac{1}{2}(x_0+\frac{a}{x_0}) - \sqrt{a} \overset{(2)}{=}\frac{1}{2}(\sqrt{a}(e_0+1)+\frac{a}{\sqrt{a}(e_0+1)})-\sqrt{a}=\frac{\sqrt{a}}{2}\frac{e_0^2}{e_0+1}\overset{(1)}{>}0\quad(5)

e_1 = \frac{x_1-\sqrt{a}}{\sqrt{a}} \overset{(5)}{=} \frac{\frac{\sqrt{a}}{2}\frac{e_0^2}{e_0+1}}{\sqrt{a}} =  \frac{1}{2}\frac{e_0^2}{e_0+1}\overset{(1)}{ > }0\quad\quad\quad(6)

By (5),

x_1 > \sqrt{a} \implies \sqrt{a} x_1 > a \implies \sqrt{a} > \frac{a}{x_1}\implies \frac{\sqrt{a}}{x_1} < \sqrt{a} < x_1

i.e., \sqrt{a} lies between \frac{a}{x_1} and x_1.

We can generate more values in stages by

x_{k+1} = \frac{1}{2}(x_k + \frac{a}{x_k}), \;k=1, 2, 3, ...\quad\quad(8)


\forall k, x_k >0.


e_k = \frac{x_k-\sqrt{a}}{\sqrt{a}}.

We have

x_k = \sqrt{a}(e_k+1)\quad\quad\quad(9)


e_{k+1} = \frac{x_{k+1}-\sqrt{a}}{\sqrt{a}}=\frac{\frac{1}{2}(\sqrt{a}(e_k+1) + \frac{a}{\sqrt{a}(e_k+1)})-\sqrt{a}}{\sqrt{a}}=\frac{1}{2}\frac{e_k^2}{e_k + 1}\quad\quad\quad(10)

Consequently, we can prove that

\forall k \ge 1, e_{k+1} > 0\quad\quad\quad(11)

by induction:

When k = 1, e_{1+1} \overset{(10)}{=} \frac{1}{2}\frac{e_1^2}{e_1+1} \overset{(6)} {>} 0.

Assume when k = p,

e_{p+1} =\frac{1}{2}\frac{e_p^2}{e_p^2+1} >0\quad\quad\quad(12)

When k = p + 1, e_{(p+1)+1} \overset{(10)}{=}\frac{1}{2}\frac{e_{p+1}^2}{e_{p+1} + 1}\overset{(12)}{>}0.

Since (11) implies

\forall k \ge 2, e_k > 0\quad\quad\quad(13)

(10) and (13) together implies

\forall k \ge 1, e_k > 0\quad\quad\quad(14)

It follows that

\forall k \ge 1,  0 \overset{(14)}{<} e_{k+1} \overset{(10)}{=} \frac{1}{2}\frac{e_k^2}{e_k+1} \overset{(14)}{<} \frac{1}{2}\frac{e_k^2}{e_k}=\frac{1}{2}e_k\quad\quad\quad(15)


\forall k \ge 1, 0 \overset{(14)}{<} e_{k+1} \overset{(10)}{=} \frac{1}{2}\frac{e_k^2}{e_k+1}\overset{(14)}{<}\frac{1}{2}\frac{e_k^2}{1}<e_k^2\quad\quad\quad(16)

(15) indicates that the error is cut at least in half at each stage after the first.

Therefore, we will reach a stage n that

0 < e_n < 1\quad\quad\quad(17)

It can be shown that

\forall k \ge 1, 0< e_{n+k} < e_n^{2^k}\quad\quad\quad(18)

by induction:

When k = 1,

0 \overset{(14)}{<} e_{n+1} \overset{(10)}{=}\frac{1}{2}\frac{e_n^2}{e_n+1} \overset{}{<} e_n^2=e_n^{2^1}\quad\quad\quad(19)

Assume when k = p,

0 < e_{n+p} < e_n^{2^p}\quad\quad\quad(20)

When k = p+1,

0 \overset{(14)}{<} e_{n+(p+1)} = e_{(n+p)+1} \overset{(16)}{<}e_{n+p}^2\overset{(20)}{<}(e_n^{2^p})^2 = e_n^{2^p\cdot 2} = e_n^{2^{p+1}}

From (17) and (18), we see that the error decreases quadratically.

Illustrated below is a script that implements the Heron’s Algorithm:

a   - the positive number whose square root we are seeking
x0  - the initial guess
eps - the desired accuracy
square_root(a, x0, eps) := block(
        if abs(xk^2-a) < eps then return(xk),

Suppose we want to find \sqrt{2} and start with x_0 = 99. Fig. 2 shows the Heron’s algorithm in action.

Fig. 1

Déjà vu!

The stages of a two stage rocket have initial masses m_1 and m_2 respectively and carry a payload of mass P. Both stages have equal structure factors and equal relative exhaust speeds. If the rocket mass, m_1+m_2, is fixed, show that the condition for maximal final speed is

m_2^2 + P m_2 = P m_1.

Find the optimal ratio \frac{m_1}{m_2} when \frac{P}{m_1+m_2} = b.

According to multi-stage rocket’s flight equation (see “Viva Rocketry! Part 2“), the final speed of a two stage rocket is

v = -c \log(1-\frac{e \cdot m_1}{m_1 + m_2 +P})- c\log(1-\frac{e \cdot m_2}{m_2+P})

Let m_0 = m_1 + m_2, we have

m_1 = m_0-m_2


v = -c \log(1-\frac{e \cdot (m_0-m_2)}{m_0 +P})- c\log(1-\frac{e \cdot m_2}{m_2+P})

Differentiate v with respect to m_2 gives

v' = \frac{c(e-1)e(2m_2 P-m_0 P +m_2^2)}{(P+m_2)(P-e m_2+m_2)(P+e m_2-e m_0+m_0)}= \frac{c(e-1)e(2m_2-m_0 P+m_2^2)}{(P+m_2)(P+(1-e)m_2)(P+e m_2 + (1-e)m_0)}\quad\quad\quad(1)

It follows that v'=0 implies

2m_2 P-m_0 P + m_2^2 =0.

That is, 2 m_2 P - (m_1+m_2) P + m_2^2 = (m_2-m_1) P + m_2^2=0. i.e.,

m_2^2 + P m_2 = P m_1\quad\quad\quad(2)

It is the condition for an extreme value of v. Specifically, the condition to attain a maximum (see Exercise-2)

When \frac{P}{m_1+m_2} = b, solving

\begin{cases} (m_2-m_1)P+m_2^2=0\\ \frac{P}{m_1+m_2} = b\end{cases}

yields two pairs:

\begin{cases} m_1=\frac{(\sqrt{b}\sqrt{b+1}+b+1)P}{b}\\m_2= -\frac{\sqrt{b^2+b}P+bP}{b}\end{cases}


\begin{cases} m_1= - \frac{(\sqrt{b}\sqrt{b+1}-b-1)P}{b}\\ m_2=\frac{\sqrt{b^2+b}P-bP}{b}\end{cases}\quad\quad\quad(3)

Only (3) is valid (see Exercise-1)


\frac{m_1}{m_2} = -\frac{(\sqrt{b}\sqrt{b+1}-b-1)P}{\sqrt{b^2+b}P-b P} = \frac{\sqrt{b+1}}{\sqrt{b}} = \sqrt{1+\frac{1}{b}}

The entire process is captured in Fig. 2.

Fig. 2

Exercise-1 Given b>0, 0<e<1, P>0, prove:

  1. - \frac{(\sqrt{b}\sqrt{b+1}-b-1)P}{b}>0
  2. \frac{\sqrt{b^2+b}P-bP}{b}>0

Exercise-2 From (1), prove the extreme value attained under (2) is a maximum.

Boosting rocket flight performance without calculus

Fig. 1

The stages of a two-stage rocket have initial masses m_1 and m_2 respectively and carry a payload of mass P. Both stages have equal structure factors e and equal relative exhaust speed c. The rocket mass, m_1+m_2 is fixed and \frac{P}{m_1+m_2}=b.

According to multi-stage rocket’s flight equation (see “Viva Rocketry! Part 2“), the final speed of a two-stage rocket is

v = -c \log(1-\frac{em_1}{m_1+m_2+P}) - c\log(1-\frac{em_2}{m_2+P})

Let a = \frac{m1}{m_2}, it becomes

v = -c\log(1-\frac{ea}{a+1+b(a+1)})-c \log(1-\frac{e}{1+b(a+1)})

where a>0, b>0, c>0, 0< e < 1. We will maximize v with an appropriate choice of a.

That is, given

v = c\log(\frac{(a+1)(b+1)((a+1)b+1)}{(1-e+(a+1)b)((1-e)a+(a+1)b+1})

where a>0, b>0, c>0, 0<e<1. Maximize v with an appropriate value of a.

The above optimization problem is solved using calculus (see “Viva Rocketry! Part 2“). However, there is an alternative that requires only high school mathematics with the help of a Computer Algebra System (CAS). This non-calculus approach places more emphasis on problem solving through mathematical thinking, as all symbolic calculations are carried out by the CAS (e.g., see Fig. 2). It also makes a range of interesting problems readily tackled with minimum mathematical prerequisites.

The fact that

\log is a monotonic increasing function \implies v_{max} = c\log(w_{max})


w = \frac{(a+1)(b+1)((a+1)b+1)}{(1-e+(a+1)b)((1-e)a+(a+1)b+1)}


w(1-e+(a+1)b)((1-e)a+(a+1)b+1) - (a+1)(b+1)((a+1)b+1)=0\quad\quad\quad(1)

(1) can be written as

A_1 a^2 + B_1 a +C_1= 0


A_1 = -bew+b^2w+bw-b^2-b

B_1 = e^2w-2bew-2ew+2b^2w+3bw+w-2b^2-3b-1,

C_1 = -bew-ew+b^2w+2bw+w-b^2-2b-1.

Since A_1 = 0 means

-b e w + b^2  w + b w - b^2 - b =0.

That is

w = -\frac{b+1}{e-b-1}.


\frac{(a+1)(b+1)((a+1)b+1)}{(1-e+(a+1)b)((1-e)a+(a+1)b+1)} = -\frac{b+1}{e-b-1}

for a gives a = 0 \implies A_1 \neq 0 if a > 0.

Hence, (1) is a quadratic equation. For it to have solution, its discriminant B_1^2-4A_1C_1 must be nonnegative, i.e.,

(e^2-2be-2e+b+1)^2 w^2-2(b+1)(2be^2+e^2-2be-2e+b+1) w +(b+1)^2 \geq 0\quad(2)


(e^2-2be-2e+b+1)^2 w^2-2(b+1)(2be^2+e^2-2be-2e+b+1)w+(b+1)^2 = 0\quad(3)

If e^2-2be-2e+b+1 \neq 0, (3) is a quadratic equation.

Solving (3) yields two solutions

w_1 = -\frac{(b+1)(2\sqrt{b(b+1)}e^2-2be^2-e^2-2\sqrt{b(b+1)}e+2be+2e-b-1)}{(e^2-2be-2e+b+1)^2},


Since 0 < e < 1,

w_1 - w_2 = -\frac{4(b+1)\sqrt{b(b+1)}(e-1)e}{(e^2-2be-2e+b+1)^2} > 0\quad(4)

(4) implies

w_1 > w_2

and, the solution to (2) is

w \leq w_2 or w \ge w_1


w \leq \frac{(b+1)(2\sqrt{b(b+1)}e^2+2be^2+e^2-2\sqrt{b(b+1)}e-2be-2e+b+1)}{(e^2-2be-2e+b+1)^2}\quad\quad\quad(4)


w \ge -\frac{(b+1)(2\sqrt{b(b+1)}e^2-2be^2-e^2-2\sqrt{b(b+1)}e+2be+2e-b-1)}{(e^2-2be-2e+b+1)^2}\quad\quad\quad(5)

We prove that (4) is true by showing (5) is false:

Consider w - w_1= 0:

\frac{(b+1)(e-1) e \cdot f(a)}{(1-e+ab+b)(a(1-e)+ab+b+1)(e^2-2be-2e+b+1)^2} = 0\quad\quad\quad(6)


f(a) = 2a\sqrt{b(b+1)}e^2+a^2be^2+be^2+e^2-2a^2b\sqrt{b(b+1)}

-4ab\sqrt{b(b+1)}e - 2b\sqrt{b(b+1)}e - 4a\sqrt{b(b+1)}e


-2e+2a^2b^2\sqrt{b(b+1)}+4ab^2\sqrt{b(b+1)} + 2b^2\sqrt{b(b+1)} +2a^2b\sqrt{b(b+1)}

+6ab\sqrt{b(b+1)} + 4b\sqrt{b(b+1)} + 2a\sqrt{b(b+1)} + 2\sqrt{b(b+1)}

+2a^2b^3 + 4ab^3 + 2b^3 +3a^2b^2 + 8ab^2 +5b^2+a^2b+4ab+4b+1.

It can be written as

A_2a^2 + B_2a + C_2\quad\quad\quad(7)


A_2 = be^2-2b\sqrt{b(b+1)}e-2b^2e-2be+2b^2\sqrt{b(b+1)}+2b\sqrt{b(b+1)}


B_2 = 2\sqrt{b(b+1)}e^2-4b\sqrt{b(b+1)}e -4\sqrt{b(b+1)}e




Since A_2 > 0 (see Exercise 1) and,

solve (7) for a yields

a = -\sqrt{1+\frac{1}{b}}.

It follows that for a > 0, f(a) > 0.

Consequently, w-w_1 is a negative quantity. i.e.,

w-w_1 <  0

which tells that (5) is false.

Hence, when e^2-2be-2e+b+1 \neq 0, the global maximum w_{max} is w_2.

Solving w = w_2 for a:

\frac{(a+1)(b+1)((a+1)b+1}{(1-e+(a+1)b)((1-e)a+(a+1)b+1)} = \frac{(b+1)(2\sqrt{b(b+1)}e^2+2be^2+e^2-2\sqrt{b(b+1)}e-2be-2e+b+1)}{(e^2-2be-2e+b+1)^2},

we have

a = \sqrt{1+ \frac{1}{b}}.


e^2-2be-2e+b+1 \neq 0 \implies w attains maximum at a = \sqrt{1+ \frac{1}{b}}.

In fact, w attains maxima at a = \sqrt{1+\frac{1}{b}} even when e^2-2be-2e+b+1 = 0, as shown below:

Solving e^2-2be-2e+b+1 = 0 for e, we have

e_1= -\sqrt{b(b+1)}+b+1 or e_2 = \sqrt{b(b+1)} + b + 1.

Only e_1 is valid (see Exercise-2),

When e = e_1,

w(\sqrt{1+\frac{1}{b}} )- w(a) = - \frac{(b+1)g(a)}{4 \sqrt{b(b+1)} (\sqrt{b(b+1)}+ab) (a\sqrt{b(b+1)}+b+1)}\quad(8)


g(a) = (2a^2b+4ab+2b+2a+2)\sqrt{b(b+1)}-2a^2b^2-4ab^2-2b^2-a^2b-4ab-3b-1

Solve quadratic equation g(a) = 0 for a yields

a = \sqrt{1+\frac{1}{b}}.

The coefficient of a^2 in g(a) is 2b\sqrt{b(b+1)}-2b^2-b, a negative quantity (see Exercise-3).

The implication is that g(a) is a negative quantity when a \neq \sqrt{1 + \frac{1}{b}}.

Hence, (8) is a positive quantity, i.e.,

e^2-2be-2e+b+1 = 0, a \neq \sqrt{1+\frac{1}{b}} \implies w(\sqrt{1+\frac{1}{b}})-w(a) > 0

We therefore conclude

\forall 0 < e < 1, b > 0, w attains its maximum at a = \sqrt{1+\frac{1}{b}}.

Fig. 2

Exercise-1 Prove:0<e<1, b>0 \implies

be^2-2b\sqrt{b(b+1)}e-2b^2e-2be+2b^2\sqrt{b(b+1)}+2b\sqrt{b(b+1)}+2b^3+3b^2+b > 0

Exercise-2 Prove: b > 0 \implies 0 <-\sqrt{b(b+1)} + b +1 <1

Exercise-3 Prove: b > 0 \implies 2b\sqrt{b(b+1)}-2b^2-b < 0

Prelude to Taylor’s theorem

As an application of derivative, we may prove the Binomial theorem that concerns the expansion of (1+x)^n as a polynomial. Namely,

(1+x)^n = 1+ a_1 x + a_2 x^2 + ... + a_n x^n\quad\quad\quad(1)

where a_k = \frac{n(n-1)(n-2)...(n-k+1)}{k!}, 1 \leq k \leq n, n \in N.

There are two steps:

Step 1) Prove (1+x)^n can be expressed as a polynomial 1+a_1 x + a_2 x^2 + ... + a_n x^n, i.e.,

(1+x)^n = 1 + \sum\limits_{i=1}^{n}a_i x^i

where a_ks are constants.

Step 2) Show that

a_k = \frac{n(n-1)(n-2)...(n-k+1)}{k!}

We use mathematical induction first.

When n = 1,

(1+x)^1 = 1+x = 1 +a_1 x

where a_1 = 1.

Assume when n=k, (1+x)^kis a polynomial:

(1+x)^k = 1+b_1 x + b_2 x + ... +b_k x^k\quad\quad\quad(2)

When n = k+1,

(1+x)^{k+1} = (1+x)^k (1+x).

By (2), it is

(1+b_1 x+b_2 x^2+ ...+ b_k x^k) (1+x)

= (1+\sum\limits_{i=1}^{k} b_i x^i)(1+x)

= 1+ x + \sum\limits_{i=1}^{k}b_i x^i (1+x)

= 1+x + \sum\limits_{i=1}^{k}(b_i x^i +b_i x^{i+1})

= 1+x +\sum\limits_{i=1}^{k}b_i x^i + \sum\limits_{i=1}^{k} b_i x^{i+1}

= 1+x + (b_1 x + b_2 x^2 +... +b_k x^k) + (b_1 x^2  + ... + b_{k-1} x^k + b_k x^{k+1})

= 1+ (b_1+1)x + (b_2 + b_1) x^2 + ... + (b_k  + b_{k-1}) x^k + b_k x^{k+1}

= 1 +a_1 x + a_2 x^2 + ... + a_k x^k + a_{k+1} x^{k+1}

where a_1 = b_1+1, a_2=b_2+b_1, ..., a_k = b_k + b_{k-1}, a_{k+1} = b_k.

Once (1) is established, we proceed to step 2) to construct a_k:

From (1),

\frac{d^k}{dx^k}(1+x)^n = \frac{d^k}{dx^k}(1+a_1 x + a_2 x^2 + ... + a_k x^k + ... +a_n x^n).

That is

n(n-1)(n-2)...(n-k+1)(1+x)^{n-k} = k(k-1)(k-2)...1 \cdot  {a_k} + \sum\limits_{i=1}^{n-k} c_i x^i\quad\quad\quad(3)

where c_is are constants.

Let x = 0, (3) becomes

n(n-1)(n-2)...(n-k+1) \cdot 1= k(k-1)(k-2)...1 \cdot {a_k} + 0


n(n-1)(n-2)...(n-k+1) = k!\;a_k\quad\quad\quad(4)

Solving (4) for a_k gives

a_k = \frac{n(n-1)(n-2)...(n-k+1)}{k!}.


\frac{n(n-1)(n-2)...(n-k+1)}{k!} = \frac{n(n-1)(n-k+1)\boxed{(n-k)(n-k-1)...1}}{\boxed{(n-k)(n-k-1)...1}\;k!}=\frac{n!}{(n-k)!\;k!},

a_k is often expressed as

a_k = \frac{n!}{(n-k)!\;k!}