Round Two on Finite Difference Approximations of Derivatives

There is another way to obtain the results stated in “Finite Difference Approximations of Derivatives“.

Let y'_i, y''_i denotes y'(x_i) and y''(x_i) respectively and,

h=x_{i+1}-x_i=x_i-x_{i-1} >0.

We define

y'_i = \alpha_1 y_{i-1} + \alpha_2 y_{i+1}\quad\quad\quad(1).

By Taylor’s expansion around x_i,

y_{i-1} \approx y_i + y'_i(-h)+\frac{y''_i}{2!}(-h)^2\quad\quad\quad(1-1)

y_{i+1} \approx y_i + y'_ih+\frac{y''_i}{2!}h^2\quad\quad\quad(1-2)

Substituting (1-1), (1-2) into (1),

y'_i \approx \alpha_1 y_i - \alpha y'_ih +\alpha_1\frac{y''_i}{2}h^2 + \alpha_2 y_i +\alpha_2y'_i h +\alpha_2\frac{y''_i}{2}h^2.

That is,

y'_i \approx (\alpha_1+\alpha_2) y_i +(\alpha_2-\alpha_1)h y'_i + (\alpha_1+\alpha_2)\frac{y''_i}{2}h^2.

It follows that

\begin{cases} \alpha_1+\alpha_2=0 \\ (\alpha_2-\alpha_1)h=1\\ \alpha_1+\alpha_2=0 \end{cases}\quad\quad\quad(1-3)

Fig. 1

Solving (1-3) for \alpha_1, \alpha_2 (see Fig. 1) yields

\alpha_1=-\frac{1}{2h}, \alpha_2=\frac{1}{2h}.


y'_i \approx -\frac{1}{2h} y_{i-1} + \frac{1}{2h} y_{i+1} = \frac{y_{i+1}-y_{i-1}}{2h}


\boxed{y'_i \approx \frac{y_{i+1}-y_i}{2h}}

Now, let

y''_i = \alpha_1y_{i-1}+\alpha_2 y_i + \alpha_3 y_{i+1}.


y_{i-1} \approx  y_i - y'_i h + \frac{y''_i}{2!} h^2


y_{i+1} \approx y_i +y'_i h + \frac{y''_i}{2!}h^2,

we have,

y''_i \approx \alpha_1 y_i - \alpha_1 y'_i h + \frac{\alpha_1}{2}y''_i h^2 + \alpha_2 y_i + \alpha_3 y_i + \alpha_3 y'_i h + \frac{\alpha_3}{2}y''_i h^2.

It leads to

\begin{cases} \alpha_1+\alpha_2+\alpha_3=0\\-\alpha_1+\alpha_3=0\\(\frac{\alpha_1}{2}+\frac{\alpha_3}{2})h^2=1 \end{cases}

Fig. 2

whose solution (see Fig. 2) is

\alpha_1 = \frac{1}{h^2}, \alpha_2 = -\frac{2}{h^2}, \alpha_3=\frac{1}{h^2}.


y''_i \approx \frac{1}{h^2}y_{i-1}-\frac{2}{h^2}y_i +\frac{1}{h^2}y_{i+1}=\frac{y_{i-1}-2y_i+y_{i+1}}{h^2}


\boxed{ y''_i \approx \frac{y_{i+1}-2y_i + y_{i-1}}{h^2}}

A Primer on Mathematical Epidemiology

In Memory of Dr. Li WenLiang (1986-2020)

This post is an introduction to deterministic models of infectious diseases and their Computer Algebra-Aided mathematical analysis.

We assume the followings for the simplistic SI model:

(A1-1) The population N under consideration remains a constant.

(A1-2) The population is divided into two categories: the infectious and the susceptible. Their percentages are denoted by i(t) and s(t) respectively. At t=0, i(0) = i_0.

(A1-3) The infectious’ unit time encounters with other individuals is \lambda. Upon an encounter with the infectious, the susceptible becomes infected.

When a infectious host have \lambda encounters with the population, \lambda\cdot s(t) susceptible become infected. There are N\cdot i(t) infectious in total at time t. It means that within any time interval [t, t+\Delta t], the infectious will increase by N\cdot i(t) \cdot \lambda s(t)\cdot \Delta t. i.e.,

N\cdot i(t+\Delta t) -N\cdot i(t) = N\cdot i(t)\cdot\lambda s(t)\cdot\Delta t.

Cancelling out the N‘s,

\frac{i(t+\Delta t)-i(t)}{\Delta t} = \lambda\cdot i(t) s(t)

and so

\lim\limits_{\Delta t\rightarrow 0}\frac{i(t+\Delta t)-i(t)}{\Delta t} = \lambda i(t) s(t).

That is,

\frac{d i(t)}{dt} = \lambda\cdot  i(t) s(t).

Deduce further from (A1-2) (i+s=1) is that

\begin{cases} \frac{d i}{dt} = \lambda i (1-i) \\ i(0) = i_0 \end{cases}\quad\quad\quad(1-1)

Let’s examine (1-1) qualitatively first.

We see that the SI model has two critical points:

i_1^* = 0, \quad i_2^* = 1.


\forall i \in (0, 1), \frac{d i}{dt} = i(1-i) > 0 \implies \lim\limits_{t \rightarrow \infty} i(t)=1.

This indicates that in the presence of any initial infectious hosts, the entire population will be infected in time. The rate of infection is at its peak when i = 0.5.

The qualitative results can be verified quantitatively by Omega CAS Explorer.

Fig. 1-1

From Fig. 1-1, we see that

i(t) = \frac{1}{1-(1-\frac{1}{i_0})e^{-\lambda t}}.


\lim\limits_{t \rightarrow \infty} i(t) = \lim\limits_{t \rightarrow \infty} \frac{1}{1-(1-\frac{1}{i_0})e^{-\lambda t}} = 1.

Fig. 1-2 confirms that the higher the number of initial infectious hosts(i_0), the sooner the entire population becomes infected (i = 1)

Fig. 1-2

The SI model does not take into consideration any medical practice in combating the spread of infectious disease. It is pessimistic and unrealistic.

An improved model is the SIR model. The assumptions are

(A2-1) See (A1-1)

(A2-2) See (A1-2)

(A2-3) See (A1-3)


(A2-4) Number of individuals recovered from the disease in unit time is \mu. The recovered are without immunity. They can be infected again.

By (A2-1) – (A2-4), the modified model is

N \frac{d i}{dt} = \lambda N s i - \mu N i


\begin{cases} \frac{d i}{dt} = \lambda i \cdot(1-i)-\mu\cdot i \\ i(0) = i_0 \end{cases}


\sigma = \frac{\lambda}{\mu},

we have

\begin{cases} \frac{d i}{dt} = -\lambda i \cdot(i - (1-\frac{1}{\sigma}))\\ i(0) = i_0 \end{cases}\quad\quad\quad(2-1)

The new model has two critical points:

i_1^*=0, \quad i_2^* = 1-\frac{1}{\sigma}.


\frac{d^2i}{dt^2} =\frac{d}{dt}(\frac{di}{dt})=\frac{d}{di}(-\lambda i (i-(1-\frac{1}{\sigma})))\cdot \frac{di}{dt}=-2\lambda (i-\frac{1}{2}(1-\frac{1}{\sigma}))\cdot \frac{di}{dt}


\frac{d^2i}{dt^2} = 2\lambda^2 i\cdot (i-\frac{1}{2}(1-\frac{1}{\sigma}))\cdot (i-(1-\frac{1}{\sigma}))

Without solving (2-1), we extract from it the following qualitative behavior:

Case-1 \sigma > 1

[2-1-1] \lim\limits_{t \rightarrow \infty} i(t) = 1-\frac{1}{\sigma}

[2-1-2] \forall i>1-\frac{1}{\sigma}, \frac{di}{dt} < 0.

[2-1-3] \forall i < 1-\frac{1}{\sigma}, \frac{di}{dt} > 0.

[2-1-4] i > 1-\frac{1}{\sigma}, \frac{d^2i}{dt^2}  > 0.

[2-1-5] \forall (\frac{1}{2}(1-\frac{1}{\sigma}) < i < (1-\frac{1}{\sigma})), \frac{d^2i}{dt^2} < 0.

[2-1-6] \forall(0 < i < \frac{1}{2}(1-\frac{1}{\sigma})), \frac{d^2i}{dt^2} > 0.

Case-2 \sigma < 1

[2-2-1] \lim\limits_{t\rightarrow \infty} i(t) = 0.

[2-2-2] \forall i, \frac{di}{dt} < 0.

[2-2-3] \forall i, \frac{d^2i}{dt^2} >0.

Case-3 \sigma = 1

[2-3-1] \lim\limits_{t\rightarrow \infty} i(t) = 0.

[2-3-2] \forall i, \frac{di}{dt} < 0.

[2-3-3] \forall i, \frac{d^2i}{dt^2} >0.

The cases are illustrated by solving (2-1) analytically using Omega CAS Explorer (see Fig. 2-1,2-2,2-3)

Fig. 2-1 \sigma> 1, i approaches 1-\frac{1}{\sigma} asymptotically.

Fig. 2-2 \sigma < 1, i approaches 0 asymptotically.

Fig. 2-3 \sigma = 1, i approaches 0 asymptotically.

Fig. 2-4 \sigma > 1, i‘s monotonicity depends on i_0.

Fig. 2-4 shows that for \sigma > 1, if \frac{1}{2}(1-\frac{1}{\sigma})< i_0 < 1-\frac{1}{\sigma}, then i increases on a convex curve. Otherwise, \forall i_0 < \frac{1}{2}(1-\frac{1}{\sigma}), i increases on a concave curve first. The curve turns convex after i reaches \frac{1}{2}(1-\frac{1}{\sigma}). However, \forall i_0 > (1-\frac{1}{\sigma}), i monotonically decreases along a concave curve.

Fig. 2-5 \sigma < 1, i monotonically decrease.

Fig 2-5 illustrates the case \sigma >1.

We also have:

Fig. 2-6 \sigma =1, i monotonically decrease.

From these results we may draw the following conclusion:

If \sigma >1, the monotonicity of i(t) depends on the level of i_0. Otherwise (\sigma = \frac{\lambda}{\mu} \le 1), i(t) will decrease and approach to 0 since the rate of recovery from medical treatment \mu is at least on par with the rate of infection \lambda.

This model is only valid for modeling infectious disease with no immunity such as common cold, dysentery. Those who recovered from such disease become the susceptible and can be infected again.

However, for many disease such as smallpox, measles, the recovered is immunized and therefore, falls in a category that is neither infectious nor susceptible. To model this type of disease, a new mathematical model is needed.

Enter the Kermack-McKendrick model of infectious disease with immunity.

There are three assumptions:

(A3-1) The total population N does not change.

(A3-2) Let i(t),  s(t) and r(t) denote the percentage of the infectious, susceptible and recovered respectively. At t=0, i(0) = i_0, s(0) = s_0, r(0) = r_0 = 0. The recovered are the individuals who have been infected and then recovered from the disease. They will not be infected again or transmit the infection to others.

(A3-3) \lambda is the unit time number of encounters with the infectious, \mu the unit time recoveries from the disease.

For the recovered, we have

N\frac{dr}{dt} = \mu N  i.


\begin{cases} \frac{d i(t)}{dt} = \lambda s(t) i(t) - \mu i(t),\;i(0) = i_0\quad\quad\; (3-1-1)\\ \frac{d s(t)}{dt} = -\lambda s(t)i(t),\;s(0) = s_0\quad\quad\quad\quad\;\;(3-1-2) \\ \frac{d r(t)}{dt} = \mu i(t),\;r(0)=r_0=0\quad\quad\quad\quad\;\;\;\;(3-1-3)\\i_0, s_0 \in (1, 0), i_0+s_0=1\quad\quad\quad\quad\quad\;\;\;\;(3-1-4)\end{cases}

This system of differential equations appears to defy any attempts to obtain an analytic solution (i.e., no solution can be expressed in terms of known function).

Numerical treatments for two sets of given \lambda, \mu and (s_0, i_0) are depicted in Fig. 3-1 and Fig. 3-2.

Fig. 3-1 \lambda = 0.25, \mu=0.1, i_0= 0.15, s_0=0.85

Fig. 3-2 \lambda = 0.25, \mu=0.1, i_0=0.75, s_0=0.25

However, it is only the rigorous analysis in general terms gives the correct interpretations and insights into the model.

To this end, we let

\rho = \frac{\mu}{\lambda}.

For \frac{ds}{dt} \ne 0 (i.e., i \ne 0, s \ne 0),

\frac{\frac{di}{dt}}{\frac{ds}{dt}} = \frac{di}{ds} = -1+\rho\frac{1}{s}


\frac{di}{ds} = -1+\rho\frac{1}{s}.

It has the following qualitatives:

[3-1] \forall s < \rho, \frac{di}{ds} > 0 \implies i increases.

[3-2] s = \rho, \frac{di}{ds} = 0\implies i reaches its maximum.

[3-3] \forall s>\rho, \frac{di}{ds} < 0\implies i decreases.

The analytical solution to

\begin{cases} \frac{di}{ds} = -1+\frac{\rho}{s}\\ i(s_0) = i_0\end{cases}\quad\quad\quad(3-2)

(see Fig. 3-3) is

i(s) = \rho\log(\frac{s}{s_0}) -s +i_0+s_0 \overset{[3-1-4]}{=} \rho\log(\frac{s}{s_0}) -s +1\quad\quad\quad(3-3)

Fig. 3-3

Notice that

\forall s, (s, 0) is a critical point of (3-1-1) and (3-1-2).


all points on the s-axis of the s-i phase plane are critical points of (3-1-1) and (3-1-2).

By a theorem of qualitative theory of ordinary differential equations (see Fred Brauer and John Nohel: The Qualitative Theory of Ordinary Differential Equations, p. 192, Lemma 5.2),

\forall t>0, i(t)>0\quad\quad\quad(3-4)

Moreover, let s' = s_0e^{-\frac{1}{\rho}},

i(s')=1-s' + \rho\log(\frac{s'}{s_0}) <1 + \rho\log(\frac{s'}{s_0}) = 1+ \rho\log(\frac{s_0e^{-\frac{1}{\rho}}}{s_0})=1+\rho\cdot(-\frac{1}{\rho})=0


i(s') < 0\quad\quad\quad(3-5)

Since s' = s_0e^{-\frac{1}{\rho}} = \frac{s_0}{e^{\frac{1}{\rho}}} \overset{\frac{1}{\rho}>0\implies e^{\frac{1}{\rho}}>e^0=1}{<} s_0, together, (3-5) and i(s_0) = i_0 >0 implies

\exists s_{\infty} : s' < s_{\infty}<s_0 \ni 1-s_{\infty} + \rho \log(\frac{s_{\infty}}{s_0})=0


\exists s_{\infty} : 0 \overset{s' = s_0e^{-\frac{1}{\rho}}> 0}{<} s_{\infty} < s_0 \ni 1-s_{\infty} + \rho\log(\frac{s_{\infty}}{s_0})=0.

Clearly, (s_{\infty}, 0) is a critical point of (3-1-1) and (3-1-2). Lemma 5.2 thus ensures

\forall t>0, s(t) > s_{\infty} > 0\quad\quad\quad(3-6)

It follows that

\forall t>0, \frac{ds(t)}{dt} = -\lambda i(t)\cdot s(t) \overset{(3-4), (3-6)}{<} 0\quad\quad\quad(3-7)

To the list ([3-1]-[3-3]) , we now add:

[3-4] \forall t_2> t_1>0, s(t_2) \overset{(3-7)}{<} s(t_1)

[3-5] \lim\limits_{t \rightarrow \infty} i(t) = 0,  \lim\limits_{t \rightarrow \infty} s(t) = s_{\infty}

[3-6] \forall t>0, \frac{dr(t)}{dt} = \mu i(t) \overset{(3-4)}{>}0

And so, for all (s_0, i_0) : 0 < s_0 <1, 0<i_0 < 1,

if \rho < s_0 <1 then

\forall  s_+: \rho \le s_+ < s_0, i_+ = i(s_+) \overset{[3-1], [3-4], (3-1-4)}{>} i(s_0)=i_0.


i_{max} \overset{[3-2]}{=} i(\rho) \overset{(3-3)}{=} \rho\log(\frac{\rho}{s_0})-\rho+1\overset{s_0 > \rho \implies \frac{\rho}{s_0} < 1 \implies \log(\frac{\rho}{s_0}) < 0}{<}1\quad\quad\quad(3-8),

we have

\forall s_+: 0 < s_+ \le \rho,  i_+ = i(s_+) \le i_{max} < 1.

Fig. 3-4

If s_0=\rho then,

i_+ = i(s_+ < \rho) \overset{[3-2], [3-4]}{<} i_{max} \overset{(3-8)}{<} 1.

Fig. 3-5

If 0 <s_0 < \rho then

i_+=i(s_+ < s_0) \overset{[3-1], [3-4]}{<} i(s_0)=i_0 < 1.

Fig. 3-6

In fact, for all finite t>0,

\forall s_+ : 0<s_+ <s_0, i_+ = i(s_+) <1.

Thus, the orbits of (3-1-1) and (3-1-2) have the form illustrated in Fig. 3-7.

Fig. 3-7

For example,

Fig. 3-8 \rho=0.3

What we see is that as time t advances, (i(t), s(t)) moves along the curve (3-3) in the direction of decreasing s. Consequently, if s_0 is less than \rho, then i(t) decreases monotonically to 0, and s(t) decreases to s_{\infty}. Therefore, if a small group of infectious i_0 is introduced into the population with the susceptibles s_0, with s_0 < \rho, the disease will die out rapidly. On the other hand, if s_0 is greater than \rho, then i(t) increases as s(t) decreases to \rho. Only after attaining its maximum at s=\rho, i(t) starts to decrease when the number of susceptibles falls below the threshold value \rho.

We therefore conclude:

An epidemic will occur only if the number of susceptibles in a population exceeds the threshold value \rho.

It means a larger \rho is preferred.

To increase \rho = \frac{\mu}{\lambda}, the recovery rate \mu is boosted through adequate medical care and treatment. Meanwhile, \lambda is reduced by quarantine and social distancing.

In addition to increase \rho, we can also decrease s_0 through immunizing the population.

If the number of susceptibles s_0 is initially greater than, but close to the threshold value \rho:

\rho < s_0 \approx \rho\quad\quad\quad(3-9),

and i_0 is very small compared to s_0:

i_0 \ll s_0\quad\quad\quad(3-10),

we can estimate the number of individuals who ultimately contracted the disease.

From (3-1-4), we have


and \lim\limits_{t\rightarrow \infty}(s(t) + i(t) + r(t)) yields

s_{\infty} + 0 + r_{\infty} = 1\quad\quad\quad(3-12)


s_0-s_{\infty} + i_0 - r_{\infty} =0 \overset {(3-10)}{\implies} s_0 - s_{\infty} - r_{\infty} = 0


r_{\infty} = s_0 - s_{\infty}\quad\quad\quad(3-13)

Given (3-9), we deduce from it that

1< \frac{s_0}{\rho} \approx 1 \implies 0 < \frac{s_0}{\rho}-1 \approx 0 \implies 0 <\frac{s_0-\rho}{\rho} \approx 0\implies 0 <\frac{s_0-\rho}{\rho} \ll 1


\rho < s_0 \approx \rho \implies 0<\frac{s_0-\rho}{\rho} \ll 1\quad\quad\quad(3-14)

(3-1-2) / (3-1-3) gives

\frac{\frac{ds}{dt}}{\frac{dr}{dt}}=\frac{ds}{dr} = -\frac{\lambda}{\mu}s=-\frac{1}{\rho}s


\begin{cases} \frac{ds}{dr} =-\frac{1}{\rho}s\\s(r_0=0)=s_0\end{cases}


s = s_0 e^{-\frac{r}{\rho}}\quad\quad\quad(3-15)

After substituting (3-15) in (3-1-3),

\frac{dr}{dt} = \mu(1-r-s) \overset{(3-14)}{=}\mu(1 - r -s_0 e^{-\frac{r}{\rho}})\quad\quad\quad(3-16)

In view of the fact that

\frac{r}{\rho} \overset{[3-6]}{<} \frac{s_0-s_{\infty}}{\rho} \overset{(3-9)}{\approx} \frac{s_0-s_{\infty}}{s_0}<1\quad\quad\quad(3-17),

we approximate the term e^{-\frac{r}{\rho}} in (3-16) with a Taylor expression up to the second order (see Fig. 3-9)

Fig. 3-9

The result is an approximation of equation (3-16):




It can be solved analytically (see Fig. 3-10).

Fig. 3-10

As a result,

\lim\limits_{t \rightarrow \infty} r(t) = 2\rho(1-\frac{\rho}{s_0})\quad\quad\quad(3-18)

It follows from

2\rho(1-\frac{\rho}{s_0}) = 2\rho (\frac{s_0-\rho}{s_0}) =2(s_0-\rho)(\frac{\rho}{s_0-\rho+\rho})=2(s_0-\rho)(\frac{1}{1+\frac{s_0-\rho}{\rho}}) \overset{(3-13)}{\approx} 2(s_0-\rho)

that (3-18) yields

r_{\infty} \approx 2(s_0-\rho)

Namely, the size of the epidemic is roughly 2(s_0-\rho). Consequently, by (3-13),

s_{\infty} = s_0 - 2(s_0-\rho).

The above analysis leads to the following threshold theorem of epidemiology:

(a) An epidemic occurs if and only if s_0 exceeds the threshold \rho.

(b) If \rho<s_0 \approx \rho and i_0 \ll s_0, then after the epidemic, the number of susceptible individuals is reduced by an amount approximately 2(s_0-\rho), namely, s_{\infty} \approx s_0-2(s_0-\rho).

We can also obtain (b) without solving for r(t):

From (3-3), as t\rightarrow \infty,

0 =\rho \log(\frac{s_{\infty}}{s_0})-s_{\infty} + i_0 + s_0

\overset{i_0 \ll s_0}{=} s_0 - s_{\infty} + \rho\log(\frac{s_0-s_0+s_{\infty}}{s_0})

= s_0 -s_{\infty} + \rho\log(\frac{s_0-(s_0-s_{\infty})}{s_0})

= s_0 -s_{\infty} + \rho\log(1-\frac{s_0-s_{\infty}}{s_0})

When s_0-s_{\infty} is small compared to s_0 (see (3-17)), we approximate \log(1-\frac{s_0-s_{\infty}}{s_0}) with a truncated Taylor series after two terms. Namely,

\log(1-\frac{s_0-s_{\infty}}{s_0}) = -\frac{s_0-s_{\infty}}{s_0} - \frac{1}{2}(\frac{s_0-s_{\infty}}{s_0})^2+ ...


0 = s_0-s_{\infty}+ \rho(-\frac{s_0-s_{\infty}}{s_0}-\frac{1}{2}(\frac{s_0-s_{\infty}}{s_0})^2)

=(s_0-s_{\infty})(1-\frac{\rho}{s_0} - \frac{\rho}{2s_0^2}(s_0-s_{\infty}))

Solving for s_0-s_{\infty} yields

s_0-s_{\infty} = 2s_0(\frac{s_0}{\rho}-1)

= 2(s_0-\rho+\rho)(\frac{s_0-\rho}{\rho})

= 2(s_0-\rho)(\frac{s_0-\rho+\rho}{\rho})

= 2(s_0-\rho)(1+\frac{s_0-\rho}{\rho})

\overset{(3-14)}{\approx} 2(s_0-\rho)

Exercise-1 For the Kermack-McKendrick model, show that \forall t>0,

  1. s(t)+i(t)+r(t)=1.
  2. 0<r(t)<1.

Restate Feynman’s “Great Identity”

In a letter dated July 7, 1948, Richard Feynman, a Nobel laureate in physics (1965) made a claim:

I am the possessor of a swanky new scheme to do each problem in terms of one with one less energy denominator. It is based on the great identity \frac{1}{ab} = \int \limits_{0}^{1} \frac{1}{(ax+b(1-x))^2}\;dx.

(The Beat of a Different Drum: The Life and Science of Richard Feynman, by Jagdish Mehra, page 262)

Assuming non-zero constants a, b are both real but otherwise arbitrary, let’s check the validity of Feynman’s “great identity”.

If a \ne b,

\int \frac{1}{(ax+b(1-x))^2}\;dx

= \int \frac{1}{((a-b)x+b)^2}\;dx

=  \int \frac{1}{a-b}\cdot \frac{a-b}{((a-b)x+b)^2}\;dx

= \frac{1}{a-b}\int \frac{((a-b)x+b)'}{((a-b)x+b)^2}\;dx

= \frac{1}{a-b}\cdot \frac{-1}{(a-b)x+b}.

Using Leibniz’s rule,

\int\limits_{0}^{1}\frac{1}{(ax+b(1-x))^2}\;dx = \frac{1}{a-b}\cdot\frac{-1}{(a-b)x+b}\bigg| _{0}^{1} =\frac{1}{a-b}(\frac{-1}{a}-\frac{-1}{b})=\frac{1}{ab}.

When a=b,

\int \frac{1}{(ax+b(1-x))^2}\;dx = \int \frac{1}{b^2}\;dx =\frac{1}{b^2}x


\int \limits_{0}^{1}\frac{1}{(ax+b(1-x))^2}\;dx = \frac{1}{b^2}x\bigg|_{0}^{1} = \frac{1}{b^2}=\frac{1}{ab}.

All is as Feynman claimed:

\int\limits_{0}^{1}\frac{1}{(ax+b(1-x))^2}\;dx = \frac{1}{ab}\quad\quad\quad(\star)

There is something amiss:

If a and b have opposite signs i.e., ab<0 then the right hand side of (\star) is negative. But the integrand is squared so the integral on the left hand side of (\star) is never negative, no matter what a and b may be.

Let’s figure it out !

In its full glory, Leibniz’s rule we used to obtain (\star) is

If the real-valued function F on an open interval I in R has the continuous derivative f and a, b \in I then \int\limits_{a}^{b} f(x)\; dx = F(b)-F(a).

Essentially, the rule requires the integrand f to be a continuous function on an open interval that contains a and b.

Solving ax+b(1-x)=0 for x yields

x = \frac{b}{b-a},

the singularity of integrand \frac{1}{(ax+(1-x))^2} in (\star).

For ab<0, we consider the following two cases:

Case (1-1) (a>0, b<0) \implies (a>0, b<0, b-a<0)\implies (\frac{b}{b-a}>0,

\frac{b}{b-a} = \frac{b-a+a}{b-a} = 1+\frac{a}{b-a}<1)\implies 0<\frac{b}{b-a}<1

Case (1-2) (a<0, b>0) \implies (a<0, b>0, b-a>0) \implies (\frac{b}{b-a}>0,

\frac{b}{b-a} = \frac{b-a+a}{b-a} = 1+\frac{a}{b-a}<1) \implies  0<\frac{b}{b-a}<1

From both cases, we see that

when ab<0, \frac{1}{(ax+b(1-x))^2} has a singularity in (0, 1) \implies \frac{1}{(ax+b(1-x))^2} is not continuous in (0, 1).

Applying Leibniz’s rule to \int\limits_{0}^{1} \frac{1}{(ax+b(1-x))^2}\;dx regardless of integrand’s singularity thus ensured an outcome of absurdity.

However, ab>0 paints a different picture.

We have

Case (2-0) a=b \implies \frac{1}{(ax+b(1-x))^2}=\frac{1}{b^2} \implies no singularity

Case (2-1) (a>b, a>0, b>0) \implies (b-a<0, a>0, b>0) \implies \frac{b}{b-a}<0

Case (2-2) (a<b, a>0, b>0) \implies (b-a>0, a>0, b>0)

\implies \frac{b}{b-a} = \frac{b-a+a}{b-a}=1+\frac{a}{b-a}>1

Case (2-3) (a>b, a<0, b<0) \implies (b-a<0, a<0, b<0) \implies \frac{b}{b-a}=1+\frac{a}{b-a}>1

Case (2-4) (a<b, a<0, b<0) \implies (b-a>0, a<0, b<0) \implies \frac{b}{b-a} <0

All cases show that when ab>0, the integrand has no singularity in (0,1).

It means that \frac{1}{(ax+b(1-x))^2} is continuous in (0, 1) and therefore, Leibniz’s rule applies.

So let’s restate Feynman’s “great identity”:

a\cdot b > 0 \iff \frac{1}{ab} = \int\limits_{0}^{1}\frac{1}{(ax+b(1-x))^2}\;dx

Exercise-1 Evaluate \int\limits_{0}^{1}\frac{1}{(ax+b(1-x))^2}\;dx using Omega CAS Explorer. For example,

(hint : for ab>0, specify a > b or b>a)

Oh! Matryoshka!

Given polynomial f(x) = a_0 + a_1 x+a_2 x^2 + ... + a_{n-1}x^{n-1}+a_n x^n, we wish to evaluate integral

\int \frac{f(x)}{(x-a)^p}\;dx, \quad p \in N^+\quad\quad\quad(1)

When p = 1,

\int \frac{f(x)}{x-a} \;dx= \int \frac{f(x)-f(a)+f(a)}{x-a}\;dx

= \int \frac{f(x)-f(a)}{x-a}\;dx + \int \frac{f(a)}{x-a}\;dx

=\int \frac{f(x)-f(a)}{x-a}\;dx + f(a)\cdot \log(x-a).


f(x) = a_0 + a_1x + a_2x^2 + ... + a_{n-1}x^{n-1} + a_n x^n


f(a) = a_0 + a_1 a + a_2 a^2 + ... + a_{n-1}a^{n-1} + a_n a^n

It follows that

f(x)-f(a) = a_1(x-a) + a_2(x^2-a^2) + ... + a_{n-1}(x^{n-1}-a^{n-1}) + a_n (x^n-a^n).

That is

f(x)-f(a) = \sum\limits_{k=1}^{n}a_k(x^k-a^k)

By the fact (see “Every dog has its day“) that

x^k-a^k =(x-a)\sum\limits_{i=1}^{k}x^{k-i}a^{i-1},

we have

f(x)-f(a) = \sum\limits_{k=1}^{n}a_k(x-a)\sum\limits_{i=1}^{k}x^{k-i}a^{i-1}=(x-a)\sum\limits_{k=1}^{n}(a_k\sum\limits_{i=1}^{k}x^{k-i}a^{i-1})


\frac{f(x)-f(a)}{x-a}=  \sum\limits_{k=1}^{n}(a_k\sum\limits_{i=1}^{k}x^{k-i}a^{i-1})\quad\quad\quad(2)


\int\frac{f(x)}{x-a}\;dx = \int \sum\limits_{k=1}^{n}(a_k \sum\limits_{i=1}^{k}x^{k-i}a^{i-1})\;dx + f(a)\log(x-a)

=\sum\limits_{k=1}^{n}(a_k \sum\limits_{i=1}^{k}\int x^{k-i}a^{i-1}\; dx)+ f(a)\log(x-a)


\int \frac{f(x)}{x-a} = \sum\limits_{k=1}^{n}(a_k\sum\limits_{i=1}^{k}\frac{x^{k-i+1}}{k-i+1}a^{i-1})+ f(a)\log(x-a)

Let us now consider the case when p>1:

\int \frac{f(x)}{(x-a)^p}\; dx

=\int \frac{f(x)-f(a)+f(a)}{(x-a)^p}\;dx

=\int  \frac{f(x)-f(a)}{(x-a)^p} + \frac{f(a)}{(x-a)^p}\;dx

=\int \frac{f(x)-f(a)}{(x-a)}\cdot\frac{1}{(x-a)^{p-1}} + \frac{f(a)}{(x-a)^p}\;dx

= \int \frac{f(x)-f(a)}{x-a}\cdot\frac{1}{(x-a)^{p-1}}\;dx + \int\frac{f(a)}{(x-a)^p}\; dx

\overset{(2)}{=}\int \frac{g(x)}{(x-a)^{p-1}}\;dx + \frac{f(a)(x-a)^{1-p}}{1-p}


g(x) = \frac{f(x)-f(a)}{x-a}=\sum\limits_{k=1}^{n}(a_k\sum\limits_{i=1}^{k}x^{k-i}a^{i-1}), a polynomial of order n-1.

What emerges from the two cases of p is a recursive algorithm for evaluating (1):

Given polynomial f(x) = \sum\limits_{k=0}^{n} a_k x^k,

\int \frac{f(x)}{(x-a)^p} \;dx, \; p \in N^+= \begin{cases}p=1: \sum\limits_{k=1}^{n}(a_k\sum\limits_{i=1}^{k}\frac{x^{k-i+1}}{k-i+1}a^{i-1})+ f(a)\log(x-a) \\p>1:  \int \frac{g(x)}{(x-a)^{p-1}}\;dx + \frac{f(a)(x-a)^{1-p}}{1-p}, \\ \quad\quad\quad g(x) = \frac{f(x)-f(a)}{x-a}=\sum\limits_{k=1}^{n}(a_k\sum\limits_{i=1}^{k}x^{k-i}a^{i-1}). \end{cases}

Exercise-1 Optimize the above recursive algorithm (hint: examine how it handles the case when f(x)=0)

Integration of Trigonometric Expressions

We will introduce an algorithm for obtaining indefinite integrals such as

\int \frac{(1+\sin(x))}{\sin(x)(1+\cos(x))}\;dx

or, in general, integral of the form

\int R(\sin(x), \cos(x))\;dx\quad\quad\quad(1)

where R is any rational function R(p, q), with p=\sin(x), q=\cos(x).


t = \tan(\frac{x}{2})\quad\quad(2)

Solving (2) for x, we have

x = 2\cdot\arctan(t)\quad\quad\quad(3)

which provides

\frac{dx}{dt} = \frac{2}{1+t^2}\quad\quad\quad(4)


\sin(x) =2\sin(\frac{x}{2})\cos(\frac{x}{2})\overset{\cos^(\frac{x}{2})+\sin^2(\frac{x}{2})=1}{=}\frac{2\sin(\frac{x}{2})\cos(\frac{x}{2})}{\cos^2(\frac{x}{2})+\sin^2(\frac{x}{2})}=\frac{2\frac{\sin(\frac{x}{2})}{\cos(\frac{x}{2})}}{1+\frac{\sin^2(\frac{x}{2})}{\cos^2(\frac{x}{2})}}=\frac{2\tan(\frac{x}{2})}{1+\tan^2(\frac{x}{2})}


\sin(x) = \frac{2 t}{1+t^2}\quad\quad\quad(5)


\cos(x) = \cos^2(\frac{x}{2})-\sin^2(\frac{x}{2})=\frac{\cos^2(\frac{x}{2})-\sin^2(\frac{x}{2})}{\cos^2(\frac{x}{2})+\sin^2(\frac{x}{2})}=\frac{1+\frac{\sin^2(\frac{x}{2})}{\cos^2(\frac{x}{2})}}{1+\frac{\sin^2(\frac{x}{2})}{\cos^2(\frac{x}{2})}}=\frac{1-\tan^2(\frac{x}{2})}{1+\tan^2(\frac{x}{2})}



We also have (see “Finding Indefinite Integrals” )

\int f(x)\;dx \overset{x=\phi(t)}{=} \int f(\phi(t))\cdot\frac{d\phi(t)}{dt}\;dt.


\int R(\cos(x), \sin(x))\;dx \overset{(2), (4), (5), (6)}{=} \int R(\frac{1-t^2}{1+t^2}, \frac{2t}{1+t^2})\cdot\frac{2}{1+t^2}\;dt,

and (1) is reduced to an integral of rational functions in t.

Example-1 Evaluate \int \csc(x)\;dx.

Solution: \csc(x) = \frac{1}{\sin(x)}\implies \int \csc(x)\;dx = \int \frac{1}{\sin(x)}\;dx

= \int \frac{1}{\frac{2t}{1+t^2}}\cdot\frac{2}{1+t^2}\;dt=\int\frac{1}{t}\;dt = \log(t) = \log(\tan(\frac{x}{2})).

Example-2 Evaluate \int \sec(x)\;dx.

Solution: \sec(x) = \frac{1}{\cos(x)}\implies \int \sec(x)\; dx =\int \frac{1}{\cos(x)}\;dx

=  \int \frac{1}{\frac{1-t^2}{1+t^2}}\cdot \frac{2}{1+t^2}\; dt=\int \frac{2}{1-t^2}\;dt=\int \frac{2}{(1+t)(1-t)}\;dt=\int \frac{1}{1+t} + \frac{1}{1-t}\;dt

=\int \frac{1}{1+t}\;dt - \int \frac{-1}{1-t}\;dt

=\log(1+t) -\log(1-t) =\log\frac{1+t}{1-t}=\log(\frac{1+\tan(\frac{x}{2})}{1-\tan(\frac{x}{2})}).

According to CAS (see Fig. 1),

Fig. 1

However, the two results are equivalent as a CAS-aided verification (see Fig. 2) confirms their difference is a constant (see Corollary 2 in “Sprint to FTC“).

Fig. 2

Exercise-1 According to CAS,

Show that it is equivalent to the result obtained in Example-1

Exercise-2 Try

\int \frac{1}{\sin(x)+1}\;dx

\int \frac{1}{\sin(x)+\cos(x)}\;dx

\int \frac{1}{(2+\cos(x))\sin(x)}\;dx

\int \frac{1}{5+4\sin(x)}\;dx

\int \frac{1}{2\sin(x)-\cos(x)+5}\;dx

and of course,

\int \frac{1+\sin(x)}{\sin(x)(1+\cos(x))}\;dx

Deriving Lagrange’s multiplier

We wish to consider a special type of optimization problem:

Find the maximum (or minimum) of a function f(x,y) subject to the condition g(x,y)=0\quad\quad(1)

If it is possible to solve g(x)=0 for y so that it is expressed explicitly as y=\psi(x), by substituting y in (1), it becomes

Find the maximum (or minimum) of a single variable function f(x, \psi(x)).

In the case that y can not be obtained from solving g(x,y)=0, we re-state the problem as:

Find the maximum (or minimum) of a single variable function z=f(x,y) where y is a function of x, implicitly defined by g(x, y)=0\quad\quad\quad(2)

Following the traditional procedure of finding the maximum (or minimum) of a single variable function, we differentiate z with respect to x:

\frac{dz}{dx} = f_x(x,y) + f_y(x,y)\cdot \frac{dy}{dx}\quad\quad\quad(3)


g_x(x,y) + g_y(x,y)\cdot \frac{dy}{dx}=0\quad\quad\quad(4)

By grouping g(x, y)=0 and (3), we have

\begin{cases} \frac{dz}{dx}= f_x(x, y)+g_x(x, y)\cdot \frac{dy}{dx}\\ g(x,y) = 0\end{cases}\quad\quad\quad(5)

The fact that \frac{dz}{dx}= 0 at any stationary point x^* means for all (x^*, y^*) where g(x^*, y^*)=0,

\begin{cases} f_x(x^*, y^*)+g_x(x^*, y^*)\cdot \frac{dy}{dx}\vert_{x=x^*}=0 \\ g(x^*,y^*) = 0\end{cases}\quad\quad\quad(6)

If g_y(x^*,y^*) \ne 0 then from (4),

\frac{dy}{dx}\vert_{x=x^*} = \frac{-g_x(x^*, y^*)}{g_y(x^*, y^*)}

Substitute it into (6),

\begin{cases} f_x(x^*, y^*)+f_y(x^*, y^*)\cdot (\frac{-g_x(x^*, y^*)}{g_y(x^*, y^*)})=f_x(x^*, y^*)+g_x(x^*, y^*)\cdot (\frac{-f_y(x^*, y^*)}{g_y(x^*, y^*)})\\ g(x^*,y^*) = 0\end{cases}\quad\quad\quad(7)

Let \lambda = \frac{-f_y(x^*, y^*)}{g_y(x^*, y^*)}, we have

f_y(x^*, y^*) + \lambda g_y(x^*, y^*) =0\quad\quad\quad(8)

Combining (7) and (8) gives

\begin{cases} f_x(x^*, y^*)+\lambda g_x(x^*, y^*) = 0 \\ f_y(x^*, y^*)+\lambda g_y(x^*, y^*)=0 \\ g(x^*, y^*) = 0\end{cases}

It follows that to find the stionary points of z, we solve

\begin{cases} f_x(x, y)+\lambda g_x(x, y) = 0 \\ f_y(x, y)+\lambda g_y(x, y)=0 \\ g(x, y) = 0\end{cases}\quad\quad\quad(9)

for x, y and \lambda.

This is known as the method of Lagrange’s multiplier.

Let F(x,y,\lambda) = f(x,y) + \lambda g(x,y).


F_x(x,y,\lambda) = f_x(x,y) + \lambda g_x(x,y),

F_y(x,y,\lambda)=f_y(x,y) + \lambda  g_y(x,y),

F_{\lambda}(x,y,\lambda) = g(x, y),

(9) is equivalent to

\begin{cases} F_x(x, y, \lambda)=0 \\ F_y(x,y,\lambda)=0 \\ F_{\lambda}(x, y) = 0\end{cases}\quad\quad\quad(10)

Let’s look at some examples.

Example-1 Find the minimum of f(x, y) = x^2+y^2 subject to the condition that x+y=4

Let F(x, y, \lambda) = x^2+y^2+\lambda(x+y-4).


\begin{cases}F_x=2x-\lambda=0  \\ F_y = 2y-\lambda = 0 \\ F_{\lambda} = x+y-4=0\end{cases}

for x, y, \lambda gives x=y=2, \lambda=4.

When x=2, y=2, x^2+y^2=2^2+2^2=8.

\forall  (x, y) \ne (2, 2), x+y=4, we have

(x-2)^2 + (y-2)^2 > 0.

That is,

x^2-4x+4 + y^2-4y+4 = x^2+y^2-4(x+y)+8 \overset{x+y=4}{=} x^2+y^2-16+8>0.


x^2+y^2>8, (x,y) \ne (2,2).

The target function x^2+y^2 with constraint x+y=4 indeed attains its global minimum at (x, y) = (2, 2).

I first encountered this problem during junior high school and solved it:

(x-y)^2 \ge 0 \implies x^2+y^2 \ge 2xy

x+y=4\implies (x+y)^2=16\implies x^2+2xy +y^2=16\implies 2xy=16-(x^2+y^2)

x^2+y^2 \ge 16-(x^2+y^2) \implies x^2+y^2 \ge 8\implies z_{min} = 8.

I solved it again in high school when quadratic equation is discussed:

x+y=4 \implies y =4-x

z=x^2+y^2 \implies z = x^2+(4-x)^2 \implies 2x^2-8x+16-z=0

\Delta =  64-4 \cdot 2\cdot (16-z) \ge 0 \implies z \ge 8\implies z_{min} = 8

In my freshman calculus class, I solved it yet again:

x+y=4 \implies y=4-x

z = x^2+(4-x)^2

\frac{dz}{dx} = 2x+2(4-x)(-1)=2x-8+2x=4x-8

\frac{dz}{dx} =0  \implies x=2

\frac{d^2 z}{dx^2} = 4 > 0 \implies x=2, z_{min}=2^2+(4-2)^2=8

Example-2 Find the shortest distance from the point (1,0) to the parabola y^2=4x.

We minimize f = (x-1)^2+y^2 where y^2=4x.

If we eliminate y^2 in f, then f = (x-1)^2+4x. Solving \frac{df}{dx} = 2x+2=0 gives x=-1, Clearly, this is not valid for it would suggest that y^2=-4 from y^2=4x, an absurdity.

By Lagrange’s method, we solve

\begin{cases} 2(x-1)-4\lambda=0 \\2y\lambda+2y = 0 \\y^2-4x=0\end{cases}

Fig. 1

The only valid solution is x=0, y=0, k=-\frac{1}{2}. At (x, y) = (0, 0), f=(0-1)^2+0^2=1. It is the global minimum:

\forall (x, y) \ne (0, 0), y^2=4x \implies x>0.

(x-1)^2+y^2 \overset{y^2=4x}{=}(x-1)^2+4x=x^2-2x+1+4x=x^2+2x+1\overset{x>0}{>}1=f(0,0)

Example-3 Find the shortest distance from the point (a, b) to the line Ax+By+C=0.

We want find a point (x_0, y_0) on the line Ax+By+C=0 so that the distance between (a, b) and (x_0, y_0) is minimal.

To this end, we minimize (x_0-a)^2+(y_0-b)^2 where Ax_0+By_0+C=0 (see Fig. 2)

Fig. 2

We found that

x_0=\frac{aB^2-bAB-AC}{A^2+B^2}, y_0=\frac{bA^2-aAB-BC}{A^2+B^2}

and the distance between (a, b) and (x_0, y_0) is


To show that (11) is the minimal distance, \forall (x, y) \ne (x_0, y_0), Ax+By+C=0.

Let d_1 = x-x_0, d_2=y-y_0, we have

x = x_0 + d_1, y=y_0 + d_2, d_1 \ne 0, d_2 \ne 0.

Since Ax+By+C=0,


That is


By the fact that Ax_0+By_0+C=0, we have

Ad_1 + Bd_2 =0\quad\quad\quad(12)

Compute (x-a)^2+(y-b)^2 - ((x_0-a)^2+(y_0-b)^2) (see Fig. 3)

Fig. 3



After some rearrangement and factoring, it becomes

\frac{-2C}{A^2+B^2}(Ad_1+Bd_2)+\frac{-2B}{A^2+B^2}(Ad_1+Bd_2)+\frac{-2A}{A^2+b^2}(Ad_1+Bd_2) + d_1^2+d_2^2

By (12), it reduces to

d_1^2 + d_2^2.

This is clearly a positive quantity. Therefore,

\forall (x, y) \ne (x_0, y_0), Ax+By+C=0 \implies (x-a)^2+(y-b)^2> (x_0-1)^2+(y_0-b)^2

Introducing Operator Delta

The r^{th} order finite difference of functionf(x) is defined by

\Delta^r f(x) = \begin{cases} f(x+1)-f(x), r=1\\ \Delta(\Delta^{r-1}f(x)), r > 1\end{cases}

From this definition, we have

\Delta f(x) = \Delta^1 f(x) = f(x+1)-f(x)


\Delta^2 f(x) = \Delta (\Delta^{2-1} f(x))

= \Delta (\Delta f(x))

= \Delta( f(x+1)-f(x))

= (f(x+2)-f(x+1)) - (f(x+1)-f(x))

= f(x+2)-2f(x)+f(x+1)

as well as

\Delta^3 f(x) = \Delta (\Delta^2 f(x))

= \Delta (f(x+2)-2f(x)+f(x+1))

= (f(x+3)-2f(x+1)+f(x+2)) - (f(x+2)-2f(x)+f(x+1))

= f(x+3)-3f(x+2)+3f(x+1)-f(x)

The function shown below generates \Delta^r f(x), r:1\rightarrow 5 (see Fig. 1).

delta_(g, n) := block(

    for i : 2 thru n do (


Fig. 1

Compare to the result of expanding (f(x)-1)^r=\sum\limits_{i=0}^r(-1)^i \binom{r}{i} f(x)^{r-i}, r:1\rightarrow 5 (see Fig. 2)

Fig. 2

It seems that

\Delta^r f(x) = \sum\limits_{i=0}^r(-1)^i \binom{r}{i} f(x+r-i)\quad\quad\quad(1)

Lets prove it!

We have already shown that (1) is true for r= 1, 2, 3.

Assuming (1) is true when r=k-1 \ge 4:

\Delta^{k-1} f(x) = \sum\limits_{i=0}^{k-1}(-1)^i \binom{r}{i} f(x+k-1-i)\quad\quad\quad(2)

When r=k,

\Delta^k f(x) = \Delta(\Delta^{k-1} f(x))

\overset{(2)}{=}\Delta (\sum\limits_{i=0}^{k-1}(-1)^i \binom{k-1}{i}f(x+k-1-i))

=\sum\limits_{i=0}^{k-1}(-1)^i \binom{k-1}{i}f(x+1+k-1-i)-\sum\limits_{i=0}^{k-1}(-1)^i \binom{k-1}{i}f(x+k-1-i)

=(-1)^0 \binom{k-1}{0}f(x+k-0)

+\sum\limits_{i=1}^{k-1}(-1)^i \binom{k-1}{i}f(x+k-i)-\sum\limits_{i=0}^{k-2}(-1)^i \binom{k-1}{i}f(x+k-1-i)


\overset{\binom{k-1}{0} = \binom{k-1}{k-1}=1}{=}

f(x+k)+ \sum\limits_{i=1}^{k-1}(-1)^i \binom{k-1}{i}f(x+k-i)-\sum\limits_{i=0}^{k-2}(-1)^i \binom{k-1}{i}f(x+k-1-i) -(-1)^{k-1}f(x)

=f(x+k)+ \sum\limits_{i=1}^{k-1}(-1)^i \binom{k-1}{i}f(x+k-i)+\sum\limits_{i=0}^{k-2}(-1)^{i+1}\binom{k-1}{i}f(x+k-1-i) -(-1)^{k-1}f(x)

\overset{j=i+1, i:0 \rightarrow k-2\implies j:1 \rightarrow k-1}{=}

f(x+k)+ \sum\limits_{i=1}^{k-1}(-1)^i\binom{k-1}{i}f(x+k-i) + \sum\limits_{j=1}^{k-1}(-1)^j \binom{k-1}{j-1}f(x+k-j)-(-1)^{k-1}f(x)

= f(x+k)+ \sum\limits_{i=1}^{k-1}(-1)^i\binom{k-1}{i}f(x+k-i) + \sum\limits_{i=1}^{k-1}(-1)^i\binom{k-1}{i-1}f(x+k-i)+(-1)^k f(x)

= f(x+k) + \sum\limits_{i=1}^{k-1}(-1)^i f(x+k-i) (\binom{k-1}{i} + \binom{k-1}{i-1})+(-1)^k f(x)

\overset{\binom{k-1}{i} + \binom{k-1}{i-1}=\binom{k}{i}}{=}

f(x+k)+ \sum\limits_{i=1}^{k-1}(-1)^i \binom{k}{i} f(x+k-i)+(-1)^k f(x)

= (-1)^0 \binom{k}{0}f(x+k-0)+\sum\limits_{i=1}^{k-1}(-1)^i \binom{k}{i} f(x+k-i)+(-1)^k \binom{k}{k} f(x+k-k)

= \sum\limits_{i=0}^{k}(-1)^i \binom{k}{i} f(x+k-i)