(by Timothy Sauer) Notes

时间:2021-10-15 14:35:26

<Numerical Analysis>(by Timothy Sauer)  Notes2ed,  by Timothy Sauer

DEFINITION 1.3
A solution is correct within p decimal places if the error is less than 0.5 × 10$^{−p}$ .
-
P29
Bisection Method的优点是计算次数(step)是确定的(interval<精度)。
后面介绍的算法的interval是不确定的, 所以什么时候结束计算呢?不知道。所以定义“stopping criteria’’来决定什么时候结束计算。
-
-
-
DEFINITION 1.4
The real number r is a fixed point of the function g if g(r) = r.
-
Fixed-Point Iteration(abrev. FPI)
-
P32
Can every equation f (x) = 0 be turned into a fixed-point problem g(x) = x?
Yes, and in many different ways. FPI 是否收敛取决于 S=|g'(r)| <1.
-
FPI 只是locally convergent,收敛速度为S
-
0<S<1/2,FPI 收敛速度优于Bisection Method。
S=1/2,    FPI 收敛速度等于Bisection Method。
1/2<S,FPI 收敛速度坏于Bisection Method。
如果S>=1, FPI发散;
-
-
P45
------------------------------- 考虑问题的方向 ---------------------------->
Data that defines problem  --->  Solution process  --->  Solution
Equation   ------------------------>   Equation solver   --->  Solution
-
Backward error( |f (x$_a$ )| )  ~~~~~~~~~~~~~~~~~~~~~~~~Forward error( |r − x$_a$ |)
r:准确值;x$_a$:算法计算出的值
-
Backward error is on the left or input (problem data) side.  
Forward error is the error on the right or output.
-
backward and forward error 和stopping criteria有关。
-
A problem is called sensitive(病态) if small errors in the input lead to large errors in the output, or solution.
-
error magnification factor = (relative forward error)/(relative backward error)
=( Δr/r)/(ε g(r)/g(r))=|g(r)| /|r f‘ (r)|
-
condition number:a measure of error magnification due to the problem(针对的是输入数据(问题?))
stability:that refers to the magnification of small input errors due to the algorithm, not the problem itself(针对的是算法)
-
-
1.4 NEWTON’S METHOD------------------
x$_{i+1}$ = x$_i$ − f(x$_i$)/f'(x$_i$)
-
-
DEFINITION 1.10
 Let e i denote the error after step i of an iterative method. The iteration is quadratically
convergent if
M = lim(i->∞) e$_i+1$ / e$_i^2$ < ∞
-
-
收敛速度优于线性. 实际上是quadratically convergent,但不总是。因为在m重根附近,牛顿法按S=(m-1)/m的速度收敛.
但,如果预先知道某个根,则Modified Newton’s Method又能以quadratically速度收敛。
-
Modified Newton’s Method算法:
x$_{i+1}$ = x$_i$ − f(x$_i$)/f'(x$_i$) * m
-
如果f‘(x i )=0, Newton’s Method不收敛。
-
上面所说的定理,在初始值在根附近时才能保证收敛。
-
牛顿法之所以比FPI和bisection快,是因为它使用了更多的数据(切线)。
对于牛顿法中法线不存在的情况下,可使用Secant Method
Secant Method不使用法线,而使用割线(secant line)
-
-
-
-
-
CH2 Systems of Equations============================================
Gaussian elimination(direct method - give the exact solution within a finite number
of steps. direct method和iterative methods相对)
LU factorization is a matrix representation of Gaussian elimination:
Ax = b  ==》MAx = Mb  ==》 Ux=Mb  ==》 M$^{-1}$Ux = b
令L= M$^{-1}$,得 A = LU
-
用LU方法求解Ax=b的过程分解为:1)求解Lc=b,得到c;2)求解Ux=c,得到x
-
有些矩阵无LU分解.但所有方阵都有PA=LU分解。
高斯消元 的误差来源于:1)ill-conditioning 2)swamping.
前者要尽早识别出来(用矩阵条件数来识别),后者用partial pivoting来消除
-
-
DEFINITION 2.4
给定Ax=b. backward error: ||b − Ax$_a$ || ∞; forward error: ||x − x$_a$ || ∞ .
其中x表示准确解,x$_a$表示结算出来的解。
-
-
向量v 的范数norm(v) = max(|v$_i$|)
方阵A的范数norm(A) = max(absolute row sum)
方阵的条件数cond(A)=||A|| · ||A$^−1$ ||
-
用cond来估计解的精度:
In other words, if cond(A) ≈ 10 k , we should prepare to lose k digits of accuracy in computing x.
In Example 2.11, cond(A) ≈ 4 × 10 4 , so in double precision we should expect about
16 − 4 = 12 correct digits in the solution x.
-
Matlab’s best general-purpose linear equation solver: \
e.g.
>> A = [1 1;1.0001 1]; b=[2;2.0001];
>> xa = A\b
xa =
1.00000000000222
0.99999999999778
-
-
-
2.4.1 Partial pivoting
在j列里找到绝对值最大的元素aij, 然后把行i和pivot row交换。
因为是作行交换操作, 所以Partial pivoting可用一个矩阵(permutation matrix)P表示:PA = LU
-
DEFINITION 2.7
A 【permutation matrix】 is an n × n matrix consisting of all zeros, except for a single 1 in
every row and column.
permutation matrix表示行交换操作
-
求解Ax=b的过程:
在构造LU的过程中得到P,使得PAx = LUx = Pb
用LU方法分解为:1)求解Lc=Pb,得到c;2)求解Ux=c,得到x
-
-
Matlab’s lu command:
>> A=[2 1 5; 4 4 -4; 1 3 1];
>> [L,U,P]=lu(A)
.
.
2.5 ITERATIVE METHODS(direct method和iterative methods相对)
2.5.1 Jacobi Method
DEFINITION 2.9
The n × n matrix A = (aij ) is 【strictly diagonally dominant】 if, for each 1 ≤ i ≤ n, |aii | >\sum_{j!=i} |aij| . In other words, each main diagonal entry dominates its row in the sense that it is greater in magnitude than the sum of magnitudes of the remainder of the entries in its row.
【strictly diagonally dominant】的性质(p112):
1)nonsingular
2)Jacobi and Gauss–Seidel Methods converge
.
Jacobi Method
Jacobi Method是一种FPI:
x$_{k+1}$ = D$^{−1}$ (b − (L + U )*x$_k$ )
.
.
2.5.2 Gauss–Seidel Method
the most recently updated values of the unknowns are used at each step, even if the updating occurs in the current step.
x$_{k+1}$ = D$^{−1}$ (b − U*x$_k$ − L*x$_{k+1}$)
.
在Gauss–Seidel 基础上更进一步的是Successive Over-Relaxation (SOR)
.
.
.既然Direct methods能在有限步得到解,为什么还要研究iterative methods(只能得到近似解)?本根原因是因为Direct methods的算法复杂度是n^3, 太慢了!
1),在Ax=b的解x0已知的情况下,A和b有微小变化时重新计算x。可把x0作为初始值用iterative methods来计算。【polishing】
2)稀疏矩阵。非零元素为O(n). 所以用高斯法代价太大。
.
.
2.6 正定矩阵
Ax=b,A为正定阵,解方程不需要矩阵分解。用conjugate gradient method,这种方法适合大型稀疏矩阵。

DEFINITION 2.12
The n × n matrix A is symmetric if A$^T$ = A. The matrix A is positive-definite if x $^T$ Ax > 0 for all vectors x != 0.

p119
正定阵A,可被分解为R$^T$R. 其中R为上三角矩阵。这是【Cholesky factorization】

正定阵A的性质:
- 是对称阵,
- 特征值>0
-If A is n × n symmetric positive-definite and X is an n × m matrix of full rank with n ≥ m,
then X $^T$ AX is m × m symmetric positive-definite.
-Any principal submatrix of a A is symmetric positive-definite.
.
.
.
DEFINITION 2.13
A 【principal submatrix】 of a square matrix A is a square submatrix whose diagonal entries are diagonal entries of A.
.
.
.2.6.3 Conjugate Gradient Method (解决稀疏阵,最多n步就可完成) P122
.DEFINITION 2.15
Let A be a symmetric positive-definite n × n matrix. For two n-vectors v and w, define the A-【inner product】:(v, w) $_A$ = v$^T$ Aw.
The vectors v and w are A-【conjugate(共轭)】 if (v, w)$_A$ = 0.
.
inner product的性质:对称,线性加和,正定。
.
Conjugate Gradient Method原理(P123):
The method succeeds because each residual is arranged to be orthogonal to all previous residuals. If this can be done, the method runs out of orthogonal directions in which to look, and must reach a zero residual and a correct solution in at most n steps. The key to accomplishing the orthogonality among residuals turns out to be choosing the search directions d k pairwise conjugate. The concept of conjugacy generalizes orthogonality and gives its name to the algorithm.
.
.
Conjugate Gradient Method的计算量为O(n3),比高斯法O(n3/3)更慢。
对于病态矩阵Conjugate Gradient Method性能低于Gaussian elimination with partial piv-
oting。所以在使用Conjugate Gradient Method前要有preconditioning这个步骤,把病态矩阵转换一下。这叫preconditioned Conjugate Gradient Method。
.

The idea of preconditioning is to reduce the effective condition number of the problem. 从而提高收敛速度。
M$^{−1}$ Ax = M$^{−1}$ b,
M:preconditioner
M should be (1) as close to A as possible and (2) simple to invert. 这两方面往往是对立的。
M一个选择是Jacobi preconditioner D, D = diag(A)



.
2.7  NONLINEAR SYSTEMS OF EQUATIONS
在多变量情况下,f‘变成了Jacobian matrix DF(x).其中F (u, v, w) = (f 1 , f 2 , f 3 )
Multivariate Newton’s Method
x$_{k+1}$ = x$_k$ − (DF(x$_k$ )) $^{−1}$ F (x$_k$ )
为了避免求逆,使用一个技巧:P132
.
2.7.2 Broyden’s Method
如果DF不存在,使用Broyden’s Method。为了避免求DF的逆,又发明了Broyden’s Method II.
.
.
==================================================
CH3 Interpolation
多项式插值,spline插值
Lagrange interpolating polynomial是唯一的。
.
Fundamental Theorem of Algebra, a degree d polynomial can have at most d zeros, unless it is the identically zero polynomial.
.
.
3.1.2 Newton’s divided differences
new data points that arrive after computing the original interpolating polynomial can be easily added.而且新添加的点的x值不一定满足递增P143 EXAMPLE3.4

3.1.5 Representing functions by approximating polynomials
多项式插值的一个主要用途是将复杂函数的求值问题用多项式求值代替。 这可看作一种compression:把复杂问题简单化。


3.2.1 Interpolation error formula  P152 用来求插值的误差范围。
3.2.3 Runge phenomenon:均匀插值在插值端点的位置,震荡幅度(插值误差)很大。
解决方法是非均匀插值:Chebyshev interpolation. (THEOREM 3.6)



>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
多项式插值,在所有点上用一个插值函数。所以:
- 如果插值点很多,多项式的度就会高;这是可以理解的。
- 我猜测Runge phenomenon的根源是:1)只有一个插值函数;2)(这唯一的)插值函数两端的插值点处hold不住(因为太远了),导致插值误差很大。
所以样条插值出现了:1)插值函数不唯一(不同点处的插值函数不同)2)由于不再使用同一个插值函数,每一个插值函数所管辖的区间小了,所以插值误差小了,所以多项式的度降低了。
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
.
nature spline :
S''$_1$(x$_1$) = S''$_{n-1}$(x$_n$) = 0
.
Curvature-adjusted cubic spline:
S''$_1$(x$_1$) = S''$_{n-1}$(x$_n$) = c
.
Clamped cubic spline
S'$_1$(x$_1$) = S'$_{n-1}$(x$_n$) = c
.
Parabolically terminated cubic spline:
.
Not-a-knot cubic spline:
S''' 1 (x2 ) = S''' 2 (x2 ) and S''' n−2(n−1) = S''' n−1 (x n−1 )
.
.
.
Bézier curves
不具有cubic splines的下述性质:the smoothness of the first and second derivatives across the knot,
Bézier splines are appropriate for cases where corners (discontinuous first derivatives) and abrupt changes in curvature (discontinuous second derivatives) are occasionally needed.
Bézier curves are building blocks that can be stacked to fit arbitrary function values
and slopes. They are an improvement over cubic splines, in the sense that the slopes at the nodes can be specified as the user wants them. However, this freedom comes at the expense of smoothness: The second derivatives from the two different directions generally disagree at the nodes. In some applications, this disagreement is an advantage.
.
.
.

===============================================================
CH4
4.1.1 Inconsistent systems of equations
A system of equations with no solution is called 【inconsistent】.
.
The residual vector b − Ax~ is perpendicular to the plane {Ax|x ∈ R n }.
为什么要垂直?见P190 figure4.1。  如果A是n维阵,则A可扩展为n+1维阵A‘,满足A'x'=b'.
.
A$^T$ Ax = A$^T$ b
.
The key property of an orthogonal matrix is that it preserves the Euclidean norm of a   vector.
If Q is an orthogonal m × m matrix and x is an m-dimensional vector, then ||Qx||2 = ||x||2 .
.
.
Fitting data by least squares
STEP 1. Choose a model.
STEP 2. Force the model to fit the data
STEP 3. Solve the normal equations.
.
.
Matlab commands:
polyfit
 polyval
.
但有时候A$^T$A的条件数太大,导致结果不对。有一种方法可避免求A$^T$A
.
.
.
4.3   QR FACTORIZATION
LU分解用来解线性方程组,QR分解用来解最小二乘
Gram–Schmidt orthogonalization:QR分解
Q是单位正交阵,R是上三角阵


LEMMA 4.2
If Q is an orthogonal m × m matrix and x is an m-dimensional vector, then ||Qx|| 2 = ||x|| 2 .(正交变换保持向量长度)


Matlab command
>> [Q,R]=qr(A,0)  returns the reduced QR factorization
>> [Q,R]=qr(A)     returns the full QR factorization.


QR分解的应用:
1)解线性方程组,耗时是LU方法的3倍。对于病态矩阵A,QR分解可避免计算A$^T$A
2)解最小二乘
3)计算特征值,apply Householder to put matrices into upper Hessenberg form。见CH12
.
.
Doing calculations with orthogonal matrices is preferable because (1) they are easy to invert by definition,and (2) by Lemma 4.2,they do not magnify errors.

.
.
4.3.2 Modified Gram–Schmidt orthogonalization
为了解决机器计算的准确度,发明了Modified Gram–Schmidt orthogonalization,但依然不理想。
Householder reflectors来做正交化是更稳定的算法。
.
LEMMA 4.3
Assume that x and w are vectors of the same Euclidean length, ||x||2 = ||w||2 . Then w − x and w + x are perpendicular.


A = H1 H2 H3 R = QR    (Note that Hi$^{−1}$ = Hi since Hi is symmetric orthogonal)
.
A projection matrix is a matrix that satisfies P*P = P .

Define the vector v = w − x
P=(v* v T)/(vT *v)
set H = I − 2P
The matrix H is called a Householder reflector(例子4.17)

THEOREM 4.4
Householder reflectors. Let x and w be vectors with ||x||$^2 $= ||w||$^2$ and define v = w − x.
Then H = I − 2(v*v$^T$) /(v$^T$* v) is a symmetric orthogonal matrix and H*x = w.

A is symmetric orthogonal  <==> A = A$^-1$


4.4 Generalized Minimum Residual (GMRES) Method (one of Krylov methods)
Krylov space: the vector space spanned by {r, Ar, . . . , A^k r}, where r = b − Ax 0 is the residual vector of the initial guess
.
A convenient feature of GMRES is that the backward error ||b − Ax$_k$ || 2 decreases
monotonically with k. The reason is clear from the fact that the least squares problem in step k minimizes ||r − Ax$_add$ || 2 for x add in the k-dimensional Krylov space. As GMRES proceeds, the Krylov space is enlarged, so the next approximation cannot do worse.
.
.
4.5    NONLINEAR LEAST SQUARES
4.5.1 Gauss–Newton Method
4.5.2 Models with nonlinear parameters
The matrix Dr is the matrix of partial derivatives of the errors ri with respect to the parameters cj ,
4.5.3 The Levenberg–Marquardt Method
如果矩阵A是病态的,AT A条件数会很大,对于非线性方程组这个情况会更糟。很多本来还不错的算法此时会得到糟糕的Dr矩阵。Levenberg–Marquardt Method能解决这个问题。

=======================================================
CH5
numerical computing
symbolic computing:

Two-point forward-difference formula:
f' (x) = (f (x + h) − f (x))/h - h * f''(c) / 2

Three-point centered-difference formula:
f' (x) = ( f(x+h) − f (x-h) )/(2h) - h^2*f'''(c)/6                           (x-h< c <x+h)        (5.7)

Three-point centered-difference formula for second derivative
f'' (x) = ( f(x+h) − 2 f(x) + f (x-h) )/(h^2) - h^2*f''''(c)/12          (x-h< c <x+h)         (5.8)

误差分析:  p248
The main message is that the three-point centered-difference formula will improve in
accuracy as h is decreased until h becomes about the size of the cube root of machine
epsilon. As h drops below this size, the error may begin increasing again.



Extrapolation for order n formula:
Q ≈ (2^n*F (h/2) − F (h) )/(2^n - 1)
It gives a higher-order approximation of Q than F (h).
用外推法来构造你需要的计算公式。
.
.
5.2.1 Trapezoid Rule  (closed’ Newton–Cotes formulas)
5.2.2 Simpson’s Rule(抛物线)(closed’ Newton–Cotes formulas)
.
open Newton–Cotes formulas用来处理反常积分,方法就是采样--》求值--》加和。
.
5.3    ROMBERG INTEGRATION
5.4    ADAPTIVE QUADRATURE
是否细分采样区间的条件:S [a,b] − (S [a,c] + S [c,b] ) <  3*(S [a,c] + S [c,b])
细分方法为2分法。
5.5    GAUSSIAN QUADRATURE
为了提高误差精度。
using the degree n Legendre polynomial on [−1, 1], has degree of precision 2n − 1.
=================================================================
CH 6 Ordinary Differential Equations
=================================================================
CH 7 Boundary Value Problems
=================================================================
CH 8 Partial Differential Equations
=================================================================
CH 9 Random Numbers and Applications
linear congruential generator
.
Minimal standard random number generator
.
Box–Muller Method
.
Matlab产生随机数的函数:
randn
.
9.2  MONTE CARLO SIMULATION
9.2.2  Quasi-random numbers
The idea of quasi-random numbers is to sacrifice the independence property of ran-
dom numbers when it is not really essential to the problem being solved. Sacrificing
independence means that quasi-random numbers are not only not random, but unlike
pseudo-random numbers, they do not pretend to be random. This sacrifice is made in the hope of faster convergence to the correct value in a Monte Carlo setting. Sequences of quasi-random numbers are designed to be self-avoiding rather than independent. That is, the stream of numbers tries to efficiently fill in the gaps left by previous numbers and to avoid clustering.
.
base-p low-discrepancy sequence比pseudo-random numbers 收敛更快。
随即过程有很多解(instance)
.
为了得到【SDE的解析解】,引入Ito formula:
If y = f (t, x), then
dy = ∂f(t, x)/∂t * dt + ∂f(t, x)/∂x * dx + 1/2* ∂$^2$f(t, x)/∂x$^2$ * dxdx         (9.17)
where the dx dx term is interpreted by using the identities dtdt = 0, dt dBt = dBt dt = 0,
and dBt dBt = dt.
.
【SDE的数值解法】
dt --> △ti
dBt --> △Bi
Each realization of Brownian motion will force a different realization of the solution y(t)
.
It is a surprise that unlike the ODE case where the Euler Method has order 1, the
Euler–Maruyama Method for SDEs has order m = 1/2. To build an order 1 method for
SDEs, another term in the “stochastic Taylor series’’ must be added to the method.
.
Milstein method:(对于dy(t) = f(t, y) dt + g(t, y) dBt)
w i+1 = wi  +f(ti, wi)*ti  +g(ti, wi)*△Bi  +1/2 * g(ti, wi) *∂g(ti, wi)/∂y *((△Bi) ^2 −△ti )
Milstein method has order 1.
Milstein method的缺点是出现了偏微分项,这个必须由用户指定。
解决方法是用数值微分的方法来改写偏微分项(类似在ODE里,RungeKutta方法用Taylor展开来处理偏微分项。)这就是:First-order stochastic Runge–Kutta Method (P460)
.
======================================================
CH10   Trigonometric Interpolation and the FFT
FT 的插值的效率很高
三角函数插值的效率源于正交性。正交基函数(orthogonal basis functions)让插值和拟合变得更加简单准确。
对于DFT,有种方法让计算非常快:FFT
FFT对数字滤波和信号处理很重要。
.
A complex number z is an n-th 【root of unity】 if z^n = 1
An n-th root of unity is called 【primitive】 if it is not a k-th root of unity for any k < n。

LEMMA 10.1  Primitive roots of unity:
Let ω be a primitive nth root of unity and k be an integer. Then
Sum{j=0 -> n-1} ( ω^(jk) )  = n   if k/n is an intteger
Sum{j=0 -> n-1} ( ω^(jk) )  = 0   otherwise
.
.
【Discrete Fourier Transform (DFT)】
yk = Sum{j=0 -> n-1} ( ω^(jk) * xj ) * n^(-1/2)
The n × n matrix in (10.8) is called the 【Fourier matrix】记为Fn
y = Fn * x = 1/sqrt(n) * Mn * x


Fn对称,可逆



FT   ~ O(n*n)
FFT ~ O(n log n)
FFT求逆非常简单:P475, (10.15)

A unitary matrix, like the Fourier matrix, is the complex version of a real orthogonal matrix.


P447
In other words, the Fourier transform Fn transforms data {xj } into interpolation
coefficients.

P480
Conversely, we can turn interpolation coefficients into data points. Instead of evaluating (10.19), just invert the DFT: Multiply the vector of interpolation coefficients {ak + ibk } by Fn^−1 .

a matrix U is orthogonal if U$^−1$ = U$^T$
$\alpha_j$



The Matlab command:
 fft   (未单位化)
所以,Fn*x = fft(x)/sqrt(n)
ifft


THEOREM 10.9 Orthogonal Function Interpolation Theorem.
给定待插值点(t 0 , x 0 ), . . . , (t n−1 , x n−1 ),
构造对称正交阵A,
A =
----------- x = t0  ~ tn-1
|  f0(t0)    t1  ...    tn-1
|  f1
|  fn-1

然后计算y = Ax
则插值函数是:F(t) = SUM{k, 0, n-1} yk*fk(t)
注意,A是方阵。如果A不是方阵,则有下面推广的定理10.11

THEOREM 10.11    Orthogonal Function Least Squares Approximation Theorem.
给定点(t 0 , x 0 ), . . . , (t n−1 , x n−1 ),
y = Ax
则插值函数是:F(t) = SUM{k, 0, n-1} yk*fk(t)
则BLS拟合函数是:F(t) = SUM{k, 0, m-1} yk*fk(t)
BLS:best least squares
(m是基函数的个数,n是采样点的个数)
This is a beautiful and useful fact. It says that, given n data points, to find the best leastsquares trigonometric function with m < n terms fitting the data, it suffices to compute the actual interpolant with n terms and keep only the desired first m terms. In other words, the interpolating coefficients Ax for x degrade as gracefully as possible when terms are dropped from the highest frequencies. Keeping the m lowest terms in the n-term expansion guarantees the best fit possible with m lowest frequency terms. This property reflects the “orthogonality’’ of the basis functions.

P488
We can think of least squares as “filtering out’’ the highest frequency contributions of the order n interpolant and leaving only the lowest m frequency contributions.




P489
10.3.3 Sound, noise, and filtering
We are using the Fourier transform as a way to transfer the information of a signal {x0 , . . . , xn−1 } from the 【time domain】 to the 【frequency domain】 where it is easier to manipulate.
When we finish changing what we want to change, we send the signal back to the time domain by an inverse FFT.



Filtering is used in two ways. It can be used to match the original sound wave as closely as possible with a simpler function.
The second major application of filtering is to remove noise.



The Wiener Filter
By definition, noise is uncorrelated with the signal. In other words, if r is noise, the expected value of the inner product  c$^T$ * r  is zero.

================================================================
CH 11 Compression
FT之所以简单,因为他是正交的。
Discrete Cosine Transform (DCT)。只有实部(没有虚部),正是它有用的原因。
Discrete Fourier Transform (DFT)
Fast Fourier Transform (FFT).
The Discrete Cosine Transform has a representation as a real orthogonal matrix。用作音频,图像压缩。
JPEG format~two-dimensional DCT1
modern audio compression formats(MP3,AAC,WMA) ~ Modified Discrete Cosine Transform (MDCT)

P508
11.2.3 Quantization
The idea of quantization will allow the effects of low-pass filtering to be achieved in a more selective way。Instead of completely ignoring coefficients, we will retain low-accuracy versions of some coefficients at a lower storage cost. This idea exploits the same aspects of the human visual system—that it is less sensitive to higher spatial frequencies.
The main idea is to assign fewer bits to store information about the lower right corner of the transform matrix Y , instead of throwing it away.

Quantization modulo q
Quantization: z = round(y / q)
Dequantization: y' = qz

【quantization error】 = | y - y' |

Q=[ q$_kl$ ]   (quantization matrix)
The entries qkl  will regulate how many bits we assign to each entry of the transform
matrix Y . Replace Y by the compressed matrix
 Y$_Q$ = [ round(y$_kl$ / q$_kl$)  ]      (11.20)    (Y$_Q$是Y被Q压缩后的矩阵)

the larger the entry of Q, the more is potentially lost to quantization.
( q$_kl$越大Y被压缩的越厉害 )

linear quantization:  q kl = 8p(k + l + 1)   ( 0 ≤ k, l ≤ 7)
p: loss parameter

The larger the q$_kl$ , the larger the potential error in reconstructing the image.
On the other hand, the larger the q$_kl$ , the smaller the integer entries of Y$_Q$ , and the fewer bits will be needed to store them.
quantization accomplishes two things:
1)Many small contributions from higher frequencies are immediately set to zero by (11.20),
2) the contributions that remain nonzero are reduced in size, so that they can be transmitted or stored by using fewer bits.
.
.
11.4   MODIFIED DCT AND AUDIO COMPRESSION
因为人的听觉系统对频率很敏感,所以音频压缩和解压带来的瑕疵很容易被人察觉。
所以音频压缩的难点在于如何隐藏这些瑕疵,不被人察觉。
There are four different versions of the DCT that are commonly used.
DCT1~图片压缩
DCT4~音频压缩
.

DEFINITION 11.9
The Discrete Cosine Transform (version 4) (DCT4)
y = Ex, E is nxn matrix

use the DCT4 matrix E to build MDCT M.
.
Lemma 11.10
Denote by c j the j th column of the (extended) DCT4 matrix (11.28). Then
 (a) c j = c −1−j for all integers j (the columns are symmetric around j = − 12 ),
 (b) c j = −c 2n−1−j for all integers j (the columns are antisymmetric around j = n − 1/2 ).
.
For integers j outside that range, the column defined by c j  still corresponds to one of the n columns of DCT4
.
.
DEFINITION 11.11
Let n be an even positive integer. The 【Modified Discrete Cosine Transform (MDCT) 】of x = (x 0 , . . . , x 2n−1 )^T is the n-dimensional vector   y = Mx,
where M is the n × 2n matrix   Mij = ...
.
.
E = [A|B], A,B是 n x n/2 矩阵
M = (B| − BR| − AR| − A)
Mx = Bx1 − BRx2 − ARx3 − Ax4 = ... =E * [−Rx 3 − x 4, x 1 − Rx 2 ]

MDCT是一个n x 2n的矩阵!所以不可逆,但可以通过重叠向量来压缩(two adjacent MDCT’s can have rank 2n in total, and working together, can reconstruct the input x-values perfectly)。
具体如何做,见:
THEOREM 11.12   Inversion of MDCT through overlapping
给定:N = M^T, n维向量u1,u2,u3,
v1=M [u1 u2], v2 = M [u2 u3]
(因为M,N是DCT阵,所以v1,v2是频域向量,所以可以把v1,v2中的(固定频率的)噪音去掉。)
[w1 w2] = N v1, [w3 w4] = N v2
u2可以还原出来:u2 =  (w2 + w3)/2

Note that we cannot recover u 1 and u m ; they should either be unimportant parts of the signal or padding that is added beforehand.

Its advantage is that it allows overlapping of adjacent vectors in an efficient way. The effect is to average contributions from two vectors, reducing artifacts from abrupt  transitions seen at boundaries.


相比图片压缩的quantization,音频压缩有【Bit quantization】


=========================================================
CH 12 Eigenvalues and Singular Values
The singular value decomposition reveals 【the basic structure of a matrix】 and is heavily used in statistical applications to find relations between data.


There is no direct method for computing eigenvalues. 附录给出的方法(实际上也是大学线数里学到的那些)只适用于2x2矩阵。对于更大的矩阵,需要用求根的方法(迭代法)。


This section introduces methods based on multiplying high powers of the matrix times a vector, which usually will turn into an eigenvector as the power is raised.
12.1.1 Power Iteration(也称作Power Method)

The motivation behind Power Iteration is that multiplication by a matrix tends to move
vectors toward the dominant eigenvector direction.

Power Iteration的几何解释:the unit sphere is gets squashed more and more until it ends up looking like a thin spindle along the direction of the largest eigenvector。(见《Discrete.Differential.Geometry-An.Applied.Introduction(sig2013).pdf》)

【dominant eigenvalue】
【dominant eigenvector】
通过PI能得到特征向量,但是如何得到特征值呢?用Rayleigh quotient:
λ = x^T A x / (x^T x)   其中x是特征向量


12.1.2 Convergence of Power Iteration
THEOREM 12.2
Let A be an m × m matrix with real eigenvalues λ1 , . . . , λm satisfying |λ1 | > |λ2 | ≥ |λ3 | ≥· · · ≥ |λm |. Assume that the eigenvectors of A span R^m . For 【almost every】 initial vector, Power Iteration converges linearly to an eigenvector associated to λ 1 with convergence rate constant S = |λ2 / λ1|
通过PI求主特征值,通过IversePI求最小特征值。其他特征值如何求?假定知道某个特征值在b附近,那么求A - bI的特征值c, 那么c+b则是A的特征值。

Inverse Power Iteration converges linearly,
Rayleigh Quotient Iteration converges quadratically nd will converge cubically if the matrix is symmetric.


Power Iteration is limited to locating the eigenvalue of largest magnitude (absolute value).
If Power Iteration is applied to the inverse of the matrix, the smallest eigenvalue can be
found.

以上方法是每次求得一个特征值的。

p539
12.2 QR ALGORITHM
The goal of this section is to develop methods for finding all eigenvalues at once.

p541
THEOREM 12.4
Assume that A is a symmetric m×m matrix with eigenvalues λi satisfying |λ1| > |λ2| > ··· > |λm|. The unshifted QR algorithm converges linearly to the eigenvectors and eigenvalues of A.As j → ∞, Aj converges to a diagonal matrix containing the eigenvalues on the main diagonal and Qj = Q1 ···Qj converges to an orthogonal matrix whose columns are the eigenvectors.
若A是mxm对称阵,则AQ0Q1Q2 ...Qn =  QnRn = An-1     (unshifted QR algorithm)
当m-->无穷大时,An-1主对角线上是特征值。Q0Q1Q2 ...Qn的列向量为特征向量。
...
At the j-th step, the columns of Qj are approximations to the eigenvectors of A, and the diagonal elements r$^j$$_11$ , . . . , r$^j$$_mm$ are approximations to the eigenvalues.
the 【unshifted QR algorithm】 does the same calculations as 【Normalized Simultaneous Iteration】
QR分解的目的是找到一个相似阵,这个阵的特征值很容易求得。

Schur factorization能够很容易求得特征值和特征向量

The full QR algorithm iteratively moves an arbitrary matrix A toward its Schur factorization by a series of similarity transformations.(Aj是相似的)

相似阵有相同的特征值。

12.2.2 Real Schur form and the QR algorithm
DEFINITION 12.5
A matrix T has real Schur form if it is upper triangular, except possibly for 2 × 2 blocks on the main diagonal.


每个实方阵都和Schur form相似

p543
Finally, to allow for the calculation of complex eigenvalues, we must allow for the existence of 2 × 2 blocks on the diagonal of the real Schur form.
.
.
.
Efficiency of the QR algorithm increases considerably if we first put A into 【upper Hessenberg form】.

DEFINITION 12.7
The m × n matrix A is in upper Hessenberg form if a ij = 0 for i > j + 1.

THEOREM 12.8 方阵A可以通过相似变换Q变成upper Hessenberg form B

通过Householder reflectors来构造B

In fact, there is no finite algorithm that computes a similarity transformation between an arbitrary matrix and an upper triangular matrix.

Thus, we finally have a complete method for finding all eigenvalues of an arbitrary square matrix A. The matrix is first put into upper Hessenberg form with the use of a
similarity transformation (Program 12.8), and then the shifted QR algorithm is applied (Program 12.7).


Matlab
eig这个命令就是基于这种方法来计算特征值的


p544
Matrices like this one, with a repeated complex eigenvalue, may not be moved into real Schur form by shifted QR. The extra assistance needed for these more difficult examples is to replace A by a similar matrix in upper Hessenberg form, which is the focus of the next section.

对称阵的特征向量是正交的,特征值是实数

12.3  SINGULAR VALUE DECOMPOSITION
singular value decomposition (SVD)
A = U S V^T  (A: mxn; U: mxm; V: nxn; S: mxn; U,V是正交阵)
.
.
LEMMA 12.10    Let A be an m × n matrix. The eigenvalues of A T A are nonnegative.

The SVD is not unique for a given matrix A

Matlab command:
>>[u,s,v]=svd(a)
..
.
对称阵的svd:


SVD应用:
the SVD turns out to be the best means of finding the rank of a matrix.
SVD性质:
1) The rank of the matrix A = U SV^T is the number of nonzero entries in S.
2) If A is an n × n matrix, | det(A)| = s1 · · · sn .
3) If A is an invertible m × m matrix, then A^(−1) = V S^(−1) U^T .
4) The m × n matrix A can be written as the sum of rank-one matrices:
A = SUM{i,1,r} si*ui*vi^T
Property 4 is the low rank approximation property of the SVD. The best least squares
approximation to A of rank p ≤ r is provided by retaining the first p terms of (12.31).
This is the main idea behind the dimension reduction and compression applications of the SVD.

12.4.2 Dimension reduction (基于Property 4)
The space <u 1 , . . . , u p > spanned by the left singular vectors u 1 , . . . , u p is the best-approximating dimension-p subspace to a 1 , . . . , a n in the sense of least squares, and the orthogonal projections of the columns a i of A into this space are the columns of A p . In other words, the projection of a collection of vectors a 1 , . . . , a n to their best least squares p-dimensional subspace is precisely the best rank-p approximation matrix Ap .   见EXAMPLE 12.9

12.4.3 Compression (基于Property 4)
12.4.4 Calculating the SVD
不推荐通过构造A T A的方式来计算SVD。因为如果A的条件数大,则AA^T的条件数更大。
所以,推荐使用:
B =
 0   A^T
 A   0
把B变成upper Hessenberg形式。
=======================================================
CH13
Newton’s Method 需要计算 Hessian(2D容易,高维难).
所以,高维的情况需要用Steepest Descent和Conjugate Gradient Search,因为这两个方法只需要计算gradient就可以了。

RealityCheck 13
Molecular Conformation and Numerical Optimization
One current approach to predicting the conformations of the proteins is to find the
minimum potential energy of the total configuration of amino acids.