文件名称:Matrix Differential Calculus with Applications in Statistics and Econometrics
文件大小:1.68MB
文件格式:PDF
更新时间:2014-01-06 07:07:55
矩阵微分
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Part One — Matrices 1 Basic properties of vectors and matrices3 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 2Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 3Matrices: addition and multiplication . . . . . . . . . . . . . . .4 4The transpose of a matrix . . . . . . . . . . . . . . . . . . . . .6 5Square matrices . . . . . . . . . . . . . . . . . . . . . . . . . . .6 6Linear forms and quadratic forms . . . . . . . . . . . . . . . . .7 7The rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . .8 8The inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 9The determinant . . . . . . . . . . . . . . . . . . . . . . . . . . 10 10 The trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 Partitioned matrices . . . . . . . . . . . . . . . . . . . . . . . . 11 12 Complex matrices . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 14 14 Schur’s decomposition theorem . . . . . . . . . . . . . . . . . . 17 15 The Jordan decomposition . . . . . . . . . . . . . . . . . . . . . 18 16 The singular-value decomposition . . . . . . . . . . . . . . . . . 19 17 Further results concerning eigenvalues . . . . . . . . . . . . . . 20 18 Positive (semi)definite matrices . . . . . . . . . . . . . . . . . . 23 19 Three further results for positive definite matrices . . . . . . . 25 20 A useful result . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2 Kronecker products, the vec operator and the Moore-Penrose inverse 31 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2The Kronecker product . . . . . . . . . . . . . . . . . . . . . . 31 3Eigenvalues of a Kronecker product . . . . . . . . . . . . . . . . 33 4The vec operator . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5The Moore-Penrose (MP) inverse . . . . . . . . . . . . . . . . . 36 6Existence and uniqueness of the MP inverse . . . . . . . . . . . 37viContents 7Some properties of the MP inverse . . . . . . . . . . . . . . . . 38 8Further properties . . . . . . . . . . . . . . . . . . . . . . . . . 39 9The solution of linear equation systems . . . . . . . . . . . . . 41 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3 Miscellaneous matrix results47 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2The adjoint matrix . . . . . . . . . . . . . . . . . . . . . . . . . 47 3Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . 49 4Bordered determinants . . . . . . . . . . . . . . . . . . . . . . . 51 5The matrix equation AX = 0 . . . . . . . . . . . . . . . . . . . 51 6The Hadamard product . . . . . . . . . . . . . . . . . . . . . . 53 7The commutation matrix Kmn. . . . . . . . . . . . . . . . . . 54 8The duplication matrix Dn. . . . . . . . . . . . . . . . . . . . 56 9Relationship between Dn+1and Dn, I . . . . . . . . . . . . . . 58 10 Relationship between Dn+1and Dn, II . . . . . . . . . . . . . . 60 11 Conditions for a quadratic form to be positive (negative) sub- ject to linear constraints . . . . . . . . . . . . . . . . . . . . . . 61 12 Necessary and sufficient conditions for r(A : B) = r(A) + r(B)64 13 The bordered Gramian matrix . . . . . . . . . . . . . . . . . . 66 14 The equations X1A + X2B′= G1,X1B = G2. . . . . . . . . . 68 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Part Two — Differentials: the theory 4 Mathematical preliminaries75 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 2Interior points and accumulation points . . . . . . . . . . . . . 75 3Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . 76 4The Bolzano-Weierstrass theorem . . . . . . . . . . . . . . . . . 79 5Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6The limit of a function . . . . . . . . . . . . . . . . . . . . . . . 81 7Continuous functions and compactness . . . . . . . . . . . . . . 82 8Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 9Convex and concave functions . . . . . . . . . . . . . . . . . . . 85 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5 Differentials and differentiability89 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 2Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3Differentiability and linear approximation . . . . . . . . . . . . 91 4The differential of a vector function . . . . . . . . . . . . . . . . 93 5Uniqueness of the differential . . . . . . . . . . . . . . . . . . . 95 6Continuity of differentiable functions . . . . . . . . . . . . . . . 96 7Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 97Contentsvii 8The first identification theorem . . . . . . . . . . . . . . . . . . 98 9Existence of the differential, I . . . . . . . . . . . . . . . . . . . 99 10 Existence of the differential, II . . . . . . . . . . . . . . . . . . 101 11 Continuous differentiability . . . . . . . . . . . . . . . . . . . . 103 12 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 13 Cauchy invariance . . . . . . . . . . . . . . . . . . . . . . . . . 105 14 The mean-value theorem for real-valued functions . . . . . . . . 106 15 Matrix functions . . . . . . . . . . . . . . . . . . . . . . . . . . 107 16 Some remarks on notation . . . . . . . . . . . . . . . . . . . . . 109 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6 The second differential113 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 2Second-order partial derivatives . . . . . . . . . . . . . . . . . . 113 3The Hessian matrix . . . . . . . . . . . . . . . . . . . . . . . . . 114 4Twice differentiability and second-order approximation, I . . . 115 5Definition of twice differentiability . . . . . . . . . . . . . . . . 116 6The second differential . . . . . . . . . . . . . . . . . . . . . . . 118 7(Column) symmetry of the Hessian matrix . . . . . . . . . . . . 120 8The second identification theorem . . . . . . . . . . . . . . . . 122 9Twice differentiability and second-order approximation, II . . . 123 10 Chain rule for Hessian matrices . . . . . . . . . . . . . . . . . . 125 11 The analogue for second differentials . . . . . . . . . . . . . . . 126 12 Taylor’s theorem for real-valued functions . . . . . . . . . . . . 128 13 Higher-order differentials . . . . . . . . . . . . . . . . . . . . . . 129 14 Matrix functions . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7 Static optimization133 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 2Unconstrained optimization . . . . . . . . . . . . . . . . . . . . 134 3The existence of absolute extrema . . . . . . . . . . . . . . . . 135 4Necessary conditions for a local minimum . . . . . . . . . . . . 137 5Sufficient conditions for a local minimum: first-derivative test . 138 6Sufficient conditions for a local minimum: second-derivative test140 7Characterization of differentiable convex functions . . . . . . . 142 8Characterization of twice differentiable convex functions . . . . 145 9Sufficient conditions for an absolute minimum . . . . . . . . . . 147 10 Monotonic transformations . . . . . . . . . . . . . . . . . . . . 147 11 Optimization subject to constraints . . . . . . . . . . . . . . . . 148 12 Necessary conditions for a local minimum under constraints . . 149 13 Sufficient conditions for a local minimum under constraints . . 154 14 Sufficient conditions for an absolute minimum under constraints158 15 A note on constraints in matrix form . . . . . . . . . . . . . . . 159 16 Economic interpretation of Lagrange multipliers . . . . . . . . . 160 Appendix: the implicit function theorem . . . . . . . . . . . . . . . . 162viiiContents Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Part Three — Differentials: the practice 8 Some important differentials167 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 2Fundamental rules of differential calculus . . . . . . . . . . . . 167 3The differential of a determinant . . . . . . . . . . . . . . . . . 169 4The differential of an inverse . . . . . . . . . . . . . . . . . . . 171 5Differential of the Moore-Penrose inverse . . . . . . . . . . . . . 172 6The differential of the adjoint matrix . . . . . . . . . . . . . . . 175 7On differentiating eigenvalues and eigenvectors . . . . . . . . . 177 8The differential of eigenvalues and eigenvectors: symmetric case 179 9The differential of eigenvalues and eigenvectors: complex case . 182 10 Two alternative expressions for dλ . . . . . . . . . . . . . . . . 185 11 Second differential of the eigenvalue function . . . . . . . . . . 188 12 Multiple eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 189 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 9 First-order differentials and Jacobian matrices193 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 2Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 3Bad notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 4Good notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 5Identification of Jacobian matrices . . . . . . . . . . . . . . . . 198 6The first identification table . . . . . . . . . . . . . . . . . . . . 198 7Partitioning of the derivative . . . . . . . . . . . . . . . . . . . 199 8Scalar functions of a vector . . . . . . . . . . . . . . . . . . . . 200 9Scalar functions of a matrix, I: trace . . . . . . . . . . . . . . . 200 10 Scalar functions of a matrix, II: determinant . . . . . . . . . . . 202 11 Scalar functions of a matrix, III: eigenvalue . . . . . . . . . . . 204 12 Two examples of vector functions . . . . . . . . . . . . . . . . . 204 13 Matrix functions . . . . . . . . . . . . . . . . . . . . . . . . . . 205 14 Kronecker products . . . . . . . . . . . . . . . . . . . . . . . . . 208 15 Some other problems . . . . . . . . . . . . . . . . . . . . . . . . 210 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 10 Second-order differentials and Hessian matrices213 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 2The Hessian matrix of a matrix function . . . . . . . . . . . . . 213 3Identification of Hessian matrices . . . . . . . . . . . . . . . . . 214 4The second identification table . . . . . . . . . . . . . . . . . . 215 5An explicit formula for the Hessian matrix . . . . . . . . . . . . 217 6Scalar functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 7Vector functions . . . . . . . . . . . . . . . . . . . . . . . . . . 219 8Matrix functions, I . . . . . . . . . . . . . . . . . . . . . . . . . 220Contentsix 9Matrix functions, II . . . . . . . . . . . . . . . . . . . . . . . . 221 Part Four — Inequalities 11 Inequalities225 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 2The Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . 225 3Matrix analogues of the Cauchy-Schwarz inequality . . . . . . . 227 4The theorem of the arithmetic and geometric means . . . . . . 228 5The Rayleigh quotient . . . . . . . . . . . . . . . . . . . . . . . 230 6Concavity of λ1, convexity of λn. . . . . . . . . . . . . . . . . 231 7Variational description of eigenvalues . . . . . . . . . . . . . . . 232 8Fischer’s min-max theorem . . . . . . . . . . . . . . . . . . . . 233 9Monotonicity of the eigenvalues . . . . . . . . . . . . . . . . . . 235 10 The Poincar´e separation theorem . . . . . . . . . . . . . . . . . 236 11 Two corollaries of Poincar´e’s theorem . . . . . . . . . . . . . . 237 12 Further consequences of the Poincar´e theorem . . . . . . . . . . 238 13 Multiplicative version . . . . . . . . . . . . . . . . . . . . . . . 239 14 The maximum of a bilinear form . . . . . . . . . . . . . . . . . 241 15 Hadamard’s inequality . . . . . . . . . . . . . . . . . . . . . . . 242 16 An interlude: Karamata’s inequality . . . . . . . . . . . . . . . 243 17 Karamata’s inequality applied to eigenvalues . . . . . . . . . . 245 18 An inequality concerning positive semidefinite matrices . . . . . 245 19 A representation theorem for (Pap i)1/p. . . . . . . . . . . . . 246 20 A representation theorem for (trAp)1/p. . . . . . . . . . . . . . 248 21 H¨older’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . 249 22 Concavity of log|A| . . . . . . . . . . . . . . . . . . . . . . . . . 250 23 Minkowski’s inequality . . . . . . . . . . . . . . . . . . . . . . . 252 24 Quasilinear representation of |A|1/n. . . . . . . . . . . . . . . . 254 25 Minkowski’s determinant theorem . . . . . . . . . . . . . . . . . 256 26 Weighted means of order p . . . . . . . . . . . . . . . . . . . . . 256 27 Schl¨omilch’s inequality . . . . . . . . . . . . . . . . . . . . . . . 259 28 Curvature properties of Mp(x,a) . . . . . . . . . . . . . . . . . 260 29 Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 30 Generalized least squares . . . . . . . . . . . . . . . . . . . . . 263 31 Restricted least squares . . . . . . . . . . . . . . . . . . . . . . 263 32 Restricted least squares: matrix version . . . . . . . . . . . . . 265 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 Part Five — The linear model 12 Statistical preliminaries275 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 2The cumulative distribution function . . . . . . . . . . . . . . . 275 3The joint density function . . . . . . . . . . . . . . . . . . . . . 276 4Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276xContents 5Variance and covariance . . . . . . . . . . . . . . . . . . . . . . 277 6Independence of two random variables . . . . . . . . . . . . . . 279 7Independence of n random variables . . . . . . . . . . . . . . . 281 8Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 9The one-dimensional normal distribution . . . . . . . . . . . . . 281 10 The multivariate normal distribution . . . . . . . . . . . . . . . 282 11 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 13 The linear regression model287 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 2Affine minimum-trace unbiased estimation . . . . . . . . . . . . 288 3The Gauss-Markov theorem . . . . . . . . . . . . . . . . . . . . 289 4The method of least squares . . . . . . . . . . . . . . . . . . . . 292 5Aitken’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 293 6Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . 295 7Estimable functions . . . . . . . . . . . . . . . . . . . . . . . . 297 8Linear constraints: the case M(R′) ⊂ M(X′) . . . . . . . . . . 299 9Linear constraints: the general case . . . . . . . . . . . . . . . . 302 10 Linear constraints: the case M(R′) ∩ M(X′) = {0} . . . . . . . 305 11 A singular variance matrix: the case M(X) ⊂ M(V ) . . . . . . 306 12 A singular variance matrix: the case r(X′V+X) = r(X) . . . . 308 13 A singular variance matrix: the general case, I . . . . . . . . . . 309 14 Explicit and implicit linear constraints . . . . . . . . . . . . . . 310 15 The general linear model, I . . . . . . . . . . . . . . . . . . . . 313 16 A singular variance matrix: the general case, II . . . . . . . . . 314 17 The general linear model, II . . . . . . . . . . . . . . . . . . . . 317 18 Generalized least squares . . . . . . . . . . . . . . . . . . . . . 318 19 Restricted least squares . . . . . . . . . . . . . . . . . . . . . . 319 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 14 Further topics in the linear model323 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 2Best quadratic unbiased estimation of σ2. . . . . . . . . . . . 323 3The best quadratic and positive unbiased estimator of σ2. . . 324 4The best quadratic unbiased estimator of σ2. . . . . . . . . . . 326 5Best quadratic invariant estimation of σ2. . . . . . . . . . . . 329 6The best quadratic and positive invariant estimator of σ2. . . 330 7The best quadratic invariant estimator of σ2. . . . . . . . . . . 331 8Best quadratic unbiased estimation: multivariate normal case . 332 9Bounds for the bias of the least squares estimator of σ2, I . . . 335 10 Bounds for the bias of the least squares estimator of σ2, II . . . 336 11 The prediction of disturbances . . . . . . . . . . . . . . . . . . 338 12 Best linear unbiased predictors with scalar variance matrix . . 339 13 Best linear unbiased predictors with fixed variance matrix, I . . 341Contentsxi 14 Best linear unbiased predictors with fixed variance matrix, II . 344 15 Local sensitivity of the posterior mean . . . . . . . . . . . . . . 345 16 Local sensitivity of the posterior precision . . . . . . . . . . . . 347 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 Part Six — Applications to maximum likelihood estimation 15 Maximum likelihood estimation351 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 2The method of maximum likelihood (ML) . . . . . . . . . . . . 351 3ML estimation of the multivariate normal distribution . . . . . 352 4Symmetry: implicit versus explicit treatment . . . . . . . . . . 354 5The treatment of positive definiteness . . . . . . . . . . . . . . 355 6The information matrix . . . . . . . . . . . . . . . . . . . . . . 356 7ML estimation of the multivariate normal distribution: distinct means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 8The multivariate linear regression model . . . . . . . . . . . . . 358 9The errors-in-variables model . . . . . . . . . . . . . . . . . . . 361 10 The non-linear regression model with normal errors . . . . . . . 364 11 Special case: functional independence of mean- and variance parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 12 Generalization of Theorem 6 . . . . . . . . . . . . . . . . . . . 366 Miscellaneous exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 368 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 16 Simultaneous equations371 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 2The simultaneous equations model . . . . . . . . . . . . . . . . 371 3The identification problem . . . . . . . . . . . . . . . . . . . . . 373 4Identification with linear constraints on B and Γ only . . . . . 375 5Identification with linear constraints on B,Γ and Σ . . . . . . . 375 6Non-linear constraints . . . . . . . . . . . . . . . . . . . . . . . 377 7Full-information maximum likelihood (FIML): the information matrix (general case) . . . . . . . . . . . . . . . . . . . . . . . . 378 8Full-information maximum likelihood (FIML): the asymptotic variance matrix (special case) . . . . . . . . . . . . . . . . . . . 380 9Limited-informationmaximumlikelihood(LIML): thefirst-order conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 10 Limited-information maximum likelihood (LIML): the informa- tion matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 11 Limited-information maximum likelihood (LIML): the asymp- totic variance matrix . . . . . . . . . . . . . . . . . . . . . . . . 388 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 393