Modification in Functions of Scalar and Matrix Argument

Defining the Gradient for Matrix Functionals with Symmetric Arguments

by Mamta Lata Chouhan*,

- Published in Journal of Advances and Scholarly Researches in Allied Education, E-ISSN: 2230-7540

Volume 17, Issue No. 2, Oct 2020, Pages 188 - 194 (7)

Published by: Ignited Minds Journals


ABSTRACT

Matrix functional defined over an inner-product space of square matrices are a common construct in applied mathematics. In most cases, the object of interest is not the matrix functional itself, but its derivative or gradient (if it be differentiable), and this notion is unambiguous. The Frechet derivative, see for e.g. and, being a linear functional readily yields the definition of the gradient via the Riesz Representation Theorem. However, there is a sub-class of matrix functional that frequently occurs in practice whose argument is a symmetric matrix. For instance, in the theory of elasticity and continuum thermodynamics, the stress (a second-order, symmetric tensor) is defined to be the gradient of the strain energy functional or Helmholtz potential with respect to the (symmetric) strain tensor while the strain is defined to be the gradient of the Gibbs potential with respect to the stress. Such functional and their gradients also occur in the analysis and control of dynamical systems, which are described by matrix differential equations, and maximum likelihood estimation in statistics, econometrics and machine-learning. For this sub-class of matrix functional with symmetric arguments, there seem to be two approaches to define the gradient that lead to different results.

KEYWORD

matrix functional, derivative, gradient, Frechet derivative, symmetric matrix, stress, strain energy functional, Helmholtz potential, Gibbs potential, dynamical systems

INTRODUCTION

Engineers and researchers in the field of continuum mechanics work with the definition of the Fr´echet derivative over the vector space of square matrices and specialize it to that of the symmetric matrices which are a proper subspace and then the gradient (denoted by Gsym for convenience) is obtained as described earlier. However, in the other fields named above, the tool of choice is matrix calculus, wherein a different idea emerged and has now taken hold – that of a ―symmetric gradient‖. The root of this idea is the fact that while the space of square matrices in Rn×n has dimension n2, the subspace of symmetric matrices has a dimension of n(n + 1)/2. The second approach aims to explicitly take into account the symmetry of the matrix elements, and view the matrix functional as one defined on the vector space Rn(n+1)/2, compute its gradient in this space before finally reinterpreting it as a symmetric matrix (the ―symmetric gradient‖ Gs) in Rn×n. However, the two gradients computed, Gsym, Gs are not equal. The question raised in the title of this article refers to this dichotomy. A perusal of the literature reveals that in the two communities that dominantly used matrix calculus, that of statisticians and electrical engineers, the idea of the ―symmetric gradient‖ came into being at around the same time. Early work in 1960s such as not make any mention of a need for special formulae to treat the case of a symmetric matrix, but does note that all the matrix elements must me functionally independent. Among statisticians, Gebhardt in 1971 seems to have been the first to remark that the derivative formulae do not consider symmetry explicitly but he concluded that no adjustment was necessary in his case since the gradient obtained was already symmetric. Tracy and Singh in 1975 echo the same sentiments as Gebhardt about the need for special formulae. By the end of the decade, the ―symmetric gradient‖ makes its appearance in some form or the other in work of Henderson in 1979, a review by Nel and book by Rogers in 1980 and McCulloch proves the expression for ―symmetric gradient‖ that we quote here. By 1982, it was included in the authoritative and influential textbook by Searle. Today the idea is firmly entrenched as evidenced by the books and the notes by Minka. In the electrical engineering community (as represented by publications in IEEE), Geering [20] in 1976 exhibited an example calculation (gradient of the determinant of a symmetric 2 x 2 matrix) to justify the definition of the ―symmetric gradient‖. We shall show that his reasoning was flawed, and that the

elements of the matrix are independent, which is not true for a symmetric matrix, and proceeds to derive the ―symmetric gradient‖ using the rules of matrix calculus for use in sensitivity analysis of optimal estimation systems. At present, the ―symmetric gradient‖ formula is also recorded in , a handy reference for engineers and scientists working on interdisciplinary topics with statistics and machine-learning, and the formula‘s appearance in shows that it is no longer restricted to a particular community of researchers. Thus, both notions of the gradient are well-established, and hence the fact that these two notions do not agree is a source of enormous confusion for researchers who straddle application areas, a point to which the authors can emphatically attest to. On the popular site Mathematics Stack Exchange, there are multiple questions (for example) related to this theme, but their answers deepen and misguide rather than alleviate the existing confusion. Depending on the context, this disagreement between the two notions of gradient has implications that range from serious to none. In the context of extremizing a matrix functional, such as when calculating a maximum likelihood estimator, both approaches yield the same critical point. If the gradient be used in an optimization routine such as for steepest descent, one of the gradients is clearly not the steepest descent direction, and that will lead to sub-optimal convergence. Indeed, since these two are the most common contexts, the discrepancy probably escaped scrutiny until now. However, in the context of mechanics, the discrepancies are a serious issue since gradients of matrix functional are used to describe physical quantities like stress and strain in a body. Problem formulation. To fix our notation, we introduce the following. We denote by Sn×n the subspace of all symmetric matrices in Rn×n. The space Rn×n (and subsequently Sn×n) is an inner product space with the following natural inner product h·,·iF. Definition 2.1. For two matrices A,B in Rn×n hA,BiF := tr(ATB) defines an inner product and induces the Frobenius norm on Rn×n via.

Corollary 2.2. We collect a few useful facts about the inner product defined above essential for this paper.

1. For A symmetric, B skew-symmetric in Rn×n, hA,BiF = 0

3. For A in Rn×n and H in Sn×n, hA,BiF = hsym(A),HiF

Proof. See, for e.g. [31]. Consider a real valued function φ : Rn×n −→ R. We say that φ is differentiable if its Fr´echet derivative, defined below, exists.

Definition 2.3. The Fr´echet derivative of φ at A in Rn×n is the unique linear transformation Dφ(A) in Rn×n

such that,

for any H in Rn×n. The Riesz Representation theorem then asserts the existence of the gradient ∇φ(A) in Rn×n such that h∇φ(A),HiF = Dφ(A)[H] Note that if A is a symmetric matrix, then by the Fr´echet derivative defined above, the gradient ∇φ(A) is not guaranteed to be symmetric. Also, observe that the dimension of Sn×n is m = n(n + 1/2), hence, it is natural to identify Sn×n with Rm. The reduced dimension along with the fact that Definition 2.3 doesn‘t account for the symmetric structure of the matrix when the argument to φ is a symmetric matrix served as a motivation to define a ―constrained gradient‖ or ―symmetric gradient‖ in Rn×n reasoned to account for the symmetry in Sn×n.

Claim 2.4. Let φ : Rn×n −→ R and φsym be the real-valued function that is

the restriction of φ to Sn×n, i.e., φ := φ Sn×n −→ R. Let G be the gradient of φ as defined in Definition 2.3is the linear transformation in Sn×n

that is claimed to be the “symmetric gradient” of φsym and related to the gradient G as

follows

where ◦ denotes the element-wise Hadamard product of G(A) and the identity I. Theorem 3.7 in the next section will demonstrate that this claim is false. Before that, however, note that Sn×n is a subspace of Rn×n with the induced inner product in Definition 2.1. Thus, the derivative in Definition 2.3 is naturally defined for all scalar functions of symmetric matrices. The Fr´echet Derivative of φ when restricted to the subspace Sn×n automatically accounts for the symmetry structure. For subspace Sn×n. Definition 2.5. The Fr´echet derivative of the function R at A in Sn×n is the unique linear transformation Dφ(A) , for any H in Sn×n. The Riesz Representation theorem then asserts the existence of the gradient Gsym(A) := ∇φsym(A) in Sn×n such that hGsym(A),HiF = Dφ(A)[H] There is a natural relationship between the gradient in the larger space Rn×n and the restricted subspace Sn×n. The following corollary states this relationship.

Corollary 2.6. If G ∈ Rn×n be the gradient of φ : Rn×n

−→ R, then Gsym = sym(G) is the gradient in

Proof. From Definition 2.3, we know that Dφ(A)[H] = hG(A),HiF for any H in Rn×n. If we restrict attention to H in Sn×n, then, Dφ(A)[H] = hG(A),HiF = h∇φsym(A),HiF . This is true for any H in Sn×n, so that by Corollary 2.2 and uniqueness of the gradient, Gsym(A) = sym(G(A)) is the gradient in Sn×n.

An Illustrative Example 1. This example will illustrate the difference between the gradient on Rn×n

and Sn×n. Fix a non-symmetric matrix A in Rn×n and consider a linear functional, φ : Rn×n −→ R, given by φ(X) = tr(ATX) for any X in Rn×n. The gradient ∇φ in Rn×n is equal to A, as defined by the Fr´echet derivative Definition 2.3. However, if φ is restricted to Sn×n, then observe that = sym(A) = (A + AT)/2 according to Corollary 2.6 ! Thus the definition of the gradient of a real-valued function defined on Sn×n in Corollary 2.6 is ensured to be symmetric. We will demonstrate that Claim 2.4 is unnecessary. In fact, the correct symmetric gradient is the one given by the Fr´echet derivative in Definition 2.5, Corollary 2.6, i.e. sym(G). To do this, we first illustrate through a simple example that Gclaims as defined in Claim 2.4 gives an incorrect gradient. Hypergeometric Functions; The noncentral X2 , noncentral F ajid multiple correlation distributions, as found by Fisher (1928), involve Bessel and hypergeometric functions which can all be written as special cases, for particular integers p and q, of the generalized hypergeometric function. where the hypergeometric coefficient (a)k. is given by Exponential Binomial series Bessel (In the noncentral distribution) Confluent (in the noncentral P distribution) Gaussian hypergeometric The corresponding multivariate distributions involve a generalization of this function to the case in which the variable x is replaced by a symmetric matrix S and P is a real or complex valued symmetric function of the latent roots of S. The hypergeometrie functions which appear in the distributions of the matrix variates are given by the Constantine (1963).

DEFINITION

Where, as before The latent roots distribution s involve functions of both population and sample roots, namely,

DEFINITION

Hypergeometric functions of product ST of symmetric matrices are defined as symmetric functions of the latent roots of ST although ST may not be a symmetric matrix, its latent roots are equal to the latent roots of S1/2 TS1/2 , T1/2 ST1/2 and TS.

Special Cases of Generalized Hypergeometric Functions of Matrix Argument;

Now special cases of the generalized hypergeometric function of matrix argument are: The following integrals, which are a generalization of Laplace and inverse laplace transforms, were used by Bochner (1952) to define the Bessel function and by Herz (1956) to define the hypergeometric functions Where is the multivariate gamma function? And And The integral is taken over all matrices for fixed positive definite X0 and Y arbitrary real symmetric. The hypergeometric function of two variables follows from that of one, by an average over 0(m) The function of two variables clearly does not depend upon the order in which they occur, and it has the same properties of Laplace and inverse Laplace transform taken with respect to either variable The power series for the function , which occurs in the distribution (18) of latent roots with unequal covariance matrices, may not converge for all requisite values of S and T, but the integral. is well defined for al l S > 0 and T > 0 Further integra representations of hypergeometrie functions are given by Herz (1955), namely of 1F1 as the moment generating function of the multivariate beta distribution. The hypergeometric functions of matrix argument satisfy some of the Kummer relationel. Herz (1955) gives And And also the obvious confluences Probability Distribution with Matrix Argument; Here we define number of probability ' distributions of special functions with matrix argument. (i) The gamma function of matrix argument is given by then the probability distribution is given by (ii) The beta function of matrix argument is given by then the probability distribution is given by for Z > 0; (iii) The H-function of matrix argument is given by then the probability distribution is given by (iv ) We know that then the probability distribution is given by For (v ) We know that then the probability distribution is given by For

(vi) We know that then the probability distribution is given by For

Bivariate Probability Distribution with Matrix Argument;

Here we define number of probability distribution of special functions with matrix argument for bivariate variables, ( i ) We know that ( i ) We know that then the bivariate probability distribution is given by For And (ii) We know that then the bivariate probability distribution is given by For (iii) We know that then the bivariate probability distribution is given by For

(iv) We know that Then the bivariate probability distribution is given by

For Z > 0; H ( ) > 0 .

(v) We know that Then the bivariate probability distribution

For

CONCLUSION

In this article, we investigated the hypergeometric function of a matrix argument is a generalization of the classical hypergeometric series. It is a function defined by an infinite summation, which can be used to evaluate certain multivariate integrals. Hypergeometric functions of a matrix argument have applications in random matrix theory. For example, the distributions of the extreme eigenvalues of random matrices are often expressed in terms of the hypergeometric function of a matrix argument.

REFERENCE

[1]. ANDERSON, T.W. (1958). Introduction to multivariate statistical Analysis, New York, John Wiley and Sons. [2]. COY, J.W. (1955). A differential calculus for functions of matrices. Doctoral Dissertation, University of Michigan, [3]. DEDRAJ (1954). On a generalized Bessel functions Population, Ganite 3, pp. 111-115. [4]. ERDELYI, A. et al. (1953). Higher transcendental functions. Vol. I, II, McGraw Hill, New York. [5]. FISHER, R. A. (1924): On a distribution yielding the error functions of several well knovm statistics. Proc. Internal Math. Congress Toronto, pp. 805-813 [6]. G.S. Rogers (1980). Matrix Derivatives. Marcel Dekker, New York. [7]. HSU, P.L. (1939). On the distribution of the roots of certain determination equations, Ann. Eugenics, 9, pp. 250-258. [8]. Charles E. McCulloch (1980). Symmetric Matrix Derivatives with Applications. Journal of the American Statistical Association, Mar 1980. [9]. S.R. Searle. Matrix Algebra for Statistics. John Wiley, New York, 1982. [11]. George Arthur Frederick Seber. A Matrix Handbook for Statisticians. John Wiley & Sons, New Jersey, 2008. [12]. Thomas Minka (2019). Old and New Matrix Algebra Useful for Statistics, 2001. [Online; accessed 1.Nov. 2019]. [13]. H. Geering (1976). On calculating gradient matrices. IEEE Transactions on Automatic Control,21(4): pp. 615–616. [14]. J. Brewer (1977). The gradient with respect to a symmetric matrix. IEEE Transactions on Automatic Control, 22(2): pp. 265–267. [15]. Kaare Brandt Petersen and Michael Syskind Pedersen (2012). The Matrix Cookbook, 11. Version 20121115. [16]. Iain Murray (2016). Differentiation of the Cholesky decomposition. ArXiv e-prints, Feb 2016.me10240 (https://math.stackexchange.com/users/66158/me10240). [17]. Making sense of matrix derivative formula for determinant of symmetric matrix as a frechet derivative? Mathematics Stack Exchange. URL:https://math.stackexchange.com/q/2436680 (version: 2017-09-21). [18]. Tomka (https://math.stackexchange.com/users/118706/tomka). What is the derivative of the determinant of a symmetric positive definite matrix? Mathematics Stack Exchange. URL:https://math.stackexchange.com/q/1981210 (version: 2017-04-13).

Corresponding Author Mamta Lata Chouhan*

Madhyanchal Professional University, Bhopal