- Home
- Documents
*Statistical Inference On the High-dimensional Gaussian ... ... Introduction Test Procedures for the*

prev

next

out of 61

View

0Download

0

Embed Size (px)

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Statistical Inference On the High-dimensional Gaussian Covariance Matrix

Xiaoqian SUN, Colin Gallagher, Thomas Fisher

Department of Mathematical Sciences, Clemson University

June 6, 2011

Xiaoqian SUN, Colin Gallagher, Thomas Fisher Statistical Inference On the High-dimensional Gaussian Covariance Matrix

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Problem Setup Statistical Inference High-Dimensional Data Sets

Outline

Introduction

Hypothesis Testing on the Covariance Matrix

Estimation of the Covariance Matrix

Conclusions and Future Work.

Xiaoqian SUN, Colin Gallagher, Thomas Fisher Statistical Inference On the High-dimensional Gaussian Covariance Matrix

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Problem Setup Statistical Inference High-Dimensional Data Sets

Problem Setup

Consider X1,X2, . . . ,XN ∼ Np(µ,Σ):

µ ∈ Rp and Σ > 0

Both µ and Σ are unknown.

(X̄ ,S) is a sufficient statistic.

Σ is the parameter of interest.

Xiaoqian SUN, Colin Gallagher, Thomas Fisher Statistical Inference On the High-dimensional Gaussian Covariance Matrix

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Problem Setup Statistical Inference High-Dimensional Data Sets

Statistical Inference

Classical inference

Based on the likelihood approach

Assume N = n + 1 > p and N →∞ with p fixed

Results appeared on most multivariate analysis textbooks

High-dimensional Inference

Assume both (n, p)→∞

No general approach

Fujikoshi, Ulyanov and Shimizu (2010) “Multivariate Statistics : High-Dimensional and Large-Sample Approximation”, Wiley

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Problem Setup Statistical Inference High-Dimensional Data Sets

Statistical Inference

Classical inference

Based on the likelihood approach

Assume N = n + 1 > p and N →∞ with p fixed

Results appeared on most multivariate analysis textbooks

High-dimensional Inference

Assume both (n, p)→∞

No general approach

Fujikoshi, Ulyanov and Shimizu (2010) “Multivariate Statistics : High-Dimensional and Large-Sample Approximation”, Wiley

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Problem Setup Statistical Inference High-Dimensional Data Sets

Statistical Inference

Classical inference

Based on the likelihood approach

Assume N = n + 1 > p and N →∞ with p fixed

Results appeared on most multivariate analysis textbooks

High-dimensional Inference

Assume both (n, p)→∞

No general approach

Fujikoshi, Ulyanov and Shimizu (2010) “Multivariate Statistics : High-Dimensional and Large-Sample Approximation”, Wiley

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Problem Setup Statistical Inference High-Dimensional Data Sets

High-Dimensional Data Sets

Examples:

1 Microarray gene data in genetics

2 Financial data in stock markets

3 Curve data in engineering

4 Image data in computer science

.....

Comments:

The dimensionality exceeds the sample size, i.e. p > N.

Collecting additional data may be expensive or infeasible.

Few data analysis before 1970

Fast computers ⇒ New methods needed

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Problem Setup Statistical Inference High-Dimensional Data Sets

High-Dimensional Data Sets

Examples:

1 Microarray gene data in genetics

2 Financial data in stock markets

3 Curve data in engineering

4 Image data in computer science

.....

Comments:

The dimensionality exceeds the sample size, i.e. p > N.

Collecting additional data may be expensive or infeasible.

Few data analysis before 1970

Fast computers ⇒ New methods needed

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Likelihood ratio test Previous High-Dimensional Sphericity Testing New Testing Procedure Simulation Study and Data Analysis

Hypothesis Testing on the Sphericity

Consider H0 : Σ = σ

2I vs. H1 : Σ 6= σ2I .

The likelihood ratio test (LRT) for this hypothesis is,

Λ(x) =

p∏

i=1 l 1/p i

p∑ i=1

li/p

1 2 pN

where l1, l2, . . . , lp ≥ 0 are the eigenvalues of the MLE for Σ.

When p > n, Σ̂ will be singular, and hence have 0-eigenvalues.

Even when p ≤ n, the eigenvalues of S disperse from the true ones

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Likelihood ratio test Previous High-Dimensional Sphericity Testing New Testing Procedure Simulation Study and Data Analysis

Hypothesis Testing on the Sphericity

Consider H0 : Σ = σ

2I vs. H1 : Σ 6= σ2I .

The likelihood ratio test (LRT) for this hypothesis is,

Λ(x) =

p∏

i=1 l 1/p i

p∑ i=1

li/p

1 2 pN

where l1, l2, . . . , lp ≥ 0 are the eigenvalues of the MLE for Σ.

When p > n, Σ̂ will be singular, and hence have 0-eigenvalues.

Even when p ≤ n, the eigenvalues of S disperse from the true ones

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Likelihood ratio test Previous High-Dimensional Sphericity Testing New Testing Procedure Simulation Study and Data Analysis

Hypothesis Testing on the Sphericity

Consider H0 : Σ = σ

2I vs. H1 : Σ 6= σ2I .

The likelihood ratio test (LRT) for this hypothesis is,

Λ(x) =

p∏

i=1 l 1/p i

p∑ i=1

li/p

1 2 pN

where l1, l2, . . . , lp ≥ 0 are the eigenvalues of the MLE for Σ.

When p > n, Σ̂ will be singular, and hence have 0-eigenvalues.

Even when p ≤ n, the eigenvalues of S disperse from the true ones

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Sample Eigenvalue Dispersion (Σ = I )

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Effects on LRT under High-Dimensionality

If p > N, the LRT is degenerate

If N > p, but p → N, the LRT will become computational degenerate/unreliable

The LRT cannot be used in a high-dimensional situation.

Introduction Test Procedures for the Covariance Matrix

Estimation of the Covariance Matrix Conclusions Remarks

Previous Work on High-Dimensional Sphericity Test

John (1971) U test statistic,

U = 1

p tr

[( S

(1/p)tr(S) − I )2]

.

Its based on the 1st and 2nd arithmetic means.

Ledoit and Wolf (2002) show its (n, p)-asymptotic null distribut