This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON CYBERNETICS
1
Correntropy Matching Pursuit With Application to Robust Digit and Face Recognition Yulong Wang, Student Member, IEEE, Yuan Yan Tang, Life Fellow, IEEE, and Luoqing Li
Abstract—As an efficient sparse representation algorithm, orthogonal matching pursuit (OMP) has attracted massive attention in recent years. However, OMP and most of its variants estimate the sparse vector using the mean square error criterion, which depends on the Gaussianity assumption of the error distribution. A violation of this assumption, e.g., non-Gaussian noise, may lead to performance degradation. In this paper, a correntropy matching pursuit (CMP) method is proposed to alleviate this problem of OMP. Unlike many other matching pursuit methods, our method is independent of the error distribution. We show that CMP can adaptively assign small weights on severely corrupted entries of data and large weights on clean ones, thus reducing the effect of large noise. Our another contribution is to develop a robust sparse representation-based recognition method based on CMP. Experiments on synthetic and real data show the effectiveness of our method for both sparse approximation and pattern recognition, especially for noisy, corrupted, and incomplete data. Index Terms—Correntropy, representation-based classifier sparse representation.
matching pursuit (MP), (RC), robust recognition,
I. I NTRODUCTION RTHOGONAL matching pursuit (OMP) is a popular sparse representation method and has been widely used in pattern recognition [1]–[4] and signal processing [5]–[7]. The major advantages of OMP making it attractive are its simplicity, low complexity, and competitive performance. Another widely used method for sparse representation is the algorithm of 1 minimization, which is also known as lasso [8] or basis pursuit [9]. While lasso enjoys striking theoretical properties under appropriate conditions, most related algorithms for solving lasso require heavy computation burden [10]. In comparison, OMP has better efficiency and is easier to implement.
O
Manuscript received September 4, 2015; revised January 23, 2016; accepted March 18, 2016. This work was supported in part by the Research Grants of University of Macau under Grant MYRG2015-00049-FST, Grant MYRG2015-00050-FST, and Grant RDG009/FST-TYY/2012, in part by the Science and Technology Development Fund of Macau under Grant 100-2012-A3 and Grant 026-2013-A, in part by the Macau-China Join Project under Grant 008-2014-AMJ, and in part by the National Natural Science Foundation of China under Grant 61273244 and Grant 11371007. This paper was recommended by Associate Editor W. X. Zheng. Y. Wang and Y. Y. Tang are with the Faculty of Science and Technology, University of Macau, Macau 999078, China (e-mail:
[email protected];
[email protected]). L. Li is with the Faculty of Mathematics and Statistics, Hubei University, Wuhan 430062, China (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2016.2544852
Inspired by OMP, a variety of variants of OMP have been developed to improve its efficiency and performance. For example, Wang et al. [11] proposed a generalized OMP (GOMP) method to promote OMP in identifying atoms, i.e., the columns of measurement matrix. Specifically, unlike OMP identifying one atom in each iteration, GOMP selects multiple atoms most correlated with the residual. Analogously, Needell and Vershynin [12] devised a regularized OMP (ROMP) approach by utilizing a predefined regularization rule. Unlike GOMP and ROMP, the compressive sampling matching pursuit (CoSaMP) method [13] refines OMP by incorporating an additional pruning step. Most matching pursuit (MP) methods above utilize the mean square error (MSE) criterion and solve a least squares (LSs) problem to approximate the target sparse vector in each iteration. However, MSE is known to depend heavily on the Gaussianity assumption of the error distribution [14]–[17]. A violation of this assumption, such as impulsive noise and outliers, may lead to severe performance degradation. Here, the impulsive noise is a category of noise which includes short duration noise pulses and characterized by a small percentage of samples having very large values [17]. Additionally, these MP methods use the correlation between the atoms and the residual to identify important atoms in each iteration. When the measurement vector is contaminated with severe noise, the identified atoms may also be not accurate. To alleviate these limitations, we develop a correntropy-based MP method. The correntropy [15] between two random variables X and Y is defined as V(X, Y) := E[κσ (X − Y)] where E denotes the mathematical expectation and κσ (x) is the Gaussian kernel with the bandwidth σ , which will be introduced in detail in later sections. Specifically, we propose to minimize the correntropy induced metric (CIM) of the residual in the estimation step and calculate weighted residuals. Unlike most existing MP methods, our proposed scheme is agnostic to the error distribution. The contributions of this paper are summarized as below. 1) We present a correntropy matching pursuit (CMP) method for sparse representation. In the presence of nonGaussian noise, it is shown that CMP can improve OMP and most of its variants with notable performance gains. 2) With the devised optimization method, we can obtain a weight vector indicating the importance of measurements as a byproduct. The weight vector is used to further improve CMP in identifying accurate atoms.
c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 2168-2267 See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2
IEEE TRANSACTIONS ON CYBERNETICS
TABLE I ACRONYMS AND N OTATIONS
A. OMP [6] First, we initialize the residual r0 = y, the index set 0 = ∅, and the iteration k = 1. In the kth iteration, it first identifies a column of most correlated with the residual λk = arg maxrk−1 , ϕ i i∈
where rk−1 denotes the residual computed in the k − 1th iteration and ϕ i represents the ith column of . The index set is augmented with the selected index k = k−1 ∪ {λk }. Then, we have a new approximation xk supported in k by solving an LSs problem with the MSE criterion xk = 3) A CMP-based classifier (CMPC) is developed in the framework of the representation-based classifier (RC) [3]. Our classifier improves the standard sparse representation-based classifier (SRC) [18] in both computing representation vectors and class-dependent residuals. This makes the CMPC more attractive to classify noisy, corrupted, and incomplete data. The remainder of this paper is structured as follows. In Section II, we briefly review some related works. Section III describes the proposed method along with the analysis. In Section IV, we verify the efficacy of the CMP method for sparse representation through experiments using synthetic data. In Section V, several experiments with real popular databases are conducted to evaluate CMPC in comparison with related state of the art. Finally, Section VI concludes this paper.
(2)
2
Before formulating the problem, we first introduce the notations used in this paper. Vectors and matrices are represented with boldface lowercase letters and boldface capital letters, respectively. For any vector x, we use x(i) to denote its ith entry. The notation x|I is used to denote the subvector of x ∈ Rn with entries indexed by the set I ⊂ = {1, 2, . . . , n}. Denote the complementary set of I as I c = − I. Table I summarizes the key acronyms and notations used in this paper. Let ∈ Rm×n (m < n) and x0 ∈ Rn be the measurement matrix and the sparse vector, respectively. The measurement vector y ∈ Rm is formulated as
Rm
y − x 2
where supp(x) denotes the support set of x. The updated residual in the kth iteration is rk = y − xk . Finally, we check a stopping criterion. Two common stopping criterions are the sparsity level K0 of the ideal signal x0 and the error tolerance of the residual, respectively. If the stopping criterion is achieved, we output xk as the estimate of x0 . Otherwise, we set k = k + 1 and go to next iteration. Using the mutual coherence, Donoho et al. [19] proved the recovery error bound for OMP with small noise magnitude. Let M() be the mutual coherence of the dictionary , i.e., M() := max1≤i