Face Recognition System for Increased Feature Extraction

Efficient feature selection for improved face recognition and biometric authentication

by Baljeet Kaur*, Dr. Kalpna Midha,

- Published in Journal of Advances and Scholarly Researches in Allied Education, E-ISSN: 2230-7540

Volume 12, Issue No. 23, Oct 2016, Pages 408 - 413 (6)

Published by: Ignited Minds Journals


ABSTRACT

Feature extraction is the most vital stage in pattern recognition and data mining. In this stage, the meaningful feature subset is extracted from original data by applying certain rules. For reliable recognition, it is desirable to extract appropriate features space, since all the extracted features may not contribute to the classification positively. In this paper, some feature extraction methods and algorithms were studied, compared and means of improving feature selection through dimension reduction was explained. It was concluded that few number of features are usually required and selected for an optimized compression. So that huge amount of data can be reduced to a relatively small set which is computationally faster. Hence, efficient selection of features is a key step of achieving efficient face recognition and biometric authentication.

KEYWORD

face recognition system, feature extraction, pattern recognition, data mining, feature subset, classification, feature selection, dimension reduction, optimized compression, biometric authentication

INTRODUCTION

Feature extraction is the application of extracting algorithm on digital images to reduce redundancy and irrelevancy present in the image. The main goals of feature extraction are to reduce the time of machine training and complexity of space, in order to achieve a dimension reduction (Khokher et al., 2015). Feature extraction algorithms transform an input data into the set of features, meanwhile select features containing the most relevant information from the original data. Feature extractions maintain acceptable classification accuracy by reducing maximum number of irrelevant features. It has great importance in data analysis, pattern classification, biometrics, computer vision, multimedia information retrieval, remote sensing, machine learning, medical data processing, and data mining applications. There are two common approaches to extract facial features: Geometric features and Appearance based methods.

Geometric Feature Based Methods

The feature-based or analytic approach computes a set of geometrical face features of eyes, a mouth, and a nose. In this representation, outline of the face and positions of the different facial features form a feature vector. Usually, for good extraction process, the feature points are chosen terms of their reliability for automatic extraction and significance for face representation. To compute the geometrical relationships, the location of those points is used. Such a system is insensitive to position variations in the image. Nevertheless, Geometric features present the shape and locations of facial components, which are extracted to form a feature vector that represents the face.

Holistic Based Methods

In Holistic or Appearance-based methods, the global properties of the human face pattern are considered. Unlike in feature based, the whole face region is recognized without using few points from different regions of the face. Commonly, Holistic methods encode the pixel intensity array representation of faces without the detection of any facial feature. This class of face extraction is more applicable and easier to implement compared to geometric feature-based methods, since detection of geometric facial features is not required. Holistic methods depend on such techniques that convert the image into a low-dimensional feature space with improved discriminating power. This is because; the distances from a given probe to its nearest and farthest neighbors may become indistinguishable for a high dimensional feature space. Similar to most of natural signals, face images contain significant redundancies or statistical regularities. Therefore, to discover low-dimensional representations of human face images, several dimensionality reduction frameworks have been developed by relying on their statistical regularities. With Appearance-based methods, image filters, such

2014).

Appearance based algorithms include Principal Component Analysis (PCA), Independent Component Analysis (ICA), Locality Preserving Projections (LPP), Linear Discriminate Analysis (LDA), Gabor wavelets, Local Binary Pattern (LBP) and Discrete Cosine Transform (DCT). Additionally, to form a complete face recognition system, two or more algorithms are usually combined to form a Hybrid approach. This method achieves high recognition rates under many considerations.

NEW ALGORITHMS FOR FACE RECOGNITION

Based on the way face is represented, existing face recognition techniques can be broadly categorized into four: 1) Appearance based explained above which uses holistic texture features; 2) Model based which employs shape and texture of the face, along with 3D depth information; 3) Template based face recognition; and 4) Techniques using Neural Networks. A summary of these techniques is illustrated in Figure 1.

Figure 1: Classification of face recognition methods Line Edge Map

The recognition of humans is highly natural that, Line Edge Map (LEM) recognizes rough line drawings quickly and accurately. The LEM is an approach that extracts features of lines from a face edge map, based on a combination of geometrical feature matching and template matching. The edge images of objects are used for object recognition and to achieve similar accuracy as gray-level images (Kaushik et al., 2014). The Sobel edge detection algorithm used to encode the faces into binary edge maps, and the Hausdorff distance was chosen to measure the similarity of the edge maps of the two faces. This distance is calculated without an is invariant to illumination, and depicts high performance of recognition using the template matching. Also, the LEM face feature representation integrates the spatial information with the structural information of a face image through grouping pixels of face edge maps into line segments to achieve recognition accuracy. A polygonal line fitting process is applied after thinning the edge map, to generate the LEM of the face. Therefore, LEM representation keeps only the end points of line segments on curves, to further reduce the storage requirements. Since LEM is an intermediate-level image representation derived from a low-level edge map representation, so it is less sensitive to illumination changes. The basic unit of the LEM is the line segment which is grouped from pixels of the edge map. To match LEMs of faces, a line segment Hausdorff distance (LHD) measure is used. Hausdorff distance is a shape comparison measure based on LEMs, a distance defined between two line sets. The LHD has better distinctive power since it uses additional structural attributes of line orientation, number disparity in obtaining the LEM, and line-point association.

Elastic Graph Matching

Elastic Graph Matching (EGM) is used to extend the dynamic link architecture method so as to increase the matching accuracy for bigger databases, and handle larger variations in poses. EGM employs object adaptive graphs, so that nodes refer to fiducially points on the face such as the pupils, the corners of the mouth, and the tip of nose. The goal of EGM on a test image is to find the fiducially points, and thus extract from the image a graph that maximizes the similarity. It uses the phase of the complex Gabor wavelet coefficients to achieve a more accurate location of the nodes, and to disambiguate patterns, which would be similar in their coefficient magnitudes. A set of 25 facial landmarks were localized using the Elastic Bunch Graph Matching framework (Déniz et al., 2011). Morphological elastic graph matching has been proposed for better improvement.

Neural Networks

Neural Network (NN) approaches have been widely explored for feature representation and face recognition. Face recognition based on a hybrid neural networks (El-Dahshan et al., 2010) have been proposed. In general, NN approaches encounter problems when the number of classes (i.e., individuals) increases. But, they are not suitable for a single model image recognition task because, multiple model images per person are necessary in increases, the computation expenses become more demanding. However, their distributed computing principles are relatively easy to implement on parallel computers. It was reported that face recognition using NN had the capability of recognizing up to 200 people, and could achieve up to a 96% correct recognition rate in approximately 1 second.

Support Vector Machines

Support Vector Machines (SVMs) are research topic in machine learning community, creating a similar enthusiasm as in Artificial Neural Network (Meyer & Wien, 2014). Library for Support Vector Machines (LIBSVM) has been actively used to develop this package since 2000. The aim is to help users easily apply the SVM in their applications. The SVM tries to find the optimal separating hyper plane that maximizes the margin of separation in order to minimize the risk of misclassification in both training samples and the unseen data in the test set. The algorithm is defined by a weighted combination of small subset of the training vectors, called support vectors (Chang and Lin, 2011). The class separating hyper plane is chosen to minimize the expected classification. Estimating the optimal hyper plane is equivalent to solving a linearly constrained quadratic programming problem. The SVM algorithm is applied for classification of facial expression characteristics in Hemalatha and Sumathi (2014). It can be considered a new paradigm to train polynomial functions, radial basis function classifiers, neural networks, and operate on another induction principle called structural risk minimization, which aims at minimizing an upper bound on the expected generalization error. Most methods for training a classifier are based on minimizing the training error (i.e. empirical risk).

Hidden Markov Models

Hidden Markov Models (HMMs) are one of the most widely used statistical tools for modeling discrete time series. HMMs have been successfully used in modeling temporal information in many speech, and image. Still, HMMs are used to perform face recognition in video signals, considering each frame in the video sequence as an observation. When a traditional 1D Markov chain is used to model a 2D image signal, the signal has to be transformed into a 1D observation sequence. It was concluded that, learning HMMs from data is computationally hard.

Discrete Cosine Transform

Discrete Cosine Transform (DCT) is one of the transformation based extraction techniques. Ahmed et al. (1974) initially introduced a DCT in the early seventies. Since then, the DCT has become very popular and several versions of it have been proposed (Rao, & Yip, 1990). Few numbers of DCT coefficients were used to reduce redundancy and recover the performance of DCT is largely affected by altering the coefficient magnitude at the top left corner of the matrix. Thus, illumination normalization before feature extraction compensates these coefficient variances and sets the blocks to more equalized intensity. After applying DCT on input image, the DCT coefficients are traced in Zig-zagged order for conversion of image matrix to feature vector (Sandeep et al., 2011). In addition, DCT has been a popular technique used in signal and image processing. It‘s very good in energy compaction, where most of the information tends to concentrate in upper part of the DCT coefficient representing a low frequency. It possesses low computational complexity and it is very effective technique that promotes recognition. DCT exploits inter-pixel redundancies to render excellent decorrelation for most natural images. Moreover, each DCT coefficient can be independently encoded without losses.

Discrete Wavelet Transform

Discrete Wavelet Transform (DWT) has been successfully used in image processing since 1985, and it is used for designing pose-invariant face recognition (Demirel et al., 2010). Because of its ability to simultaneously provide spatial and frequency representations of image, it is used for feature extraction. It allows for decomposition of input data into several layers of division in space and frequency. It also allows for isolation of frequency components introduced by intrinsic deformations (due to expression or extrinsic factors like illumination) into certain sub-bands. Wavelet-based methods focus on the space or frequency sub-bands that contain the most relevant information to better represent the data and aid in classification of images. Meanwhile, prune away the variable sub-bands. Haar wavelet transform is a widely used technique (Kumar & Sood, 2012), which is simple and powerful for multi-resolution decomposition of time series.

Gabor Filter Representation

The Gabor filter was first introduced by David Gabor in 1946 and was later shown as models of simple cell receptive fields (Chelali & Djeradi, 2015). Gabor features are less sensitive to variations of illumination, pose, and expression than the holistic features such as Eigen-face. Face representation using the Gabor features has been a research interest in image processing, computer vision and pattern recognition. The Gabor filters exploits salient visual properties such as spatial localization, spatial frequency characteristics and orientation selectivity. Besides, Gabor filter and DCT are well known techniques used in facial recognition. Evidently, face

performance for feature extraction, but it is time consuming and sensitive to non-linear distortions and rotations.

Principle Component Analysis

Principle Component Analysis (PCA) and other statistical based feature extraction approaches such as ICA, LDA make use of algebraic methods. The PCA algorithm identifies patterns in a data, and expressing the data to highlight their similarities and differences. It is used for face recognition to express the large 1-D vector of pixels constructed from 2-D facial image into the compact principal components of the feature space. Thus, it finds the vectors which account for distribution of face images within an image space. These vectors define the subspace of face images which is called face space. The Eigen faces are the principle components of a distribution of the faces. The main advantage of PCA is data compression, carried out by reducing the number of dimensions and without much loss of information (Mahto & Yadav, 2014). Eigen space was calculated by identifying the eigenvectors of the covariance matrix which is derived from a set of facial images (vectors). Zhiguo and Xuehong (2010) suggested a Weighted Principal Component Analysis (WPCA) on multi feature fusion. Kernel PCA is also eigenvector based method, but this method uses nonlinear mapping.

Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) and related fisher‘s linear discriminant are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which separates two or more classes of objects or events. This is the most dominant algorithm for feature selection in Appearance based method; it yields an effective representation that linearly transforms the original data space into a low-dimensional feature space where the data is well separated (Mohammed & Gupta, 2013). Dataset selected should have larger samples per class for good extraction of discriminating features. Under the test and comparison of performance of Appearance based statistical methods on colored face images, LDA surpassed ICA under different illuminations. It is also more sensitive than PCA and ICA on partial occlusions. The LDA has small size problem when dealing with high dimensional data. It defines a projection that makes the within-class scatter small and the between-class scatter large. Trying to find non-linear correlation models among different biometric features can improve the system and it was recommended that, the intra-class correlation for different feature set should be further explored on extended biometric data sets (Soviany & Puscoci, 2013).

computational method for separating a multivariable signal into additive subcomponents, they are non-Gaussian signals and are statistically independent from each other. The ICA is a special case of blind source separation. It considers statistically independent images; these images are sparse and localized in space resembling facial features. Hough Transform

Hough Transform (HT) is one of the feature extraction techniques used in image analysis, computer vision and digital image processing (Shapiro & Stockman, 2001). It was patented by Paul Hough in 1962 and assigned to the U.S. Atomic Energy Commission with the name ―method and means for recognizing complex patterns‖. The HT universally applied today was invented by Duda and Hart (1972), and called it ―Generalized Hough Transform (GHT)‖. The transform was popularized in the computer vision community by Dana H. Ballard in 1981, through a journal article titled "Generalizing the Hough transform to detect arbitrary shapes" most commonly circles or ellipses. The use of the transform to detect straight lines in digital images is probably one of the most widely used procedures in computer (sufyanu et al., 2015). Therefore, current studies put forward face recognition using the HT.

COMPARISON OF RELATED WORK

Face recognition is the process of authenticating users through their facial attributes. Feature-based methods of face recognition have found increasing use in many applications such as object recognition, 3D reconstruction and mosaicing (Gupta & George, 2014). Several face recognition algorithms were based on feature-based methods. To detect a set of geometrical features such as eyes, eyebrows, nose, and mouth of a face; areas, distances, and angles between the feature points are used as descriptors for face recognition. But there are no certain fix points that gives the best performance. Therefore, important features are not generally known, and how to extract the features depends on the algorithm. The performance of face recognition based on geometrical features depends on the accuracy of the feature location algorithm. Subsequently, Appearance-based methods have received much interest in face recognition since 1990s. The algorithms under these methods find the closest pattern by projecting an image into subspace. The two widely used approaches for dimensionality reduction and feature extraction were PCA and LDA (Xu et al., 2011). A lots of leading commercial face recognition products employ face recognition methods based on PCA or Karhunen–Loeve (KL) expansion techniques, such as eigenface and LFA. representation and face recognition for unsupervised dimensionality reduction; this approach outperforms KL when data distribution is far from a multidimensional Gaussian. LDA is the most dominant algorithm for feature selection in Appearance based methods, it yields an effective representation that linearly transforms the original data space into a low-dimensional feature space where the data is well separated (Mohammed & Gupta, 2013). The selected dataset should have larger samples per class for good extraction of discriminating features. And trying to find non-linear correlation models among different biometric features can improve the system and recommended that, the intra-class correlation for different feature set should be further explored on extended biometric data sets. The Fisher Linear Discriminant Analysis aims to find a set of most discriminative linear projections by maximizing the ratio of the determinant of the between-class scatter matrix to that of the within-class scatter matrix (Kan et al., 2011). Several approaches on various kernels have been proposed in Perronnin et al. (2010) that leads to additional improvements at a very affordable cost. PCA uses eigenvectors to determine basis vectors that retain maximum variance of an image. Face recognition based on ICA was introduced; it is based on a generalization that is sensitive to higher-order statistics not second-order relationships. Because, most of the important information may be contained in the high-order relationship. The ICA gives a set of basis vectors that acquire maximum statistical independence. However, face recognition techniques based on neural networks (Agarwal et al., 2010) and elastic graph matching (Bhat & Wani, 2015) reported successful results. SVMs have been applied to face recognition in Meyer and Wien (2014) and gender classification in Hu et al. (2014). It finds the optimal separating hyper plane that maximizes the margin of separation, in order to minimize the risk of misclassification not only for the training samples, but also for the unseen data in the test set. Based on a combination of geometrical feature matching and template matching, the LEM approach extracts lines from a face edge map as features. A nearest feature line classifier was proposed to meet up the variations of illumination, pose, and expression for a face class. This is achieved through finding a candidate person having minimum distance between the query image feature point and any feature lines connecting two prototype-feature points. The Eigen face method has been a successful holistic approach to face recognition using the Karhunen-Loeve Transform (KLT). The KLT produces an expansion of an input image in terms of a set of basis images or Eigen-images (Siddiqi et al., 2014). energy compaction, but offers only moderate compression since its basis functions are source dependent. Hence, it requires the transform itself to be coded (Biswas et al., 2010). A technique for prediction-error block coding using the KLT was proposed. This method does not require coding of the KLT bases. But the basic functions can be derived at the decoder in a manner similar to the encoder. Moreover, holistic approaches for face recognition depend largely on such techniques that lower the dimension of images. Many of these techniques need to be combined for illumination invariant or redundancy elimination. In this study, some of the face recognition techniques are improved to form hybrid system.

CONCLUSION

Feature extraction determines an appropriate subspace from the given image. The means of improving feature selection is certainly through dimension reduction or features compression. The reasons why number of features are kept as small as possible is that, it reduces classification error and avoids measurement cost. Since, the reduced features provide better decorrelation than the original features. It is concluded that few number of features are usually required and selected for an optimized compression. Hence, recognition without dimension reduction or feature selection is not encouraging for higher systems. As a result, many hybrid systems do enhance the dimension reduction efficiently.

REFERENCES

1. Agarwal, M., Jain, N., Kumar, M. M., & Agrawal, H. (2010). Face Recognition using Eigen faces and Artificial Neural Network. International Journal of Computer Theory and Engineering, Vol. 2, No. 4, pp. 1793-8201. 2. Ahmed, N., Natarajan, T., Rao, K. (2014). Discrete Cosine Transform, IEEE Trans. on Computers, Vol. 23, No. 1, pp. 90–93. 3. Bhat, F., & Wani, M. A. (2015). Elastic Bunch Graph Matching Based Face Recognition Under Varying Lighting, Pose, and Expression Conditions. International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE), Vol. 1, No. 8, pp.51-59. 4. Biswas, M., Pickering, M. R., & Frater, M. R. (2010). Improved H. 264-Based Video Coding using an Adaptive Transform. In Image Processing (ICIP), 2010 17th IEEE International Conference, pp. 165-168.

Technology (TIST), Vol. 2, No. 3, pp. 1 -27. 6. Chelali, F. Z., & Djeradi, A. (2015). Face Recognition Using MLP and RBF Neural Network with Gabor and Discrete Wavelet Transform Characterization: A Comparative Study, Mathematical Problems in Engineering, Hindawi Publishing Corporation. pp. 1-16. 7. Demirel, H., Ozcinar, C., & Anbarjafari, G. (2010). Satellite Image Contrast Enhancement using Discrete Wavelet Transform and Singular Value Decomposition. Geoscience and Remote Sensing Letters, IEEE, Vol. 7, No. 2, PP. 333-337. 8. Déniz, O., Bueno, G., Salido, J., & De la Torre, F. (2011). Face Recognition using Histograms of Oriented Gradients. Pattern Recognition Letters, Vol. 32, No. 12, pp. 1598-1603. 9. Duda, R. O. & P. E. Hart. (2012). Use of the Hough Transformation to Detect Lines and Curves in Pictures, Comm. ACM, Vol. 15, pp. 11–15. 10. El-Dahshan, E. S. A., Hosny, T., & Salem, A. B. M. (2010). Hybrid Intelligent Techniques for MRI Brain Images Classification. Digital Signal Processing, Vol. 20, No. 2, PP. 433-441. 11. Gupta, P. & George, N. V. (2014). An Improved Face Recognition Scheme using Transform Domain Features, International Conference on signal processing and integrated networks. 12. Hemalatha, G., & Sumathi, C. P. (2014). A Study of Techniques for Facial Detection and Expression Classification. International Journal of Computer Science & Engineering Survey (IJCSES) Vol. 5, No.2. pp. 27-37, DOI: 10.5121/ijcses.2014.5203. 13. Hsu, D., Kakade, S. M., & Zhang, T. (2012). A Spectral Algorithm for Learning Hidden Markov Models. Journal of Computer and System Sciences, Vol. 78, No. 5, pp. 1460-1480. 14. Hu, M., Zheng, Y., Ren, F., & Jiang, H. (2014). Age Estimation and Gender Classification of Facial Images Based on Local Directional Pattern. In Cloud Computing and Intelligence Systems (CCIS), 2014 IEEE 3rd International Conference, pp. 103-107.

Corresponding Author Baljeet Kaur*

Research Scholar, OPJS University, Churu, Rajasthan