Table 2 RECOGNITION ACCURACY COMPARISON ON CK+ DATASET wearing different artifacts etc. We also evaluate performance of the proposed network for 7-class expression, where datasets included neutral expressions too. Furthermore, to extract the facial region, we have utilized the viola jones [30] face detection algorithm instead of manual face cropping like existing methods [11, 12]. This procedure creates a real-life scenario and enhances the adaptability of the model to deal with real-life conditions as images don’t have static position and background. Moreover, to validate the outcomes, experimental setup included N-fold cross- validation scheme. In N- Fold cross-validation, image sets are partitioned into N equal sized folds, from which N-1 folds are included as training and remaining one is used as a test dataset. Further, to prepare datasets, we have extracted most expressive image frames and then divided them to 80:20 ratio as 80% for training and remaining for testing image set. To train the network, training set is again divided into 70:30 ratio, from which 70% images are used to train the network and 30% to validate the accuracy outcomes. The recognition rate for the network is calculated by using Eq 10.