Papers by Yousef Masmoudi

Procedia Computer Science, 2016
Many methods have been proposed to measure the similarity between time series data sets, each wit... more Many methods have been proposed to measure the similarity between time series data sets, each with advantages and weaknesses. It is to choose the most appropriate similarity measure depending on the intended application domain and data considered. The performance of machine learning algorithms depends on the metric used to compare two objects. For time series, Dynamic Time Warping (DTW) is the most appropriate distance measure used. Many variants of DTW intended to accelerate the calculation of this distance are proposed. The distance learning is a subject already well studied. Indeed Data Mining tools, such as the algorithm of k-Means clustering, and K-Nearest Neighbor classification, require the use of a similarity/distance measure. This measure must be adapted to the application domain. For this reason, it is important to have and develop effective methods of computation and algorithms that can be applied to a large data set integrating the constraints of the specific field of study. In this paper a new hybrid approach to learn a global constraint of DTW distance is proposed. This approach is based on Large Margin Nearest Neighbors classification and Tabu Search algorithm. Experiments show the effectiveness of this approach to improve time series classification results.

Tabu search for dynamic time warping global constraint learning
Measuring similarity or distance between two data points is fundamental to many Machine Learning ... more Measuring similarity or distance between two data points is fundamental to many Machine Learning algorithms such as K-Nearest-Neighbor, Clustering etc. Depending on the nature of the data point, various measurements can be used. DTW is largely used for mining time series but it is not adopted to large data sets because of its quadratic complexity. Global constraints narrow the search path in the matrix which results in a significant decrease in the number of performed calculations. The distance between examples from the same class is small. Instances from different classes are with large distances. A field called metric learning is introduced to make such criteria. In some time series classification tasks, it is a common case that two time series are out of phase, even they share the same class label. An appropriate constraint of DTW can strongly improve the classification performance. It is to choose the appropriate size of the global constraint. A Tabu search algorithm is used to find the optimal size of the global constraint. Results show the efficiency of the proposed method in terms of the improvement of the classification results and the CPU time.

Accurate and fast Dynamic Time Warping approximation using upper bounds
The purpose of Dynamic Time Warping (DTW) is to determine the shortest warp path, corresponding t... more The purpose of Dynamic Time Warping (DTW) is to determine the shortest warp path, corresponding to the optimal alignment between two sequences. It is one of the most used methods for time series distance measure. DTW was introduced to the community as a Data Mining utility for various tasks for time series problems such as classification and clustering. Many variants of DTW aim to accelerate the calculation of this distance and others order to overcome the weakness points of DTW as the singularity problem. We propose a new approach called Dynamic Warping Window (DWW) for speeding up DTW algorithm based on upper bound. It gives an accurate approximation of DTW in linear time. It accelerates the calculation of DTW distance. Results show that the new approach provides a good compromise between accuracy of DTW approximation and the non-degradation of KNN classification results.
International Journal of Data Mining, Modelling and Management, 2016
Dynamic time warping (DTW) consists at finding the best alignment between two time series. It was... more Dynamic time warping (DTW) consists at finding the best alignment between two time series. It was introduced into pattern recognition and data mining, including many tasks for time series such as clustering and classification. DTW has a quadratic time complexity. Several methods have been proposed to speed up its computation. In this paper, we propose a new variant of DTW called dynamic warping window (DWW). It gives a good approximation of DTW in a competitive CPU time. The accuracy of DWW was evaluated to prove its efficiency. Then the KNN classification was applied for several distance measures (dynamic time warping, derivative dynamic time warping, fast dynamic time warping and DWW). Results show that DWW gives a good compromise between computational speed and accuracy of KNN classification.
Optimal feature extraction and ulcer classification from WCE image data using deep learning
Soft Computing, Mar 15, 2022

Hadoop is framework that is processing data with large volume that cannot be processed by convent... more Hadoop is framework that is processing data with large volume that cannot be processed by conventional systems. Hadoop has management le system called Hadoop Distributed File System (HDFS) that has NameNode and DataNode where the data is divided into blocks based on the total size of dataset. In addition, Hadoop has MapReduce where the dataset is processed in Mapping phase and then reducing phase. Using Hadoop for big data analysis has been revealed important information that can be used for analytical purpose and enabling new products. Big data could be found in many different resources such as social networks, web server logs, broadcast audio streams and banking transactions. In this paper, we illustrated the main steps to setup Hadoop and MapReduce. The illustrated version in this work is the latest released of Hadoop 3.1.1 for big data analysis. A simpli ed pseudo code is provided to show the functionality of Map class and reduce class. The developed steps are applied with a given example that could be generalized with bigger data.
Advances in intelligent systems and computing, 2017
Forecasting is an important data analysis technique that aims to study historical data in order t... more Forecasting is an important data analysis technique that aims to study historical data in order to explore and predict its future values. In fact, to forecast, different methods have been tested and applied from regression to neural network models. In this research, we proposed Elman Recurrent Neural Network (ERNN) to forecast the M ackey-Glass time series elements. Experimental results show that our scheme outperforms other state-of-art studies.
Advances in Intelligent Systems and Computing, 2017
Forecasting is an important data analysis technique that aims to study historical data in order t... more Forecasting is an important data analysis technique that aims to study historical data in order to explore and predict its future values. In fact, to forecast, different methods have been tested and applied from regression to neural network models. In this research, we proposed Elman Recurrent Neural Network (ERNN) to forecast the M ackey-Glass time series elements. Experimental results show that our scheme outperforms other state-of-art studies.

Hadoop Distributed File System for Big data analysis
2019 4th World Conference on Complex Systems (WCCS), 2019
Hadoop is framework that is processing data with large volume that cannot be processed by convent... more Hadoop is framework that is processing data with large volume that cannot be processed by conventional systems. Hadoop has management file system called Hadoop Distributed File System (HDFS) that has NameNode and DataNode where the data is divided into blocks based on the total size of dataset. In addition, Hadoop has MapReduce where the dataset is processed in Mapping phase and then reducing phase. Using Hadoop for big data analysis has been revealed important information that can be used for analytical purpose and enabling new products. Big data could be found in many different resources such as social networks, web server logs, broadcast audio streams and banking transactions. In this paper, we illustrated the main steps to setup Hadoop and MapReduce. The illustrated version in this work is the latest released of Hadoop 3.1.1 for big data analysis. A simplified pseudo code is provided to show the functionality of Map class and reduce class. The developed steps are applied with a given example that could be generalized with bigger data.
Forecasting is an important data analysis technique that aims to study historical data in order t... more Forecasting is an important data analysis technique that aims to study historical data in order to explore and predict its future values. In fact, to forecast, different methods have been tested and applied from regression to neural network models. In this research, we proposed Elman Recurrent Neural Network (ERNN) to forecast the Mackey-Glass time series elements. Experimental results show that our scheme outperforms other state-of-art studies.
Optimal feature extraction and ulcer classification from WCE image data using deep learning
Soft Computing, 2022
A balanced approach for hazardous waste allocation problem
2013 5th International Conference on Modeling Simulation and Applied Optimization, 2013
ABSTRACT The hazardous waste management is composed of three components: the allocation of the di... more ABSTRACT The hazardous waste management is composed of three components: the allocation of the different hazardous waste generators, the hazardous waste routing and the hazardous waste location problems. In this paper, we focus on the allocation problem. It is very important to perform this task since it affects the location and the routing problems. Minimizing the risk and maximizing the equity distribution of risk are the major objectives of the hazardous waste management. Hence, the use of clustering approach to classify the hazardous waste generators needs to be balanced. In this regard, a balancing approach is proposed to ensure the equity among the hazardous waste clusters.
Upper bounds for time and accuracy improvement of dynamic time warping approximation
International Journal of Data Mining, Modelling and Management
Upper bounds for time and accuracy improvement of dynamic time warping approximation
International Journal of Data Mining, Modelling and Management
DTW-Global Constraint Learning Using Tabu Search Algorithm
Procedia Computer Science, 2016
Accurate and fast Dynamic Time Warping approximation using upper bounds
2015 38th International Conference on Telecommunications and Signal Processing (TSP), 2015
A balanced approach for hazardous waste allocation problem
2013 5th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), 2013
ABSTRACT The hazardous waste management is composed of three components: the allocation of the di... more ABSTRACT The hazardous waste management is composed of three components: the allocation of the different hazardous waste generators, the hazardous waste routing and the hazardous waste location problems. In this paper, we focus on the allocation problem. It is very important to perform this task since it affects the location and the routing problems. Minimizing the risk and maximizing the equity distribution of risk are the major objectives of the hazardous waste management. Hence, the use of clustering approach to classify the hazardous waste generators needs to be balanced. In this regard, a balancing approach is proposed to ensure the equity among the hazardous waste clusters.

A binarization strategy for modelling mixed data in multigroup classification
ABSTRACT This paper presents a binarization pre-processing strategy for mixed datasets. We propos... more ABSTRACT This paper presents a binarization pre-processing strategy for mixed datasets. We propose that the use of binary attributes for representing nominal and integer data is beneficial for classification accuracy. We also describe a procedure to convert integer and nominal data into binary attributes. Expectation- Maximization (EM) clustering algorithms was applied to classify the values of the attributes with a wide range to use a small number of binary attributes. Once the data set is pre-processed, we use the Support Vector Machine (LibSVM) for classification. The proposed method was tested on datasets from the literature. We demonstrate the improved accuracy and efficiency of presented binarization strategy for modelling mixed and complex data in comparison to the classification of the original dataset, nominal dataset and binary dataset.
Industrial application of a new classification procedure based on mathematical programming
2009 International Conference on Computers & Industrial Engineering, 2009
The aim of this paper is to present an industrial application of a new procedure for classificati... more The aim of this paper is to present an industrial application of a new procedure for classification. The problem is solved by minimizing the distance between the components and the centers of the clusters. It is therefore critical to determine the best centers of the ...
Tabu search for dynamic time warping global constraint learning
2014 6th International Conference of Soft Computing and Pattern Recognition (SoCPaR), 2014
Uploads
Papers by Yousef Masmoudi