Mining web usage data of e-business organizations is essential to provide knowledge about clients... more Mining web usage data of e-business organizations is essential to provide knowledge about clients' web utilization patterns, which can help these businesses in landing at vital business choices. Because of non-deterministic web access behavior of web clients, web user session data is usually noisy and imperfect. Such imperfection has a negative impact on pattern discovery process. One of the real issues associated with the prevalently used Fuzzy c-Means (FCM) and Fuzzy c-Medoids (FCMdd) methods is that they are not robust against the noise, because a single outlier object could lead to a very different clustering result. In this research we propose a robust Fuzzy c-Least Medians (FCLMdn) clustering framework to deal with the user session data contaminated with noise and outlier user session objects, with the objective of improving the quality of the extracted patterns. To deal with the high dimensionality of user session data which may contain noise and outliers, a fuzzy set theoretic approach for assigning fuzzy weights to user sessions and associated URLs has been proposed. Our results clearly indicate that quality of user session clusters formed using FCLMdn algorithm is much better than those using FCM and FCMdd algorithms in terms of various cluster validity indices.
A Novel Data Mining Approach for Multi Variant Text Classification
Text classification, which aims to assign a document to one or more categories based on its conte... more Text classification, which aims to assign a document to one or more categories based on its content, is a fundamental task for Web and/or document data mining applications. In natural language processing and information extraction fields Text classification is emerging as an important part, were we can use this approach to discover useful information from large database. These approaches allow individuals to construct classifiers that have relevance for a variety of domains. Existing algorithms such as Svm Light have less GUI support and take more time to perform classification task. In this presented work classification of multi-domain documents is performed by using weka-LibSVM classifier. Here to transform collected training set and test set documents into term-document matrix (TDM), the vector space model is used. In classifier TDM is used to generate predicted results. The results emerged from weka with its GUI support using TDM have quick response time in classifying the documents.
The heat transfer analysis coupled with fluid flow is important in many real-world application ar... more The heat transfer analysis coupled with fluid flow is important in many real-world application areas varying from micro-channels to spacecraft's. Numerical prediction of thermal and fluid flow situation has become very common method using any computational fluid dynamics software or by developing in-house codes. One of the major issues pertinent to numerical analysis lies with immense computational time required for repeated analysis. In this article, technique applied for parallelization of in-house developed generic code using CUDA and OpenMP paradigm is discussed. The parallelized finite-volume method (FVM)-based code for analysis of various problems is analyzed for different boundary conditions. Two GPUs (graphical processing units) are used for parallel execution. Out of four functions in the code (U, V , P, and T), only P function is parallelized using CUDA as it consumes 91% of computational time and the rest functions are parallelized using OpenMP. Parallel performance analysis is carried out for 400, 625, and 900 threads launched from host for parallel execution. Improvement in speedup using CUDA compared with speedup using complete OpenMP parallelization on different computing machines is also provided. Parallel efficiency of the FVM code for different grid size, Reynolds number, internal flow, and external flow is also carried out. It is found that the GPU provides immense speedup and outperforms OpenMP largely. Parallel execution on GPU gives results in a quite acceptable amount of time. The parallel efficiency is found to be close to 90% in internal flow and 10% for external flow.
Arabian journal for science and engineering, May 28, 2020
Conjugate heat transfer and fluid flow is a common phenomenon occurring in parallel plate channel... more Conjugate heat transfer and fluid flow is a common phenomenon occurring in parallel plate channels. Finite volume method (FVM) formulation-based semi-implicit pressure linked equations algorithm is a common technique to solve the Navier-Stokes equation for fluid flow simulation in such phenomena, which is computationally expensive. In this article, an indigenous FVM code is developed for numerical analysis of conjugate heat transfer and fluid flow, considering different problems. The computational time spent by the code is found to be around 90% of total execution time in solving the pressure (P) correction equation. The remaining time is spent on U, V velocity, and temperature (T) functions, which use tri-diagonal matrix algorithm. To carry out the numerical analysis faster, the developed FVM code is parallelized using OpenMP paradigm. All the functions of the code (U, V, T, and P) are parallelized using OpenMP, and the parallel performance is analyzed for different fluid flow, grid size, and boundary conditions. Using nested and without nested OpenMP parallelization, analysis is done on different computing machines having different configurations. From the complete analysis, it is observed that flow Reynolds number (Re) has a significant impact on the sequential execution time of the FVM code but has a negligible role in effecting speedup and parallel efficiency. OpenMP parallelization of the FVM code provides a maximum speedup of up to 1.5 for considered conditions.
Archives of Computational Methods in Engineering, Jan 13, 2016
Computational fluid dynamics (CFD) is one of the most emerging fields of fluid mechanics used to ... more Computational fluid dynamics (CFD) is one of the most emerging fields of fluid mechanics used to analyze fluid flow situation. This analysis is based on simulations carried out on computing machines. For complex configurations, the grid points are so large that the computational time required to obtain the results are very high. Parallel computing is adopted to reduce the computational time of CFD by utilizing the available resource of computing. Parallel computing tools like OpenMP, MPI, CUDA, combination of these and few others are used to achieve parallelization of CFD software. This article provides a comprehensive state of the art review of important CFD areas and parallelization strategies for the related software. Issues related to the computational time complexities and parallelization of CFD software are highlighted. Benefits and issues of using various parallel computing tools for parallelization of CFD software are briefed. Open areas of CFD where parallelization is not much attempted are identified and parallel computing tools which can be useful for parallelization of CFD software are spotlighted. Few suggestions for future work in parallel computing of CFD software are also provided.
Literature Survey for the Comparative Study of Various High Performance Computing Techniques
International Journal of Computer Trends and Technology, Sep 25, 2015
The advent of high performance computing (HPC) and graphics processing units (GPU), present an en... more The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for large data transactions (big data) that require parallel processing for robust and prompt data analysis. In this paper, we take an overview of four parallel programming models, OpenMP, CUDA, MapReduce, and MPI. The goal is to explore literature on the subject and provide a high level view of the features presented in the programming models to assist high performance users with a concise understanding of parallel programming concepts.
Parallelization of Computational Fluid Dynamics Software Codes
Computational fluid dynamics (CFD) is one of the most emerging fields of fluid mechanics used to ... more Computational fluid dynamics (CFD) is one of the most emerging fields of fluid mechanics used to analyze fluid flow situation. This analysis is based on simulations carried out on computing machines. For complex configurations, the grid points are so large that the computational time required to obtain the results are very high. Parallel computing is adopted to reduce the computational time of CFD by utilizing the available resource of computing. Parallel computing tools like OpenMP, MPI, CUDA, combination of these and few others are used to achieve parallelization of CFD software. This book provides a comprehensive state of the art review of important CFD areas and parallelization strategies for the related software. Issues related to computational time complexities and parallelization of CFD software are highlighted. Benefits and issues of using parallel computing tools for parallelization of CFD software are briefed. Open areas of CFD where parallelization is not much attempted are identified and parallel computing tools which can be useful for parallelization of CFD software are spotlighted. Suggestions for future work in parallel computing of CFD software are also provided.
Experimental Exploration of Support Vector Machine for Cancer Cell Classification
text classification is the task of automatically categorizing collections of electronic textual d... more text classification is the task of automatically categorizing collections of electronic textual documents into their predefined classes, based on their contents. Due to the increase in the amount of text data in these recent years, document classification has emerged in the form of text classification systems. They have been widely implemented in a large number of applications such as spam filtering, emails, knowledge repositories and ontology mapping. The main essence is to propose a text classification technique based on the feature selection and reduction of the feature vector dimensionality and increase the classification accuracy using pre-processing. This paper gives the detailed study on how support vector machine (SVM) can be used to classify uncertain data. SVM is a powerful and supervised learning sample based on the lowest structural risk principle. During training, this algorithm creates a hyperplane for separating positive and negative samples. The type of kernel used for SVM classifier will be having a major impact on classification results. In this paper Breast Cancer Wisconsin (Diagnostic) Data Sets are used in order to classify using four types of SVM kernel methods such as linear, polynomial, sigmoid and radial. Classification results obtained reveal that radial kernel method is best-suited data sets. In order to measure the suitability of kernel method, various factors are compared from classification results such as accuracy, kappa value, sensitivity, specificity precision etc.
Web usage mining is the application of data mining techniques to discover usage patterns from Web... more Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web based applications. To analyze the pattern from large transactional database there are many algorithms. One of the algorithms which is very simple to use and easy to implement is the Apriori algorithm. But this apriori algorithm is time consuming algorithm during its candidate item- set generation. IP-Apriori i.e Improved Pruning in Apriori is the improved variation of Apriori algorithm which improves the pruning step of the existing apriori algorithm. This algorithm uses average support instead of minimum support in the pruning step, to generate the probabilistic item set instead of large item-set. This analysis work is on IP- Apriori algorithm on different datasets. Based on the comparison of frequent item sets generated and time consumed, its shown that IP-Apriori algorithm is better than the Apriori Algorithm.
Performance Analysis of Self-Organizing Neural Network- Based Clustering
Data mining and knowledge discovery in databases have been attracting a significant amount of res... more Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention. Data mining is the method of analyzing the large amounts of data stored in data warehouses. We can perform data analysis, classification, clustering etc. of huge data by using different algorithms. It is important to evaluate the performance of various clustering techniques because the application of different clustering techniques generally results in different sets of cluster formation. The performance can be evaluated in terms of accuracy and validity of the clusters, and also the time required to generate them, using appropriate performance measures. In this paper, we have analysed the performance of Self-Organizing neural network based clustering and k-Means clustering using Matrix Laboratory tool, MATLAB. These techniques are tested against the various datasets. Finally, their performance results are compared and presented. Keywords— Clustering...
Literature Survey for the Comparative Study of Various High Performance Computing Techniques
The advent of high performance computing (HPC) and graphics processing units (GPU), present an en... more The advent of high performance computing (HPC) and graphics processing units (GPU), present an enormous computation resource for large data transactions (big data) that require parallel processing for robust and prompt data analysis. In this paper, we take an overview of four parallel programming models, OpenMP, CUDA, MapReduce, and MPI. The goal is to explore literature on the subject and provide a high level view of the features presented in the programming models to assist high performance users with a concise understanding of parallel programming concepts.
Empirical Analysis of K-means, Fuzzy C-means and Particle Swarm Optimization for Data Clustering
Clustering is a fundamental task in data mining technique which puts more similar data objects in... more Clustering is a fundamental task in data mining technique which puts more similar data objects into one group and dissimilar objects into another group. The aim of this paper is to compare the quality of clusters produced by K-Means, Particle swarm optimization (PSO) and Fuzzy C-Means (FCM) for data clustering. The k-means algorithm is the most widely used partitional clustering algorithm technique in the industries and academia. The algorithm is simple and easy to implement. The main drawback of the K-Means algorithm is that it is sensitive to the selection of the initial cluster centers and it may converge to local optima. Fuzzy C-means algorithm is a popular algorithm in the field of fuzzy clustering. Fuzzy clustering using FCM can provide a data partition that is both better and more meaningful than hard clustering approaches. Particle Swarm Optimization (PSO) is an evolutionary computational technique which was motivated by the organism’s behavior such as schooling of fish and ...
Clustering of COVID-19 data for knowledge discovery using c-means and fuzzy c-means
Results in Physics, 2021
In this work, the partitioning clustering of COVID-19 data using c-Means (cM) and Fuzy c-Means (F... more In this work, the partitioning clustering of COVID-19 data using c-Means (cM) and Fuzy c-Means (Fc-M) algorithms is carried out. Based on the data available from January 2020 with respect to location, i.e., longitude and latitude of the globe, the confirmed daily cases, recoveries, and deaths are clustered. In the analysis, the maximum cluster size is treated as a variable and is varied from 5 to 50 in both algorithms to find out an optimum number. The performance and validity indices of the clusters formed are analyzed to assess the quality of clusters. The validity indices to understand all the COVID-19 clusters' quality are analysed based on the Zahid SC (Separation Compaction) index, Xie-Beni Index, Fukuyama–Sugeno Index, Validity function, PC (performance coefficient), and CE (entropy) indexes. The analysis results pointed out that five clusters were identified as a major centroid where the pandemic looks concentrated. Additionally, the observations revealed that mainly the pandemic is distributed easily at any global location, and there are several centroids of COVID-19, which primarily act as epicentres. However, the three main COVID-19 clusters identified are 1) cases with value <50,000, 2) cases with a value between 0.1 million to 2 million, and 3) cases above 2 million. These centroids are located in the US, Brazil, and India, where the rest of the small clusters of the pandemic look oriented. Furthermore, the Fc-M technique seems to provide a much better cluster than the c-M algorithm.
Computational Fluid Dynamics in Turbomachinery: A Review of State of the Art
Archives of Computational Methods in Engineering, 2016
Computational fluid dynamics (CFD) plays an essential role to analyze fluid flows and heat transf... more Computational fluid dynamics (CFD) plays an essential role to analyze fluid flows and heat transfer situations by using numerical methods. Turbomachines involve internal and external fluid flow problems in compressors and turbines. CFD at present is one of the most important tools to design and analyze all types of turbomachinery. The main purpose of this paper is to review the state of the art work carried out in the field of turbomachinery using CFD. Literature review of research work pertaining to CFD analysis in turbines, compressors and centrifugal pumps are described. Various issues of CFD codes used in turbomachinery and its parallelization strategy adopted are highlighted. Furthermore, the prevailing merits and demerits of CFD in turbomachinery are provided. Open areas pertinent to CFD investigation in turbomachinery and CFD code parallelization are also described.
Archives of Computational Methods in Engineering, 2016
Computational fluid dynamics (CFD) is one of the most emerging fields of fluid mechanics used to ... more Computational fluid dynamics (CFD) is one of the most emerging fields of fluid mechanics used to analyze fluid flow situation. This analysis is based on simulations carried out on computing machines. For complex configurations, the grid points are so large that the computational time required to obtain the results are very high. Parallel computing is adopted to reduce the computational time of CFD by utilizing the available resource of computing. Parallel computing tools like OpenMP, MPI, CUDA, combination of these and few others are used to achieve parallelization of CFD software. This article provides a comprehensive state of the art review of important CFD areas and parallelization strategies for the related software. Issues related to the computational time complexities and parallelization of CFD software are highlighted. Benefits and issues of using various parallel computing tools for parallelization of CFD software are briefed. Open areas of CFD where parallelization is not much attempted are identified and parallel computing tools which can be useful for parallelization of CFD software are spotlighted. Few suggestions for future work in parallel computing of CFD software are also provided.
Uploads
Papers by Zahid A Ansari