Hadoop Framework Research Papers

Attentron: Weighting Perceptron Neurons with Sigmoid Attention

2025

Transformers owe their success to both using Feedforward Neural Networks and Scaled Dot Product Attention in one model to both represent and filter unwanted information. Most attention-based Deep Learning multimodal fusion models use... more

descriptionView Paper arrow_downwardDownload

Evaluation of Different Deep-Learning Models for the Prediction of a Ship’s Propulsion Power

by Spilios D Fassois

2025, Journal of Marine Science and Engineering

Adverse conditions within specific offshore environments magnify the challenges faced by a vessel’s energy-efficiency optimization in the Industry 4.0 era. As the data rate and volume increase, the analysis of big data using analytical... more

descriptionView Paper arrow_downwardDownload

IACIS Paper

by Srikanth Kadainti

2024

Sentiment Analysis plays a vital role in Natural Language Processing (NLP) which aims to discern opinions and emotions expressed in text. However, the data sparsity and disambiguation of natural languages make it challenging for the... more

Where, TP - True Positive; TN — True Negative; FP - False positive and FN — False Negative. The effectiveness of the proposed MAS-BiGRU approach is implemented on Python 3.11 with the system configuration of windows 10 OS, 16GB RAM and Intel i5 processor. The different performance metrices like accuracy, precision, recall and Fl-score are used to estimate the effectiveness of the proposed MAS-BiGRU approach. The mathematical expressions of these metrices are formulated in equations (12) to (15).

E PERFORMANCE ANALYSIS OF MSA WITH DIFFERENT CLASSIFICATION RESULTS

MSA designs richer feature representations which capture various semantic and syntactic aspects of the input data, leading to enhanced classification performance. The proposed MSA-BiGRU approach achieves a commendable accuracy of 98.08%, 98.91% and 97.13% on IMDB, sentiment140 and World Cup Socccer datasets, respectively. Table 2 represents the performance analysis of BiGRU with different attention mechanisms based on the three datasets. The different attention mechanisms like Global Attention (GA), Hard Attention (HA) and Self Attention (SA) are compared and estimated with the proposed MSA-BiGRU approach. Through performing multiple attention heads, the

LE II. PERFORMANCE ANALYSIS OF BIGRU WITH DIFFERENT ATTENTION MECHANISMS Table 3 demonstrates the performance analysis of different activation functions with MSA-BiGRU approach. Various activation functions like Exponential Linear Unit (ELU), Rectified ELU (ReLU) and Leaky ReLU (LReLu). In comparison to other activation functions, the SELU attains superior outcomes because of the self-normalizing effect of SELU that effectively minimizes the requirement for obvious batch normalization layers. Through maintaining stable activations and gradients throughout the network, SELU contributes to smoother and more stable training. This leads to the reduction of the likelihood of vanishing or exploding gradients. This simplification designs MSA-BiGRU, which is simple for designing and training. The proposed MSA- BiGRU approach with SELU activation function accomplishes a commendable accuracy of 98.08%, 98.91% and 97.13% correspondingly on IMDB, sentiment140 and World Cup Soccer datasets.

\BLE III. PERFORMANCE ANALYSIS OF DIFFERENT ACTIVATION FUNCTION WITH MSA-BIGRU GRU [18] and GARN [19]. Table 4 displays the comparison results of the MSA-BiGRU method. When compared to these existing methods, the proposed MSA-BiGRU accomplishes exceptional results on the overall performance metrices through capturing the long-term dependencies and relationships among the texts.

TABLE IV. COMPARISON RESULTS OF THE PROPOSED METHOD This section describes the comparison of the proposed MSA-BiGRU approach with the existing methods based on the IMDB and Sentiment140 datasets. The effectiveness of the MSA-BiGRU method is estimated with the existing methods of MPNet-GRUs [16], ASASM-HHODL [17], RoBERTa-

descriptionView Paper arrow_downwardDownload

Predictive analytics on COVID-19 data using Hive based on Hadoop cluster

by Indonesian Journal of Electrical Engineering and Computer Science

2024, Indonesian Journal of Electrical Engineering and Computer Science

COVID-19 pandemic has received a serious attention from academia, industry and governments to stop the huge number of deaths and economic disruptions around the world. Many techniques have been used to control the spread of the pandemic... more

descriptionView Paper arrow_downwardDownload

Real-Time Data Processing: An Analysis of PySpark's Capabilities

by SHANMUKHA EETI

2024

Real-time data processing has become increasingly important in today's data-driven world, where organizations need to quickly analyze and respond to incoming data to maintain a competitive edge. PySpark, an open-source distributed... more

descriptionView Paper arrow_downwardDownload

In-Memory Processing of Big Data to Improve Processing Rate

by ocean love

2024, International Journal of Research and Analytical Reviews

The amount of data generated due to the development of IT technology is increasing exponentially every year. As an alternative to this, research on distributed systems and in-memory-based big data processing techniques is being actively... more

descriptionView Paper arrow_downwardDownload

Big Data: Concepts, Technologies and Applications in the Public Sector

by Eleonora Tudora

2024, World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data... more

descriptionView Paper arrow_downwardDownload

A study on Analyzing and Forecasting Crude oil prices and BSE Oil & Gas Index using Forecasting Models

by Dr. Amish Soni

2024, IJRAR

The purpose of this study is to study and forecast the crude oil prices in India using Autoregressive Integrated Moving Average (ARIMA) model of time series analysis. The report tables, charts, and ARIMA model are used to forecast the... more

BIC value is lower from hence it predicts that this

For each model, forecasts start after the last non-missing in the range of the requested estimation period Forecast

Regression analysis between BSE Index of oil & gas industries & Crude oil prices Fall of prices this is due to Corona emerging situation along with other external factors.

lata is non — Stationary. Hence to make it Stationary Arima Model Is required. 4.1.4 Identification Graph- ACF

4.1.10 Stationary data: When spikes of both ACF &PACF is within confidence level is known as stationary data. So, at first for

& chart where it discloses model name, which series is used in model on which ARIMA model is focused The first stage in the ARIMA Model is to determine if the data is stationary or non-stationary. We may use

Interpretation: After sequence plotting at first for identification, we have to check whether given data

Interpretation: By taking difference 1 model. we can conclude that NOT all lags respectively for Residua ACF & Residual PACF falls in 95% confidence level. Between UCL & LCL but at level 1, minimum lag: touch the UCL /LCL, so we can conclude our model is fit to run with appropriate forecast values. fa nee oe oe, +o

b= slope of regression line (which is rate of change for y as x changes So, in our case we can say there is strong positive relationship between index &crude oil prices i.e — 0.94

5. Findings Percentile value discloses probability output which is shown with help of graph. residual helps us to predict how far actual forecast is there from estimated forecast. There is an increase in price of crude oil on basis of estimate value.

Number of Computable First Lags After Differencins

Series: Price Table 9 Partial Autocorrelations

The best model for data is the one with lowest BIC Value. ‘good’ BIC value. BIC value needs to be compared

Applying the model specifications from MOD_1 4.2 Forecast analysis of BSE Index Price

Case Processing Summary 4.2.4 BSE Index Price

descriptionView Paper arrow_downwardDownload

A MapReduce Opinion Mining for COVID-19-Related Tweets Classification Using Enhanced ID3 Decision Tree Classifier

by Bego garcia

2024, IEEE Access

Opinion Mining (OM) is a field of Natural Language Processing (NLP) that aims to capture human sentiment in the given text. With the ever-spreading of online purchasing websites, micro-blogging sites, and social media platforms, OM in... more

Opinion Mining (OM) is a field of Natural Language Processing (NLP) that aims to capture human sentiment in the given text. With the ever-spreading of online purchasing websites, micro-blogging sites, and social media platforms, OM in online social media platforms has picked the interest of thousands of scientific researchers. Because the reviews, tweets and blogs acquired from these social media networks, act as a significant source for enhancing the decision making process. The obtained textual data (reviews, tweets, or blogs) are classified into three different class labels which are negative, neutral and positive for analyzing and extracting relevant information from the given dataset. In this contribution, we introduce an innovative MapReduce improved weighted ID3 decision tree classification approach for OM, which consists mainly of three aspects: Firstly We have used several feature extractors to efficiently detect and capture the relevant data from the given tweets, including N-grams or character-level, Bag-Of-Words, word embedding (GloVe, Word2Vec), FastText, and TF-IDF. Secondly, we have applied a multiple feature selector to reduce the high feature's dimensionality, including Chi-square, Gain Ratio, Information Gain, and Gini Index. Finally, we have employed the obtained features to carry out the classification task using an improved ID3 decision tree classifier, which aims to calculate the weighted information gain instead of information gain used in traditional ID3. In other words, to measure the weighted information gain for the current conditioned feature, we follow two steps: First, we compute the weighted correlation function of the current conditioned feature. Second, we multiply the obtained weighted correlation function by the information gain of this current conditioned feature. This work is implemented in a distributed environment using the Hadoop framework, with its programming framework MapReduce and its distributed file system HDFS. Its primary goal is to enhance the performance of a well-known ID3 classifier in terms of accuracy, execution time, and ability to handle the massive datasets. We have carried out several experiences that aims to assess the effectiveness of our suggested classifier compared to some other contributions chosen from the literature. The experimental results demonstrated that our ID3 classifier works better on COVID-19_Sentiments dataset than other classifiers in terms of Recall (85.72 %), specificity (86.51 %), error rate (11.18 %), false-positive rate (13.49 %), execution time (15.95s), kappa statistic (87.69 %), F1-score (85.54 %), classification rate (88.82 %), false-negative rate (14.28 %), precision rate (86.67 %), convergence (it convergent towards the iteration 90), stability (it is more stable with mean deviation standard equal to 0.12 %), and complexity (it requires much lower time and space computational complexity).

descriptionView Paper arrow_downwardDownload

Sentence-Level Classification Using Parallel Fuzzy Deep Learning Classifier

by Bego garcia

2024, IEEE Access

At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook, and Twitter, users honestly communicate their opinions and ideas about events, services, and products. Owing to this rise in the number of social... more

At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook, and Twitter, users honestly communicate their opinions and ideas about events, services, and products. Owing to this rise in the number of social platforms and their extensive use by people, enormous amounts of data are produced hourly. However, sentiment analysis or opinion mining is considered as a useful tool that aims to extract the emotion and attitude from the user-posted data on social media platforms by using different computational methods to linguistic terms and various Natural Language Processing (NLP). Therefore, enhancing text sentiment classification accuracy has become feasible, and an interesting research area for many community researchers. In this study, a new Fuzzy Deep Learning Classifier (FDLC) is suggested for improving the performance of data-sentiment classification. Our proposed FDLC integrates Convolutional Neural Network (CNN) to build an effective automatic process for extracting the features from collected unstructured data and Feedforward Neural Network (FFNN) to compute both positive and negative sentimental scores. Then, we used the Mamdani Fuzzy System (MFS) as a fuzzy classifier to classify the outcomes of the two used deep (CNN+FFNN) learning models in three classes, which are: Neutral, Negative, and Positive. Also, to prevent the long execution time taking by our hybrid proposed FDLC, we have implemented our proposal under the Hadoop cluster. An experimental comparative study between our FDLC and some other suggestions from the literature is performed to demonstrate our offered classifier's effectiveness. The empirical result proved that our FDLC performs better than other classifiers in terms of true positive rate, true negative rate, false positive rate, false negative rate, error rate, precision, classification rate, kappa statistic, F1-score and time consumption, complexity, convergence, and stability. INDEX TERMS Deep learning, convolutional neural network (CNN), sentiment analysis, feedforward neural network (FFNN), fuzzy logic, Hadoop framework, MapReduce, Hadoop Distributed File System (HDFS). I. INTRODUCTION Social network platforms like Instagram, Twitter, Youtube, LinkedIn, and Facebook have been considered essential and indispensable in our daily activities. Day-today , billions of social media users disseminate billions of personal or professional posts [1]. For example, Marketers use social media to spread professional posts that endeavor to present, promote, advertise, and market their products, services, events, and brand names. On the other hand, the customers interact The associate editor coordinating the review of this manuscript and approving it for publication was Jing Bi. with the marketers' posts by express their feelings, opinions, ideas, attitudes about the presented products or services [2]. Further, the marketers gather the customer's feedback, study, and analyze it using the sentiment analysis tool. The main objective from they are doing these operations is to improve the quality of their products and services, enhance their offerings by adding other privileges, and improve their brand performance [1], [2]. Sentiment Analysis (SA) plays a significant role in Business Intelligence (BI). In BI, it uses to get responses for questions such as, 'Why is product sales so low?', 'Have customer's needs are fully satisfied by utilizing our products?',

descriptionView Paper arrow_downwardDownload

Big Data: Concepts, Technologies and Applications in the Public Sector

by Adriana Alexandru

2024, World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering

Big Data (BD) is associated with a new generation of technologies and architectures which can harness the value of extremely large volumes of very varied data through real time processing and analysis. It involves changes in (1) data... more

descriptionView Paper arrow_downwardDownload

Deep Learning Approaches for Twitter User Classification

by IJIREM JOURNAL

2023, International Journal of Innovative Research in Engineering and Management (IJIREM)

Twitter, a popular social media platform, has become a rich source of user-generated content. The classification of Twitter users based on their characteristics and behavior has gained significant attention. Deep learning techniques, with... more

descriptionView Paper arrow_downwardDownload

Machine Learning-Based Home Price Prediction

by Amarta Kundu

2023, ijrar.org

21 st Century population management is an excellent challenge for all of us. The current population of India in 2022 is 1,406,631,776, a 0.95% increase from 2021. It is obvious that people need more houses in this situation in order to... more

descriptionView Paper arrow_downwardDownload

Novel Approach for Spatiotemporal Weather Data Analysis

by K C GOUDA

2023, International Journal of Advanced Computer Science and Applications

Massive volumes of multidimensional array-based spatiotemporal data are generated by climate observations and model simulations. The growth in climate data leads to new opportunities for climate studies at multiple spatial and temporal... more

descriptionView Paper arrow_downwardDownload

A multimodal feature learning approach for sentiment analysis of social network multimedia

by Marco Bertini

2023, Multimedia Tools and Applications

In this paper we investigate the use of a multimodal feature learning approach, using neural network based models such as Skip-gram and Denoising Autoencoders, to address sentiment analysis of micro-blogging content, such as Twitter short... more

descriptionView Paper arrow_downwardDownload

Neural Memory Networks

by Andrew Nuttall

2023

CONTEXT is vital in formulating intelligent classifications and responses, especially under uncertainty. In a standard feed-forward neural network (FFNN), context comes in the form of information encoded in the input vector and trained in... more

descriptionView Paper arrow_downwardDownload

Rider Weed Deep Residual Network-Based Incremental Model for Text Classification Using Multidimensional Features and MapReduce Framework

by HEMN BARZAN ABDALLA

2022

The increasing demand for information and rapid growth of big data has dramatically increased textual data. The amount of different kinds of data has led to the overloading of information. For obtaining useful text information, the... more

descriptionView Paper arrow_downwardDownload

Rider weed deep residual network-based incremental model for text classification using multidimensional features and MapReduce

by Hemn Abdalla and

2022, PeerJ Computer Science

Increasing demands for information and the rapid growth of big data have dramatically increased the amount of textual data. In order to obtain useful text information, the classification of texts is considered an imperative task.... more

descriptionView Paper arrow_downwardDownload

Emotion processing by applying a fuzzy-based Vader lexicon and a parallel deep belief network over massive data

by fatima es-sabery

2022, IEEE Access

Emotion processing has been a very intense domain of investigation in data analysis and NLP during the previous few years. Currently, the algorithms of the deep neural networks have been applied for opinion mining tasks with good results.... more

Emotion processing has been a very intense domain of investigation in data analysis and NLP during the previous few years. Currently, the algorithms of the deep neural networks have been applied for opinion mining tasks with good results. Among various neuronal models applied for opinion mining a deep belief network (DBN) model has gained more attention. In this proposal, we have developed a combined classifier based on fuzzy Vader lexicon and a parallel deep belief network for emotion analysis. We have implemented multiple pretreatment techniques to improve the quality and soundness of the data and eliminate disturbing data. Afterward, we have performed a semi-automatic dataset labeling using a combination of two different methods: Mamdani's fuzzy system and Vader lexicon. As well, we have applied four feature extractors, which are: GloVe, TFIDF (Trigram), TFIDF (Bigram), TFIDF (Unigram) with the aim of transforming each incoming tweet into a digital value vector. In addition, we have integrated three feature selectors, namely: The ANOVA method, the chi-square approach and the mutual information technique with the objective of selecting the most relevant features. Further, we have implemented the DBN as classifier for classifying each inputted tweet into three categories: neutral, positive or negative. At the end, we have deployed our proposed approach in parallel way employing both Hadoop and Spark framework with the purpose of overcoming the problem of long runtime of massive data. Furthermore, we have carried out a comparison between our newly suggested hybrid approach and alternative hybrid models available in the literature. From the experimental findings, it was found that our suggested vague parallel approach is more powerful than the baseline patterns in terms of false negative rate (1.33%), recall (99.75%), runtime (32.95s), convergence, stability, F1 score (99.53%), accuracy (98.96%), error rate (1.04%), kappa-Static (99.1%), complexity, false positive rate (0.25%), precision rate (97.59%) and specificity rate (98.67%). As a conclusion, our vague parallel approach outperforms baseline and deep learning models, as well as certain other approaches chosen from the literature. INDEX TERMS Hadoop, Sentiment analysis, Extractors of features, HDFS, Selectors of features, MapReduce, Fuzzy logic, Deep belief neural network.

FIGURE 1. Global architecture of our combined classifier based on fuzzy Vader lexicon and a parallel deep belief networ!

FIGURE 2. Proportion of positive, neutral, and negative tweeted posts in the Sentiment140 corpus. F. Es-sabery et al/.: Emotion processing by applying a fuzzy-based Vader lexicon and a parallel deep belief network over massive data

FIGURE 4. Stages of Mamdani fuzzy system. Once both sentimental scores NS (Negativity score) = 0.425 and PS (Positivity score) = 0.575 have been computed. The implementation of Mamdani’s fuzzy logic, which is com- posed of three stages as illustrated in the figure 4, is the following stage.

A monotonically increasing membership function [42] is defined by two metrics d and p. It is defined by the next equation: A trapezoidal membership function [41] is defined by a low value lv, a high value hv and two values vp and vz which represent the boundaries of its kernel. The formula for the trapezoidal membership function is represented as follows:

Weighted Average Approach : is the simplest and, most commonly applied defuzzification method [50]. This tech- nique is also referred as the "Sugeno defuzzification" ap- proach. it can be formed by averaging every function of the outcome by its corresponding maximum belonging degree. Where x; denotes the element in the instance,u(a;) is the membership rate of the variable x; , and n presents the overall number of the variables in the used example. The representation graphic of the CM defuzzification approach is illustrated in the figure 6.

Centroid approach : this method is also renowned as the centre of mass, of gravity, or of area [49]. It is the most frequently applied defuzzification technique. Its core idea is to identify the point x* at which a vertical boundary line would split the aggregate into two independent equal masses. It is defined by the following equation (11)

Mean-Max Membership : this approach is also referred to as the medium of maxima procedure [51]. It is very closely linked to the maximum membership function, with the exception that the peak membership positions can be non- unique. The defuzzified outcome here is expressed by the subsequent equation (13):

has at least 2 non-overlapping fuzzy convex subsets. The out- come, in this scenario, is skewed to one side of a membership method [53]. Whenever the fuzzy outcome has two or more convex areas, so, the centroid of the convex fuzzy subarea with the biggest value is taken to get the defuzzified value x*. The value is determined by the next equation (15).

vectors. In our contribution, we have applied four feature extractors, which are: GloVe, TFIDF (Trigram), TFIDF (Bi- gram), TFIDF (Unigram) in order to discover which one provides great accuracy rate.

FIGURE 13. Example of linear sub-structures when working with GloVe. Where X;; indicates the frequency with which word j occurs in the content of term i. 6s; and bs; are the biases related to the terms i and j respectively. E. STAGE OF FEATURE SELECTION

FIGURE 15. data set before labeling. Figure 15 describes the datasets before the re-labeling process and the figure 16 shows the dataset after re-labeling. tion/defuzzification combination with the highest classifica- tion rate (98.96%) and lower error rate (1.04%).

From the figures 15 and 16, we remark that our hybrid process re-labeled the dataset into three class label: Negative label represents 34% , Positive label represent 44%, and Neu- tral represents 22% of the whole dataset instead of bot label Negative and Positive in the original dataset and h class every class label represents 50% of the whole dataset. According to this re-labelling process, we deduce that the original d is mislabelled with 22% as error rate. ata set

F. Es-sabery et a/.: Emotion processing by applying a fuzzy-based Vader lexicon and a parallel deep belief network over massive data

F. Es-sabery et a/.: Emotion processing by applying a fuzzy-based Vader lexicon and a parallel deep belief network over massive data FIGURE 20. Comparative analysis of our method and deep learning algorithms.

FIGURE 21. Comparison of our model with other approaches selected from the literature.

FIGURE 22. The convergence rate of our proposed hybrid model when it was performed over the Sentiment140 dataset.

TABLE 2. Lingual input and output metrics of Mamdani’s fuzzy system. Fuzzification phase: After the setting of lingual variables and lingual words, the subsequent stage is the implemen- tation of fuzzification operation [40] to the net rates of PS and NS, employing one of the membership functions (MFs) detailed by the equations (1), (2), (3), (4) and (5) for computing the membership degree u of both sentimental scores PS and NS in the Lower, Medium, and Higher fuzzy sets.

TABLE 3. Defining the DBN model's parameters. Choosing the hyper-parameters, like the number of hidden units, RBM and DBN learning rates, the number of epochs, the batch size, the number of hidden and visible layers and the depth of features has a serious influence on the classification’s accuracy and the calculation’s complexity. The precision of DBN might not be more effective than

FIGURE 18. Analysis of feature selectors. FIGURE 17. Evaluation of feature extractors. F. Es-sabery et a/.: Emotion processing by applying a fuzzy-based Vader lexicon and a parallel deep belief network over massive data

TABLE 6. Classification rate, Error rate of different combinations Fuzzification/Defuzzification. TABLE 7. Classification rate, Error rate, Run-time and Kappa-Static of different combinations

TABLE 9. FalsePositiveRate and FalseNegativeRate of different combinations. the hybridization (TF-IDF Trigram+Hybridization+DBN) delivered the best performance in terms of classification rate (98.96%), error rate (1.04%), runtime (32.95s), kappa- static (99.10%), precision (98.96), recall(99.75%), Fl-score (99.53%), specificity (98.67%), false positive rate (0.25%) and false negative rate (1.33%). We were able to draw the conclusion from the compar- ative analysis that the outcomes produced by our approach significantly outperform those of the seven algorithms, This demonstrates the benefit of employing a deep learning tech- nique for tasks like these involving sentiment analysis and the strong results we obtain when dealing with large amounts of data as opposed to using conventional ML algorithms.

TABLE 8. Precision, Recall, F1-Score and Specificity of different combinations.

TABLE 10. Complexity in term of space of our developed hybrid model, Botchway et al. [7], Es-Sabery et al. [10], Hua et al. [63], Hassan et al. [64] and Chen et al. [65].

TABLE 11. Complexity in terms of time of our developed hybrid model, Botchway et al. [7], Es-Sabery et al. [10], Hua et al. [63], Hassan et al. [64] and Chen et al [65]. In the fifth experiment, we have assessed the efficiency of our proposed hybrid method and the Botchway et al. [7], Es-Sabery et al. [10], Hua et al. [63], Hassan et al. [64] and Chen et al. [65] methods chosen from the literature to train the Sentiment140 dataset by demonstrating the convergence of each evaluated approach using the equation 52 in order to determine the iteration number when the analyzed method verified the condition described below in the following equa- tion 52.

TABLE 12. Convergence round of our innovative hybrid approach, Botchway et al. [7], Es-Sabery et al. [10], Hua et al. [63], Hassan et al. [64] and Chen et al. [65]. In the final experiment, we measured the mean standard deviation (MSD) of each approach in comparison to the various 5 cross-validations of the given corpus to examine the effectiveness of our developed hybrid pattern, Botchway et al. [7], Es-Sabery et al. [10], Hua et al. [63], Hassan et al. [64] and Chen et al. [65]. The primary objective of this experiment is to find the more stable approaches amongst all the evaluated approaches.

TABLE 13. Stability of our innovative hybrid approach, Botchway et al. [7], Es-Sabery et al. [10], Hua et al. [63], Hassan et al. [64] and Chen et al. [65] ir comparison to the various 5 cross-validations.

descriptionView Paper arrow_downwardDownload

Sentence-Level Classification Using Parallel Fuzzy Deep Learning Classifier

by ABDELLATIF HAIR

2022, IEEE Access

At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook, and Twitter, users honestly communicate their opinions and ideas about events, services, and products. Owing to this rise in the number of social... more

At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook, and Twitter, users honestly communicate their opinions and ideas about events, services, and products. Owing to this rise in the number of social platforms and their extensive use by people, enormous amounts of data are produced hourly. However, sentiment analysis or opinion mining is considered as a useful tool that aims to extract the emotion and attitude from the user-posted data on social media platforms by using different computational methods to linguistic terms and various Natural Language Processing (NLP). Therefore, enhancing text sentiment classification accuracy has become feasible, and an interesting research area for many community researchers. In this study, a new Fuzzy Deep Learning Classifier (FDLC) is suggested for improving the performance of data-sentiment classification. Our proposed FDLC integrates Convolutional Neural Network (CNN) to build an effective automatic process for extracting the features from collected unstructured data and Feedforward Neural Network (FFNN) to compute both positive and negative sentimental scores. Then, we used the Mamdani Fuzzy System (MFS) as a fuzzy classifier to classify the outcomes of the two used deep (CNN+FFNN) learning models in three classes, which are: Neutral, Negative, and Positive. Also, to prevent the long execution time taking by our hybrid proposed FDLC, we have implemented our proposal under the Hadoop cluster. An experimental comparative study between our FDLC and some other suggestions from the literature is performed to demonstrate our offered classifier's effectiveness. The empirical result proved that our FDLC performs better than other classifiers in terms of true positive rate, true negative rate, false positive rate, false negative rate, error rate, precision, classification rate, kappa statistic, F1-score and time consumption, complexity, convergence, and stability. INDEX TERMS Deep learning, convolutional neural network (CNN), sentiment analysis, feedforward neural network (FFNN), fuzzy logic, Hadoop framework, MapReduce, Hadoop Distributed File System (HDFS). I. INTRODUCTION Social network platforms like Instagram, Twitter, Youtube, LinkedIn, and Facebook have been considered essential and indispensable in our daily activities. Day-today , billions of social media users disseminate billions of personal or professional posts [1]. For example, Marketers use social media to spread professional posts that endeavor to present, promote, advertise, and market their products, services, events, and brand names. On the other hand, the customers interact The associate editor coordinating the review of this manuscript and approving it for publication was Jing Bi. with the marketers' posts by express their feelings, opinions, ideas, attitudes about the presented products or services [2]. Further, the marketers gather the customer's feedback, study, and analyze it using the sentiment analysis tool. The main objective from they are doing these operations is to improve the quality of their products and services, enhance their offerings by adding other privileges, and improve their brand performance [1], [2]. Sentiment Analysis (SA) plays a significant role in Business Intelligence (BI). In BI, it uses to get responses for questions such as, 'Why is product sales so low?', 'Have customer's needs are fully satisfied by utilizing our products?',

descriptionView Paper arrow_downwardDownload

Sentence-Level Classification Using Parallel Fuzzy Deep Learning Classifier

by Junaid Qadir

2022, IEEE Access

At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook, and Twitter, users honestly communicate their opinions and ideas about events, services, and products. Owing to this rise in the number of social... more

At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook, and Twitter, users honestly communicate their opinions and ideas about events, services, and products. Owing to this rise in the number of social platforms and their extensive use by people, enormous amounts of data are produced hourly. However, sentiment analysis or opinion mining is considered as a useful tool that aims to extract the emotion and attitude from the user-posted data on social media platforms by using different computational methods to linguistic terms and various Natural Language Processing (NLP). Therefore, enhancing text sentiment classification accuracy has become feasible, and an interesting research area for many community researchers. In this study, a new Fuzzy Deep Learning Classifier (FDLC) is suggested for improving the performance of data-sentiment classification. Our proposed FDLC integrates Convolutional Neural Network (CNN) to build an effective automatic process for extracting the features from collected unstructured data and Feedforward Neural Network (FFNN) to compute both positive and negative sentimental scores. Then, we used the Mamdani Fuzzy System (MFS) as a fuzzy classifier to classify the outcomes of the two used deep (CNN+FFNN) learning models in three classes, which are: Neutral, Negative, and Positive. Also, to prevent the long execution time taking by our hybrid proposed FDLC, we have implemented our proposal under the Hadoop cluster. An experimental comparative study between our FDLC and some other suggestions from the literature is performed to demonstrate our offered classifier's effectiveness. The empirical result proved that our FDLC performs better than other classifiers in terms of true positive rate, true negative rate, false positive rate, false negative rate, error rate, precision, classification rate, kappa statistic, F1-score and time consumption, complexity, convergence, and stability. INDEX TERMS Deep learning, convolutional neural network (CNN), sentiment analysis, feedforward neural network (FFNN), fuzzy logic, Hadoop framework, MapReduce, Hadoop Distributed File System (HDFS). I. INTRODUCTION Social network platforms like Instagram, Twitter, Youtube, LinkedIn, and Facebook have been considered essential and indispensable in our daily activities. Day-today , billions of social media users disseminate billions of personal or professional posts [1]. For example, Marketers use social media to spread professional posts that endeavor to present, promote, advertise, and market their products, services, events, and brand names. On the other hand, the customers interact The associate editor coordinating the review of this manuscript and approving it for publication was Jing Bi. with the marketers' posts by express their feelings, opinions, ideas, attitudes about the presented products or services [2]. Further, the marketers gather the customer's feedback, study, and analyze it using the sentiment analysis tool. The main objective from they are doing these operations is to improve the quality of their products and services, enhance their offerings by adding other privileges, and improve their brand performance [1], [2]. Sentiment Analysis (SA) plays a significant role in Business Intelligence (BI). In BI, it uses to get responses for questions such as, 'Why is product sales so low?', 'Have customer's needs are fully satisfied by utilizing our products?',

descriptionView Paper arrow_downwardDownload

Sentence-Level Classification Using Parallel Fuzzy Deep Learning Classifier

by fatima es-sabery

2022, IEEE Access

At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook, and Twitter, users honestly communicate their opinions and ideas about events, services, and products. Owing to this rise in the number of social... more

At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook, and Twitter, users honestly communicate their opinions and ideas about events, services, and products. Owing to this rise in the number of social platforms and their extensive use by people, enormous amounts of data are produced hourly. However, sentiment analysis or opinion mining is considered as a useful tool that aims to extract the emotion and attitude from the user-posted data on social media platforms by using different computational methods to linguistic terms and various Natural Language Processing (NLP). Therefore, enhancing text sentiment classification accuracy has become feasible, and an interesting research area for many community researchers. In this study, a new Fuzzy Deep Learning Classifier (FDLC) is suggested for improving the performance of data-sentiment classification. Our proposed FDLC integrates Convolutional Neural Network (CNN) to build an effective automatic process for extracting the features from collected unstructured data and Feedforward Neural Network (FFNN) to compute both positive and negative sentimental scores. Then, we used the Mamdani Fuzzy System (MFS) as a fuzzy classifier to classify the outcomes of the two used deep (CNN+FFNN) learning models in three classes, which are: Neutral, Negative, and Positive. Also, to prevent the long execution time taking by our hybrid proposed FDLC, we have implemented our proposal under the Hadoop cluster. An experimental comparative study between our FDLC and some other suggestions from the literature is performed to demonstrate our offered classifier's effectiveness. The empirical result proved that our FDLC performs better than other classifiers in terms of true positive rate, true negative rate, false positive rate, false negative rate, error rate, precision, classification rate, kappa statistic, F1-score and time consumption, complexity, convergence, and stability. INDEX TERMS Deep learning, convolutional neural network (CNN), sentiment analysis, feedforward neural network (FFNN), fuzzy logic, Hadoop framework, MapReduce, Hadoop Distributed File System (HDFS). I. INTRODUCTION Social network platforms like Instagram, Twitter, Youtube, LinkedIn, and Facebook have been considered essential and indispensable in our daily activities. Day-today , billions of social media users disseminate billions of personal or professional posts [1]. For example, Marketers use social media to spread professional posts that endeavor to present, promote, advertise, and market their products, services, events, and brand names. On the other hand, the customers interact The associate editor coordinating the review of this manuscript and approving it for publication was Jing Bi. with the marketers' posts by express their feelings, opinions, ideas, attitudes about the presented products or services [2]. Further, the marketers gather the customer's feedback, study, and analyze it using the sentiment analysis tool. The main objective from they are doing these operations is to improve the quality of their products and services, enhance their offerings by adding other privileges, and improve their brand performance [1], [2]. Sentiment Analysis (SA) plays a significant role in Business Intelligence (BI). In BI, it uses to get responses for questions such as, 'Why is product sales so low?', 'Have customer's needs are fully satisfied by utilizing our products?',

descriptionView Paper arrow_downwardDownload

Estimation of infection risk using symptoms of COVID-19: An approach based on fuzzy expert system

by Sezai Tokat

2022, Frontiers in Life Sciences and Related Technologies

According to the published reports and studies, the symptoms of the disease caused by the COVID-19 virus have not yet been fully determined. It is a major stress on clinicians to make a correct and consistent decision about whether to... more

descriptionView Paper arrow_downwardDownload

A MapReduce Opinion Mining for COVID-19-Related Tweets Classification Using Enhanced ID3 Decision Tree Classifier

by Junaid Qadir

2022, IEEE Access

Opinion Mining (OM) is a field of Natural Language Processing (NLP) that aims to capture human sentiment in the given text. With the ever-spreading of online purchasing websites, micro-blogging sites, and social media platforms, OM in... more

Opinion Mining (OM) is a field of Natural Language Processing (NLP) that aims to capture human sentiment in the given text. With the ever-spreading of online purchasing websites, micro-blogging sites, and social media platforms, OM in online social media platforms has picked the interest of thousands of scientific researchers. Because the reviews, tweets and blogs acquired from these social media networks, act as a significant source for enhancing the decision making process. The obtained textual data (reviews, tweets, or blogs) are classified into three different class labels which are negative, neutral and positive for analyzing and extracting relevant information from the given dataset. In this contribution, we introduce an innovative MapReduce improved weighted ID3 decision tree classification approach for OM, which consists mainly of three aspects: Firstly We have used several feature extractors to efficiently detect and capture the relevant data from the given tweets, including N-grams or character-level, Bag-Of-Words, word embedding (GloVe, Word2Vec), FastText, and TF-IDF. Secondly, we have applied a multiple feature selector to reduce the high feature's dimensionality, including Chi-square, Gain Ratio, Information Gain, and Gini Index. Finally, we have employed the obtained features to carry out the classification task using an improved ID3 decision tree classifier, which aims to calculate the weighted information gain instead of information gain used in traditional ID3. In other words, to measure the weighted information gain for the current conditioned feature, we follow two steps: First, we compute the weighted correlation function of the current conditioned feature. Second, we multiply the obtained weighted correlation function by the information gain of this current conditioned feature. This work is implemented in a distributed environment using the Hadoop framework, with its programming framework MapReduce and its distributed file system HDFS. Its primary goal is to enhance the performance of a well-known ID3 classifier in terms of accuracy, execution time, and ability to handle the massive datasets. We have carried out several experiences that aims to assess the effectiveness of our suggested classifier compared to some other contributions chosen from the literature. The experimental results demonstrated that our ID3 classifier works better on COVID-19_Sentiments dataset than other classifiers in terms of Recall (85.72 %), specificity (86.51 %), error rate (11.18 %), false-positive rate (13.49 %), execution time (15.95s), kappa statistic (87.69 %), F1-score (85.54 %), classification rate (88.82 %), false-negative rate (14.28 %), precision rate (86.67 %), convergence (it convergent towards the iteration 90), stability (it is more stable with mean deviation standard equal to 0.12 %), and complexity (it requires much lower time and space computational complexity).

descriptionView Paper arrow_downwardDownload

Big Data: A Bigger Review

by ARPAN BAIRAGI and

2022, IJRAR

In this modernistic chapter of information and technology, a galactic volume of data generations is happening every moment. Big data is a phrase that is referred to data sets that are not only big or, massive but also having velocity,... more

descriptionView Paper arrow_downwardDownload

Depression Analysis of Social Media Activists Using the Gated Architecture Bi-LSTM

by Mohsan Ali

2022, IEEE

Twitter is one of the social media platforms that has evolved into an incredible environment for users to communicate with friends and other users to trade thoughts, videos, and photographs that reflect their present mood. Using social... more

Fig. 2: dataset distribution The training data contains 26,308 tweets labe led as 0 and 1, sad tweet and happy tweet respectively. Since this research is about detecting depression, it should be better trained at sad tweets. It contains 18,308 sad and 8000 happy this study required a dataset that had an proportion of both sad and happy tweets. Sad twe weets since appropriate ets (label 0) were taken from Sentimentl40 dataset [3] and for happy tweets (label 1) dataset [8] was used. Both datasets were combined to form a 60:40 ratio of sad and happy tweets[9] respectively. The dataset had three attributes tweet and label. namely, id, Train-Test spilt evaluation has been used to estimate the performance of algorithm. For testing a dataset of 5832 tweets was used, out of which 2,568 were 0 label tweets (sad) and 3,264 were | label tweets (happy)[2]. Unlike the training dataset, the test dataset is not biased towards sad tweets. The research of the study assume that the trained model could generate better accuracy result whenever gave a correspondingly circulated test dataset. With the scores that has been obtained, it is proved that the given data is well generalized.

and testing data were formatted for the neural networks. Fig. 1: Proposed methodology for depression analysis using BI-LSTM

The results obtained by running the Bi-LSTM model on the unknown testing data are depicted in Fig. 5. The confusion matrix's y-axis represents the actual data, while the x-axis represents the model's expected outcomes. Fig. 4: Confusion matrix using Bi-LSTM trained model

Fig. 5: Depression Analyzer using Bi-LSTM in real time application The application developed using the proposed methodology in this this article is currently being used by the users on the local bases. The demo screenshot is shown in Fig. 5.

TABLE II RESULTS OF THE LSTM AND CNN MODELS FOR DEPRESSION ANALYSIS

descriptionView Paper arrow_downwardDownload

Big data and remote sensing: A new software of ingestion

by Chaker El Amrani

2022, International Journal of Electrical and Computer Engineering (IJECE)

Currently, remote sensing is widely used in environmental monitoring applications, mostly air quality mapping and climate change supervision. However, satellite sensors occur massive volumes of data in near-real-time, stored in multiple... more

descriptionView Paper arrow_downwardDownload

A deep neural network approach for sentiment analysis of medically related texts: an analysis of tweets related to concussions in sports

by Kayvan Tirdad

2022, Brain Informatics

Annually, over three million people in North America suffer concussions. Every age group is susceptible to concussion, but youth involved in sporting activities are particularly vulnerable, with about 6% of all youth suffering a... more

descriptionView Paper arrow_downwardDownload

Social Media Sentiment Analysis Using CNN-BiLSTM

by Rhea Bharal

2022

Sentiment analysis is application of natural language processing for understanding the opinions or views of public on various topics. This is also popularly known as opinion mining, the system collects, analyses and examines the... more

Word2Vector is based on the fact that some words have similarities to one another which helps us to perform natural language processing.

using the max operation. important information is lost about where exactly it appeared Unlike a CNN, RNN is suitable for temporal information, also called sequential data. However, it is difficult to train RNNs with learning long-term temporal dependencies as it causes time lag problems during training and disappearing gradient problems hence, Bi-LSTM, Using bidirectional network, the inputs will run in two directions, one from future to past and one from past to future and this is bette1 than LSTM as it preserves information in memory from the future and uses the two hidden states together for retaining information from both past and future at any given time. Thus it has proven to give good results in natural language processing

Figure 4: Bi-LSTM Cell Volume 10 Issue 9, September 2021

In figure 8, We can see the incorrectly and correctly classified tweets. The proportion in the principle diagonal are the correctly classified tweets. The accuracy of the model is 78.86% by using CNN-BiLSTM model. The performance of model on this dataset is better than other machine learning techniques. Therefore, CNN-BiLSTM is very good for classifying the tweets and that too in shorter period of training time.

Figure 10: Accuracy comparison with different models Deep learning methods such as CNN-BiLSTM show better performance of sentiment classification with more amounts of training data) As shown in figure 10, Our model’s accuracy is better as compared to other models that have been trained on the same dataset Sentiment 140.

descriptionView Paper arrow_downwardDownload

Explaining Sentiment Classification

by Gérard Chollet

2022, Interspeech 2019

This paper presents a novel 1-D sentiment classifier trained on the benchmark IMDB dataset. The classifier is a 1-D convolutional neural network with repeated convolution and max pooling layers. The main contribution of this work is the... more

descriptionView Paper arrow_downwardDownload

Sentimental analysis using recurrent neural network

by Merin Thomas

2022, International Journal of Engineering & Technology

Sentiment analysis has been an important topic of discussion from two decades since Lee published his first paper on the sentimental analysis in 2002. Apart from the sentimental analysis in English, it has spread its wing to other natural... more

descriptionView Paper arrow_downwardDownload

Sentiment Analysis on Social Distancing and Physical Distancing on Twitter Social Media using Recurrent Neural Network (RNN) Algorithm

by Roni Habibi

2022, Jurnal Online Informatika

The government is seeking preventive steps to reduce the risk of the spread of Covid-19, one of which is social restrictions that have become popular with social distancing and physical distancing. One way to assess whether the steps... more

descriptionView Paper arrow_downwardDownload

FLE: A Fuzzy Logic Algorithm for Classification of Emotions in Literary Corpora

by Luis Gil Moreno Jiménez

2022

This paper presents an algorithm based on fuzzy logic, devised to identify emotions in corpora of literary texts, called Fuzzy Logic Emotions (FLE) classifier. This algorithm evaluates a sentence to define the class(es) of emotions to... more

descriptionView Paper arrow_downwardDownload

Big data and remote sensing: A new software of ingestion

by Badr-Eddine Boudriki Semlali

2022, International Journal of Electrical and Computer Engineering (IJECE)

Currently, remote sensing is widely used in environmental monitoring applications, mostly air quality mapping and climate change supervision. However, satellite sensors occur massive volumes of data in near-real-time, stored in multiple... more

descriptionView Paper arrow_downwardDownload

SMM4H Task 2: The Automatic Classification of Tweets Mentioning Adverse Events

by Tasha Torchon

2022

As part of the Social Media Mining for Health Applications (SMM4H) Shared Task 2020, our team participated in task 2, the automatic classification of tweets that mention adverse events associated with medication use. Our general... more

descriptionView Paper arrow_downwardDownload

A Robust Hybrid Approach for Textual Document Classification

by Sheraz Ahmed

2022, 2019 International Conference on Document Analysis and Recognition (ICDAR)

Text document classification is an important task for diverse natural language processing based applications. Traditional machine learning approaches mainly focused on reducing dimensionality of textual data to perform classification.... more

descriptionView Paper arrow_downwardDownload

Deep Sentiment Analysis: A Case Study on Stemmed Turkish Twitter Data

by Sezai Tokat

2022, IEEE Access

Sentiment analysis using stemmed Twitter data from various languages is an emerging research topic. In this paper, we address three data augmentation techniques namely Shift, Shuffle, and Hybrid to increase the size of the training data;... more

descriptionView Paper arrow_downwardDownload

Crop Analysis and Prediction Using Data Mining Techniques

by Sandeep Hegde

2022

This paper solves the problem which arises in the production of crops by analyzing the various factors using data mining techniques. This system gathers information about the crops that are cultivated from the different place around the... more

descriptionView Paper arrow_downwardDownload

Sentiment Detection on Sudanese Political Tweets Using Deep Learning Approach

by Shazali Siddig

2022, International Journal of New Technology and Research

Sentiment detection of Arabic tweets is interesting research topic and it enables scholars to analyze huge resources of shared opinions in social media websites such as Facebook and tweeter. It is one of the more complex natural language... more

descriptionView Paper arrow_downwardDownload

Sentence-Level Classification Using Parallel Fuzzy Deep Learning Classifier

by Beatriz Sainz

2022, IEEE Access

At present, with the growing number of Web 2.0 platforms such as Instagram, Facebook, and Twitter, users honestly communicate their opinions and ideas about events, services, and products. Owing to this rise in the number of social... more

descriptionView Paper arrow_downwardDownload

Machine Learning Approaches to Detect DoS and Their Effect on WSNs Lifetime

by Raniyah Wazirali

2021, Computers, Materials & Continua

Energy and security remain the main two challenges in Wireless Sensor Networks (WSNs). Therefore, protecting these WSN networks from Denial of Service (DoS) and Distributed DoS (DDoS) is one of the WSN networks security tasks. Traditional... more

Energy and security remain the main two challenges in Wireless Sensor Networks (WSNs). Therefore, protecting these WSN networks from Denial of Service (DoS) and Distributed DoS (DDoS) is one of the WSN networks security tasks. Traditional packet deep scan systems that rely on open field inspection in transport layer security packets and the open field encryption trend are making machine learning-based systems the only viable choice for these types of attacks. This paper contributes to the evaluation of the use machine learning algorithms in WSN nodes traffic and their effect on WSN network life time. We examined the performance metrics of different machine learning classification categories such as K-Nearest Neighbour (KNN), Logistic Regression (LR), Support Vector Machine (SVM), Gboost, Decision Tree (DT), Naïve Bayes, Long Short Term Memory (LSTM), and Multi-Layer Perceptron (MLP) on a WSN-dataset in different sizes. The test results proved that the statistical and logical classification categories performed the best on numeric statistical datasets, and the Gboost algorithm showed the best performance compared to different algorithms on average of all performance metrics. The performance metrics used in these validations were accuracy, F1-score, False Positive Ratio (FPR), False Negative Ratio (FNR), and the training execution time. Moreover, the test results showed the Gboost algorithm got 99.6%, 98.8%, 0.4% 0.13% in accuracy, F1-score, FPR, and FNR, respectively. At training execution time, it obtained 1.41 s for the average of all training time execution datasets. In addition, this paper demonstrated that for the numeric statistical data type, the best results are in the size of the dataset ranging from 3000 to 6000 records and the percentage between categories is not less than 50% for each category with the other categories. Furthermore, this paper investigated the effect of Gboost on the WSN lifetime, which resulted in a 32% reduction compared to other Gboost-free scenarios.

descriptionView Paper arrow_downwardDownload

Bi-LSTM Model to Increase Accuracy in Text Classification: Combining Word2vec CNN and Attention Mechanism

by Gaspard Harerimana

2021, Applied Sciences

There is a need to extract meaningful information from big data, classify it into different categories, and predict end-user behavior or emotions. Large amounts of data are generated from various sources such as social media and websites.... more

descriptionView Paper arrow_downwardDownload

Performance Analysis of Machine Learning Algorithms for Big Data Classification

by Dr. Sanjeev Punia

2021, International Journal of E-Health and Medical Communications

In broad, three machine learning classification algorithms are used to discover correlations, hidden patterns, and other useful information from different data sets known as big data. Today, Twitter, Facebook, Instagram, and many other... more

descriptionView Paper arrow_downwardDownload

Spam Reviews and Spammer Community Detection using Machine Learning Algorithms

by Anuja Kumbharkar

2021, International Journal of Scientific Research in Science, Engineering and Technology

Online reviews and feedback of a product plays a vital role in human tendency to purchase those products. To affect the product sale spammer generates fake reviews on online social media platform. To identify spam reviews and spammer... more

descriptionView Paper arrow_downwardDownload

Exploring Diverse Features for Sentiment Quantification Using Machine Learning Algorithms

by Saqib Iqbal

2021, IEEE Access

In the era of web 2.0, online forums, blogs and Twitter are becoming primary sources for sharing views, opinions and comments about different topics. Classifying these views, opinions and comments is known as sentiment analysis which is... more

descriptionView Paper arrow_downwardDownload

Development of Hybrid Deep Learning in Sentence Classification

by Thiptanawat Phongwattana

2021

This paper explores the combination of two deep learning techniques that consists of convolutional neural networks (CNN) and long short-term memory recurrent neural networks (LSTM-RNN) as a hybrid approach to sentence classification. The... more

descriptionView Paper arrow_downwardDownload

Amrita@LT-EDI-EACL2021: Hope Speech Detection on Multilingual Text

by KOTHAMASU SAI RAHUL

2021

Analysis and deciphering code-mixed data is imperative in academia and industry, in a multilingual country like India, in order to solve problems apropos Natural Language Processing. This paper proposes a bidirectional long short-term... more

Figure 3: Confusion matrix for test data: 1-D CNN with up-sampled data using Word2Vec embeddings Figure 2: Confusion matrix for test data: 1-D CNN with data augmentation using back translation with Word2Vec embeddings.

Figure 1: Confusion matrix for test data: 1-D CNN without any augmentation technique using Word2Vec embeddings.

Figure 4: Confusion matrix for test data: 1-D CNN without any augmentation technique using FastText embeddings.

Figure 5: Confusion matrix for test data: 1-D CNN with data augmentation using back translation with FastText embeddings.

Figure 6: Confusion matrix for test data: 1-D CNN with up-sampled data using FastText embeddings. Table 6: Fil-score for development and _ test data.CNN_ft_UP: 1-D CNN with up-sampled data using FastText embeddings, CNN_ft_NORM: 1-D CNN without any augmentation technique using FastText embeddings, CNN_ft_AUG: 1-D CNN with data augmentation using back translation with FastText embeddings. References development data set. We achieved an Fl-score

Figure 7: Confusion matrix for test data: Bi-LSTM with up-sampled data using FastText embeddings. 5 Conclusion We evaluated the performance of our models on the test data using sklearn Fl-score °, we found that 1-D CNN on up-sampled data with both Word2Vec and FastText embeddings performed better than the Bi-LSTM model which we submitted for the task. However, Bi-LSTM performed relatively better on the development data set (see Table 7). We also observed that random up-sampling yielded better results compared to data augmentation with back translation method.

CNN with Word2Vec embeddings: Word embeddings from Word2Vec are feed into 1-D CNN; the results were evaluated using weighted Fl-score. Various hyperparameters and word embeddings dimensions were experimented with. 1-D CNN with up-sampled data using Word2Vec embeddings (CNN_W2V_UP) obtained a weighted Fl-score of 0.73 on developmen data set. 1-D CNN without any augmentation technique (CNN_W2V_NORM) obtained a weighted Fl-score of 0.62 on development data set. 1-D CNN with data augmentation using back translation (CNN_W2V_AUG) obtained a weighted Fl-score of 0.69 on development data set. See Table 5 and Figures: 2,3,4 Table 5: Fil-score for development and _ test data.CNN_W2V_UP: 1-D CNN with up-sampled data using Word2Vec embeddings, CNN_W2V_NORM: I- D CNN without any augmentation technique using Word2Vec embeddings, CNN_W2V_AUG: 1-D CNN with data augmentation using back translation with Word2Vec embeddings.

descriptionView Paper arrow_downwardDownload

Design and Development of Fuzzy Logic Application Tsukamoto Method in Predicting the Number of Covid-19 Positive Cases in West Java

by Alwi Permana

2021, International Journal of Global Operations Research

The increase in covid-19 positive patients in Indonesia, especially in West Java, is unpredictable, resulting in unpreparedness in dealing with covid-19 cases. People in monitoring and patients under supervision are the category that is... more

descriptionView Paper arrow_downwardDownload

Sentiment Analysis on Social Distancing and Physical Distancing on Twitter Social Media using Recurrent Neural Network (RNN) Algorithm

by Roni Habibi

2021, Jurnal Online Informatika

The government is seeking preventive steps to reduce the risk of the spread of Covid-19, one of which is social restrictions that have become popular with social distancing and physical distancing. One way to assess whether the steps... more

Figure 1. Flowchart of the Design Process for Determining Sentiment Classification

Modeling is a process to create a model of the system. The purpose of making modeling is to analyz and provide predictive results that can be closer to reality before the system is applied in the field [15]. Th data used in modeling is public opinion in the form of tweets from socia media Twitter. Data retrieval is don by crawling data using the Twitter API, to get tweets with the keywords social distancing and physicz distancing within a certain period. The data collection period was carried out from April to June 2020 with th amount of data obtained as many as 9594 for tweets with the keyword with the keyword physical distancing. Then the crawled data is filtered and tweet data that tends not to be Indonesian. The data that has gone t social distancing and 9414 for tweet , namely by removing duplicate dat hrough the filtering process is furthe divided into training data, and test data with a percentage of 80% and 20%, where the training data is 1280 and the test data is 3298 data consisting of 1815 data with physical distancing keywords and 1483 data wit words the key to social distancing. Then, the accuracy obtained was still below 70% after modeling, so the dat was added for training data from the tweet in July as many as 7136 data so that the overall training data use was 19410 data. Jupyter Notebook is an application that is used to design, develop, implement, an communicate with special code [16]. The following is a flow chart for modeling the tweet data sentimer analysis using a jupyter notebook as shown in Figure 2.

Figure 4. Load and Explore Data i aa ee ee nn nn EE The command used in this stage is pd.read_csv, where this function is a function owned by the Pandas library and is used to read data that has csv and txt formats and read data can be formed into a data frame, which later can be initialized into a variable that serves to facilitate data call. Then in this stage, the process of exploring the data is also carried out, which aims to see the distribution of each data. In this study, explored data was carried out to see the comparison of the amount of data, where tweets that have positive sentiment are 9970 data, and negative sentiment are 9970 data as shown in Figure 4.

Figure 6. Confusion Matrix Training Data The Model Evaluation Process aims to measure the performance of the model that has gone through the training process. Model evaluation is done by using a confusion matrix, where confusion matrix is a method that is usually used to calculate accuracy in data mining and supervised machine learning concepts. From a classification point of view, there are 4 (four) categories that can be used to compare labels from each class on the classification results, namely True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN) such as shown in Figure 7 and Figure 8 [17].

Figure 9. Visualization of the Results of Classification Dataset Test Keywords Physical Distancin; Sentiment Prediction Results on Test Data (Keywords: Social Distancing)

Figure 10. Visualization of the Results of Classification Dataset Test Keywords Social Distancing To see how accurate the model can describe the classification correctly, accuracy is needed. Accuracy is the ratio of true predictions (positive and negative) to the overall data. In other words, accuracy is the level of closeness of the predicted value to the actual or actual value. Measuring the level of accuracy of sentiment analysis using the RNN algorithm and evaluating the modeling is done using confusion matrix where the results obtained for the training dataset are 89% accuracy, 89% recall, 89% precision, and 89% F1 Score. Meanwhile, for the test dataset, an accuracy of 80% was obtained, a recall of 79%, a precision of 81%, and an F1 score of 80%.

2.1.4 Tweet Length Calculation ae OS eee eS eS Le This process aims to find out the length of the tweets in the dataset, where later tweets that are less than the maximum length will be padded to match the size. For example, in this study, the maximum length of identified tweets was 55, so that a tweet that was less than 55 would be padded at the tokenize stage to make it the same length. aims to divide the cleaned dataset into training datasets and test datasets with a percentage of 80% or 1592 records and 20% or 3988 records.

descriptionView Paper arrow_downwardDownload

Detection of Fake News Text Classification on COVID-19 Using Deep Learning Approaches

by Dr. Hafsa Dar

2021, Computational and Mathematical Methods in Medicine

A vast amount of data is generated every second for microblogs, content sharing via social media sites, and social networking. Twitter is an essential popular microblog where people voice their opinions about daily issues. Recently,... more