Term Frequency-Inverse Document Frequency Research Papers

Application of machine learning methods to analysis and evaluation of distance education

by International Journal of Electrical and Computer Engineering (IJECE)

2025, International Journal of Electrical and Computer Engineering (IJECE)

In recent decades, distance learning has become an essential component of the modern educational system, providing students with flexibility and access to knowledge regardless of location. This paper discusses creating a hybrid... more

descriptionView Paper arrow_downwardDownload

Reddit social media text analysis for depression prediction: using logistic regression with enhanced term frequency-inverse document frequency features

by International Journal of Electrical and Computer Engineering (IJECE) and

2024, International Journal of Electrical and Computer Engineering (IJECE)

Language provides significant insights into an individual's emotional state, social status, and personality traits. This research aims to enhance depression detection through the analysis of linguistic features and various dataset... more

descriptionView Paper arrow_downwardDownload

A Comparative Study of Various Text Summarization Methods

by CHINNI MOHITH

2024

Text summarization refers to the process of condensing long texts into short notes while keeping the most significant information, it is an application of natural language processing. This research provides an overview of text... more

Text summaries cut down on word count without compromising meaning. For humans, summarizing long articles can be challenging. Methods include extractive (selecting important sections) and abstractive (rewriting). This survey focuses on extractive techniques for text summarization, crucial for understanding advancements in natural language processing This literature review delves into the complexities of text summarization, particularly focusing on extractive techniques for effective natural language processing advancements. summarization entails comprehending the previous text through text analysis and interpretation utilizing the language techniqu The goal of the abstractive summarization is to provide the generalized summary for communicating information succinctly. Figut 1 shows the steps of text summarization of the model.

There are two primary categories of text summarization techniques: extractive and abstractive methods. For these kinds of tasks, a lot of research has been done on both machine learning (ML) and deep learning (DL) techniques. For extractive summarization, machine learning techniques such as Naive Bayes, Support Vector Machines (SVM), and decision trees are commonly employed. These methods pick out the most pertinent sentences or phrases from the text that are important. Although they are adept at recognizing factual information, they frequently have trouble deciphering subtler contextual cues. Table 1 Summary of Key Findings of Various Methods

V. CONCLUSION The above results determine the efficiency of Neural Networks in the context of text summarization, while considering many important aspects. Neural Networks are known for their ability to identify complex patterns and hence they outperformed other conventional techniques like TF-IDF, clustering at maintaining context relevance. Additionally, they are better in identification of important terms, which resulted in summaries with better keyword count. Neural Networks have also performed better than any other models in accuracy evaluation, which determine their ability to learn from large datasets.

descriptionView Paper arrow_downwardDownload

A Comparative Study of Various Text Summarization Methods

by Chinni Sai Raj Dheeraj and

2024, International Journal of Novel Research and Development

Text summarization refers to the process of condensing long texts into short notes while keeping the most significant information, it is an application of natural language processing. This research provides an overview of text... more

descriptionView Paper arrow_downwardDownload

Uma experiência de utilização da análise semântica latente para o tratamento de documentos

by Celso Kaestner

2024

Resumo. Este artigo relata experimentos realizados para a realização automática de tarefas em Recuperação de Informações: recuperação e agrupamento de documentos. Nesta abordagem é empregada a Análise Semântica Latente (Latent Semantic... more

descriptionView Paper arrow_downwardDownload

Text Classification for Arabic Words Using Rep-Tree

by Hamza Naji

2024, International Journal of Computer Science and Information Technology

The amount of text data mining in the world and in our life seems ever increasing and there's no end to it. The concept (Text Data Mining) defined as the process of deriving high-quality information from text. It has been applied on... more

descriptionView Paper arrow_downwardDownload

SENTIMENT ANALYSIS ON EMPLOYEE LAYOFFS BASED ON HYBRID FEATURE EXTRACTION AND LONG SHORT TERM MEMORY NETWORK

by IAEME Publication

2024, IAEME PUBLCATION

In recent decades, sentiment analysis has become crucial for understanding the opinions and emotions expressed in different forms of communication, namely speech, text, etc. Particularly, in the scenario of employee layoffs, sentiment... more

descriptionView Paper arrow_downwardDownload

A Proposed Method for Summarizing Arabic Single Document

by Asmaa Bialy

2024, International Journal of Computer Applications

This paper proposes an automatic text summarization method, which is considered as a selective process for the most important information in the original text. It could be divided into two types extractive and abstractive. In this study,... more

descriptionView Paper arrow_downwardDownload

Analysis of Information Retrieval in Call Center Documents - Case Study in Computer Solutions Bases

by Leila Weitzel

2023, IEEE International Conference on Cloud Computing Technology and Science

Centrais de Atendimento buscam ser mais produtivas realizando um atendimento padronizado para os seus clientes. A fim de alcançar este objetivo, são utilizados procedimentos, que contém um conjunto de soluções possíveis. O motor de busca... more

descriptionView Paper arrow_downwardDownload

Recuperação da Informação aplicada à extração de relações semânticas em uma coleção fechada de documentos Psicológicos coletados na Web

by Murilo Chaves Jayme

2023

Devido ao crescente aumento do volume de informaç̧ões na internet, buscam-se uma melhoria contínua das diversas técnicas da recuperação de informaçõ̃es à fim de alcançar resultados mais eficientes e eficazes para encontrar documentos cada... more

descriptionView Paper arrow_downwardDownload

Sistema de digitalización y estructuración de información clínica con técnicas de reconocimiento óptico de caracteres y procesamiento del lenguaje natural

by Walter Barrientos

2023, CIIS Ulima Congreso Internacional de Ingeniería de Sistemas

El presente trabajo busca desarrollar un sistema que permita la digitalización y estructuración de los registros clínicos apuntados por el doctor de forma tradicional mediante técnicas de reconocimiento óptico de caracteres y... more

descriptionView Paper arrow_downwardDownload

Text Classification for Arabic Words Using Rep-Tree

by Hamza Naji

2023, International Journal of Computer Science and Information Technology

The amount of text data mining in the world and in our life seems ever increasing and there's no end to it. The concept (Text Data Mining) defined as the process of deriving high-quality information from text. It has been applied on... more

descriptionView Paper arrow_downwardDownload

Automatic summarization of YouTube video transcription text using term frequency-inverse document frequency

by Hiba Aleqabie

2023, Indonesian Journal of Electrical Engineering and Computer Science

Automatic summarization is a technique for quickly introducing key information by abbreviating large sections of material. Summarization may apply to text and video with a different method to display the abstract of the subject. Natural... more

Figure 1. The proposed system block diagram This paper proposes a text summary of the video content system based onTF-IDF. Proposed model employs python 3.5 programing language to implemented a system functions and steps. The main aim of the proposed system is generating a summary text to the content of the YouTube video containing all important information. The general block diagram of the proposed system stages illustrated in Figure 1. The convolutional neural network (CNN)-dailymail-master dataset is used in this paper which is a dataset for text summarization, it contains online news stories written by journalists at CNN and the daily mail; it has paired with multi-sentence summaries and the models are evaluated using ROUGE-1, ROUGE-2, ROUGE-L [19], [20]. The details of the steps are demonstrated in the following subsections.

3.3. Generate unique word frequency ON The initial processing step is the first significant stage in natural language processing which consists of three stages. The first one is tokenization which splits each phrase into a series of words or terms. The second is to eliminate English stop words, which is a way to efface letters and words with no denotement in the sentence and reiterate more than once in the text so that the text will be pristine from stop words. Table 1 shows a sample of the stopwords. The last stage is word-stemming; the central concept is to handle the word that cessations or beginning by minimizing the phrases or words to their word roots, kenned as a lemma. Stemming is typically performed before the word's final assignment to the index by deleting all affixed suffixes and prefixes (affixes) from index words. In this step, the unique words will be computed using the unique word frequency (UWF). The iteration of each unique word will be counted to see which of the words is essential in the text in the summarization text. The iterations numbers had sorted in descending order according to their repetition Table 2 shows a sample of the word frequency in our example.

Finding keywords is the consequential step in the system, to filter the words in the text, TF-IDF was utilized this approach measures the words consequential in a sentence and the number of times a word is included in a text. The word is very paramount if it is reiterated in a sentence, but less reiterated in a document [21], [22]. TEL TWWVG ja anal tn TEXTING hnt+th TH and WE warn nnamunitoad (1)\ and ())\ ewocnantivyalcs:

Table 3. Evaluation result using Rouge-1, Rouge-2, Rouge-L and Rouge-SU Table 4. Comparison results using rouge recall at variant length

descriptionView Paper arrow_downwardDownload

Comparação de Atributos Estilométricos para Identificação de Autoria de Escrita: Um Estudo de Caso de Guimarães Rosa versus Clarice Lisperctor

by Diego Flores

2023, Anais do Encontro Nacional de Inteligência Artificial e Computacional (ENIAC)

Quando um escritor se expressa, deve se decidir entre uma série de escolhas, tais como quais palavras/expressões usar ou como deve ser a pontuação da leitura. Essas escolhas definem as características individuais do escritor e a... more

descriptionView Paper arrow_downwardDownload

Comparação de Atributos Estilométricos para Identificação de Autoria de Escrita: Um Estudo de Caso de Guimarães Rosa versus Clarice Lisperctor

by Diego Soto Flores

2023, Anais do Encontro Nacional de Inteligência Artificial e Computacional (ENIAC)

Quando um escritor se expressa, deve se decidir entre uma série de escolhas, tais como quais palavras/expressões usar ou como deve ser a pontuação da leitura. Essas escolhas definem as características individuais do escritor e a... more

descriptionView Paper arrow_downwardDownload

Named Entity Recognition and Hashtag Decomposition to Improve the Classification of Tweets

by billal belainine

2023

In social networks services like Twitter, users are overwhelmed with huge amount of social data, most of which are short, unstructured and highly noisy. Identifying accurate information from this huge amount of data is indeed a hard task.... more

descriptionView Paper arrow_downwardDownload

Text Classification for Arabic Words Using Rep-Tree

by Hamza Naji

2023, International Journal of Computer Science and Information Technology

The amount of text data mining in the world and in our life seems ever increasing and there's no end to it. The concept (Text Data Mining) defined as the process of deriving high-quality information from text. It has been applied on... more

descriptionView Paper arrow_downwardDownload

by Douglas Rolins de Santana

2023, Anais Estendidos do XXXVII Simpósio Brasileiro de Banco de Dados (SBBD Estendido 2022)

Junção por similaridade retorna todos os pares de objetos cuja similaridade não é menor que um limite especificado. Essa operação é de fundamental importância para limpeza e integração de dados. Uma abordagem popular é adotar uma... more

descriptionView Paper arrow_downwardDownload

A Review of Feature Selection Techniques for Heart Disease Prediction

by Richard Rimiru

2023, IJARKE Science & Technology Journal

Cardiovascular diseases (CVDs) are currently the number one cause of death globally (WHO,2017) and in Kenya Cardiovascular issues such as heart attacks are the number one cause of death in adults over 30.However, the trend of the disease... more

Cardiovascular diseases (CVDs) are currently the number one cause of death globally (WHO,2017) and in Kenya Cardiovascular issues such as heart attacks are the number one cause of death in adults over 30.However, the trend of the disease is fast shifting to the youth as more young people are diagnosed with heart conditions. At least 60 per cent of patients who go to hospital with heart attacks are between 20 and 30 (Merab, 2016). Medical diagnosis is an essential yet difficult task that needs to be effected accurately and efficiently. The computerization of medical diagnosis is exceptionally advantageous. A classification system can assist the physicians to examine a patient .The system can predict if the patient is likely to have a certain disease. This paper explores the different feature selection techniques which are used to reduce features, in order to improve accuracy of classification techniques in health care field to aid in the prediction of heart diseases. These techniques provide unseen patterns which can be used for disease diagnosis in the healthcare data. Feature selection techniques are effective approaches to the latest and high dimensional data. The relevant features" obtained which lack noise will improve disease diagnosis in the healthcare industry. Feature selection techniques like Correlation Feature Selection, Information Gain, Relief, and Genetic Search are used to choose the most relevant features from the high dimensional feature set. The rest of the paper is organized as follows: in section 2 we provide an introduction to feature selection frameworks and different feature selection techniques as well as a comprehensive comparison between them; section 3 briefly introduces several feature selection research works for heart disease prediction. Finally we draw conclusions and recommendation in section 4. 2. Feature Selection A realization to use data mining tools more effectively is through data processing, most researchers and practitioners had realized this fact too (Sindhiya and Gunasundari, 2014). Feature selection is one of the essential and commonly used methods in data preprocessing. Computation of the result by using the entire feature set may not always give the greatest result because of redundant and irrelevant features, also referred as noisy features. To remove these redundant features, it is essential to use a

descriptionView Paper arrow_downwardDownload

Aplicação de um método LSA na avaliação automática de respostas discursivas

by Tacio Ribeiro

2023, Anais Do Workshop De Desafios Da Computacao Aplicada a Educacao

In order to attend the virtual learning environment needs, this paper presents the LSA (Latent Semantic Analysis) application to estimate scores automatically in open ended questions, because still there is not a method with a acceptable... more

descriptionView Paper arrow_downwardDownload

Short text classification using feature enrichment from credible texts

by issa alsmadi

2022, International Journal of Web Engineering and Technology

Classifying Tweet's contents can become a useful feature for other application tasks. However, such classification can be quite challenging due to the short length and sparsity of tweet contents. Although individual tweets have limited... more

descriptionView Paper arrow_downwardDownload

Automatic summarization of YouTube video transcription text using term frequency-inverse document frequency

by Indonesian Journal of Electrical Engineering and Computer Science

2022, Indonesian Journal of Electrical Engineering and Computer Science

Automatic summarization is a technique for quickly introducing key information by abbreviating large sections of material. Summarization may apply to text and video with a different method to display the abstract of the subject. Natural... more

descriptionView Paper arrow_downwardDownload

Categorização e Análise de Informações Médicas

by Rebeca N.Alves

2022

Resumo - O método de Análise Semântica Latente (LSA) pode ser utilizado para a construção de um espaço semântico onde os significados de palavras e textos são representados por vetores, e, a proximidade entre estes significados é... more

descriptionView Paper arrow_downwardDownload

Topic of Interest Discovery on Social Media Using Knowledge Base and Term Frequency – Inverse Document Frequency Techniques

by Kennedy Ogada Agayi

2022

Online users frequently post comments in their social network profiles; these comments leave unique traces of attributes such as keywords, interests of an entity and its related connection especially in micro blogs such as twitter. The... more

descriptionView Paper arrow_downwardDownload

by Reginalda Santos Silva and

2022

This article presents a literature review aiming to identify similarity analysis techniques for data represented in XML. Articles that addressed techniques to verify the similarity of XML were searched. During the research and... more

descriptionView Paper arrow_downwardDownload

Uma Avaliação das Prevenções de Phishing em Navegadores Web

by Eduardo Feitosa

2022, Anais do XVII Simpósio Brasileiro de Segurança da Informação e de Sistemas Computacionais (SBSeg 2017)

Navegadores Web são ferramentas de extrema importância no que diz respeito ao consumo de dados na internet, pois possibilitam a interação e consumo de informações providas por diversos serviços disponíveis na Web. Diversas empresas... more

descriptionView Paper arrow_downwardDownload

Comparação de Atributos Estilométricos para Identificação de Autoria de Escrita: Um Estudo de Caso de Guimarães Rosa versus Clarice Lisperctor

by Diego Flores

2022, Anais do Encontro Nacional de Inteligência Artificial e Computacional (ENIAC)

Quando um escritor se expressa, deve se decidir entre uma série de escolhas, tais como quais palavras/expressões usar ou como deve ser a pontuação da leitura. Essas escolhas definem as características individuais do escritor e a... more

descriptionView Paper arrow_downwardDownload

Comparação de Atributos Estilométricos para Identificação de Autoria de Escrita: Um Estudo de Caso de Guimarães Rosa versus Clarice Lisperctor

by Diego flores

2022, Anais do Encontro Nacional de Inteligência Artificial e Computacional (ENIAC)

Quando um escritor se expressa, deve se decidir entre uma série de escolhas, tais como quais palavras/expressões usar ou como deve ser a pontuação da leitura. Essas escolhas definem as características individuais do escritor e a... more

descriptionView Paper arrow_downwardDownload

Automatic Text Ontological Representation and Classification via Fundamental to Specific Conceptual Elements (TOR-FUSE)

by Amir Hossein Razavi

2022

In this dissertation, we introduce a novel text representation method mainly used for text classification purpose. The presented representation method is initially based on a variety of closeness relationships between pairs of words in... more

descriptionView Paper arrow_downwardDownload

Detecção Multilíngue de Serviços Web Duplicados Baseada na Similaridade Textual

by Daniela Claro

2022

Grouping by similarity represents a significant step in strategies of Web Services discovery and composition. Many clustering methods process the service descriptions in natural language to estimate the degree of correlation between them.... more

descriptionView Paper arrow_downwardDownload

Alternative Title: Analyzing the vector model effectiveness in ordering questionnaires

by Richard Henrique de Souza

2022

The elaboration of questionnaires for application in interviews, statistical surveys or scientific research is not a trivial task, because poorly worked questions can lead to direct answers with meaningless or naive interpretations.... more

descriptionView Paper arrow_downwardDownload

Ciência de Dados: Percurso Inicial para Tratamento do Dataset CORD-19

by Klenilmar Dias

2022, Anais da I Escola Regional de Alto Desempenho Norte 2 (ERAD-ERAMIA-NO2 2021)

Este artigo se concentra em apresentar um percurso preliminar para a fase inicial de tratamento do dataset CORD-19, aplicando algumas técnicas de ciência de dados baseado em bibliotecas científicas do Python.

descriptionView Paper arrow_downwardDownload

Profile specific Document Weighted approach using a New Term Weighting Measure for Author Profiling

by Dr. B.Vishnu Vardhan

2022, International Journal of Intelligent Engineering and Systems

Author Profiling is a text classification technique to predict the demographic features like age, gender, native language, location, educational background of the authors by analyzing their writing styles. Term weight measures identify... more

descriptionView Paper arrow_downwardDownload

Detecç ao Multilıngue de Serviços Web Duplicados Baseada na Similaridade Textual

by Daniela Claro

2022

Grouping by similarity represents a significant step in strategies of Web Services discovery and composition. Many clustering methods process the service descriptions in natural language to estimate the degree of correlation between them.... more

descriptionView Paper arrow_downwardDownload

Aplicação de um método LSA na avaliação automática de respostas discursivas

by Joaquim Queiroz

2022

In order to attend the virtual learning environment needs, this paper presents the LSA (Latent Semantic Analysis) application to estimate scores au- tomatically in open ended questions, because still there is not a method with a... more

descriptionView Paper arrow_downwardDownload

Short text classification using feature enrichment from credible texts

by issa alsmadi

2022, Int. J. Web Eng. Technol.

Classifying Tweet's contents can become a useful feature for other application tasks. However, such classification can be quite challenging due to the short length and sparsity of tweet contents. Although individual tweets have... more

descriptionView Paper arrow_downwardDownload

MIKE: An Interactive Microblogging Keyword Extractor using Contextual Semantic Smoothing

by Osama Khan

2022

Social media, such as tweets on Twitter and Short Message Service (SMS) messages on cellular networks, are short-length textual documents (short texts or microblog posts) exchanged among users on the Web and/or their mobile devices.... more

descriptionView Paper arrow_downwardDownload

Efficient Feature Selection and Domain Relevance Term Weighting Method for Document Classification

by Aurangzeb khan

2021, 2010 Second International Conference on Computer Engineering and Applications

Feature selection is of paramount concern in document classification process which improves the efficiency and accuracy of text classifier. Vector Space Model is used to represent the "Bag of Word" BOW of the documents with term weighting... more

descriptionView Paper arrow_downwardDownload

Efficient feature selection and domain relevance term weighting method for document classification

by Aurangzeb khan

2021

Feature selection is of paramount concern in document classification process which improves the efficiency and accuracy of text classifier. Vector Space Model is used to represent the "Bag of Word" BOW of the documents with term weighting... more

descriptionView Paper arrow_downwardDownload

Topic of Interest Discovery on Social Media Using Knowledge Base and Term Frequency – Inverse Document Frequency Techniques

by Kennedy Ogada

2021

Online users frequently post comments in their social network profiles; these comments leave unique traces of attributes such as keywords, interests of an entity and its related connection especially in micro blogs such as twitter. The... more

descriptionView Paper arrow_downwardDownload

Information Extraction from Microblog for Disaster Related Event

by Prasenjit Majumder

2021

This paper presents the participation of Information Retrieval Lab(IRLAB) at DAIICT Gandhinagar ,India in Data challenge track of SMERP 2017. This year SMERP Data challenge track has offered a task called Text Extraction on the Italy... more

descriptionView Paper arrow_downwardDownload

Text Classification for Arabic Words Using Rep-Tree

by Wesam Ashour

2021, International Journal of Computer Science and Information Technology

The amount of text data mining in the world and in our life seems ever increasing and there's no end to it. The concept (Text Data Mining) defined as the process of deriving high-quality information from text. It has been applied on... more

descriptionView Paper arrow_downwardDownload

Uma arquitetura de question-answering instanciada no domínio de doenças crônicas

by Luciana Almansa

2021

The medical record describes health conditions of patients helping experts to make decisions about the treatment. The biomedical scientific knowledge can improve the prevention and the treatment of diseases. However, the search for... more

descriptionView Paper arrow_downwardDownload

Twitter User Topic Profiling Using Knowledge Base and Term Frequency – Inverse Document Frequency Feature Selection Method

by Athman Masoud

2021, IJARKE Science & Technology Journal

Social media platforms such as twitter have been used enormously to post tweets and comments respectively by organizations or individuals from different geographical locations, religion, language and cultural background for branding,... more

Social media platforms such as twitter have been used enormously to post tweets and comments respectively by organizations or individuals from different geographical locations, religion, language and cultural background for branding, sensitization, and knowledge dissemination, message exchange etc. The real-world nature of posts is that they are noisy and complex, making our problem difficult. Tweets are intentionally short (limited to just 140-characters) which forces users to be creative in how they constrain the text while preserving meaning. As with text messages sometimes users rely on common acronyms (e.g., "d/r" means "dressing room" in sports), or ("Hawks" to mean "Chicago Blackhawks," in general, this leads to noise (Dredze, McNamee, Rao, Gerber, & Finin, 2010). Another noisy and complex aspect of social posts analysis is when a user switches to another language either within the same comment/tweet or in a different one. Furthermore the user can be having social profiles in two or more other languages; this will require a huge diversity of algorithms and approaches that help in identifying user specific interest in the social media (Dredze et al., 2010). Therefore, in this research, social media posts were analyzed by using term frequency inverse document frequency (TFIDF) feature selection method and knowledge base Synsets merged with keywords in the text representation model. Various machine learning algorithms such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Decision Tree (DT) were used for data training in the discovery of user topic of interest. In addition, Social media includes all the channels people connect to each other through social sites. Therefore, mobile devices, social networks, email, texting, micro-blogging and location sharing are just a few of the many ways people engage in creating virtual communities. As people link, like, follow, friend, reply, retweet, comment, tag, rate, review, edit, update, and text one another (among other channels) they form collections of connections. These collections contain network structures that can be extracted, analyzed and visualized. The result can be insights into the structure, size, and key positions in these networks (Raad et al., 2010).

descriptionView Paper arrow_downwardDownload

Aplicação das Bibliotecas Python para tratamento de dados em tempo real

by denis Vicentainer

2021, Metodologias e Aprendizado

A necessidade de monitorar a propagação do Covid-19 (Sars-Cov-2) fez emergir uma demanda, sem precedentes, por armazenamento, tratamento e análise de dados. Esta demanda impõe aos pesquisadores maior celeridade e agilidade no... more

descriptionView Paper arrow_downwardDownload

Recuperação da Informação aplicada à extração de relações semânticas em uma coleção fechada de documentos psicológicos coletados na Web

by Murilo C Jayme

2020, Recuperação da Informação aplicada à extração de relações semânticas em uma coleção fechada de documentos psicológicos coletados na Web

Devido ao crescente aumento do volume de informaç̧ões na internet, buscam-se uma melhoria contínua das diversas técnicas da recuperação de informaçõ̃es à fim de alcançar resultados mais eficientes e eficazes para encontrar documentos cada... more

descriptionView Paper arrow_downwardDownload