Academia.eduAcademia.edu

Keyphrases Extraction

description22 papers
group1 follower
lightbulbAbout this topic
Keyphrases extraction is a natural language processing task that involves identifying and extracting significant phrases from a text document that capture its main topics or themes, facilitating information retrieval, summarization, and content analysis.
lightbulbAbout this topic
Keyphrases extraction is a natural language processing task that involves identifying and extracting significant phrases from a text document that capture its main topics or themes, facilitating information retrieval, summarization, and content analysis.

Key research themes

1. How can novel unsupervised tree and graph-based structures enhance domain-independent keyphrase extraction?

This theme focuses on techniques that aim to improve keyphrase extraction without reliance on supervised training data or domain knowledge. Such methods leverage tree or graph structures to capture term cohesiveness, semantic relations, and document topology, addressing limitations in widely used unsupervised methods like TextRank. These approaches matter because they enable scalable, domain-agnostic extraction applicable to resource-scarce scenarios and diverse languages.

Key finding: TeKET introduces the KeyPhrase Extraction (KePhEx) tree and a novel Cohesiveness Index measure for ranking candidate phrases, achieving domain- and language-independence without requiring training data. This tree-based... Read more
Key finding: GTEK employs a sentence clustering approach via Graph-based Growing Self-Organizing Map (G-GSOM) to represent subtopics within documents, then applies TextRank within these clusters to extract keyphrases. This guarantees... Read more
Key finding: SemGraph builds a global semantic relationship graph by statistically filtering co-occurrences across documents and enriching this graph with WordNet semantic relations. Unlike conventional co-occurrence graphs limited to... Read more

2. What role can hybrid or collaborative approaches combining supervised and unsupervised methods play in improving keyphrase extraction?

This theme investigates approaches that reconcile the strengths of supervised learning, which leverage labeled data and global corpus knowledge, and unsupervised methods, which are adaptable to domain shifts and require no training data. Hybrid models aim to integrate local document structure and global statistics for more accurate and robust keyphrase extraction, especially in short or noisy documents where each approach alone may underperform.

Key finding: HybridRank synergistically combines a supervised Naïve Bayes classifier (KEA) and an unsupervised graph-based ranking algorithm (TextRank) by merging their ranked outputs through a collaborative scoring scheme. Tested on... Read more
Key finding: This study reframes keyphrase extraction as a ranking problem addressed via a multilayer perceptron neural network, classifying candidate phrases and supplementing ranked lists with uncertain candidates to meet desired... Read more

3. How do feature engineering and linguistic insights, including document structure and positional features, contribute to improving supervised keyphrase extraction performance?

This theme covers methods that enhance candidate representation through linguistic and structural features such as phrase morphology, position in document sections, and word-level statistics. Understanding feature importance and utilizing richer linguistic cues improves supervised classification or ranking models, offering better generalization and interpretability in keyphrase extraction tasks.

Key finding: By augmenting the baseline Kea system with features that encode phrase position relative to logical document sections and morphological markers like acronyms and terminologically productive suffixes, this work tailors... Read more
Key finding: This study exploits detailed XML document markup from scientific articles to incorporate structural features—such as frequency in title, abstract, and sections—into candidate ranking. Results show that structural features... Read more
Key finding: Through empirical analysis of five keyphrase attributes (Term Frequency, First Occurrence, Last Occurrence, Phrase Position in Sentences, and Term Cohesion Degree) using author-supplied keyphrases as ground truth, this study... Read more

All papers in Keyphrases Extraction

Nowadays, automatic multidocument text summarization systems can successfully retrieve the summary sentences from the input documents. But, it has many limitations such as inaccurate extraction to essential sentences, low coverage, poor... more
Automatic keyphrase extraction is useful in efficiently locating specific documents in online databases. While several techniques have been introduced over the years, improvement on accuracy rate is minimal. This research examines... more
One of the most useful techniques for improving the contact experience is predicting the most likely word for immediate pick. Socializing has gotten to be much easier much appreciated to the headway of versatile phones and the broad... more
Data Mining as a process of finding new, useful knowledge from data using different techniques. Using these techniques we getting faster and better search of large amounts of data that we facing every day. Clustering of data is one of the... more
In this paper, we propose a new technique to improve QRS complex detection. This technique consists of incorporating an autoencoder and bidirectional long short term memory (BiLSTM). The autoencoder used is a stacked autoencoder and... more
Automatic keyphrase extraction is useful in efficiently locating specific documents in online databases. While several techniques have been introduced over the years, improvement on accuracy rate is minimal. This research examines... more
As long as the internet user is increasing, online electronic content is growing proportionally irrespective of languages. A lot of research works on English text summarization have come to light to deal with this gigantic body of online... more
Nowadays, the amount of Arabic documents has increased significantly in different domains, such as news articles, emails, business summary, biomedicine, web sites and social media documents. Some databases have increased in its size to... more
In the rapid development of www the amount of documents used increases in a rapid speed. This produces huge gigabyte of text document processing. For indexing as well as retrieving the required text document an efficient algorithms... more
Multi document summarization has very great impact among research community, ever since the growth of online information and availability. Selecting most important sentences from such huge repository of data is quiet tricky and... more
Cluster analysis is a statistical approach that identifies uniform clusters within data. The closeness of data is measured quantitatively using distance functions. Specifically for text data mining, clustering serves as a method of... more
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Automatic keyphrases extraction (AKE) is a principal task in natural language processing (NLP). Several techniques have been exploited to improve the process of extracting keyphrases from documents. Deep learning (DL) algorithms are the... more
The exponential growth of online textual data triggered the crucial need for an effective and powerful tool that automatically provides the desired content in a summarized form while preserving core information. In this paper, we propose... more
An effective keyphrase extraction system requires to produce self-contained high quality phrases that are also key to the document topic. This paper presents BERT-JointKPE, a multi-task BERT-based model for keyphrase extraction. JointKPE... more
The exponential growth of online textual data triggered the crucial need for an effective and powerful tool that automatically provides the desired content in a summarized form while preserving core information. In this paper, we propose... more
Clustering is one of the most important data mining or text mining algorithm that is used to group similar objects together. The aim of clustering is to find the relationship among the data objects, and classify them into meaningful... more
In text categorization problem the most used method for documents representation is based on words frequency vectors called VSM (Vector Space Model). This representation is based only on words from documents and in this case loses any... more
With the problem of extended information resources and the remarkable evaluate of data removal, the require of having automated summarization techniques revealed up. As summarization is needed the most at present searching information on... more
As long as the internet user is increasing, online electronic content is growing proportionally irrespective of languages. A lot of research works on English text summarization have come to light to deal with this gigantic body of online... more
With the increased advancement in technology and the proliferation of internet applications, electronic mails become an increasingly essential means of communication, for both individuals and organizations. In the recent years, however,... more
The graph-based approach has proven to be the most effective method of extracting keyphrases. Existing graph-based extraction methods do not include nouns as a component, resulting in keyphrases that are not noun-centric, leading to... more
The graph-based approach has proven to be the most effective method of extracting keyphrases. Existing graph-based extraction methods do not include nouns as a component, resulting in keyphrases that are not noun-centric, leading to... more
Automatic keyphrases extraction (AKE) is a principal task in natural language processing (NLP). Several techniques have been exploited to improve the process of extracting keyphrases from documents. Deep learning (DL) algorithms are the... more
The methods of Automatic Extractive Summarization (AES) uses the features of the sentences of the original text to extract the most important information that will be considered in summary. It is known that the first sentences of the text... more
This study presents the results of an experimental study of two document clustering techniques which are kmeans and k-means++. In particular, we compare the two main approaches in crime document clustering. The drawback of k-means is that... more
by ff gh
In this work the improvement of automatic keyphrases extraction using deep linguistic features and supervised machine learning algorithm is discussed. The n-gram method for extracting important keyphrases produces huge number of candidate... more
Punjabi, the official language of the Indian Punjab, is an Indo-Aryan language. Keyphrase extraction is a Natural Language Processing task to identify keyphrases in a document. Keyphrases are groups of words that describe the meaning of a... more
This paper focus on keyphrase extraction for news articles because news article is one of the popular document genres on the web and most news articles have no author-assigned keyphrases. Existing methods for single document keyphrase... more
State-of-the-art researches in unsupervised automatic keyphrase extraction focused on graph analysis. Keyphrase ranking is critical step in graph-based approaches. In this paper, we follow two main purposes including choice of good... more
This paper introduces the methods and experiments applied in CIST system participating in the CLSciSumm 2016 Shared Task at BIRNDL 2016. We have participated in the TAC 2014 Biomedical Summarization Track, so we develop the system based... more
We examine the effect of probabilistic topic model-based word representations, on sentence-based extractive summarization. We formulate the task of sentence selection as a binary classification problem, and we test a variety of machine... more
A textual data processing task that involves the automatic extraction of relevant and salient keyphrases from a document that expresses all the important concepts of the document is called keyphrase extraction. Due to technological... more
A textual data processing task that involves the automatic extraction of relevant and salient keyphrases from a document that expresses all the important concepts of the document is called keyphrase extraction. Due to technological... more
A growing number of universities offer recordings of lectures, seminars and talks in an online e-learning portal. However, the user is often not interested in the entire recording, but is looking for parts covering a certain topic.... more
A keyphrase can be described as a brief phrase comprising between one to five words that correspond to significant perceptions in an article. Text summarization, automatic indexing, classification and text mining are some of the many... more
Nowadays, the amount of Arabic documents has increased significantly in different domains, such as news articles, emails, business summary, biomedicine, web sites and social media documents. Some databases have increased in its size to... more
Automatic keyphrases extraction is to extract a set of phrases that are related to the main topics discussed in a document. They have served in several areas of text mining such as information retrieval and classification of a large text... more
Yogeswari Magar ABSTRACTNowadays the volume of data is increasing from variety of sources. So this volume of text data needs to be summarized effectively to be useful. This paper is a comprehensive literature review of Automatic text... more
Clustering algorithms are used to generate clusters of elements having similar characteristics. Among the different groups of clustering algorithms, agglomerative algorithm is widely used in the document clustering domain. This study... more
Text Summarization is the process of generating a short summary for the document that contains the overall meaning. This paper explains the extractive technique of summarization which consists of selecting important sentences from the... more
The purpose of the study is to verify whether some correlation exists between soil erodibility ( i.e. K factor mentioned in RUSLE model) and data obtained from satellite images. This piece of work represents a first attempt towards a... more
This chapter addresses the issue of topic extraction from text corpora for ontology learning. The first part provides an overview of some of the most significant solutions present today in the literature. These solutions deal mainly with... more
A summary system comprises a subtraction of text document contents to generate a new form that delivers the essentials contents of the text documents. Due to the hassle of documents overload, getting the right information and... more
We present a novel approach to intro-to-programming domain model discovery from textbooks using an over-generation and ranking strategy. We first extract candidate key phrases from each chapter in a Computer Science textbook focusing on... more
Sentence-based extractive summarization aims at automatically generating shorter versions of texts by extracting from them the minimal set of sentences that are necessary and sufficient to cover their content. Providing effective... more
Automatic keyphrase extraction is useful in efficiently locating specific documents in online databases. While several techniques have been introduced over the years, improvement on accuracy rate is minimal. This research examines... more
Arabic Documents Clustering is an important task for obtaining good results with the traditional Information Retrieval (IR) systems especially with the rapid growth of the number of online documents present in Arabic language. Documents... more
This paper introduces an automatic method to extend existing WordNets via machine translation. Our proposal relies on the hierarchical skeleton of the English Princeton WordNet (PWN) as a backbone to extend their taxonomies. Our proposal... more
A growing number of universities offer recordings of lectures, seminars and talks in an online e-learning portal. However, the user is often not interested in the entire recording, but is looking for parts covering a certain topic.... more
Download research papers for free!