Academia.eduAcademia.edu

Keyphrases Extraction

description22 papers
group1 follower
lightbulbAbout this topic
Keyphrases extraction is a natural language processing task that involves identifying and extracting significant phrases from a text document that capture its main topics or themes, facilitating information retrieval, summarization, and content analysis.
lightbulbAbout this topic
Keyphrases extraction is a natural language processing task that involves identifying and extracting significant phrases from a text document that capture its main topics or themes, facilitating information retrieval, summarization, and content analysis.

Key research themes

1. How can novel unsupervised tree and graph-based structures enhance domain-independent keyphrase extraction?

This theme focuses on techniques that aim to improve keyphrase extraction without reliance on supervised training data or domain knowledge. Such methods leverage tree or graph structures to capture term cohesiveness, semantic relations, and document topology, addressing limitations in widely used unsupervised methods like TextRank. These approaches matter because they enable scalable, domain-agnostic extraction applicable to resource-scarce scenarios and diverse languages.

Key finding: TeKET introduces the KeyPhrase Extraction (KePhEx) tree and a novel Cohesiveness Index measure for ranking candidate phrases, achieving domain- and language-independence without requiring training data. This tree-based... Read more
Key finding: GTEK employs a sentence clustering approach via Graph-based Growing Self-Organizing Map (G-GSOM) to represent subtopics within documents, then applies TextRank within these clusters to extract keyphrases. This guarantees... Read more
Key finding: SemGraph builds a global semantic relationship graph by statistically filtering co-occurrences across documents and enriching this graph with WordNet semantic relations. Unlike conventional co-occurrence graphs limited to... Read more

2. What role can hybrid or collaborative approaches combining supervised and unsupervised methods play in improving keyphrase extraction?

This theme investigates approaches that reconcile the strengths of supervised learning, which leverage labeled data and global corpus knowledge, and unsupervised methods, which are adaptable to domain shifts and require no training data. Hybrid models aim to integrate local document structure and global statistics for more accurate and robust keyphrase extraction, especially in short or noisy documents where each approach alone may underperform.

Key finding: HybridRank synergistically combines a supervised Naïve Bayes classifier (KEA) and an unsupervised graph-based ranking algorithm (TextRank) by merging their ranked outputs through a collaborative scoring scheme. Tested on... Read more
Key finding: This study reframes keyphrase extraction as a ranking problem addressed via a multilayer perceptron neural network, classifying candidate phrases and supplementing ranked lists with uncertain candidates to meet desired... Read more

3. How do feature engineering and linguistic insights, including document structure and positional features, contribute to improving supervised keyphrase extraction performance?

This theme covers methods that enhance candidate representation through linguistic and structural features such as phrase morphology, position in document sections, and word-level statistics. Understanding feature importance and utilizing richer linguistic cues improves supervised classification or ranking models, offering better generalization and interpretability in keyphrase extraction tasks.

Key finding: By augmenting the baseline Kea system with features that encode phrase position relative to logical document sections and morphological markers like acronyms and terminologically productive suffixes, this work tailors... Read more
Key finding: This study exploits detailed XML document markup from scientific articles to incorporate structural features—such as frequency in title, abstract, and sections—into candidate ranking. Results show that structural features... Read more
Key finding: Through empirical analysis of five keyphrase attributes (Term Frequency, First Occurrence, Last Occurrence, Phrase Position in Sentences, and Term Cohesion Degree) using author-supplied keyphrases as ground truth, this study... Read more

All papers in Keyphrases Extraction

Document Clustering algorithms goal is to create clusters that are coherent internally, but clearly different from each other. The useful expressions in the documents is often accompanied by a large amount of noise that is caused by the... more
This paper focus on keyphrase extraction for news articles because news article is one of the popular document genres on the web and most news articles have no author-assigned keyphrases. Existing methods for single document keyphrase... more
Current multi-document summarization systems can successfully extract summary sentences, however with many limitations including: low coverage, inaccurate extraction to important sentences, redundancy and poor coherence among the selected... more
In this paper, a supervised learning technique for extracting keyphrases of Arabic documents is presented. The extractor is supplied with linguistic knowledge to enhance its efficiency instead of relying only on statistical information... more
Document Clustering is a branch of a larger area of scientific study known as data mining .which is an unsupervised classification using to find a structure in a collection of unlabeled data. The useful information in the documents can be... more
Document Clustering is a branch of a larger area of scientific study known as data mining .which is an unsupervised classification using to find a structure in a collection of unlabeled data. The useful information in the documents can be... more
In this paper, a supervised learning technique for extracting keyphrases of Arabic documents is presented. The extractor is supplied with linguistic knowledge to enhance its efficiency instead of relying only on statistical information... more
Download research papers for free!