Academia.eduAcademia.edu

Topic Classification

description16 papers
group1 follower
lightbulbAbout this topic
Topic classification is the process of categorizing text or documents into predefined topics or themes using algorithms and machine learning techniques. It involves analyzing the content to identify relevant features and assigning labels that represent the main subject matter, facilitating information retrieval and organization.
lightbulbAbout this topic
Topic classification is the process of categorizing text or documents into predefined topics or themes using algorithms and machine learning techniques. It involves analyzing the content to identify relevant features and assigning labels that represent the main subject matter, facilitating information retrieval and organization.

Key research themes

1. How do machine learning approaches address document representation and classifier construction for effective automated text classification?

This research focuses on the machine learning (ML) paradigm for automated text classification, emphasizing methods for representing textual data as numerical features, constructing classifiers from labeled datasets, and evaluating their effectiveness. This theme is vital because textual data is inherently high-dimensional and sparse, and successful categorization demands carefully engineered document representations and robust classification algorithms to improve accuracy while ensuring scalability.

Key finding: This foundational survey delineates the transformation from knowledge-engineering to machine learning approaches for text classification, highlighting three main problems: document representation (text indexing), classifier... Read more
Key finding: This survey contextualizes text classification as a variant of the broader classification problem and highlights specific challenges arising from text's high-dimensionality and sparse features. It analyzes various... Read more
Key finding: The study presents the application of supervised machine learning algorithms—Naive Bayes, Vector Space Model (VSM)-based classifiers, and methods incorporating syntactic features (e.g., Stanford Tagger)—evaluating their... Read more
Key finding: This paper introduces a novel supervised feature extraction method that projects high-dimensional document vectors into an abstract feature space with dimensions equal to the number of classes by aggregating evidence for each... Read more
Key finding: This work proposes a classification framework using a hierarchical topic dictionary enhanced by relevance and discrimination weights on keywords and ontology nodes. By propagating weighted relevance scores up the hierarchy,... Read more

2. Can ontology-based methods enable dynamic, training-free text classification with semantic understanding?

This area investigates approaches that leverage structured domain knowledge in the form of ontologies to classify text without relying on labeled data for training. By modeling documents and classes semantically rather than statistically, these methods allow flexible, dynamic topic definitions and the incorporation of background knowledge, addressing limitations of conventional supervised algorithms with fixed categories and training dependence.

Key finding: This paper introduces an innovative classification approach using ontological contexts as dynamic topics, treating the ontology as a classifier and obviating the need for pre-classified training documents. It employs semantic... Read more

3. How do topic modeling techniques facilitate the discovery and classification of latent subjects in textual corpora?

Topic modeling represents unsupervised probabilistic methods to extract latent semantic structures from text collections, which aids in document clustering, classification, and exploration. This research theme focuses on statistical models such as Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA), their extensions, evaluation metrics like topic coherence, and challenges dealing with short texts and semantic ambiguities.

Key finding: The paper elaborates on the theoretical underpinnings of LDA in comparison to related methods like LSA and probabilistic LSA, emphasizing LDA’s virtue as a fully generative Bayesian model that assigns topics as latent... Read more
Key finding: This empirical study compares LSA and LDA applied to full-text classification of a large corpus of e-books, finding that while both can effectively cluster documents by topic, LSA shows superiority in some recommendation... Read more
Key finding: The authors develop several topic modeling-based classification frameworks for clinical imaging reports, including binary and aggregate topic classifiers built around LDA. They show that topic distribution features outperform... Read more
Key finding: Employing scientometric and bibliometric analyses on publications from Web of Science and Scopus, this study maps the evolution, primary application areas, and prominent models used in topic modeling research. It identifies... Read more
Key finding: This paper extends the traditional LDA framework by integrating external ontology concepts to capture semantic relationships and address polysemy and word ambiguity more effectively. Semantic-LDA computes dynamic word-concept... Read more

All papers in Topic Classification

Diabetes mellitus is a worldwide pandemic chronic metabolic disease that threatens human health seriously. Correct and early prediction of diabetes is one of the important factors for medical treatment and diabetes management. In the... more
There are huge data from unstructured text obtained daily from various resources like emails, tweets, social media posts, customer comments, reviews, and reports in many different fields, etc. Unstructured text data can be analyzed to... more
The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine... more
There are huge data from unstructured text obtained daily from various resources like emails, tweets, social media posts, customer comments, reviews, and reports in many different fields, etc. Unstructured text data can be analyzed to... more
The purpose of this thesis is to assist in automating the detection of Fake News by identifying which features are more useful for different classifiers. The effectiveness of different extracted features for Fake News detection are going... more
The purpose of this thesis is to assist in automating the detection of Fake News by identifying which features are more useful for different classifiers. The effectiveness of different extracted features for Fake News detection are going... more
Topic classification of texts is one of the most interesting challenges in Natural Language Processing (NLP). Topic classifiers commonly use a bag-of-words approach, in which the classifier uses (and is trained with) selected terms from... more
Due to rapid growth of on-line information, text classification has become one of key technique for handling and organizing text data. One of the reasons to build taxonomy of documents is to make it easier to find relevant documents,... more
There are huge data from unstructured text obtained daily from various resources like emails, tweets, social media posts, customer comments, reviews, and reports in many different fields, etc. Unstructured text data can be analyzed to... more
There are huge data from unstructured text obtained daily from various resources like emails, tweets, social media posts, customer comments, reviews, and reports in many different fields, etc. Unstructured text data can be analyzed to... more
There are huge data from unstructured text obtained daily from various resources like emails, tweets, social media posts, customer comments, reviews, and reports in many different fields, etc. Unstructured text data can be analyzed to... more
There are huge data from unstructured text obtained daily from various resources like emails, tweets, social media posts, customer comments, reviews, and reports in many different fields, etc. Unstructured text data can be analyzed to... more
We present a topic identification system for news, which is based upon an evaluation of similarity between the topics and a large amount of documents in the news database. Our system is able to provide the topics for every news samples.... more
The review process is essential to ensure the quality of publications. Recently, the increase of submissions for top venues in machine learning and NLP has caused a problem of excessive burden on reviewers and has often caused concerns... more
Due to rapid growth of on-line information, text classification has become one of key technique for handling and organizing text data. One of the reasons to build taxonomy of documents is to make it easier to find relevant documents,... more
With the rapid growth of information technology, the amount of unstructured text data in digital libraries is rapidly increased and has become a big challenge in analyzing, organizing and how to classify text automatically in E-research... more
News articles are important for providing timely, historic information. However, the Internet is replete with text that may contain irrelevant or unhelpful information, therefore means of processing it and distilling... more
by Prof. Mona Nasr and 
1 more
There are huge data from unstructured text obtained daily from various resources like emails, tweets, social media posts, customer comments, reviews, and reports in many different fields, etc. Unstructured text data can be analyzed to... more
Social network users generate a large number of reviews and comments, these reviews and comments express their opinions about on different topics. As a result, there is a great need to understand and classify these opinions. Sentiment... more
Download research papers for free!