Short-Text Semantic Similarity

description7 papers

group2 followers

lightbulbAbout this topic

Short-Text Semantic Similarity is a subfield of natural language processing that focuses on measuring the degree of similarity in meaning between short text segments, such as sentences or phrases. It employs various computational techniques, including vector space models and deep learning, to quantify semantic relationships and enhance understanding of textual content.

lightbulbAbout this topic

Key research themes

1. What semantic knowledge sources and methodological frameworks can most effectively measure short-text semantic similarity?

This research theme focuses on classifying and evaluating various semantic similarity measurement techniques by leveraging different semantic knowledge sources such as string-based, corpus-based, knowledge-based, and hybrid methods. Understanding these frameworks is critical to developing effective algorithms that capture semantic similarity beyond surface lexical matching, which is particularly challenging in short texts due to limited context and high ambiguity.

by asad abdi

2022, Soft Computing

Key finding: This comprehensive review categorizes short-text similarity methods into string-based, corpus-based, knowledge-based, and hybrid techniques, identifying four semantic knowledge bases and eight corpus resources as external... Read more

articleView Paper downloadDownload

Semantic Textual Similarity Methods, Tools, and Applications: A Survey

by Goutam Majumder

2022, Computación y Sistemas

Key finding: The paper divides STS methods into topological/knowledge-based, statistical/corpus-based, and string-based categories, with special emphasis on WordNet taxonomy for topological methods. It contributes a novel hybrid approach... Read more

articleView Paper downloadDownload

Semantic similarity of short texts

by Diana Inkpen

2015, Current Issues in Linguistic Theory

Key finding: Proposes a corpus-based semantic word similarity measure integrated with a modified and normalized Longest Common Subsequence (LCS) algorithm for text similarity. Experimentation on multiple datasets shows superior... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. How can lexical, syntactic, and semantic features be integrated via machine learning to improve short-text semantic similarity prediction?

This research area investigates the integration of multiple linguistic feature types—lexical overlap, syntactic structures, semantic relations—through supervised machine learning techniques like Support Vector Machines (SVM). The goal is to construct robust feature representations that capture semantic equivalence or similarity between short texts, enabling systems to generalize across languages and domains, including resource-scarce settings.

MiniExperts: An SVM Approach for Measuring Semantic Textual Similarity

by Hernani Costa and

2015, 9th Int. Workshop on Semantic Evaluation (SemEval'15)

Key finding: This work presents an SVM-based system utilizing diverse linguistically motivated features including distributional, conceptual, semantic similarity measures, and multiword expressions. It performed well on SemEval-2015 Task... Read more

articleView Paper downloadDownload

UOW: Semantically Informed Text Similarity

by Miguel Rios

2022

Key finding: Employs a supervised learning regression model combining lexical, syntactic, and semantic metrics such as word overlap, BLEU scores on base-phrases, named entity preservation, and predicate-argument alignment. While... Read more

articleView Paper downloadDownload

Using Part-of-Speech Tags as Deep-Syntax Indicators in Determining Short-Text Semantic Similarity

by Vuk Batanović

2015, Computer Science and Information Systems

Key finding: Proposes a bag-of-words statistical model augmented with a part-of-speech weighting scheme as proxy for deeper syntactic information, enhancing semantic similarity measurement without requiring resource-heavy parsing. It... Read more

articleView Paper downloadDownload

Evaluation and Classification of Syntax Usage in Determining Short-Text Semantic Similarity

by Vuk Batanović

2014, Telfor Journal

Key finding: Offers a systematic analysis and classification of syntactic information usage—including word order, POS tagging, parsing, semantic role labeling—in STS algorithms, evaluated on the Microsoft Research Paraphrase Corpus.... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. What role do lexico-syntactic pattern-based corpus methods play in capturing semantic similarity in short texts without reliance on fine-grained semantic resources?

This theme examines approaches that extract semantic similarity via lexico-syntactic patterns from large corpora instead of curated semantic lexical resources like WordNet, aiming to overcome coverage limitations and resource constraints. It focuses on pattern-based methods employing finite-state transducers and corpus mining to capture semantic relations robustly and their utility in short-text similarity and relation extraction.

by Cédrick Fairon

2025

Key finding: Introduces PatternSim, a novel corpus-based semantic similarity measure leveraging 18 hand-crafted lexico-syntactic patterns encoded as finite-state transducers applied to massive corpora (WACYPEDIA, UKWAC). Without relying... Read more

articleView Paper downloadDownload

by Diego Molla

2025

Key finding: Applies various similarity matching methods—ranging from simple word overlap to dependency graph matching and feature-based vector similarity incorporating lexical, syntactic, and semantic features—for multiple-choice... Read more

articleView Paper downloadDownload

SAGAN: an approach to semantic textual similarity based on textual entailment

by JULIO GABRIEL MARTINEZ CASTILLO

2023

Key finding: Develops a system originally designed for textual entailment that uses multiple WordNet-based word-to-word similarity measures aggregated at sentence level to assess semantic textual similarity. Achieves competitive Pearson... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Short-Text Semantic Similarity

Incorporating Multiple Feature Groups to a Siamese Neural Network for Semantic Textual Similarity Task in Portuguese Texts

by Claudia Moro

2025

The Semantic Textual Similarity (STS) algorithms have a key role in Natural Language Processing (NLP) studies since it can support various NLP tasks such as Text Summarization and Information Retrieval. Although we found several STS... more

descriptionView Paper arrow_downwardDownload

Run-time classification of malicious processes using system call analysis

by Moshe Kam

2024, 2015 10th International Conference on Malicious and Unwanted Software (MALWARE)

This study presents a malware classification system designed to classify malicious processes at run-time on production hosts. The system monitors process-level system call activity and uses information extracted from system call traces as... more

Fig. 2. Weighted F score vs. trace length for LR classifier and n = 3

Fig. 1. Weighted F) score vs. n-gram length for LR classifier and trace length of 1500 system calls

Fig. 3. Classifier confusion matrix for Microsoft category ground truth labels, showing the fraction of samples in each class indicated by the row labels classified as the column labels

CLASSIFIER AND FEATURE EXTRACTION STRATEGY COMPARISON TABLE II

PER-FAMILY CLASSIFIER PRECISION, RECALL, AND F SCORES FOR THE MALWARE FAMILIES WITH THE HIGHEST AND LOWEST F SCORES Generally, the best performing families were those from narrowly defined malware categories, such as viruses, worms, and backdoors. Conversely, the worst performing families were those from more broadly defined categories, particularly Tro- jans. Furthermore, some of the poorly performing families are also broadly defined. For example, Gandlo!gmb, Ircbrute!gmb, and Sisron!gmb are all generically-defined Trojans. In contrast, the highest performing families are typically very narrowly defined. For example, Klez and MyDoom, are well studied families whose samples perform specific functions and have a shared heritage.

descriptionView Paper arrow_downwardDownload

Choosing News Topics to Explain Stock Market Returns

by Paul-Robert Laliberte

2024

We analyze methods for selecting topics in news articles to explain stock returns. We find, through empirical and theoretical results, that supervised Latent Dirichlet Allocation (sLDA) implemented through Gibbs sampling in a stochastic... more

Proor. From (12) we get document grows or the number of documents grows, (12) puts much more probability on z € Zg than z’ € Zg, relative to what the text model alone would do. 8 Conclusions

path the in-sample R? sometimes climbs rapidly and then falls. We study this case in greater detail in Section 7, and here we describe less formally why it often fails. From a regression perspective, the root of the problem with the Gibbs/EM algorithm is that it uses the same data to estimate the coefficients 7 and to select the regressors z;. This offers too much flexibility and can lead to serious overfitting. Any error in the current estimate of 7 gets amplified when the Gibbs sampler favors topic assignments tailored to 4; at the next “M” step, the errors are further amplified when the new # is optimized for flawed topic assignments, and the process repeats.

Figure 2: Out-of-sample R’s; x-axis gives the number of topics.

7 Analysis of the o Parameter Figure 3: Comparisons of log-likelihood for return models

Choosing a larger or more accurate value of o can help but does not eliminate the problem. If o is too large, sLDA reduces to LDA. The true value o; in Step (2)(b) of the generative model is the standard deviation of the residuals y(j) — Me Zjs j=1...,J. It therefore seems reasonable to estimate o. from the residuals in the regression in each “M” step. Figure 1 shows the evolution of in-sample R* across independent runs of this method. At a fixed number of iterations, the results vary widely, and along a single 5 Branching LDA In Section 7, we explain what happens theoretically as o — 0. But the results of Table 1 provide a simple illustration. We run sLDA with o fixed and report the in-sample R? after the final ”M" step. Using the estimated coefficients 7, we also calculate an out- of-sample (predictive) R? on a holdout sample. (See Section 5 for details of this calculation.) At small values of o, the in-sample R? approaches 1, suggesting that the topic model is doing an unrea- sonably good job explaining returns. But the predictive R? becomes very negative, revealing serious overfitting to the training data. In choosing topic assignments to fit the training labels at the expense of the validity of the topic model, the algorithm loses any ability to predict returns out of sample. The problem becomes particularly acute with a large number of topics. With more topics, we are more likely to encounter the scenario described in (6), below, and more likely to find meaningless topic assignments that produce nearly perfect in-sample fits to the data.

7.2 Residual o Proor. The right side of (9) is The breakdown in Proposition 7.1 results from using an arbitrary value for o. An alternative is to use the standard regression estimate,

descriptionView Paper arrow_downwardDownload

Knowledge-Infused Text Classification for the Biomedical Domain

by Sonika Malik

2024, International Journal of Information System Modeling and Design

Extracting knowledge from unstructured text and then classifying it is gaining importance after the data explosion on the web. The traditional text classification approaches are becoming ubiquitous, but the hybrid of semantic knowledge... more

descriptionView Paper arrow_downwardDownload

AR-ASAG An ARabic Dataset for Automatic Short Answer Grading Evaluation

by OUAHRANI LEILA

2023, Language Resources and Evaluation

Automatic short answer grading is a significant problem in E-assessment. Several models have been proposed to deal with it. Evaluation and comparison of such solutions need the availability of Datasets with manual examples. In this paper,... more

descriptionView Paper arrow_downwardDownload

Non-uniform Language Detection in Technical Writing

by Axel Soto

2023, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

Technical writing in professional environments, such as user manual authoring, requires the use of uniform language. Nonuniform language detection is a novel task, which aims to guarantee the consistency for technical writing by detecting... more

descriptionView Paper arrow_downwardDownload

ASAPP 2.0: Advancing the state-of-the-art of semantic textual similarity for Portuguese

by Rui A C Encarnação

2023

Semantic Textual Similarity (STS) aims at computing the proximity of meaning transmitted by two sentences. In 2016, the ASSIN shared task targeted STS in Portuguese and released training and test collections. This paper describes the... more

descriptionView Paper arrow_downwardDownload

Labeled LDA

by Chris Manning

2023, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing Volume 1 - EMNLP '09

A significant portion of the world's text is tagged by readers on social bookmarking websites. Credit attribution is an inherent problem in these corpora because most pages have multiple tags, but the tags do not always apply with equal... more

descriptionView Paper arrow_downwardDownload

Topic Modeling Based Classification of Clinical Reports

by Hyeong-Ah Choi

2023

Electronic health records (EHRs) contain important clinical information about pa-tients. Some of these data are in the form of free text and require preprocessing to be able to used in automated systems. Effi-cient and effective use of... more

descriptionView Paper arrow_downwardDownload

Co-Clustering based Classification Algorithm with Latent Semantic Relationship for Cross-Domain Text Classification throughWikipedia

by max well

2023, Bonfring International Journal of Data Mining

Conventional schemes to document classification need labeled data to build consistent and precise classifiers. On the other hand, labeled data are rarely available, and normally too expensive to obtain. Provided a learning task for which... more

descriptionView Paper arrow_downwardDownload

Non-uniform Language Detection in Technical Writing

by Axel Soto

2023, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

descriptionView Paper arrow_downwardDownload

Webpage Recommendation System Based on the Social Media Semantic Details of the Website

by Rooba. R Rooba

2023

The web page recommendation is generated by using the navigational history from web server log files. Semantic Variable Length Markov Chain Model (SVLMC) is a web page recommendation system used to generate recommendation by combining a... more

descriptionView Paper arrow_downwardDownload

Using LDA to detect semantically incoherent documents

by Hemant Misra

2023, Proceedings of the Twelfth Conference on Computational Natural Language Learning - CoNLL '08

Detecting the semantic coherence of a document is a challenging task and has several applications such as in text segmentation and categorization. This paper is an attempt to distinguish between a 'semantically coherent' true document and... more

descriptionView Paper arrow_downwardDownload

Knowledge-Infused Text Classification for the Biomedical Domain

by Sonika Malik

2023, International Journal of Information System Modeling and Design

descriptionView Paper arrow_downwardDownload

The Influence of Text Preprocessing Methods and Tools on Calculating Text Similarity

by Djordje Petrovic

2023, Facta Universitatis, Series: Mathematics and Informatics

Text mining to a great extent depends on the various text preprocessing techniques. The preprocessing methods and tools which are used to prepare texts for further mining can be divided into those which are and those which are not... more

descriptionView Paper arrow_downwardDownload

Are Embedding Spaces Interpretable? Results of an Intrusion Detection Evaluation on a Large French Corpus

by Nicolas Dugué

2023, HAL (Le Centre pour la Communication Scientifique Directe)

Word embedding methods allow to represent words as vectors in a space that is structured using word co-occurrences so that words with close meanings are close in this space. These vectors are then provided as input to automatic systems to... more

descriptionView Paper arrow_downwardDownload

ParsiPardaz: Persian Language Processing Toolkit

by Mojgan Farhoodi

2023, ICCKE 2013

ParsiPardaz Toolkit (Persian Language Processing Toolkit), which is introduced in this paper, is a comprehensive suite of Persian language processing tools, providing many computational linguistic applications. This system can process and... more

descriptionView Paper arrow_downwardDownload

Information Retrieval System Using Latent Contextual Relevance

by Minoru Sasaki

2023

When the relevance feedback, which is one of the most popular information retrieval model, is used in an information retrieval system, a related word is extracted based on the first retrival result. Then these words are added into the... more

descriptionView Paper arrow_downwardDownload

Semantic text similarity using corpus-based word similarity and string similarity

by Aminul Islam

2023, ACM Transactions on Knowledge Discovery from Data

We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing... more

descriptionView Paper arrow_downwardDownload

The Influence of Text Preprocessing Methods and Tools on Calculating Text Similarity

by Djordje Petrovic

2023, Facta Universitatis, Series: Mathematics and Informatics

descriptionView Paper arrow_downwardDownload

Cross-Level Semantic Similarity in Newswire Texts and Software Code Comments: Insights from Serbian Data in the AVANTES Project

by Vuk Batanović

2023, Proceedings of the Conference on Language Technologies & Digital Humanities (JTDH 2022)

This paper presents the Serbian datasets developed within the project Advancing Novel Textual Similarity-based Solutions in Software Development-AVANTES, intended for the study of Cross-Level Semantic Similarity (CLSS). CLSS measures the... more

descriptionView Paper arrow_downwardDownload

by Vuk Batanović

2023, Proceedings of the 29th Telecommunications Forum (TELFOR 2021)

This paper presents an overview of the open access datasets in Serbian that have been manually annotated for the tasks of semantic textual similarity and short-text sentiment classification. In addition, it describes several kinds of... more

descriptionView Paper arrow_downwardDownload

Detecting events from the social media through exemplar-enhanced supervised learning

by Ming-Hsiang Tsou

2023, International Journal of Digital Earth

Understanding and detecting the intended meaning in social media is challenging because social media messages contain varieties of noise and chaos that are irrelevant to the themes of interests. For example, conventional supervised... more

descriptionView Paper arrow_downwardDownload

Improving Classification of Tweets Using Linguistic Information from a Large External Corpus

by Aleksander Bai

2022

The bag of words representation of documents is often unsatisfactory as it ignores relationships between important terms that do not co-occur literally. Improvements might be achieved by expanding the vocabulary with other relevant word,... more

descriptionView Paper arrow_downwardDownload

Improving classification of tweets using word-word co-occurrence information from a large external corpus

by Aleksander Bai

2022, Proceedings of the 31st Annual ACM Symposium on Applied Computing

Classifying tweets is an intrinsically hard task as tweets are short messages which makes traditional bags of words based approach inefficient. In fact, bags of words approaches ignores relationships between important terms that do not... more

descriptionView Paper arrow_downwardDownload

Comparison of Methods for Choosing an Appropriate Number of Topics in an LDA Model

by Patricia Goedecke

2022

Topic modeling is a technique for reducing dimensionality of large corpuses of text. Latent Dirichlet allocation (LDA), the most prevalent form of topic modeling, improved upon earlier methods by introducing Bayesian iterative updates,... more

descriptionView Paper arrow_downwardDownload

A novel classification model of collective user web behaviour based on network traffic contents

by Yuliang Wei

2022, IET Networks

Web behaviour analysis of a collective user has provided a powerful means for studying the collective user interests on the Internet. However, the existing research merely analyses the behaviour of a single user who accesses multiple... more

descriptionView Paper arrow_downwardDownload

ASAPPpy: a Python Framework for Portuguese STS

by José Carlos O. Santos

2022

This paper describes ASAPPpy – a framework fully-developed in Python for computing Semantic Textual Similarity (STS) between Portuguese texts – and its participation in the ASSIN 2 shared task on this topic. ASAPPpy follows other versions... more

descriptionView Paper arrow_downwardDownload

Is Your Anchor Going Up or Down? Fast and Accurate Supervised Topic Models

by Eric Ringger

2022, Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Topic models provide insights into document collections, and their supervised extensions also capture associated document-level metadata such as sentiment. However, inferring such models from data is often slow and cannot scale to big... more

descriptionView Paper arrow_downwardDownload

Semi-supervised Stochastic Multi-Domain Learning using Variational Inference

by YITONG LI

2022, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Supervised models of NLP rely on large collections of text which closely resemble the intended testing setting. Unfortunately matching text is often not available in sufficient quantity, and moreover, within any domain of text, data is... more

descriptionView Paper arrow_downwardDownload

Effects of Parametric and Non-Parametric Methods on High Dimensional Sparse Matrix Representations

by ABHISHEK GUPTA

2022

The semantics are derived from textual data that provide representations for Machine Learning algorithms. These representations are interpretable form of high dimensional sparse matrix that are given as an input to the machine learning... more

descriptionView Paper arrow_downwardDownload

Co-Clustering based Classification Algorithm with Latent Semantic Relationship for Cross-Domain Text Classification throughWikipedia

by max well

2022, Bonfring International Journal of Data Mining

descriptionView Paper arrow_downwardDownload

Content-Aware Tweet Location Inference Using Quadtree Spatial Partitioning and Jaccard-Cosine Word Embedding

by Shahrzad Zargari

2022, 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

Inferring locations from user texts on social media platforms is a non-trivial and challenging problem relating to public safety. We propose a novel non-uniform grid-based approach for location inference from Twitter messages using... more

descriptionView Paper arrow_downwardDownload

Topic Modeling: A Comprehensive Review

by Poonam Bansal

2022, ICST Transactions on Scalable Information Systems

Topic modelling is the new revolution in text mining. It is a statistical technique for revealing the underlying semantic structure in large collection of documents. After analysing approximately 300 research articles on topic modeling, a... more

descriptionView Paper arrow_downwardDownload

Enhanced LSA Method with Ukraine Language Support

by Yurii Oliinyk

2022

The use of semantic models is relevant in automated learning systems, in solving certain tasks, such as: extracting knowledge from texts, information retrieval, abstracting, checking the correctness of vocabulary terms and definitions,... more

descriptionView Paper arrow_downwardDownload

Histogram Meets Topic Model: Density Estimation by Mixture of Histograms

by Hideaki Kim

2022, arXiv: Machine Learning

The histogram method is a powerful non-parametric approach for estimating the probability density function of a continuous variable. But the construction of a histogram, compared to the parametric approaches, demands a large number of... more

descriptionView Paper arrow_downwardDownload

Selection of representative set of membrane proteins of known structure: development of improved algorithms using the random model concept

by Jadranko Batista

2022

Strukturu membranskih proteina osjetno je teže eksperimentalno odrediti nego strukturu topljivih proteina. Kako bi se razvio pouzdani model za predviđanje strukture proteina, potrebno je provesti njegovu optimizaciju na što većem... more

descriptionView Paper arrow_downwardDownload

With Learning-To-Rank Topic Modeling

by Zaihan Yang

2022

Topic modeling has emerged as a popular learning technique not only in mining text representations, but also in modeling authors’ interests and influence, as well as predicting linkage among documents or authors. However, few existing... more

descriptionView Paper arrow_downwardDownload

Co-Clustering based Classification Algorithm with Latent Semantic Relationship for Cross-Domain Text Classification throughWikipedia

by Pe Ter

2022, Bonfring International Journal of Data Mining

descriptionView Paper arrow_downwardDownload

An Improved Information Retrieval Approach to Short Text Classification

by Indrajit Mukherjee

2022, International Journal of Information Engineering and Electronic Business

Twitter act as a most important medium of communication and information sharing. As tweets do not provide sufficient word occurrences i.e. of 140 characters limits, classification methods that use traditional approaches like... more

descriptionView Paper arrow_downwardDownload

Text classification using document-document semantic similarity

by Indrajit Mukherjee

2022, International Journal of Web Science

One of the key problems encountered while using a text classification learning algorithms is that they require huge amount of labelled examples to learn accurately. The objective of this paper is to propose a novel method of topic... more

descriptionView Paper arrow_downwardDownload

Estimating Temporal Dynamics of Human Emotions

by Joonseok Lee

2022, Proceedings of the AAAI Conference on Artificial Intelligence

Sentiment analysis predicts a one-dimensional quantity describing the positive or negative emotion of an author. Mood analysis extends the one-dimensional sentiment response to a multi-dimensional quantity, describing a diverse set of... more

descriptionView Paper arrow_downwardDownload

Natural Language Processing: Text Categorization And Classifications

by Mario Raouf

2022, International Journal of Advanced Networking and Applications

There are huge data from unstructured text obtained daily from various resources like emails, tweets, social media posts, customer comments, reviews, and reports in many different fields, etc. Unstructured text data can be analyzed to... more

descriptionView Paper arrow_downwardDownload

Reviewing local economic development (LED) projects: the case of Mogale city local municipality (MCLM)

by Ngaka Machete

2022

Small-sample classification is a challenging problem in computer vision. In this work, we show how to e ciently and e↵ectively utilize semantic information of the annotations to improve the performance of small-sample classification.... more

descriptionView Paper arrow_downwardDownload

Word Significance Analysis in Documents for Information Retrieval by LSA and TF-IDF using Kubeflow

by Aseem Patil

2022, Expert Clouds and Applications

The capital investment for automated analysis of electronic documents has enhanced rapidly since the growth of text categorization and classification. In recent times, various works have been done in the context of text mining and... more

Word Significance Analysis in Documents for Information Retrieval ... any rectangular matrix to simplify into three distinct value units. Finally, in the infor mation retrieval phase, the words are read from the document after each and ever sentence in the document is split. Then, the suggested word in the array is replace¢ by the weak word [3]. After all the weak words are replaced with the suggestec words from the array displayed above each highlighted word on the document, thi overall word significance probability is calculated so as to send the actual and earlie results in the form of information to the concept extraction phase. The followin; block diagram shows the process of the latent semantic analysis particularly for th system, as shown in Fig. 2. a aa, ee a: a ee a a

descriptionView Paper arrow_downwardDownload

ASAPPpy: a Python Framework for Portuguese STS

by José Gregorio Santos Núñez

2022

descriptionView Paper arrow_downwardDownload

A scalable approach to near real-time sentiment analysis on social networks

by Marco Bianchi

2022

This paper reports about results collected during the development of a scalable Information Retrieval system for near real-time analytics on social networks. More precisely, we present the end-user functionalities provided by the system,... more

descriptionView Paper arrow_downwardDownload

Sentiment Analysis and Topic Modeling on Arabic Twitter Data during Covid-19 Pandemic

by Nassera HABBAT

2022, Indonesian Journal of Innovation and Applied Sciences (IJIAS)

Twitter Sentiment Analysis is the task of detecting opinions and sentiments in tweets using different algorithms. In our research work, we conducted a study to analyze and compare different Algorithms of Machine Learning (MLAs) for the... more

descriptionView Paper arrow_downwardDownload

Latent Topic Based Medical Data Classification

by Jian-hua Yeh

2022

This paper discusses the classification process for medical data. In this paper, we use the data from ACM KDDCup 2008 to demonstrate our classification process based on latent topic discovery. In this data set, the target set and outliers... more

descriptionView Paper arrow_downwardDownload

Multilabel Subject-Based Classification of Poetry

by Kỳ Lê

2022

Oftentimes, the question "what is this poem about?" has no trivial answer, regardless of length, style, author, or context in which the poem is found. We propose a simple system of multi-label classification of poems based on... more

descriptionView Paper arrow_downwardDownload

Short-Text Semantic Similarity

Key research themes

1. What semantic knowledge sources and methodological frameworks can most effectively measure short-text semantic similarity?

2. How can lexical, syntactic, and semantic features be integrated via machine learning to improve short-text semantic similarity prediction?

3. What role do lexico-syntactic pattern-based corpus methods play in capturing semantic similarity in short texts without reliance on fine-grained semantic resources?

Related Topics

All papers in Short-Text Semantic Similarity