Topic Identification

description156 papers

group3 followers

lightbulbAbout this topic

Topic identification is the process of determining and articulating the central theme or subject matter of a research study. It involves analyzing existing literature, understanding research gaps, and formulating specific questions or hypotheses that guide the investigation, ensuring relevance and clarity in the research objectives.

lightbulbAbout this topic

Key research themes

1. How can unsupervised topic modeling techniques identify and track the evolution of scientific ideas and fields over time?

This research area focuses on applying unsupervised probabilistic topic modeling methods, such as Latent Dirichlet Allocation (LDA), to large scientific corpora to analyze the temporal dynamics of research topics and intellectual trends. Understanding how scientific ideas emerge, grow, decline, or shift in prominence over time provides insights into paradigm changes and the structural evolution of academic disciplines. It matters because it offers a data-driven, quantitative complement to traditional historiographic methods, enabling nuanced tracking of thematic diversity and convergence across venues and subfields.

Studying the History of Ideas Using Topic Models

by Chris Manning

2015

Key finding: Applied LDA to over 12,500 computational linguistics papers from the ACL Anthology spanning 1978-2006, revealing significant historical trends such as the rise of probabilistic methods from 1988 and decline in semantics... Read more

articleView Paper downloadDownload

Modeling Evolution of Topics in Large-Scale Temporal Text Corpora

by Palash Goyal

2023, Proceedings of the International AAAI Conference on Web and Social Media

Key finding: Proposed a novel computational approach combining word embeddings with dynamic semantic similarity networks and clustering to detect temporal evolution of topics in large corpora. Demonstrated the ability to model complex... Read more

articleView Paper downloadDownload

A Semantics-enhanced Topic Modelling Technique: Semantic-LDA

by Dakshi Tharanga

2024

Key finding: Introduced semantic-LDA, an enhanced topic modeling approach which integrates external ontologies (Probase) to capture word semantics more accurately within the input corpus context by quantifying word-concept relationships... Read more

articleView Paper downloadDownload

Topic Modeling: Perspectives From a Literature Review

by Andres Grisales

2023, IEEE Access

Key finding: Provided a scientometric analysis of topic modeling research evolution, highlighting LDA's dominance and applications across multiple domains, particularly in large-scale text analysis. It underscored LDA’s theoretical... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What methods exist for topic identification in massive, heterogeneous text corpora, and how do they compare in scalability and interpretability?

This research area investigates diverse computational approaches for discovering latent topics in large and diverse textual datasets, emphasizing techniques that differ in scalability, parameter requirements, and interpretability. It includes probabilistic generative models like LDA which require pre-specification of topics, and alternative hashing-based and graph-based algorithms able to handle massive vocabularies and documents without strict prior constraints. Understanding these methods aids in selecting effective solutions for practical large-scale applications such as social media analytics and web corpus organization.

Topic Discovery in Massive Text Corpora Based on Min-Hashing

by Gibran Fuentes-Pineda

2018

Key finding: Presented Sampled Min-Hashing (SMH), an alternative to LDA for massive corpora topic discovery that obviates the need to predetermine topic number and dramatically reduces computational resource requirements. By generating... Read more

articleView Paper downloadDownload

Topic detection with recursive consensus clustering and semantic enrichment

by Andrea Filetti

2023, Humanities and Social Sciences Communications

Key finding: Proposed an iterative clustering approach based on consensus matrices combined with semantic enrichment via word embeddings for topic detection in short texts like tweets. This method addresses instability and noise... Read more

articleView Paper downloadDownload

Topic Modelling and Event Identification from Twitter Textual Data

by Joshua Ramisch

2023, arXiv (Cornell University)

Key finding: Applied LDA to Twitter datasets concerning social events in Kenya, utilizing evaluation metrics such as Normalized Mutual Information (NMI) and topic coherence to select optimal models. Demonstrated that LDA effectively... Read more

articleView Paper downloadDownload

Taming the Tiger Topic: An XCES Compliant Corpus Portal to Generate Subcorpora Based on Automatic Text-Topic Identification

by Fernando Paulovich

2016, ucrel.lancs.ac.uk

Key finding: Developed an XCES-compliant corpus portal for Brazilian Portuguese newspaper corpora enabling corpus partitioning based on automatic topic identification using term covariance and multidimensional projections (Projection... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can topic identification facilitate practical applications such as social media analysis, information retrieval, and cyber-security through tailored approaches?

This research area concentrates on leveraging topic identification methods specifically designed or adapted for domains like social media analytics, text classification in cybercrime, and enterprise network security. It involves integrating topic detection with sentiment analysis, classification techniques, and domain-specific preprocessing to extract actionable insights from noisy, multilingual, or domain-specific textual data. These application-driven studies inform the development of targeted computational tools that enhance real-time monitoring, information filtering, or anomaly detection in complex operational environments.

An unsupervised multilingual approach for online social media topic identification

by David Cornforth

2023, Expert Systems with Applications

Key finding: Proposed an unsupervised approach combining term ranking, localized language analysis (including informal language like Singlish), multilingual sentiment analysis, and unsupervised clustering to extract relevant topics from... Read more

articleView Paper downloadDownload

Cita en Nor Muhammad Farhan Nor Muhamad Nizam, et al. (2024). Text Classification on Cybercrime Cases From News Articles Using Supervised Learning. International Journal of Computing and Digital Systems

by José Octavio Islas Carmona

2024, International Journal of Computing and Digital Systems

Key finding: Applied supervised machine learning models combined with feature extractors (TF-IDF, Word2Vec) for automatic classification of cybercrime news articles into types. Found Random Forest with TF-IDF achieved highest accuracy... Read more

articleView Paper downloadDownload

Topic modelling of authentication events in an enterprise computer network

by Nick Heard

2024, 2016 IEEE Conference on Intelligence and Security Informatics (ISI)

Key finding: Utilized Latent Dirichlet Allocation to model patterns of user authentication events from real enterprise network data at Los Alamos National Laboratory. Treated daily authentication logs as documents and destination... Read more

articleView Paper downloadDownload

An Unsupervised Graph-Based Approach for Detecting Relevant Topics: A Case Study on the Italian Twitter Cohort during the Russia–Ukraine Conflict

by Antonello Rizzi

2023, Information

Key finding: Implemented an intelligent topic tracking and infoveillance system combining NLP and graph mining to analyze streams of Italian tweets related to the Russia-Ukraine conflict. This unsupervised graph-based method effectively... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Topic Identification

Topic Identification based on Bayesian Belief Networks in the context of an Air Traffic Control Task

by Rubén San Segundo

2025

Resumen: En este artículo presentamos una tarea de identificación de tópico basada en Redes Bayesianas. Estas redes son entrenadas a partir de los conceptos semánticos que se han etiquetado para cada frase a procesar y que han sido... more

descriptionView Paper arrow_downwardDownload

A study of context inference for Web-based information systems

by Sangjun Kim

2025, Electronic Commerce Research and Applications

Recently, context-awareness has been a hot topic in the ubiquitous computing field. Numerous methods for capturing, representing and inferring context have been developed and relevant projects have been performed. Existing research has... more

descriptionView Paper arrow_downwardDownload

Topic Identification based on Bayesian Belief Networks in the context of an Air Traffic Control Task

by Juan M Montero

2025

descriptionView Paper arrow_downwardDownload

Automatic topic identification for two-level call routing

by John Golden

2025, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)

This paper presents an approach to routing telephone calls automatically, based upon their speech content. Our data consist of a set of calls collected from a customer-service center with a twolevel menu, which allows jumping past the... more

descriptionView Paper arrow_downwardDownload

A study of context inference algorithm on the web-based information system

by Sangjun Kim

2025, Proceedings of pacific asia conference on …

descriptionView Paper arrow_downwardDownload

Automatic knowledge extraction from manufacturing research publications

by Serge Tichkiewitch

2025, CIRP Annals

Knowledge mining is a young and rapidly growing discipline aiming at automatically identifying valuable knowledge in digital documents. This paper presents the results of a study of the application of document retrieval and text mining... more

descriptionView Paper arrow_downwardDownload

Social network analysis of an online melanoma discussion group

by Kathleen Durant

2025, AMIA Joint Summits on Translational Science proceedings AMIA Summit on Translational Science

We have developed tools to explore social networks that share information in medical forums to better understand the unmet informational needs of patients and family members facing cancer treatments. We define metrics that demonstrate... more

descriptionView Paper arrow_downwardDownload

Automatic knowledge extraction from manufacturing research publications

by Niek Du Preez

2024, CIRP Annals

descriptionView Paper arrow_downwardDownload

Automatic Document Topic Identification using Wikipedia Hierarchical Ontology

by Dr.Mostafa M.A.Hassan

2024

The rapid growth in the number of documents available to various end users from around the world has led to a greatly increased need for machine understanding of their topics, as well as for automatic grouping of related documents. This... more

descriptionView Paper arrow_downwardDownload

by José Octavio Islas Carmona

2024, International Journal of Computing and Digital Systems

Abstract: The number of cybercrime cases has increased in this country, especially after the pandemic. The nation has created numerous strategic plans, including the introduction of the Malaysia Cyber Security Strategy (MCSS), which... more

Continuing to analyze the word cloud, we can see that the frequency of words such as data, company, attack, hacker, and security indicates the common themes and topics discussed in Type 1 cybercrime news. The presence of words like China and Facebook suggests that there may be specific incidents or events involving these entities that are of interest in the cybercrime landscape. By examining these patterns and trends, we can better understand the current state of cyber threats and potentially identify areas for further research and investigation. This insight can be valuable in developing strategies to prevent and mitigate cyber- attacks in the future. In cybercrime type 2 news, the presence of words like police, victim, and scam indicates their association with this type of cybercrime, concerning Fig. 2. The Word Cloud also shows the words’ bank account, Malaysia, 'datuk’, and woman are frequently mentioned in cybercrime type 2 news. The Word Cloud raised questions and prompted a deeper analysis of the frequent mention of specific words in both cybercrime type | and type 2 news. Figure 2. Example of a figure caption. (figure caption)

as shown in Fig.1. These words are commonly linked to Type | cybercrime news. Additionally, words such as China and Facebook indicate local Type 1 cybercrime news related to these entities. Figure |. Example of a figure caption. (figure caption)

we hope to gain a better understanding of the common themes and issues present in both types of cybercrime news. By utilizing topic modeling and LDA, we aim to uncover the underlying topics that are prevalent in these news articles and shed light on the key factors driving cybercrime in today's digital landscape. This research will provide valuable insights into the motivations and tactics of cybercriminals, ultimately helping to inform strategies for the prevention and mitigation of cyber threats. For example, by analyzing a dataset of cybercrime news articles using topic modeling, researchers may discover prevalent themes such as phishing scams, ransomware attacks, and data breaches. By identifying these common topics, they can better understand the emerging trends and patterns in cybercrime activity. This information can then be used to develop targeted interventions and strategies to combat cyber threats effectively. Figure 3. Example of a figure caption. (figure caption)

Based on Figure 3, 200 topics have been tested for topic discovery in the study. For cybercrime type news, the highest coherent score of 23 is associated with the discovery of topics. However, the results also reveal the presence of many overlapping topics, which can affect the outcome. Therefore, 8 topics are determined to be the most suitable number for analysis. It has the highest coherent score with the least amount of topic overlap. In the case of cybercrime type 2, 6 topics are considered the most appropriate for exploration. It scores the highest and has the least overlap with other topics. For topic interpretation, it is essential to utilize the language model (LM), specifically a generative-pretrained transformer (GPT), known for its

descriptionView Paper arrow_downwardDownload

Mining Text in News Channels: A Case Study from Facebook

by Khaled Shaalan

2024

Recently, the usage of social media websites has become an attractive phenomenon in our daily life. These sites allow their users to communicate with each other through various tools. This results in learning and sharing of valuable... more

descriptionView Paper arrow_downwardDownload

Mining Student Information System Records to Predict Students’ Academic Performance

by Khaled Shaalan

2024, Advances in Intelligent Systems and Computing

Educational Data Mining (EDM) is an emerging field that is concerned with mining and exploring the useful patterns in educational data. The main objective of this study is to predict the students' academic performance based on a new... more

descriptionView Paper arrow_downwardDownload

Analyzing the Arab Gulf Newspapers Using Text Mining Techniques

by Khaled Shaalan

2024, Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017

Nowadays, the broadcasting of news via social media networks is almost provided in a textual format. The nature of the broadcasted text is considered as unstructured text. Text mining techniques play an essential role in converting the... more

descriptionView Paper arrow_downwardDownload

Approaches and Trends of Automatic Bangla Text Summarization

by Md.Anowar Hossain

2024, International Journal of Technology Diffusion

As long as the internet user is increasing, online electronic content is growing proportionally irrespective of languages. A lot of research works on English text summarization have come to light to deal with this gigantic body of online... more

descriptionView Paper arrow_downwardDownload

Conflict Ontology Enrichment Based on Triggers, in "The 2nd International workshop on Ontologies and Information Systems for the Semantic Web, États-Unis d’Amérique

by ENS Chahnez Zakaria

2024

In this paper, we propose an ontology-based approach that enables to detect the emergence of relational conflicts between persons that cooperate on computer supported projects. In order to detect these conflicts, we analyze, using this... more

descriptionView Paper arrow_downwardDownload

Taming the Tiger Topic: An XCES Compliant Corpus Portal to Generate Subcorpora Based on Automatic Text-Topic Identification

by Kleber Infante

2024, ucrel.lancs.ac.uk

Taming the Tiger Topic: An XCES Compliant Corpus Portal to Generate Subcorpora Based on Automatic Text-Topic Identification Marcelo Muniz, 1 Fernando V. Paulovich, 1 Rosane Minghim, 1 Kleber Infante, 1 Fernando Muniz, 1 Renata Vieira 2... more

descriptionView Paper arrow_downwardDownload

Experiments with Arabic Topic Detection

by Abdelouafi Meziane

2024

The continuous growth of information on the Internet and the availability of a large mass of electronic documents in Arabic language make Natural Language processing (NLP) tasks play an important role to enhance and facilitate the access... more

descriptionView Paper arrow_downwardDownload

An Online Semantic-enhanced Dirichlet Model for Short Text Stream Clustering

by Jay Kumar

2024, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Clustering short text streams is a challenging task due to its unique properties: infinite length, sparse data representation and cluster evolution. Existing approaches often exploit short text streams in a batch way. However, determine... more

descriptionView Paper arrow_downwardDownload

A New Model for Arabic Multi-Document Text Summarization

by Khulood Abu Maria

2024, International Journal of Innovative Computing, Information and Control

Nowadays, the amount of Arabic documents has increased significantly in different domains, such as news articles, emails, business summary, biomedicine, web sites and social media documents. Some databases have increased in its size to... more

descriptionView Paper arrow_downwardDownload

Topic Identification based on Bayesian Belief Networks in the context of an Air Traffic Control Task

by Luis D'Haro

2024

descriptionView Paper arrow_downwardDownload

Addressee detection for dialog systems using temporal and spectral dimensions of speaking style

by Elizabeth Shriberg

2023, Interspeech 2013

As dialog systems evolve to handle unconstrained input and for use in open environments, addressee detection (detecting speech to the system versus to other people) becomes an increasingly important challenge. We study a corpus in which... more

descriptionView Paper arrow_downwardDownload

The CALO Meeting Assistant System

by Elizabeth Shriberg

2023, IEEE Transactions on Audio, Speech, and Language Processing

The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper presents the... more

descriptionView Paper arrow_downwardDownload

Tramas del linaje en "Muerte de Narciso" de José Lezama Lima

by Daniela E . Chazarreta

2023

descriptionView Paper arrow_downwardDownload

Extraction Based Multi Document Summarization using Single Document Summary Cluster

by Hariharan Shanmugasundaram

2023, International Journal

Multi document summarization has very great impact among research community, ever since the growth of online information and availability. Selecting most important sentences from such huge repository of data is quiet tricky and... more

We have also carried out a study, to obtain 100% accuracy by varying the compression rates for both documents in each cluster. We have represented the Seven-point summary sheet of the documents using Minimum (Min), Maximum (Max), Median, Quartile] (Q1), Quartile3 (Q3), Standard deviation (SD), Mean in Table 5.

S.Hariharan Fig. 3 Comparison of MEAD and our Summarizer at various compression rates

From the study, we found that summary generation at specified compression ratio is proportional to the single document summary generated at the same compression. The results would be enhanced further using linguistic processing tools to achieve 100% accuracy for the system with minimal compression ratio. Table 5: Seven-point summary sheet

descriptionView Paper arrow_downwardDownload

Who is "you"?

by Stanley Peters

2023, Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics on - EACL '09

We explore the problem of resolving the second person English pronoun you in multi-party dialogue, using a combination of linguistic and visual features. First, we distinguish generic and referential uses, then we classify the referential... more

descriptionView Paper arrow_downwardDownload

The CALO Meeting Assistant System

by Stanley Peters

2023, IEEE Transactions on Audio, Speech, and Language Processing

Fig. 3. Snapshot from the CALO-MA offline meeting browser.

descriptionView Paper arrow_downwardDownload

Leveraging Minimal User Input to Improve Targeted Extraction of Action Items

by Stanley Peters

2023, staff.science.uva.nl

In face-to-face meetings, assigning and agreeing to carry out future actions is a frequent subject of conversation. Work thus far on identifying these action item discussions has focused on extracting them from entire transcripts of... more

descriptionView Paper arrow_downwardDownload

Browsing Meetings: Automatic Understanding, Presentation and Feedback for Multi-Party Conversations

by Stanley Peters

2023

We present a system for extracting useful information from multi-party meetings and presenting the results to users via a browser. Users can view automatically extracted discussion topics and action items, initially seeing high-level... more

descriptionView Paper arrow_downwardDownload

Generating time-constrained audio presentations of structured information

by rohit kumar

2023, Interspeech 2006

Presenting complex information in an understandable manner using speech is a challenging task to do well. Significant limitations, both in the generation process and from the human listeners' capabilities, typically make for poorly... more

descriptionView Paper arrow_downwardDownload

Natural Language Queries on Natural Language Data: a Database of Meeting Dialogues

by Martin Rajman

2023, Applications of Natural Language to Data Bases

This paper describes an integrated system that enables the storage and retrieval of meeting transcripts (e.g. staff meetings). The system gives users who have not attended a meeting, or who want to review a particular point, enhanced... more

descriptionView Paper arrow_downwardDownload

Comparing TR-Classifier and KNN by using Reduced Sizes of Vocabularies

by Mourad Abbas

2023, HAL (Le Centre pour la Communication Scientifique Directe)

The aim of this study is topic identification by using two methods, in this case, a new one that we have proposed: TR-classifier which is based on computing triggers, and the well-known k Nearest Neighbors. Performances are acceptable,... more

descriptionView Paper arrow_downwardDownload

Evaluation of Topic Identification Methods on Arabic Corpora

by Mourad Abbas

2023, HAL (Le Centre pour la Communication Scientifique Directe)

Topic Identification is one of the important keys for the success of many applications. Indeed, there are few works in this field concerning Arabic language because of lack of standard corpora. In this study, we will provide directly... more

descriptionView Paper arrow_downwardDownload

A topic identification task for modern standard Arabic

by Mourad Abbas

2023, Annual Conference on Computers

In this paper we present two well-known categorization methods and their use in topic identification for Modern Standard Arabic. The first one is the TFIDF approach, and the second is a Support Vector Machines (SVM) based classifier. In... more

descriptionView Paper arrow_downwardDownload

TR-Classifier and kNN Evaluation for Topic Identification tasks

by Mourad Abbas

2023, HAL (Le Centre pour la Communication Scientifique Directe)

This paper focuses on studying topic identification for Arabic language by using two methods. The first method is the well-known kNN (k Nearest Neighbors) which is used as baseline. The second one is the TR-Classifier, mainly based on... more

descriptionView Paper arrow_downwardDownload

Adobe-MIT submission to the DSTC 4 Spoken Language Understanding pilot task

by Trung H. Bui

2023, arXiv (Cornell University)

The Dialog State Tracking Challenge 4 (DSTC 4) proposes several pilot tasks. In this paper, we focus on the spoken language understanding pilot task, which consists of tagging a given utterance with speech acts and semantic slots. We compare different classifiers: the best system obtains 0.52 and 0.67 F1-scores on the test set for speech act recognition for the tourist and the guide respectively, and 0.52 F1-score for semantic tagging for both the guide and the tourist. 1 Speech act recognition Recognizing the speech acts of the current utterance is one of the two goals of the spoken language understanding pilot task. In the training and development sets, each utterance is annotated with one speech act. One speech act is composed of zero, one or two speech act categories. Each speech act category has in turn zero, one or two speech act attributes. There are 4 speech act categories, and 22 speech act attributes. [6] and [7] give further details on the task. The main approaches for this task are presented in [15, 1, 17, 5, 16, 19, 10, 3]. We submitted 5 systems. Systems 3 and 5 were the best performing ones. System 3 is based on a support vector machine (SVM) classifier to recognize the speech acts: the features are the 5000 most common unigrams, bigrams, trigrams, as well as a binary feature indicating whether the current speaker is different from the speaker in the last utterance. To account for the history, each feature is computed for both the current and the previous utterance. Two SVM classifiers were trained: one for each speaker. The kernel function as well as the penalty parameter of the error term were both optimized with 5-fold cross-validation. System 5 is similar, but with logistic regression as the classifier; moreover, it uses one single speaker-independent model instead of one model per speaker, as it slightly improves the results on the development set. Systems 3 and 5 assume that each utterance contains exactly one speech act category and one speech act attribute: they are therefore multiclass, monolabel classifiers, with 88 possible classes (4 speech act categories × 22 speech act attributes).

descriptionView Paper arrow_downwardDownload

The CALO Meeting Assistant

by Patrick Ehlen

2023

The CALO Meeting Assistant is an integrated, multimodal meeting assistant technology that captures speech, gestures, and multimodal data from multiparty interactions during meetings, and uses machine learning and robust discourse... more

descriptionView Paper arrow_downwardDownload

GECKO - A Tool for Effective Annotation of Human Conversations

by Eduard Golshtein

2023, Conference of the International Speech Communication Association

With the dramatic improvement in automated speech recognition (ASR) accuracy, a variety of machine learning (ML) and natural language processing (NLP) algorithms are designed for human conversation data. Supervised machine learning and... more

descriptionView Paper arrow_downwardDownload

An efficient single document Arabic text summarization using a combination of statistical and semantic features

by wasel ghanem

2023, Journal of King Saud University - Computer and Information Sciences

The exponential growth of online textual data triggered the crucial need for an effective and powerful tool that automatically provides the desired content in a summarized form while preserving core information. In this paper, we propose... more

descriptionView Paper arrow_downwardDownload

Topic Segmentation

by Narjès Boufaden

2023

We study the problem of topic segmentation of manually transcribed speech in order to facilitate information extraction from dialogs. Our approach is based on a combination of multi-source knowledge modeled by hidden Markov models. We... more

descriptionView Paper arrow_downwardDownload

Classic Term Weighting Technique for Mining Web Content Outliers

by Adamu Mustapha

2023

Abstract—Outlier analysis has become a popular topic in the field of data mining but there have been less work on how to detect outliers in web content. Mining Web Content Outliers is used to detect irrelevant web content within a web... more

descriptionView Paper arrow_downwardDownload

Ovidio y sus imágenes: estéticas de la modernidad

by FRANCISCO GARCIA JURADO

2023

descriptionView Paper arrow_downwardDownload

A Novel Lecture Browser Using Key Phrases and Stream Graphs

by Elmar Noeth

2023

We present a novel lecture browser that utilizes ranked key phrases displayed on a stream graph to overcome the shortcomings of traditional extractive (query-based) summaries. The system extracts key phrases from the ASR transcripts,... more

descriptionView Paper arrow_downwardDownload

Automatic human utility evaluation of ASR systems: does WER really predict performance?

by Gerald Penn

2023, Interspeech 2013

We propose an alternative evaluation metric to Word Error Rate (WER) for the decision audit task of meeting recordings, which exemplifies how to evaluate speech recognition within a legitimate application context. Using machine learning... more

Running a human-subject experiment such as this one is time- consuming and expensive. Our objective is therefore to find an automated means of anticipating the results of running a new human-subject experiment, given a new ASR system.

Figure 2: Precision/recall curve for each of the leave-one-out models. The upper family of curves was trained with all features whereas only WER was used in the lower curves.

Figure 3: Feature ablation experiment: F-score when each sub- set of features is removed from training

descriptionView Paper arrow_downwardDownload

Leveraging Minimal User Input to Improve Targeted Extraction of Action Items

by Stanley Peters

2023, staff.science.uva.nl

descriptionView Paper arrow_downwardDownload

Domain-Specific Utterance End-Point Detection for Speech Recognition

by Gautam Tiwari

2023, Interspeech 2017

The task of automatically detecting the end of a device-directed user request is particularly challenging in case of switching short command and long free-form utterances. While lowlatency end-pointing configurations typically lead to... more

descriptionView Paper arrow_downwardDownload

Índice de Documentos con una Jerarquía de Conceptos

by Adolfo Guzmán Arenas

2023, Computación y Sistemas

Given a large hierarchical concept dictionary (thesaurus, or ontology), the task of selection of the concepts that describe the contents of a given document is considered. A statistical method of document indexing driven by such a... more

descriptionView Paper arrow_downwardDownload

Meetings about meetings: research at ICSI on speech in multiparty conversations

by Sonali Bhagat

2023, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).

In early 2001 we reported (at the Human Language Technology meeting) the early stages of an ICSI project on processing speech from meetings (in collaboration with other sites, principally SRI, Columbia, and UW). In this paper we report... more

descriptionView Paper arrow_downwardDownload

TopCat: data mining for topic identification in a text corpus

by Sakthi Vel

2023, IEEE Transactions on Knowledge and Data Engineering

TopCat (Topic Categories) is a technique for identifying topics that recur in articles in a text corpus. Natural language processing techniques are used to identify key entities in individual articles, allowing us to represent an article... more

descriptionView Paper arrow_downwardDownload

Optimal Keyword Search for Audio Libraries

by Abubakar Siddik

2023

Keywords are used to index data, generate tag clouds or for searching. Alchemy API's keyword extraction, API is capable of finding keywords in text and ranking them. In this paper addresses the problem of getting the related keywords from... more

descriptionView Paper arrow_downwardDownload

An efficient single document Arabic text summarization using a combination of statistical and semantic features

by Wasel Ghanem

2023, Journal of King Saud University - Computer and Information Sciences

descriptionView Paper arrow_downwardDownload