Text Clustering

description397 papers

group131 followers

lightbulbAbout this topic

Text clustering is a natural language processing technique that involves grouping a set of documents or text data into clusters based on their similarity, typically using algorithms that analyze the content and structure of the text. This method aids in organizing, summarizing, and retrieving information from large datasets.

lightbulbAbout this topic

Key research themes

1. How do similarity and distance measures impact the effectiveness of partitional text clustering?

This research area investigates the selection and comparative effectiveness of similarity and distance functions when applied within partitional clustering algorithms like K-means for text documents. Since the quality of clustering heavily depends on accurately capturing the closeness between documents, the choice of measure such as cosine similarity, Euclidean distance, Jaccard coefficient, and Kullback-Leibler divergence is critical. These measures differ in their mathematical properties and suitability for high-dimensional, sparse textual data, and empirical evaluations on diverse datasets help identify best practices for clustering performance.

by Adaikalam S

2016

Key finding: This paper empirically compared five commonly used distance/similarity measures (Euclidean distance, cosine similarity, Jaccard coefficient, Pearson correlation coefficient, and averaged Kullback-Leibler divergence) within... Read more

articleView Paper downloadDownload

A Review of Data and Document Clustering pertaining to various Distance Measures

by Hannah Grace

2025, Salud, Ciencia y Tecnología

Key finding: The paper surveyed the impact of diverse distance measures on clustering quality across different data types, emphasizing text data characterized by high dimensionality and sparsity. It reinforced that cosine similarity often... Read more

articleView Paper downloadDownload

ACONS: A New Algorithm for Clustering Documents

by José Medina-Pagola

2022, Lecture Notes in Computer Science

Key finding: Introducing the Condensed Star (ACONS) algorithm that operates on a thresholded similarity graph representation of documents, this work showed improved clustering quality and reduced cluster count compared to Star and... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What challenges do short texts pose to clustering algorithms and what methods improve short text clustering effectiveness?

Short text clustering (STC) deals with clustering highly sparse, context-poor, and noisy textual data such as tweets, search queries, and social media posts. Due to limited length, traditional clustering approaches often underperform on short texts. This research theme centers on addressing the representation, similarity measure, dimensionality reduction, and algorithmic adaptations necessary to overcome data sparsity and high dimensionality while preserving semantic coherence in STC. Advances in approaches specifically tailored to short text characteristics are critical for enhancing applications in social media analysis, sentiment detection, and real-time information extraction.

Short Text Clustering Algorithms, Application and Challenges: A Survey

by Nor Samsiah Sani

2023, Applied Sciences

Key finding: This comprehensive survey identified the intrinsic challenges in short text clustering, such as data sparsity, meaningful feature representation, and noisy, informal language. It emphasized that conventional methods like... Read more

articleView Paper downloadDownload

Lifting the Curse: Exploring Dimensionality Reduction on Text Clustering Applications

by Leonidas Akritidis and

2022, 2022 13th International Conference on Information, Intelligence, Systems & Applications (IISA)

Key finding: Through empirical evaluation of eight clustering algorithms across six high-dimensional text datasets, the study demonstrated that dimensionality reduction techniques, such as Singular Value Decomposition (SVD) and Principal... Read more

articleView Paper downloadDownload

The process of summarization in the pre-processing stage in order to improve measurement of texts when clustering

by Marcus Vinicius Carvalho Guelpeli

2023

Key finding: The Cassiopeia model introduced pre-processing via summarization to reduce dimensionality and sparse data problems prior to clustering. Experiments comparing clustering on full-text versus summarized texts showed that... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can semantic-enriched document representations and modern embedding methods enhance text clustering quality?

This theme explores advances in document representation that go beyond simple bag-of-words and frequency-based vectors to incorporate semantic relations, word embeddings, lexical databases, and representations derived from large language models (LLMs). By capturing synonymy, polysemy, and contextual meanings, semantic-enriched methods aim to improve the discrimination and coherence of clusters. The impact of such advanced embeddings and domain-specific knowledge sources on clustering algorithms like K-means, spectral clustering, and fuzzy clustering is a key focus. The theme also includes leveraging LLM embeddings and hybrid semantic techniques to improve clustering purity and topic separability.

Fuzzy WordNet-Based Document Representation and Clustering using Regularized K-Means

by Aravind Dupati and

2023, Fuzzy WordNet-Based Document Representation and Clustering using Regularized K-Means

Key finding: By integrating fuzzy membership values derived from semantic relationships captured by WordNet into TF-IDF weighted document vectors, and applying a regularized K-means clustering with adaptive group LASSO penalties, this... Read more

articleView Paper downloadDownload

Clustering Document based on Semantic Similarity Using Graph Base Spectral Algorithm

by rowaida ibrahim and

2023, IEEE Xplore

Key finding: This work utilizes semantic similarity derived from document summaries and lexical preprocessing from NLTK to construct TF-IDF matrices, subsequently clustered via graph-based spectral methods. Applying this approach to movie... Read more

articleView Paper downloadDownload

Text Clustering with Large Language Model Embeddings

by Nuno Fachada

2024, International Journal of Cognitive Computing in Engineering

Key finding: This study evaluates embeddings from large language models (LLMs) such as GPT-3.5 Turbo and BERT in combination with traditional clustering algorithms, demonstrating that LLM embeddings capture language subtleties and... Read more

articleView Paper downloadDownload

Clustering Data Text Based on Semantic

by Parisa Zandieh

2023

Key finding: This paper proposed a novel hierarchical clustering algorithm incorporating semantic similarity between words calculated via WordNet ontology combined with TF-IDF vectors. By representing documents semantically, the algorithm... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Text Clustering

Machines in the conversation: Detecting themes and trends in informal communication streams

by James Newswanger

2025, IBM Systems Journal

Data-mining techniques that detect trends and patterns in structured data are often illsuited for analysis of unstructured text. Information critical to business-and generated by groups such as employees, customers, and the public-appears... more

descriptionView Paper arrow_downwardDownload

Novel semantic tagging detection algorithms based non-negative matrix factorization

by Mohamed Haggag

2025, SN Applied Sciences

The tagging aims to address a challenge to search relevant text-documents given a set of tags. In addition, the tag-based approaches received a wide attention as a possible solution to the big-content. Probabilistic topic model methods,... more

descriptionView Paper arrow_downwardDownload

Automatic Non-Horizontal Scene Text Recognition from a Document Image

by Prof. Shivanand Sharnappa Gornale

2025, International Journal of Machine Intelligence

The detection and extraction of scene text from document images is one of the challenging research areas. Many researchers have detected and extracted the text from plain text background. But the multi-oriented scene text detection is one... more

descriptionView Paper arrow_downwardDownload

Using a knowledge base to disambiguate personal name in web search results

by Quang Minh Vũ

2025, Proceedings of the 2007 ACM symposium on Applied computing

Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different people, an effective method is needed to measure document... more

descriptionView Paper arrow_downwardDownload

Overview of RepLab 2012: Evaluating Online Reputation Management Systems

by Enrique Amigó

2025, CLEF (Online Working Notes/Labs/Workshop)

This paper summarizes the goals, organization and results of the first RepLab competitive evaluation campaign for Online Reputation Management Systems (RepLab 2012). RepLab focused on the reputation of companies, and asked participant... more

descriptionView Paper arrow_downwardDownload

WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks

by Enrique Amigó

2025

The third WePS (Web People Search) Evaluation campaign took place in 2009-2010 and attracted the participation of 13 research groups from Europe, Asia and North America. Given the top web search results for a person name, two tasks were... more

descriptionView Paper arrow_downwardDownload

Text classification toward a scientific forum

by Xijin Tang

2025, Journal of Systems Science and Systems Engineering

Text mining, also known as discovering knowledge from the text, which has emerged as a possible solution for the current information explosion, refers to the process of extracting non-trivial and useful patterns from unstructured text.... more

descriptionView Paper arrow_downwardDownload

Text classification toward a scientific forum

by Xijin Tang

2025, Journal of Systems Science and Systems Engineering

descriptionView Paper arrow_downwardDownload

Study of Text Based Mining

by Ranjna Garg and

2025

Text based Mining is the process of analyzing a document or set of documents to understand the content and meaning of the information they contain. Text Mining enhances human's ability to Process massive quantities of information and it... more

descriptionView Paper arrow_downwardDownload

Boyut Azaltmanın Bulanık C-Ortalama Kümeleme Teknikleri Üzerindeki Etkisi

by CEMALETTİN KUBAT

2025, Veri Bilimi

Bulanık c-ortalama kümeleme, literatürde farklı alanlarda kullanılan yaygın kümeleme algoritmalarından biridir. Boyut küçültme, büyük veri kümelerini, en az bilgi kaybıyla eşdeğeri olan daha küçük boyutlu veri kümelerine dönüştüren bir... more

descriptionView Paper arrow_downwardDownload

Knowledge-based vector space model for text clustering

by Joshua Zhexue Huang

2025, Knowledge and Information Systems

This paper presents a new knowledge-based vector space model (VSM) for text clustering. In the new model, semantic relationships between terms (e.g., words or concepts) are included in representing text documents as a set of vectors. The... more

descriptionView Paper arrow_downwardDownload

İnsansız Araçlarla Düzlemsel Olmayan Araçların Taranması

by fatih semiz

2025, arXiv (Cornell University)

Günümüzde insansız araçlarla alan taraması, yani bir alanın tümünün veya bir kısmının insansız araçlarla en az efor ile dolaşılması, alan taramasına duyulan ihtiyaç ve insansız araçların kullanımının artmasıyla beraber hızla önem... more

descriptionView Paper arrow_downwardDownload

Patch Relational Neural Gas – Clustering of Huge Dissimilarity Datasets

by Alexander Hasenfuß

2025, Springer eBooks

Clustering constitutes an ubiquitous problem when dealing with huge data sets for data compression, visualization, or preprocessing. Prototype-based neural methods such as neural gas or the self-organizing map offer an intuitive and fast... more

descriptionView Paper arrow_downwardDownload

Patch Relational Neural Gas – Clustering of Huge Dissimilarity Datasets

by Alexander Hasenfuß

2025, Artificial Neural Networks in Pattern Recognition

descriptionView Paper arrow_downwardDownload

Particle swarm optimizer for variable weighting in clustering high-dimensional data

by Shengrui Wang

2025, Machine Learning

In this paper, we present a particle swarm optimizer (PSO) to solve the variable weighting problem in projected clustering of high-dimensional data. Many subspace clustering algorithms fail to yield good cluster quality because they do... more

descriptionView Paper arrow_downwardDownload

News Representation with Multi-Word Features

by Christoph Schommer

2025

Information is commonly reflected in news articles. However, texts are unstructured and thus demanding to analyze automatically. To identify and capture the facts in a news story we propose a novel approach, which utilizes natural... more

descriptionView Paper arrow_downwardDownload

Aggregating skip bigrams into key phrase-based vector space model for web person disambiguation

by Qin Lu

2025

Web Person Disambiguation (WPD) is often done through clustering of web documents to identify the different namesakes for a given name. This paper presents a clustering algorithm using key phrases as the basic feature. However, key... more

descriptionView Paper arrow_downwardDownload

Candidate Cluster Extraction for Hierarchical Document Clustering

by Mohammad Atique

2025

Text Document are tremendously increasing in the internet, the hierarchical document clustering has proven to be useful in grouping similar document for large applications. Still most documents suffer from problems of high dimensionality,... more

descriptionView Paper arrow_downwardDownload

Uma abordagem baseada na web para resolução de entidades e criação de aquivos de autoridade

by denilson pereira

2025

descriptionView Paper arrow_downwardDownload

Latent Ontological Feature Discovery for Text Clustering

by Van Khanh Duong

2025, 2009 IEEE-RIVF International Conference on Computing and Communication Technologies

The content of a text is mainly defined by keywords and named entities occurring in it. In particular for news articles, named entities are usually important to define their semantics. However, named entities have ontological features,... more

descriptionView Paper arrow_downwardDownload

Coloured semantic networks for content analysis

by Jean Pierre Malrieu

2025, Quality & Quantity

This paper adapts a widespread formalism of Knowledge Representation known in the AI literature as J. Sowa's Conceptual Graphs to the purposes of Content Analysis. It is proposed that instead of nested contexts, negation and modalities... more

descriptionView Paper arrow_downwardDownload

A Fuzzy-Ontology Based Information Retrieval System for Relevant Feedback

by Ifiok Udo

2025

Obtaining correct and relevant information at the right time to user's query is quite a difficult task. This becomes even complex, if the query terms have many meanings and occur in different varieties of domain. This paper presents a... more

descriptionView Paper arrow_downwardDownload

A Review on Novel approach for Text Compression

by Pallavi Surwade

2025

Generally, textual data sets are represented by using different models. But, sometimes it does not capture the text structure, or some models that preserves text structure. Vector space model is also known as the ‘bag of word model’. To... more

descriptionView Paper arrow_downwardDownload

Novel Approach for Text Compression

by Pallavi Surwade

2025, International Journal of Advance Research and Innovative Ideas in Education

Generally, textual data sets are represented by using different models. But, sometimes it does not capture the text arrangement as it is. Vector space model is also recognized as the bag of word model. To represent textual document using... more

descriptionView Paper arrow_downwardDownload

Novel Approach for Text Compression

by Pallavi Surwade

2025, International Journal of Advance Research and Innovative Ideas in Education

descriptionView Paper arrow_downwardDownload

Detection of Real Target Number from Radar Target Echo Signal Environment based on Fuzzy C Mean Clustering

by Derya AVCI

2024, DergiPark (Istanbul University)

Günümüzde, karmaşık radar hedef eko sinyal ortamlarındaki gerçek hedef sayısının tespiti işlemi savunma sistemleri ve haberleşme alanında önem arz eden konular arasında yer almaktadır. Özellikle, birden fazla hedef sinyalinin bulunduğu bu... more

descriptionView Paper arrow_downwardDownload

Transformer Mimarisi: Dil Modeli Katmanlarının Matematiksel Temelleri

by Çağrı Bilgehan

2024

descriptionView Paper arrow_downwardDownload

Transformer Mimarisi: Dil Modeli Katmanlarının Matematiksel Temelleri

by Çağrı Bilgehan

2024, F. Cagri Bilgehan

Bu çalışma, Transformer mimarisinin giriş katmanlarının matematiksel temellerine odaklanmaktadır. Giriş katmanları, ham veriyi modelin işleyebilecegi anlamlı bir forma dönüştürmek için kritik bir rol oynar. Bu baglamda, tokenizasyon, embedding ve positional encoding süreçleri detaylışekilde incelenmiştir. Tokenizasyon aşamasında, ham veri belirli kurallara göre anlamlı birimlere ayrılmış, ardından bu birimler sayısal bir forma dönüştürülerek model için uygun bir giriş dizisi oluşturulmuştur. Embedding aşamasında, bu sayısal temsiller yüksek boyutlu vektörler halinde ifade edilmiş ve her bir token için anlam taşıyan vektör temsilleri elde edilmiştir. Positional encoding süreci ise sinüs ve kosinüs tabanlı fonksiyonlarla token sırasını koruyan ek bilgiler saglamıştır. Bu çalışmada, tüm bu süreçlerin matematiksel modelleri ayrıntılı olarak açıklanmış ve giriş katmanlarının Transformer'ın genel performansına olan katkısı vurgulanmıştır. Giriş katmanlarının matematiksel temellerini anlamak, Transformer mimarisinin baglam modelleme kapasitesini daha derinlemesine kavramak için bir rehber sunmaktadır. Here's the English translation of the summary: Abstract This study focuses on the mathematical foundations of the input layers in the Transformer architecture. Input layers play a critical role in transforming raw data into a meaningful format that can be processed by the model. In this context, the processes of tokenization, embedding, and positional encoding are analyzed in detail. During tokenization, raw data is divided into meaningful units based on specific rules, and these units are then converted into numerical representations to form a suitable input sequence for the model. In the embedding stage, these numerical representations are expressed as high-dimensional vectors, providing meaningful vector representations for each token. The positional encoding process incorporates additional information to preserve token order using sine and cosine-based functions. This study elaborates on the mathematical models of all these processes and highlights the contribution of input layers to the overall performance of the Transformer. Understanding the mathematical foundations of input layers offers a deeper insight into the contextual modeling capacity of the Transformer architecture.

descriptionView Paper arrow_downwardDownload

AntTree: A Web Document Clustering Using Artificial Ants

by Christiane Guinot

2024

We present in this work a new algorithm for document hierarchical clustering and automatic generation of portals sites. This model is inspired from the self-assembling behavior observed in real ants where ants progressively get attached... more

descriptionView Paper arrow_downwardDownload

Cross-domain Text Classification using Wikipedia

by Carlotta Domeniconi

2024

Traditional approaches to document classification requires labeled data in order to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available, and often too expensive to obtain, especially for large... more

descriptionView Paper arrow_downwardDownload

Cross-domain Text Classification using Wikipedia

by Carlotta Domeniconi

2024, IEEE Intelligent Informatics Bulletin

AbstractTraditional approaches to document classification requires labeled data in order to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available, and often too expensive to obtain, especially for... more

descriptionView Paper arrow_downwardDownload

The university of amsterdam at weps3

by Manos Tsagkias

2024, CLEF 2010 (Notebook Papers/LABs/Workshops)

Abstract. In this paper we describe our participation in the Third Web People Search (WePS3) evaluation campaign. We took part in the Online Reputation Management (ORM) task. Ambiguity of organization names (eg,“Amazon” or “Apple”) raises... more

descriptionView Paper arrow_downwardDownload

The university of amsterdam at weps2

by Manos Tsagkias

2024, 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference

In this paper we describe our participation in the Second Web People Search workshop (WePS2) and detail our approaches. For the clustering task, our focus was on replicating the lessons learned at WEPS1 on the data set made available as... more

descriptionView Paper arrow_downwardDownload

Text Clustering with Large Language Model Embeddings

by Nuno Fachada

2024, International Journal of Cognitive Computing in Engineering

Text clustering is an important method for organising the increasing volume of digital content, aiding in the structuring and discovery of hidden patterns in uncategorised data. The effectiveness of text clustering largely depends on the... more

descriptionView Paper arrow_downwardDownload

Unsupervised text mining methods for literature analysis: a case study for Thomas Pynchon's V

by Iraklis Tsatsoulis

2024, Orbit: Writing Around Pynchon

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a... more

descriptionView Paper arrow_downwardDownload

Rightnow eservice center: internet customer service using a self-learning knowledge base

by Doug Warner

2024, National Conference on Artificial Intelligence

Delivering effective customer service via the Internet requires attention to many aspects of knowledge management if it is to be convenient and satisfying for customers, while at the same time efficient and economical for the company or... more

descriptionView Paper arrow_downwardDownload

Çoklu Gösterim Veritabanları Ve Navigasyon Haritası Tasarımı

by Necla Ulugtekin

2024, İTÜDERGİSİ/d

descriptionView Paper arrow_downwardDownload

Automated Text Summarization Base on Lexicales Chain and graph Using of WordNet and Wikipedia Knowledge Base

by Mohsen Pourvali

2024

The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of... more

The algorithm presented in this paper is based on lexical chains therefore the system needs to deeply analyze the text. Per word has a sense based on it’s position in the sentence. For instance, the word bank in the follow sentences has different senses: Beautiful bank of river” and “Bank failures were a major disaster”. _In first sentence bank means river’s coast, but in the second sentence 1 means economic bank. The most appropriate sense must be chosen for this word and it cause increasing the connectedness in a lexical chain. In the algorithm presented in this paper , word sense are calculated locally . in this way the best word sense is extracted .we also use WordNe as an external source for disambiguation

Fig. 2 Sample graph built on the 2 words let w; be a word in the document ,and w; have n senses {wi,,Wi,» wey Wigs vee w;, }.in this procedure for finding the meaning of two words related locally together and placed in the same sentence , we assume all of the possible meanings and senses of per word as the first level of the traversing word tree then we process every sense in a returning algorithm .Next , we connect all the relations for hat sense as it’s descendants ,and these descendants are generated through relations that are Hypernym ,.... We do his process in a returning manner for n levels. Next, every first level sense of the one word compare with all the first evel senses of the other word .Afterwards, the numbers of equalities are considered in integer digit .the same comparison is done for another word .if there isn’t any equality, for each word we choose first sense that is most common. In the above figure, we illustrate the relations of the tree .the root of the tree is considered as the target word, and the first level nodes as the senses of the target words. The nodes of the second, third,...levels are senses related with the first level nodes with Hypernym ,... relations. This tree is generated using returning functions and traversing of the tree is in the returning manner.

.verage values of evaluation metrics for summarization methods (DUC02 dataset). Table 2:

descriptionView Paper arrow_downwardDownload

Validation of Text Clustering Based on Document Contents

by A. Visa

2024, Lecture Notes in Computer Science

In this paper some results of a new text clustering methodology are presented. A prototype is an interesting document or a part of an extracted, interesting text. The given prototype is matched with the existing document database or the... more

descriptionView Paper arrow_downwardDownload

Combining data and text mining techniques for analysing financial reports

by A. Visa

2024, Intelligent Systems in Accounting, Finance & Management

There is a vast amount of financial information on companies' financial performance available to investors today. While automatic analysis of financial figures is common, it has been difficult to automatically extract meaning from the... more

Figure 1. The Feature Planes for the Ratios Operating Margin, Return on Total Assets, and Equity to Capital. The map was created using SOM_PAK, a SOM training software package developed at the Helsinki University of Technology (Kohonen et al., 1996). The U-matrix map is visualized using the software Nenet v1.la. The trained map from Karlsson et al. (2001a) was also used in the experiment, and the relevant data were mapped on to this existing map. The training parameters for the map are illustrated in Table 1.

Figure 2. The Identified Clusters and the Quarterly Movements of Ericsson, Motorola, and Nokia

Table 1. The Used Training Parameters Defining the Clusters

Table 3. The Closest Matches to Every Report in the Collection (Sentence Level)

descriptionView Paper arrow_downwardDownload

Geographic Spatiotemporal Dynamic Model using Cellular Automata and Data Mining Techniques

by Aniati M. Arymurthy

2024, ijcsi.org

IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 3, No. 2, May 2011 ISSN (Online): 1694-0814 www.IJCSI.org ... Ahmad Zuhdi1, Aniati Murni Arymurthy2 and Heru Suhartanto3 ... 1 Informatics Engineering Dept., Trisakti... more

descriptionView Paper arrow_downwardDownload

Boyut Azaltmanın Bulanık C-Ortalama Kümeleme Teknikleri Üzerindeki Etkisi

by Cemalettin Kubat

2024

descriptionView Paper arrow_downwardDownload

Pragmatic text mining

by Jaap Suermondt

2024, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

We discuss our experiences in analyzing customer-support issues from the unstructured free-text fields of technical-support call logs. The identification of frequent issues and their accurate quantification is essential in order to track... more

descriptionView Paper arrow_downwardDownload

Galaksimizdeki 20 Açık Yıldız Kümesinin CCD UBVRI Fotometrisi

by Yüksel Karataş

2024

Bu calismada Meksika San Pedro Martir (SPM) Ulusal Astronomi Gozlemevi'nde 84 cm teleskop ile gozlenmis 20 acik yildiz kumesinin CCD UBVRI fotometrisi verileri ile gozlemsel parametreleri olan renk artiklari, metal ve agir element... more

descriptionView Paper arrow_downwardDownload

Interactive Discovery System for Direct Democracy

by Andreas Kaltenbrunner

2024

Decide Madrid is the civic technology of Madrid City Council which allows users to create and support online petitions. Despite the initial success, the platform is encountering problems with the growth of petition signing because... more

descriptionView Paper arrow_downwardDownload

Assessing the feasibility of self-organizing maps for data mining financial information

by A. Visa

2024

Analyzing financial performance in today's information-rich society can be a daunting task. With the evolution of the Internet, access to massive amounts of financial data, typically in the form of financial statements, is widespread.... more

descriptionView Paper arrow_downwardDownload

Galaksimizdeki 20 Açık Yıldız Kümesinin CCD UBVRI Fotometrisi

by Yusuf Karataş

2024

descriptionView Paper arrow_downwardDownload

Domain Based Punjabi Text Document Clustering

by Vishal Rajan

2024

Text Clustering is a text mining technique which is used to group similar documents into single cluster by using some sort of similarity measure & separating the dissimilar documents. Popular clustering algorithms available for text... more

descriptionView Paper arrow_downwardDownload

Using Morpheme-Level Attention Mechanism for Turkish Sequence Labelling

by Burcu Can

2024

Özetçe-Derin öğrenmenin doğal dil işleme problemlerinde kullanılmaya başlaması ile bu alandaki birçok problemin çözümünde ciddi iyileşmeler olmuştur. Kelime dizilerini etiketleme problemlerinin de derin öğrenme yöntemleri ile sıkça... more

descriptionView Paper arrow_downwardDownload

Context Based Automatic Spelling Correction for Turkish

by Burcu Can

2024

descriptionView Paper arrow_downwardDownload