Exploring Inter Tagger Consistency Measures
Sign up for access to the world's latest research
Abstract
Margaret EI Kipp. In 20th Annual SIG/CR Classification Research Workshop (6 November 2009). Kipp and Campbell (2006) examined tags assigned to the same URL in del.icio.us and determined that MDS and frequency graphs showed clusters of related terms as well as divergences ...
Related papers
Folksonomies-networks of users, resources, and tags allow users to easily retrieve, organize and browse web contents. However, their advantages are still limited mainly due to the noisiness of user provided tags. To overcome this issue, we propose an approach for characterizing related tags in folksonomies: we use tag co-occurrence statistics and Laplacian score based feature selection in order to create empirical co-occurrence probability distribution for each tag; then we identify related tags on the basis of the dissimilarity between their distributions. For this purpose, we introduce variant of the Jensen-Shannon Divergence, which is more robust to statistical noise. We experimentally evaluate our approach using WordNet and compare it to a common tag-relatedness approach based on the cosine similarity. The results show the effectiveness of our approach and its advantage over the competing method. RÉSUMÉ. Folksonomies-Les réseaux sociaux, les ressources disponibles sur le web et les tags utilisateurs qui y sont associés permettent de facilement récupérer, organiser du contenu et naviguer sur le web. Cependant, leurs avantages restent limités, principalement à cause du caractère bruité des tags proposés par les utilisateurs. Pour pallier cette difficulté, nous proposons une méthode pour regrouper les tags similaires dans une folksonomie : les cooccurrences entre tags et le "Laplacian Score" sont utilisées pour définir, pour chaque tag, une distribution de probabilité empirique ; les tags supposés liés sont identifiés selon les similarités entre leurs distributions. Dans ce but, nous présentons une variante de la divergence de Jensen-Shannon, plus résistante au bruit. Nous évaluons notre approche expérimentalement à l'aide de WordNet et la comparons à une méthode classique de recherche de similarité entre tags, basée sur la similarité cosinus. Les résultats de notre évaluation montrent l'efficacité de notre approche et ses avantages par rapport aux méthodes concurrentes.
2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019
A collection of scientific papers is usually accompanied by tags (keywords, topics, concepts etc.), associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. The evaluation of the tagging quality is an important problem. We propose a simple metrics of tagging consistency for scientific papers: whether these tags are predictive of citations. Since the authors tend to cite papers about the topics close to those of their publications, a consistent tagging should be able to predict citations. We present an algorithm to calculate consistency, and show experiments with human-and machine-generated tags. We show that the addition of machine-generated tags to the manual ones can enhance tagging consistency. We further introduce cross-consistency metrics, the ability to predict citation links between papers tagged by different taggers, e.g. humans and computers. Cross-consistency metrics can be used to evaluate tagging quality of a tagger when...
Folksonomies-collections of user-contributed tags, proved to be efficient in reducing the inherent semantic gap. However, user tags are noisy; thus, they need to be processed before they can be used by further applications. In this paper, we propose an approach for bootstrapping semantics from folksonomy tags. Our goal is to automatically identify semantically related tags. The approach is based on creating probability distribution for each tag based on co-occurrence statistics. Subsequently, the similarity between two tags is determined by the distance between their corresponding probability distributions. For this purpose, we propose an extension for the well-known Jensen-Shannon Divergence. We compared our approach to a widely used method for identifying similar tags based on the cosine measure. The evaluation shows promising results and emphasizes the advantage of our approach.
Research Journal of Applied Sciences, Engineering and Technology
With the advent of web, massive information is available to the internet users. One can acquire information from this according to his or her own field of interest; for example we can have large amount of information on bioinformatics available on the web, computer researcher community can found any type of published data at any period of time with just a single click on the Google or any other well renewed web search engines. Filtering the most relevant information from a large dump of online information is considered a challenging task, which is gaining popularity in the web research community. Now, various scientific tools and techniques have been introduced which enable the users to extract the relevant and required information. The accuracy of the information extracted is an interrogative mark. In research community the citation is very common term. Citations are used to extract the historic information relevant to some particular topic. But the citation of a specific research ...
ACM Transactions on the Web ( …, 2009
This paper uses data from the social bookmarking site del.icio.us to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users.
In this paper, we introduce a tag recommendation algorithm that mimics the way humans draw on items in their long-term memory. This approach uses the frequency and recency of previous tag assignments to estimate the probability of reusing a particular tag. Using three real-world folksonomies gathered from bookmarks in BibSonomy, CiteULike and Flickr, we show how adding a time-dependent component outperforms conventional "most popular tags" approaches and another existing and very effective but less theory-driven, time-dependent recommendation mechanism. By combining our approach with a simple resource-specific frequency analysis, our algorithm outperforms other well-established algorithms, such as FolkRank, Pairwise Interaction Tensor Factorization and Collaborative Filtering. We conclude that our approach provides an accurate and computationally efficient model of a user's temporal tagging behavior. We show how effective principles for information retrieval can be designed and implemented if human memory processes are taken into account.
Proceedings of The Asist Annual Meeting, 2006
This paper analyzes the tagging patterns exhibited by users of del.icio.us, to assess how collaborative tagging supports and enhances traditional ways of classifying and indexing documents. Using frequency data and co-word analysis matrices analyzed by multi-dimensional scaling, the authors discovered that tagging practices to some extent work in ways that are continuous with conventional indexing. Small numbers of tags tend to emerge by unspoken consensus, and inconsistencies follow several predictable patterns that can easily be anticipated. However, the tags also indicated intriguing practices relating to time and task which suggest the presence of an extra dimension in classification and organization, a dimension which conventional systems are unable to facilitate.
2015
Collaborative tagging systems allow users to assign keywords, so called tags, to resources (anything with a URL 1) giving them a meaning based on their expertise or knowledge, this is what it's called a Folksonomy. However, these systems require of a means that help them interpreting this meaning and finding patterns, coming from the collaborative tagging process. Several recommendation algorithms have been proposed and implemented in order to solve this problem. Most of these algorithms use well-known techniques, mainly, from the Machine Learning (ML), Artificial Intelligence (AI) and Information Retrieval (IR) fields. Others are based on graph theory and co-occurrence counting, exploiting the structure of a Folksonomy.
2012
An examination of search result terms (SRT) of two major search engines and the classification of these terms into the three thesaural relationships – equivalence, hierarchical and associative, indicating their occurrence outside of a controlled vocabulary setting and demonstrating a naturally occurring phenomena in language.
Proceedings of the 16th …, 2007
The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including whether coherent categorization schemes can emerge from unsupervised tagging by users. This paper uses data from the social bookmarking site del.icio.us to examine the dynamics of collaborative tagging systems. In particular, we examine whether the distribution of the frequency of use of tags for "popular" sites with a long history (many tags and many users) can be described by a power law distribution, often characteristic of what are considered complex systems. We produce a generative model of collaborative tagging in order to understand the basic dynamics behind tagging, including how a power law distribution of tags could arise. We empirically examine the tagging history of sites in order to determine how this distribution arises over time and to determine the patterns prior to a stable distribution. Lastly, by focusing on the high-frequency tags of a site where the distribution of tags is a stabilized power law, we show how tag co-occurrence networks for a sample domain of tags can be used to analyze the meaning of particular tags given their relationship to other tags.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.