Exploring Inter Tagger Consistency Measures

Margaret Kipp

Outline

Title

Abstract

Introduction

Discussion and Conclusions

Exploring Inter Tagger Consistency Measures

Margaret Kipp

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Margaret EI Kipp. In 20th Annual SIG/CR Classification Research Workshop (6 November 2009). Kipp and Campbell (2006) examined tags assigned to the same URL in del.icio.us and determined that MDS and frequency graphs showed clusters of related terms as well as divergences ...

Related papers

Tag relatedness in image folksonomies

Gabriele Gianini

Folksonomies-networks of users, resources, and tags allow users to easily retrieve, organize and browse web contents. However, their advantages are still limited mainly due to the noisiness of user provided tags. To overcome this issue, we propose an approach for characterizing related tags in folksonomies: we use tag co-occurrence statistics and Laplacian score based feature selection in order to create empirical co-occurrence probability distribution for each tag; then we identify related tags on the basis of the dissimilarity between their distributions. For this purpose, we introduce variant of the Jensen-Shannon Divergence, which is more robust to statistical noise. We experimentally evaluate our approach using WordNet and compare it to a common tag-relatedness approach based on the cosine similarity. The results show the effectiveness of our approach and its advantage over the competing method. RÉSUMÉ. Folksonomies-Les réseaux sociaux, les ressources disponibles sur le web et les tags utilisateurs qui y sont associés permettent de facilement récupérer, organiser du contenu et naviguer sur le web. Cependant, leurs avantages restent limités, principalement à cause du caractère bruité des tags proposés par les utilisateurs. Pour pallier cette difficulté, nous proposons une méthode pour regrouper les tags similaires dans une folksonomie : les cooccurrences entre tags et le "Laplacian Score" sont utilisées pour définir, pour chaque tag, une distribution de probabilité empirique ; les tags supposés liés sont identifiés selon les similarités entre leurs distributions. Dans ce but, nous présentons une variante de la divergence de Jensen-Shannon, plus résistante au bruit. Nous évaluons notre approche expérimentalement à l'aide de WordNet et la comparons à une méthode classique de recherche de similarité entre tags, basée sur la similarité cosinus. Les résultats de notre évaluation montrent l'efficacité de notre approche et ses avantages par rapport aux méthodes concurrentes.

downloadDownload free PDF View PDFchevron_right

How to Measure the Consistency of the Tagging of Scientific Papers?

Boris Veytsman

2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2019

A collection of scientific papers is usually accompanied by tags (keywords, topics, concepts etc.), associated with each paper. Sometimes these tags are human-generated, sometimes they are machine-generated. The evaluation of the tagging quality is an important problem. We propose a simple metrics of tagging consistency for scientific papers: whether these tags are predictive of citations. Since the authors tend to cite papers about the topics close to those of their publications, a consistent tagging should be able to predict citations. We present an algorithm to calculate consistency, and show experiments with human-and machine-generated tags. We show that the addition of machine-generated tags to the manual ones can enhance tagging consistency. We further introduce cross-consistency metrics, the ability to predict citation links between papers tagged by different taggers, e.g. humans and computers. Cross-consistency metrics can be used to evaluate tagging quality of a tagger when...

downloadDownload free PDF View PDFchevron_right

Tag Similarity in Folksonomies

Gabriele Gianini

Folksonomies-collections of user-contributed tags, proved to be efficient in reducing the inherent semantic gap. However, user tags are noisy; thus, they need to be processed before they can be used by further applications. In this paper, we propose an approach for bootstrapping semantics from folksonomy tags. Our goal is to automatically identify semantically related tags. The approach is based on creating probability distribution for each tag based on co-occurrence statistics. Subsequently, the similarity between two tags is determined by the distance between their corresponding probability distributions. For this purpose, we propose an extension for the well-known Jensen-Shannon Divergence. We compared our approach to a widely used method for identifying similar tags based on the cosine measure. The evaluation shows promising results and emphasizes the advantage of our approach.

downloadDownload free PDF View PDFchevron_right

Measuring the Relevancy between Tags and Citation in Social Web

ghani rehman

Research Journal of Applied Sciences, Engineering and Technology

With the advent of web, massive information is available to the internet users. One can acquire information from this according to his or her own field of interest; for example we can have large amount of information on bioinformatics available on the web, computer researcher community can found any type of published data at any period of time with just a single click on the Google or any other well renewed web search engines. Filtering the most relevant information from a large dump of online information is considered a challenging task, which is gaining popularity in the web research community. Now, various scientific tools and techniques have been introduced which enable the users to extract the relevant and required information. The accuracy of the information extracted is an interrogative mark. In research community the citation is very common term. Citations are used to extract the historic information relevant to some particular topic. But the citation of a specific research ...

downloadDownload free PDF View PDFchevron_right

Emergence of consensus and shared vocabularies in collaborative tagging systems

Harry Halpin

ACM Transactions on the Web ( …, 2009

This paper uses data from the social bookmarking site del.icio.us to empirically examine the dynamics of collaborative tagging systems and to study how coherent categorization schemes emerge from unsupervised tagging by individual users.

downloadDownload free PDF View PDFchevron_right

Long time no see: The probability of reusing tags as a function of frequency and recency

P. Seitlinger, D. Kowald, Christoph Trattner

In this paper, we introduce a tag recommendation algorithm that mimics the way humans draw on items in their long-term memory. This approach uses the frequency and recency of previous tag assignments to estimate the probability of reusing a particular tag. Using three real-world folksonomies gathered from bookmarks in BibSonomy, CiteULike and Flickr, we show how adding a time-dependent component outperforms conventional "most popular tags" approaches and another existing and very effective but less theory-driven, time-dependent recommendation mechanism. By combining our approach with a simple resource-specific frequency analysis, our algorithm outperforms other well-established algorithms, such as FolkRank, Pairwise Interaction Tensor Factorization and Collaborative Filtering. We conclude that our approach provides an accurate and computationally efficient model of a user's temporal tagging behavior. We show how effective principles for information retrieval can be designed and implemented if human memory processes are taken into account.

downloadDownload free PDF View PDFchevron_right

Patterns and Inconsistencies in Collaborative Tagging Systems: An Examination of Tagging Practices

Margaret Kipp

Proceedings of The Asist Annual Meeting, 2006

This paper analyzes the tagging patterns exhibited by users of del.icio.us, to assess how collaborative tagging supports and enhances traditional ways of classifying and indexing documents. Using frequency data and co-word analysis matrices analyzed by multi-dimensional scaling, the authors discovered that tagging practices to some extent work in ways that are continuous with conventional indexing. Small numbers of tags tend to emerge by unspoken consensus, and inconsistencies follow several predictable patterns that can easily be anticipated. However, the tags also indicated intriguing practices relating to time and task which suggest the presence of an extra dimension in classification and organization, a dimension which conventional systems are unable to facilitate.

downloadDownload free PDF View PDFchevron_right

Begelman vs. FolkRank. The Comparison of Two Algorithms in the Tag Recom-mendation Context: An Exploratory Study

Maria Emmanouil

2015

Collaborative tagging systems allow users to assign keywords, so called tags, to resources (anything with a URL 1) giving them a meaning based on their expertise or knowledge, this is what it's called a Folksonomy. However, these systems require of a means that help them interpreting this meaning and finding patterns, coming from the collaborative tagging process. Several recommendation algorithms have been proposed and implemented in order to solve this problem. Most of these algorithms use well-known techniques, mainly, from the Machine Learning (ML), Artificial Intelligence (AI) and Information Retrieval (IR) fields. Others are based on graph theory and co-occurrence counting, exploiting the structure of a Folksonomy.

downloadDownload free PDF View PDFchevron_right

Classifying Web Term Relationships : An Examination of the Search Result Pages of Two Major Search Engines

elizabeth milonas

2012

An examination of search result terms (SRT) of two major search engines and the classification of these terms into the three thesaural relationships – equivalence, hierarchical and associative, indicating their occurrence outside of a controlled vocabulary setting and demonstrating a naturally occurring phenomena in language.

downloadDownload free PDF View PDFchevron_right

The complex dynamics of collaborative tagging

Harry Halpin

Proceedings of the 16th …, 2007

The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including whether coherent categorization schemes can emerge from unsupervised tagging by users. This paper uses data from the social bookmarking site del.icio.us to examine the dynamics of collaborative tagging systems. In particular, we examine whether the distribution of the frequency of use of tags for "popular" sites with a long history (many tags and many users) can be described by a power law distribution, often characteristic of what are considered complex systems. We produce a generative model of collaborative tagging in order to understand the basic dynamics behind tagging, including how a power law distribution of tags could arise. We empirically examine the tagging history of sites in order to determine how this distribution arises over time and to determine the patterns prior to a stable distribution. Lastly, by focusing on the high-frequency tags of a site where the distribution of tags is a stabilized power law, we show how tag co-occurrence networks for a sample domain of tags can be used to analyze the meaning of particular tags given their relationship to other tags.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

Fabio Crestani

Proceedings of the 32nd …, 2009

downloadDownload free PDF View PDFchevron_right

Clemens Cap

Proceeding of the 2008 ACM workshop on Search in social media - SSM '08, 2008

Recent growth of social classification systems due to steadily increasing popularity has established a multitude of heterogeneous isolated, non-integrated, and non-interoperable tag spaces. Contrary to current research predominantly focusing on single folksonomies, we exploit cross-space similarities to improve a variety of tagging use cases beyond the limits of one folksonomy. This paper presents the results of practical studies concerning cross-space analysis of (co-)tag spaces of five well-established social classification services for tagging of bookmarks (del.icio.us, BibSonomy bookmarks), and publications (BibSonomy publications, Ci-teULike, Connotea). The studies are based on one month data sets of RSS recent feeds from the same time scope. We provide a profound motivation for cross-space tagging, and give insight into similarities and intersections of (top ranking) (co-)tag spaces as well as convergence aspects over time.

downloadDownload free PDF View PDFchevron_right

08391 Working Group Summary--Analyzing Tag Semantics Across Tagging Systems}

Vito Servedio

The objective of our group was to exploit state-of-the-art Information Retrieval methods for finding associations and dependencies between tags, capturing and representing differences in tagging behavior and vocabulary of various folksonomies, with the overall aim to better understand the semantics of tags and the tagging process. Therefore we analyze the semantic content of tags in the Flickr and Delicious folksonomies. We find that: tag context similarity leads to meaningful results in Flickr, despite its narrow folksonomy character; the comparison of tags across Flickr and Delicious shows little semantic overlap, being tags in Flickr associated more to visual aspects rather than technological as it seems to be in Delicious; there are regions in the tag-tag space, provided with the cosine similarity metric, that are characterized by high density; the order of tags inside a post has a semantic relevance.

downloadDownload free PDF View PDFchevron_right

Quality Metrics for Tags of Broad Folksonomies

Tanguy Coenen

Proceedings of I-Semantics, …, 2008

downloadDownload free PDF View PDFchevron_right

Ontologies and tag-statistics

Tamás Vicsek

New Journal of Physics, 2012

Due to the increasing popularity of collaborative tagging systems, the research on tagged networks, hypergraphs, ontologies, folksonomies and other related concepts is becoming an important interdisciplinary area with great potential and relevance for practical applications. In most collaborative tagging systems the tagging by the users is completely 'flat', while in some cases they are allowed to define a shallow hierarchy for their own tags. However, usually no overall hierarchical organization of the tags is given, and one of the interesting challenges of this area is to provide an algorithm generating the ontology of the tags from the available data. In contrast, there are also other types of tagged networks available for research, where the tags are already organized into a directed acyclic graph (DAG), encapsulating the 'is a sub-category of' type of hierarchy between each other. In this paper, we study how this DAG affects the statistical distribution of tags on the nodes marked by the tags in various real networks. The motivation for this research was the fact that understanding the tagging based on a known hierarchy can help in revealing the hidden hierarchy of tags in collaborative tagging systems. We analyse the relation between the tagfrequency and the position of the tag in the DAG in two large sub-networks of the English Wikipedia and a protein-protein interaction network. We also study the tag co-occurrence statistics by introducing a two-dimensional (2D) tag-distance distribution preserving both the difference in the levels and the absolute distance in the DAG for the co-occurring pairs of tags. Our most interesting finding is that the local relevance of tags in the DAG (i.e. their rank or significance as 2 characterized by, e.g., the length of the branches starting from them) is much more important than their global distance from the root. Furthermore, we also introduce a simple tagging model based on random walks on the DAG, capable of reproducing the main statistical features of tag co-occurrence. This model has high potential for further practical applications, e.g., it can provide the starting point for a benchmark system in ontology retrieval or it may help pinpoint unusual correlations in the co-occurrence of tags. Acknowledgments 18 Appendix 18 References 20

downloadDownload free PDF View PDFchevron_right

Classifying Web Term Relationships - ISKO 2012

Elizabeth Milonas

downloadDownload free PDF View PDFchevron_right

Exploiting tag similarities to discover synonyms and homonyms in folksonomies

Luca Mazzola

Software: Practice and Experience, 2013

Tag-based systems are widely available, thanks to their intrinsic advantages, such as self-organization, currency, and ease of use. Although they represent a precious source of semantic metadata, their utility is still limited. The inherent lexical ambiguities of tags strongly affect the extraction of structured knowledge and the quality of tag-based recommendation systems. In this paper, we propose a methodology for the analysis of tag-based systems, addressing tag synonymy and homonymy at the same time in a holistic approach: in more detail, we exploit a tripartite graph to reduce the problem of synonyms and homonyms; we apply a customized version of Tag Context Similarity to detect them, overcoming the limitations of current similarity metrics; finally, we propose the application of an overlapping clustering algorithm to detect contexts and homonymies, then evaluate its performances, and introduce a methodology for the interpretation of its results. journal special issues (e.g., ACM RecSys ¶ or UMAP || conference, or SASWeb workshops series, ** ACM Transactions on Intelligent Systems and Technology, and so on) are devoted to them.

downloadDownload free PDF View PDFchevron_right

The dynamics and semantics of collaborative tagging

Harry Halpin

… of the 1st Semantic Authoring and …, 2006

The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including the dynamics of such systems and whether coherent classification schemes can emerge from undirected tagging by users. Currently millions of users are using collaborative tagging without centrally organizing principles, and many suspect this exhibits features considered to be indicative of a complex system. If this is the case, it remains to be seem whether collaborative tagging by users over time leads to emergent classification schemes that could be formalized into an ontology usable by the Semantic Web. This paper uses data from "popular" tagged sites on the social bookmarking site del.icio.us to examine the dynamics of such collaborative tagging systems. In particular, we are trying to determine whether the distribution of tag frequencies stabilizes, which indicates a degree of cohesion or consensus among users about the optimal tags to describe particular sites. We use tag co-occurrence networks for a sample domain of tags to analyze the meaning of particular tags given their relationship to other tags and automatically create an ontology. We also produce a generative model of collaborative tagging in order to model and understand some of the basic dynamics behind the process.

downloadDownload free PDF View PDFchevron_right

Extracting Usage Patterns and the Analysis of Tag Connection Dynamics within Collaborative Tagging Systems

Nicolae Tomai

Informatica Economica, 2013

Collaborative tagging has become a very popular way of annotation, thanks to the fact that any entity may be labeled by any individual based on his own reason. In this paper we present the results of the case study carried out on the basis of data gathered at different time intervals from the social tagging system developed and implemented on Întelepciune.ro. Analyzing collective data referring to the way in which community members associate different tags, we have observed that between tags, links are formed which become increasingly stable with the passing of time. Following the application of methodology specific to network analysis, we have managed to extract information referring to tag popularity, their influence within the network and the degree to which a tag depends upon another. As such, we have succeeded in determining different semantic structures within the collective tagging system and see their evolution at different stages in time. Furthermore, we have pictured the way in which tag recommendations can be executed and that they can be integrated within recommendation systems. Thus, we will be able to identify experts and trustworthy content based on different categories of interest.

downloadDownload free PDF View PDFchevron_right

An integrated approach to discover tag semantics

Davide Eynard, Luca Mazzola

2011

Tag-based systems have become very common for online classification thanks to their intrinsic advantages such as self-organization and rapid evolution. However, they are still affected by some issues that limit their utility, mainly due to the inherent ambiguity in the semantics of tags. Synonyms, homonyms, and polysemous words, while not harmful for the casual user, strongly affect the quality of search results and the performances of tag-based recommendation systems.

downloadDownload free PDF View PDFchevron_right