Papers by George Papadakis
Proceedings of the International Workshop on Semantic Web Information Management - SWIM '11, 2011
Blocking methods are crucial for making the inherently quadratic task of Entity Resolution more e... more Blocking methods are crucial for making the inherently quadratic task of Entity Resolution more efficient. The blocking methods proposed in the literature rely on the homogeneity of data and the availability of binding schema information; thus, they are inapplicable to the voluminous, noisy, and highly heterogeneous data of the Web 2.0 user-generated content. To deal with such data, attribute-agnostic blocking has been recently introduced, following a two-fold strategy: the first layer places entities into overlapping blocks in order to achieve high effectiveness, while the second layer reduces the number of unnecessary comparisons in order to enhance efficiency.
ACM International Conference Proceeding Series, 2012
Text classification constitutes a popular task in Web research with various applications that ran... more Text classification constitutes a popular task in Web research with various applications that range from spam filtering to sentiment analysis. To address it, patterns of cooccurring words or characters are typically extracted from the textual content of Web documents. However, not all documents are of the same quality; for example, the curated content of news articles usually entails lower levels of noise than the user-generated content of the blog posts and the other Social Media.
Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services - iiWAS '10, 2010
The Semantic Web is constantly gaining momentum, as more and more Web sites and content providers... more The Semantic Web is constantly gaining momentum, as more and more Web sites and content providers adopt its principles. At the core of these principles lies the Linked Data movement, which demands that data on the Web shall be annotated and linked among different sources, instead of being isolated in data silos. In order to materialize this vision of a web of semantics, existing resource identifiers should be reused and shared between different Web sites. This is not always the case with the current state of the Semantic Web, since multiple identifiers are, more often than not, redundantly introduced for the same resources.

Proceedings of the fourth ACM international conference on Web search and data mining - WSDM '11, 2011
We have recently witnessed an enormous growth in the volume of structured and semi-structured dat... more We have recently witnessed an enormous growth in the volume of structured and semi-structured data sets available on the Web. An important prerequisite for using and combining such data sets is the detection and merge of information that describes the same real-world entities, a task known as Entity Resolution. To make this quadratic task efficient, blocking techniques are typically employed. However, the high dynamics, loose schema binding, and heterogeneity of (semi-)structured data, impose new challenges to entity resolution. Existing blocking approaches become inapplicable because they rely on the homogeneity of the considered data and a-priory known schemata. In this paper, we introduce a novel approach for entity resolution, scaling it up for large, noisy, and heterogeneous information spaces. It combines an attribute-agnostic mechanism for building blocks with intelligent block processing techniques that boost blocks with high expected utility, propagate knowledge about identified matches, and preempt the resolution process when it gets too expensive. Our extensive evaluation on real-world, large, heterogeneous data sets verifies that the suggested approach is both effective and efficient.

Proceedings of the 21st international conference companion on World Wide Web - WWW '12 Companion, 2012
Social Network (SN) environments are the ideal future service marketplaces. It is well known and ... more Social Network (SN) environments are the ideal future service marketplaces. It is well known and documented that SN users are increasing at a tremendous pace. Taking advantage of these social dynamics as well as the vast volumes, of amateur content generated every second, is a major step towards creating a potentially huge market of services. In this paper, we describe the external web services that SocIoS project is researching and developing, and will support with the Social Media community. Aiming to support the end users of SNs, to enhance their transactions with more automated ways, and with the advantage for better production and performance in their workflows over SNs inputs and content, this work presents the main architecture, functionality, and benefits per external service. Finally, introduces the end user, into the new era of SNs with business applicability and better social transactions over SNs content.
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics - WIMS '12, 2012
Users of Social Media typically gather into communities on the basis of some common interest. The... more Users of Social Media typically gather into communities on the basis of some common interest. Their interactions inside these on-line communities follow several, interesting patterns. For example, they differ in the level of influence they exert to the rest of the group: some community members are actively involved, affecting a large part of the community with their actions, while the majority comprises plain participants (e.g., information consumers). Identifying users of the former category lies on the focus of interest of many recent works, as they can be employed in a variety of applications, like targeted marketing.

IEEE Transactions on Knowledge and Data Engineering, 2000
Entity Resolution is an inherently quadratic task that typically scales to large data collections... more Entity Resolution is an inherently quadratic task that typically scales to large data collections through blocking. In the context of highly heterogeneous information spaces, blocking methods rely on redundancy in order to ensure high effectiveness at the cost of lower efficiency (i.e., more comparisons). This effect is partially ameliorated by coarse-grained block processing techniques that discard entire blocks either a-priori or during the resolution process. In this paper, we introduce meta-blocking as a generic procedure that intervenes between the creation and the processing of blocks, transforming an initial set of blocks into a new one with substantially fewer comparisons and equally high effectiveness. In essence, meta-blocking aims at extracting the most similar pairs of entities by leveraging the information that is encapsulated in the block-to-entity relationships. To this end, it first builds an abstract graph representation of the original set of blocks, with the nodes corresponding to entity profiles and the edges connecting the co-occurring ones. During the creation of this structure all redundant comparisons are discarded, while the superfluous ones can be removed by pruning of the edges with the lowest weight. We analytically examine both procedures, proposing a multitude of edge weighting schemes, graph pruning algorithms as well as pruning criteria. Our approaches are schema-agnostic, thus accommodating any type of blocks. We evaluate their performance through a thorough experimental study over three large-scale, real-world datasets, with the outcomes verifying significant efficiency enhancements at a negligible cost in effectiveness.
Uploads
Papers by George Papadakis