Entity Resolution

description257 papers

group65 followers

lightbulbAbout this topic

Entity Resolution is the process of identifying and merging records that refer to the same real-world entity across different data sources, ensuring data consistency and accuracy. It involves techniques from data cleaning, deduplication, and record linkage to resolve ambiguities and discrepancies in data representation.

lightbulbAbout this topic

Key research themes

1. How can schema-agnostic and scalable blocking techniques improve entity resolution on heterogeneous and noisy big data?

This research theme focuses on developing blocking methods that do not require prior schema knowledge and can efficiently handle large, heterogeneous, and noisy datasets. Blocking is a critical step in entity resolution (ER) that partitions datasets into smaller blocks to reduce the quadratic comparison cost. Addressing schema heterogeneity and noise while maintaining blocking effectiveness and scalability is essential for processing Big Data ER tasks.

An Effective Entity Resolution Approach for Big Data

by Ali El-bastawissy

2022, International Journal of Innovative Technology and Exploring Engineering

Key finding: Proposes a novel schema-agnostic ER approach that treats entity attributes as bags of words and uses n-grams combined with Apache Spark for scalable processing. The method avoids complex schema alignment and meta-blocking... Read more

articleView Paper downloadDownload

Incremental Entity Blocking over Heterogeneous Streaming Data

by Tiago Brasileiro

2023, Information

Key finding: Introduces a schema-agnostic blocking technique that incrementally processes streaming, noisy, and heterogeneous data using distributed infrastructure. The approach applies attribute selection and top-n neighborhood... Read more

articleView Paper downloadDownload

A noise tolerant and schema-agnostic blocking technique for entity resolution

by Demetrio Mestre

2025, Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing

Key finding: Presents NA-BLOCKER, a novel noise-tolerant, schema-agnostic blocking technique using Locality Sensitive Hashing (LSH) to hash attribute values. NA-BLOCKER enhances block quality and effectiveness over the state-of-the-art... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What are effective distributed and clustering-based methods for scalable multi-source entity resolution?

This theme investigates methods that use distributed computing frameworks and clustering algorithms to tackle ER involving multiple heterogeneous data sources. By focusing on clustering to group matching entities across many datasets and exploiting parallel processing platforms like Apache Flink and Apache Spark, these approaches aim to improve scalability and integration quality in multi-source ER scenarios.

Comparative Evaluation of Distributed Clustering Schemes for Multi-source Entity Resolution

by Alieh Saeedi

2022

Key finding: Implements distributed versions of six clustering algorithms on Apache Flink for multi-source ER, demonstrating that clustering-based approaches improve match quality and scalability by grouping related entities across... Read more

articleView Paper downloadDownload

Three-dimensional Entity Resolution with JedAI

by Sonia BERGAMASCHI

2025, Information Systems

Key finding: JediAI system facilitates building end-to-end ER pipelines combining schema-awareness, budget-awareness, and execution mode dimensions, supporting schema-agnostic and schema-based blocking/matching. It offers both serial and... Read more

articleView Paper downloadDownload

Scaling Up Record-level Matching Rules

by Sonia BERGAMASCHI

2025, SEBD

Key finding: Proposes RulER, a method to efficiently execute complex record-level matching rules combining multiple similarity predicates on distributed MapReduce-like systems. It enables parallel and distributed processing of similarity... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can multi-type or graph-based entity representations improve unsupervised entity resolution and disambiguation?

This research area explores leveraging graph structures and multi-type entity models for unsupervised entity resolution and named entity disambiguation. Techniques focus on jointly resolving entities of different types by summarizing multi-typed RDF graphs, exploiting relational context, and applying graph-based semantic relatedness for disambiguating entities without relying on supervised learning.

Unsupervised Entity Resolution on Multi-type Graphs

by Linhong Zhu

2018

Key finding: Formulates ER as a multi-type graph summarization problem, jointly clustering nodes of different types that represent the same entity, and inferring the importance of relations between entity types. The approach outperforms... Read more

articleView Paper downloadDownload

Cultural Knowledge for Named Entity Disambiguation: A Graph-Based Semantic Relatedness Approach

by Ziqi Zhang

2023, Serdica Journal of Computing

Key finding: Proposes a graph-based random walk semantic relatedness method over Wikipedia that models only named entities and their contextual links for Named Entity Disambiguation (NED). The approach achieves state-of-the-art accuracy... Read more

articleView Paper downloadDownload

Probabilistic Entity Linkage for Heterogeneous Information Spaces

by Ekaterini Ioannou

2024, Lecture Notes in Computer Science

Key finding: Introduces a Bayesian network-based algorithm for probabilistic entity linkage that models evidences and their interdependencies from heterogeneous information spaces. The method supports incremental update of matching... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Entity Resolution

Duplicate Record Detection: A Survey

by first name lastname

2007, IEEE Transactions on Knowledge and Data Engineering

Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of... more

descriptionView Paper arrow_downwardDownload

Frameworks for entity matching: A comparison

by Erhard Rahm

2010, Data & Knowledge Engineering

Entity matching is a crucial and difficult task for data integration. Entity matching frameworks provide several methods and their combination to effectively solve different match tasks. In this paper, we comparatively analyze 11 proposed... more

descriptionView Paper arrow_downwardDownload

Evaluation of entity resolution approaches on real-world match problems

by Erhard Rahm

2010, Proceedings of the VLDB Endowment

Despite the huge amount of recent research efforts on entity resolution (matching) there has not yet been a comparative evaluation on the relative effectiveness and efficiency of alternate approaches. We therefore present such an... more

descriptionView Paper arrow_downwardDownload

Entity resolution with iterative blocking

by Georgia Koutrika

2009, Proceedings of the 35th SIGMOD international conference on Management of data - SIGMOD '09

Entity Resolution (ER) is the problem of identifying which records in a database refer to the same real-world entity. An exhaustive ER process involves computing the similarities between pairs of records, which can be very expensive for... more

descriptionView Paper arrow_downwardDownload

A taxonomy of privacy-preserving record linkage techniques

by Dinusha Vatsalan and

2012

P. Christen), verykios@eap.gr (V.S. Verykios). Information Systems ] (]]]]) ]]]-]]] Please cite this article as: D. Vatsalan, et al., A taxonomy of privacy-preserving record linkage techniques, Information Systems (2013), http://dx.

Fig. 1. Outline of the general record linkage process as discussed in detail in Section 3. The second step in record linkage is indexing [19], which is aimed at reducing the number of comparisons that need to be conducted between records by removing as many record pairs as possible that are unlikely to correspond to matches [17]. Only pairs that are poten- tially matching, the so-called ‘candidate record pairs’ among which we expect to find matches, are brought together to be compared in detail in the next step, the

Fig. 2. Outline of the general privacy-preserving record linkage process as described in Section 4.

Fig. 3. The 15 dimensions used to characterize privacy-preserving record linkage techniques. Abbreviations shown in brackets are those used in Table 1. The evaluation of linkage quality in a privacy-preserving context is challenging, because in PPRL access to the actual record values is unlikely to be possible as this would reveal private or confidential information about these records. How to evaluate linkage quality using any of the measures In this section we describe a taxonomy for PPRL techniques. Our aim in developing this taxonomy is to provide a clearer picture of current approaches to PPRL, and to identify gaps in these techniques which will help us to identify directions for future research. We describe 15 dimensions of PPRL which we categorize into five main

Fig. 4. Secure hash encoding for exact matching as used by Van Eycken et al. [126], Weber et al. [76], and Quantin et al. [86-89].

Kan08: A multi-party approach based on a general- ization technique (k-anonymity) for person-specific bio- medical data was introduced by Kantarcioglu et al. [108] in 2008. This approach performs efficient secure joins of encrypted databases by a third party without decrypting or inferring the contents of the joined records. It is guaranteed that each record can be linked to no less than

Fig. 6. k-Anonymized records (k=2) as used by Kantarcioglu et al. [108], Inan et al. [74], and Mohammed et al. [109].

Dur10: Durham et al. [113] in 2010 adopted Schnell et al.’s Bloom filters approach [114] in their work to evaluate three different PPRL approaches. They investigated deterministic classification techniques for exact comparison, probabilistic Fig. 7. Bloom filter mapping as used by Schnell et al. [114], Karakasidis et al. [115], and Durham et al. [113,128].

Fig. 8. Secure edit-distance for PPRL as proposed by Atallah et al. [129]. Rav04: In 2004, Ravikumar et al. [130] used SMC techniques for secure computation of several distance functions. In their work, they presented methods for approximate comparison of values using string distance metrics, specifically TF-IDF, SoftTF-IDF and the Euclidean distance. They use a secure stochastic dot product proto- col for secure computation of these distance metrics. The protocol is developed in the setting of two parties with a HBC adversary model. The use of SMC computations for achieving privacy makes the protocol computationally intensive. To overcome this drawback, they use sampling techniques to control the amount of communication between the two parties. Experiments on the publicly available Cora bibliographic dataset [13] showed high linkage quality with average precision of 0.85 after 1000 samples.

Fig. 10. Blocking aware private record linkage using hash signatures (HS) as proposed by Al-Lawati et al. [72]. F is an array of floating-point numbers containing TF-IDF weights.

Fig. 12. Reference value based similarity calculation as used by Pang et al. [103]. With ED being the edit distance function described in Section 3.2, the triangular inequality holds: ED(‘pete’,‘pedro’) < ED (‘pete’,‘peter’) + ED(‘pedro’, ‘peter’).

Fig. 11. Value generalization hierarchies as used by Inan et al. [74] and Mohammed et al. [109].

Characterization of the privacy-preserving record linkage techniques surveyed in Section 6.

descriptionView Paper arrow_downwardDownload

Framework for evaluating clustering algorithms in duplicate detection

by Oktie Hassanzadeh

2009, Proceedings of the …

descriptionView Paper arrow_downwardDownload

Data Cleaning and Query Answering with Matching Dependencies and Matching Functions

by Laks Lakshmanan

2013, Theory of Computing Systems

Matching dependencies were recently introduced as declarative rules for data cleaning and entity resolution. Enforcing a matching dependency on a database instance identifies the values of some attributes for two tuples, provided that the... more

EXAMPLE 2. Consider the set of MDs © consisting of 1: RIA] ® R[A] > R[B] = R[B] and ge: R[B,C] = R[B,C] > R[D| = R[D]. The similarities are: ai & a2, bo & bs, c2 © c3. Instance Do below is not a stable instance, i.e., it does not satisfy y1,y2. We start by enforcing yi on Do. Let (b1,b2) in instance D; denote the value that replaces bi and bz to enforce yi on instance Do, and assume that (b1, 62) % bs. Now, (Do, Di) F 1. However, (Do, Di) F 2.

EXAMPLE 9. (Example [8]continued.) By assuming that old similarities hold after applying matching functions (e.g., (bi, b2) =* bg), we obtain the (Do, “)-over clean instance D+ shown below. Notice that for the two (Do, ©)-clean instances D2, D3 in Ex- ample [4] we have Dz EC Dy and D3 EC D+. If we pose query OQ: To(TA=azR) to Dy, we obtain Q(Dy+) = {(c1, c2,¢3) }. Observe that Poss9(Do) = {(ci,¢c2,¢3)}, and thus Q(D;) provides an over-approximation for Possq(Do). It can be seen that an arbitrary (Do, “)-clean instance, say D2 for in- stance, may not provide a complete approximation to pos- sible answer since Possg(Do) Z Q(D2) = {(c1, c2) }. a

descriptionView Paper arrow_downwardDownload

Search engine driven author disambiguation

by Min-Yen Kan

2006, Proceedings of the 6th ACM/IEEE-CS joint …

In scholarly digital libraries, author disambiguation is an important task that attributes a scholarly work with specific authors. This is critical when individuals share the same name. We present an approach to this task that analyzes... more

descriptionView Paper arrow_downwardDownload

Parallel sorted neighborhood blocking with mapreduce

by Erhard Rahm

2010

Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel... more

descriptionView Paper arrow_downwardDownload

Efficient entity resolution for large heterogeneous information spaces

by George Papadakis

2011, Proceedings of the fourth ACM international conference on Web search and data mining - WSDM '11

We have recently witnessed an enormous growth in the volume of structured and semi-structured data sets available on the Web. An important prerequisite for using and combining such data sets is the detection and merge of information that... more

descriptionView Paper arrow_downwardDownload

Creating probabilistic databases from duplicated data

by Oktie Hassanzadeh

2009, … Journal on Very Large Data Bases

descriptionView Paper arrow_downwardDownload

MOMA - A Mapping-based Object Matching System

by Erhard Rahm

2007, Conference on Innovative Data Systems Research

Object matching or object consolidation is a crucial task for data in- tegration and data cleaning. It addresses the problem of identifying object instances in data sources referring to the same real world entity. We propose a flexible... more

descriptionView Paper arrow_downwardDownload

Unifying Logical and Statistical AI

by Stanley Kok

Intelligent agents must be able to handle the complexity and uncertainty of the real world. Logical AI has focused mainly on the former, and statistical AI on the latter. Markov logic combines the two by attaching weights to first-order... more

Figure 1: Ground Markov network obtained by applying an MLN containing the formulas Vx Smokes(x) = Cancer(x) and VxVy Friends(x, y) > (Smokes(x) = Smokes(y)) to the constants Anna(A) and Bob(B).

descriptionView Paper arrow_downwardDownload

Entity Matching in Online Social Networks

by Michael Fire and

SocialCom 2013

In recent years, Online Social Networks (OSNs) have essentially become an integral part of our daily lives. There are hundreds of OSNs, each with its own focus and offers for particular services and functionalities. To take... more

descriptionView Paper arrow_downwardDownload

Qualitative effects of knowledge rules and user feedback in probabilistic data integration

by Maurice Van Keulen

2009, The VLDB Journal

descriptionView Paper arrow_downwardDownload

DAMIA - A Data Mashup Fabric for Intranet Applications

by Ashutosh Singh

2007

Damia is a lightweight enterprise data integration service where line of business users can create and catalog high value data feeds for consumption by situational applications. Damia is inspired by the Web 2.0 mashup phenomenon. It... more

Figure 1: Architecture of the Damia server This section provides an overview of Damia system by briefly describing main components, which are depicted in Figure 1. 2.1 User interface

available as web services. Damia provides operators to extract information from sequences (Extract), to filter tuples (Filter), to iterate over items in a sequence (Iterate), to construct a new sequence from other sequences (Construct), as well as operators to join (Fuse), sort (Sort), aggregate (Group), and perform other sophisticated operations over the sequence data.

In this scenario, an insurance agent wants to know which of his home-owner insurance customers are at risk because of a storm. Figure 4: Spreadsheet with policy holders

descriptionView Paper arrow_downwardDownload

Benchmarking declarative approximate selection predicates

by Oktie Hassanzadeh

2007, Proceedings of the …

descriptionView Paper arrow_downwardDownload

A generic Web-based entity resolution framework

by Laender Alves

2011, Journal of The American Society for Information Science and Technology

Web data repositories usually contain references to thousands of real-world entities from multiple sources. It is not uncommon that multiple entities share the same label (polysemes) and that distinct label variations are associated with... more

descriptionView Paper arrow_downwardDownload

Just Add Weights: Markov Logic for the Semantic Web

by Stanley Kok

2008, Lecture Notes in Computer Science

In recent years, it has become increasingly clear that the vision of the Semantic Web requires uncertain reasoning over rich, firstorder representations. Markov logic brings the power of probabilistic modeling to first-order logic by... more

descriptionView Paper arrow_downwardDownload

Efficient Spectral Neighborhood Blocking for Entity Resolution

by Liangcai Shu

research.google.com

In many telecom and web applications, there is a need to identify whether data objects in the same source or different sources represent the same entity in the real-world. This problem arises for subscribers in multiple services,... more

descriptionView Paper arrow_downwardDownload

by Peter Christen and

2009, … of the 18th ACM conference on …

Entity resolution, also known as data matching or record linkage, is the task of identifying and matching records from several databases that refer to the same entities. Traditionally, entity resolution has been applied in batch-mode and... more

Figure 3: Similarity-aware index resulting from the example records from Figure 1. The similarity index is shown in the top left, the block index in the middle right, and the record identifier index at the bottom.

Figure 4: Summary experimental results: Build time (left); memory usage (middle); and average query time per record (right). Note that all three graphs are shown with a logarithmic y-axis scale.

Figure 5: Query matching accuracy for the full test data set for varying number of modifications per record. Similar accuracy results were achieved for the smaller test data sets.

Figure 7: Proportion of case 1 (query attribute value is available in similarity-aware index) to case 1 plus case 2 (new unknown attribute value) for varying number of modifications per record.

Figure 1: Example records with surname values and their Soundex encodings, used to illustrate the two index approaches in Figures 2 and 3.

Table 1: Characteristics of the data set used for experiments.

descriptionView Paper arrow_downwardDownload

HIL

by Georgia Koutrika

2013, Proceedings of the 16th International Conference on Extending Database Technology - EDBT '13

We introduce HIL, a high-level scripting language for entity resolution and integration. HIL aims at providing the core logic for complex data processing flows that aggregate facts from large collections of structured or unstructured data... more

Figure 4: Integration times in the Financial Scenario

Figure 5: Performance of HIL fusion over Twitter data: (a) effect of co-group optimization, (b) total fusion time.

Table 1: Characteristics of the SEC data to integrate.

descriptionView Paper arrow_downwardDownload

Comparative evaluation of entity resolution approaches with FEVER

by Erhard Rahm

2009, Proceedings of the VLDB Endowment

We present FEVER, a new evaluation platform for entity resolution approaches. The modular structure of the FEVER framework supports the incorporation or reconstruction of many previously proposed approaches for entity resolution. A... more

descriptionView Paper arrow_downwardDownload

Linking Entity Resolution and Risk

by German G Creamer

2010, Eastern Economic Journal

A major emerging problem among consumer finance institutions is that customers that are not well recognized might be riskier than customers that are fully recognized. Fortunately, financial institutions count with external vendors... more

descriptionView Paper arrow_downwardDownload

A Self-Verifying Clustering Approach to Unsupervised Matching of Product Titles

by Leonidas Akritidis

2020, Artificial Intelligence Review

The continuous growth of the e-commerce industry has rendered the problem of product retrieval particularly important. As more enterprises move their activities on the Web, the volume and the diversity of the product-related information... more

descriptionView Paper arrow_downwardDownload

AI and Global Science and Technology Assessment

by Chaomei Chen

2009, IEEE Expert / IEEE Intelligent Systems

Addressing the research opportunities we've identified could substantially broaden the spectrum of multilingual text-mining and its practicality for supporting global S&T knowledge management. These opportunities also share a common set... more

descriptionView Paper arrow_downwardDownload

Towards scalable real-time entity resolution using a similarity-aware inverted index approach

by Peter Christen and

2008, Proceedings of AusDM

Most research into entity resolution (also known as record linkage or data matching) has concentrated on the quality of the matching results. In this paper, we focus on matching time and scalability, with the aim to achieve large-scale... more

descriptionView Paper arrow_downwardDownload

The missing links

by George Papadakis

2010, Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services - iiWAS '10

The Semantic Web is constantly gaining momentum, as more and more Web sites and content providers adopt its principles. At the core of these principles lies the Linked Data movement, which demands that data on the Web shall be annotated... more

In order to evaluate the vector-based approach to URI similarity, we consider the following settings: given a query q: (ui = u;), the system will respond yes if u; appears in the top-k matching candidates of u;, otherwise it will respond no. Table 4 presents the effectiveness of this approach based on the inverted index of tokens of the URIs in ManuallySplit. We can see that Precision is high, because it is unlikely that the system replies “yes” for URIs that do not match; they always have a rather low TF-IDF similarity. Recall, on the other hand, is generally low and increases only when raising the number & of retrieved URIs. Table 4: Effectiveness of the baseline for various val- ues of threshold k on ManuallySplit.

We applied the method using the PI(S) form on the other two, much larger ground-truths, namely SameAs and IFP. Table 5 summarizes the effectiveness measures. We notice that Recall is higher for SameAs, while Precision lies at the same level for both datasets. The difference in the Recall should be expected, due to the presence of machine gener- ated, straightforward links between resources in the SameAs dataset. Contrariwise, the /FP dataset contains statements that are mostly manually generated, and thus more difficult to be identified. The fairly competitive recall of 0.66 verifies the robustness of our approach as well as its applicability to URIs that have not been mapped automatically.

descriptionView Paper arrow_downwardDownload

Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations

by Leonidas Akritidis

2018, Proceedings of the 14th IEEE International Conference on Innovations in Intelligent Systems and Applications (INISTA)

The problem of matching product titles is of particular interest for both users and marketers. The former, frequently search the Web with the aim of comparing prices and characteristics , or obtaining and aggregating information provided... more

descriptionView Paper arrow_downwardDownload

Declarative entity resolution via matching dependencies and answer set programs

by Laks Lakshmanan and

2012

Entity resolution (ER) is an important and common problem in data cleaning. It is about identifying and merging records in a database that represent the same real-world entity. Recently, matching dependencies (MDs) have been introduced... more

descriptionView Paper arrow_downwardDownload

Declarative analysis of noisy information networks

by Walaa Eldin Moustafa

2011

There is a growing interest in methods for analyzing data describing networks of all types, including information, biological, physical, and social networks. Typically the data describing these networks is observational, and thus noisy... more

descriptionView Paper arrow_downwardDownload

A Hybrid Model Words-Driven Approach for Web Product Duplicate Detection

by Flavius Frasincar

2013, Lecture Notes in Computer Science

The detection of product duplicates is one of the challenges that Web shop aggregators are currently facing. In this paper, we focus on solving the problem of product duplicate detection on the Web. Our proposed method extends a... more

descriptionView Paper arrow_downwardDownload

To compare or not to compare

by George Papadakis

2011, Proceedings of the International Workshop on Semantic Web Information Management - SWIM '11

Blocking methods are crucial for making the inherently quadratic task of Entity Resolution more efficient. The blocking methods proposed in the literature rely on the homogeneity of data and the availability of binding schema information;... more

descriptionView Paper arrow_downwardDownload

Fast and accurate incremental entity resolution relative to an entity knowledge base

by Aamod Sane

2012, Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12

User facing topical web applications such as events or shopping sites rely on large collections of data records about real world entities that are updated at varying latencies ranging from days to seconds. For example, event venue details... more

descriptionView Paper arrow_downwardDownload

Learning-based entity resolution with MapReduce

by Lars Kolb and

2011, Proceedings of the third international workshop on Cloud data management - CloudDB '11

Entity resolution is a crucial step for data quality and data integration. Learning-based approaches show high effectiveness at the expense of poor efficiency. To reduce the typically high execution times, we investigate how learningbased... more

descriptionView Paper arrow_downwardDownload

Scaling multiple-source entity resolution using statistically efficient transfer learning

by Benjamin Rubinstein

2012, Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12

We consider a serious, previously-unexplored challenge facing almost all approaches to scaling up entity resolution (ER) to multiple data sources: the prohibitive cost of labeling training data for supervised learning of similarity scores... more

descriptionView Paper arrow_downwardDownload

Extraction and Compilation of Events and Sub-events from Twitter

by Lipika Dey

2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology

Twitter has emerged as a great source to provide insights about upcoming planned and unplanned events of social, economic and political relevance. Big events are publicized and known in advance, but smaller, unplanned sub-events around... more

descriptionView Paper arrow_downwardDownload

Addressing mobile information overload in the universal inbox through lenses

by Christopher Paretti

2010, Proceedings of the 12th international conference on Human computer interaction with mobile devices and services - MobileHCI '10

Increasingly, smartphones are being used to access all manner of information: email messages, Facebook status updates, tweets, RSS feeds, photographs and more. Approaches to dealing with this multi-faceted information stream developed on... more

descriptionView Paper arrow_downwardDownload

Accuracy of approximate string joins using grams

by Oktie Hassanzadeh

Proc. of the International Workshop on …

descriptionView Paper arrow_downwardDownload

Web-based affiliation matching

by Erhard Rahm and

2009

Authors of scholarly publications state their affiliation in various forms. This kind of heterogeneity makes bibliographic analysis tasks on institutions impossible unless a comprehensive cleaning and consolidation of affiliation data is... more

descriptionView Paper arrow_downwardDownload

Merging data sources based on semantics, contexts and trust

by Dejan Lavbič

descriptionView Paper arrow_downwardDownload

AN ENTITY-DRIVEN RECURSIVE NEURAL NETWORK MODEL FOR CHINESE DISCOURSE COHERENCE MODELING

by International Journal of Artificial Intelligence (IJAIA)

Chinese discourse coherence modeling remains a challenge taskin Natural Language Processing field.Existing approaches mostlyfocus on the need for feature engineering, whichadoptthe sophisticated features to capture the logic or syntactic... more

descriptionView Paper arrow_downwardDownload

A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS

by Computer Science & Information Technology (CS & IT) Computer Science Conference Proceedings (CSCP)

Social networks initially had been places for people to contact each other, find friends or new acquaintances. As such they ever proved interesting for machine aided analysis. Recent developments, however, pivoted social networks to being... more

descriptionView Paper arrow_downwardDownload

Trio-ER: The Trio System as a Workbench for Entity-Resolution

by parag agrawal

descriptionView Paper arrow_downwardDownload

Entity Identification in Documents Expressing Shared Relationships

by JaMia Moore

2007

This paper addresses the problem of entity identification in documents in which key identity attributes are missing. The most common approach is to take a single entity reference and determine the "best match" of its attributes to a set... more

descriptionView Paper arrow_downwardDownload

Tractable vs. Intractable Cases of Matching Dependencies for Query Answering under Entity Resolution

by Leopoldo Bertossi

Matching Dependencies (MDs) are a relatively recent proposal for declarative entity resolution. They are rules that specify, on the basis of similarities satisfied by values in a database, what values should be considered duplicates, and... more

descriptionView Paper arrow_downwardDownload

Theoretical foundations of entity resolution models

by András József Molnár

2014, Annales Univ. Sci. Budapest., Sect. Comp.

Data quality is crucial in all information systems. As a key step in obtaining clean data, record linkage or entity resolution (ER) groups database records by the underlying real world entities. In this pa- per we give practical... more

descriptionView Paper arrow_downwardDownload

Significant information encapsulation and valence exploitation (SIEVE) for discovery

by Rakesh Nagi

2011, 14th International Conference on Information Fusion

In intelligence analysis environments, content such as entities, events and relationships appear in different source documents and contexts, and relating them is a challenging and intensive task. This paper presents an approach to... more

descriptionView Paper arrow_downwardDownload

Simulated Entity Resolution by Diverse Means: DIMACS Work on the KDD Challenge of 2005

by Alexander Genkin and

2006

descriptionView Paper arrow_downwardDownload

Establishing traveler identity using collective identity resolution

by Donald Kretz

2010, … for Homeland Security (HST), 2010 IEEE …

Every day, millions of people cross international borders by air or sea. A nation's ability to identify and neutralize threats posed by travelers depends heavily on an accurate and proactive methodology for establishing traveler identity.... more

descriptionView Paper arrow_downwardDownload

Entity Resolution

Key research themes

1. How can schema-agnostic and scalable blocking techniques improve entity resolution on heterogeneous and noisy big data?

2. What are effective distributed and clustering-based methods for scalable multi-source entity resolution?

3. How can multi-type or graph-based entity representations improve unsupervised entity resolution and disambiguation?

Related Topics

All papers in Entity Resolution