Academia.eduAcademia.edu

Outline

Embracing Uncertainty in Entity Linking

2012, Semantic Search over the Web

https://doi.org/10.1007/978-3-642-25008-8_9

Abstract
sparkles

AI

The paper presents a novel approach to entity linkage in the context of heterogeneous, uncertain, and volatile data on the Web. It addresses the challenges posed by the traditional methods which rely on static data and predefined thresholds. The approach incorporates probabilities to reflect confidence in entity attributes, enhancing the robustness and accuracy of linking entities extracted from diverse sources such as Wikipedia and online databases.

References (36)

  1. Adar, E., Re, C.: Managing uncertainty in social networks. IEEE Data Eng. Bull. 15-22 (2007)
  2. Agrawal, P., Benjelloun, O., Sarma, A., Hayworth, C., Nabar, S., Sugihara, T., Widom, J.: Trio: a system for data, uncertainty, and lineage. VLDB, pp. 1151-1154 (2006)
  3. Andritsos, P., Fuxman, A., Miller, R.: Clean answers over dirty databases: a probabilistic approach. ICDE (2006)
  4. Antova, L., Koch, C., Olteanu, D.: 10 .10/ 6 worlds and beyond: efficient representation and processing of incomplete information. VLDB J. 18(5), 1021-1040 (2009)
  5. Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Whang, S., Widomr, J., Jonas, J.: Swoosh: a generic approach to entity resolution. VLDB J. 18(1), 255-276 (2009)
  6. Bex, G., Neven, F., Vansummeren, S.: Inferring xml schema definitions from xml data. VLDB, pp. 998-1009 (2007)
  7. Bhattacharya, I., Getoor, L.: Iterative record linkage for cleaning and integration. DMKD, pp. 11-18 (2004)
  8. Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intel. Syst. 18(5), 16-23 (2003)
  9. Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name- matching tasks. IIWeb, pp. 73-78 (2003)
  10. Dalvi, N., Kumar, R., Pang, B., Ramakrishnan, R., Tomkins, A., Bohannon, P., Keerthi, S., Merugu, S.: A web of concepts. PODS, pp. 1-12 (2009)
  11. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523-544 (2007)
  12. Dalvi, N., Suciu, D.: Management of probabilistic data: foundations and challenges. PODS, pp. 1-12 (2007)
  13. Dasu, T., Johnson, T.: Exploratory Data Mining and Data Cleaning. Wiley, NY, USA (2003)
  14. Doan, A., Halevy, A.Y.: Semantic integration research in the database community: a brief survey. AI Mag. 26(1), 83-94 (2005)
  15. Doan, A., Lu, Y., Lee, Y., Han, J.: Object matching for information integration: a profiler-based approach. IIWeb, pp. 53-58 (2003)
  16. Domingos, P.: Multi-relational record linkage. Multi-relational data mining workshop co- located with KDD, pp. 31-48 (2004)
  17. Dong, X., Halevy, A., Madhavan, J.: Reference reconciliation in complex information spaces. SIGMOD conference, pp. 85-96 (2005)
  18. Dong, X., Halevy, A., Yu, C.: Data integration with uncertainty. VLDB, pp. 687-698 (2007)
  19. Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1-16 (2007)
  20. Getoor, L., Diehl, C.: Link mining: a survey. SIGKDD explorations (2005)
  21. Gupta, R., Sarawagi, S.: Creating probabilistic databases from information extraction models. VLDB, pp. 965-976 (2006)
  22. Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. PODS, pp. 1-9 (2006)
  23. Hernández, M., Stolfo, S.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Mining Knowledge Dis. 2(1), 9-37 (1998)
  24. Ioannou, E., Nejdl, W., Niederée, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. PVLDB 3(1), 429-438 (2010)
  25. Ioannou, E., Nejdl, W., Niederée, C., Velegrakis, Y.: LinkDB: a probabilistic linkage database system. SIGMOD conference, pp. 1307-1310 (2011)
  26. Ioannou, E., Niederée, C., Nejdl, W.: Probabilistic entity linkage for heterogeneous information spaces. CAiSE, pp. 302-316 (2008)
  27. Kalashnikov, D., Mehrotra, S.: Domain-independent data cleaning via analysis of entity- relationship graph. ACM Trans. Database Syst. 31(2), 716-767 (2006)
  28. Lenzerini, M.: Data integration: a theoretical perspective. PODS, pp. 233-246 (2002)
  29. Morris, A., Velegrakis, Y., Bouquet, P.: Entity identification on the semantic web. SWAP (2008)
  30. Papadakis, G., Ioannou, E., Niederée, C., Fankhauser, P.: Efficient entity resolution for large heterogeneous information spaces. WSDM, pp. 535-544 (2011)
  31. Rastogi, V., Dalvi, N., Garofalakis, M.: Large-scale collective entity matching. PVLDB 4(4), 208-218 (2011)
  32. Re, C., Suciu, D.: Managing probabilistic data with MystiQ: the can-do, the could-do, and the can't-do. SUM, pp. 5-18 (2008)
  33. Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. KDD, pp. 269- 278 (2002)
  34. Sen, P., Deshpande, A.: Representing and querying correlated tuples in probabilistic databases. ICDE, pp. 596-605 (2007)
  35. Velegrakis, Y.: On the importance of updates in information integration and data exchange systems. DBISP2P (2008)
  36. Whang, S., Menestrina, D., Koutrika, G., Theobald, M., Garcia-Molina, H.: Entity resolution with iterative blocking. SIGMOD Conference, pp. 219-232 (2009)