Efficient Ranking on Entity Graphs with Personalized Relationships
Sign up for access to the world's latest research
Abstract
Authority flow techniques like PageRank and ObjectRank can provide personalized ranking of typed entity-relationship graphs.
Related papers
Proceedings of the VLDB Endowment, 2014
We propose a new scalable algorithm that can compute Personalized PageRank (PPR) very quickly. The Power method is a state-of-the-art algorithm for computing exact PPR; however, it requires many iterations. Thus reducing the number of iterations is the main challenge. We achieve this by exploiting graph structures of web graphs and social networks. The convergence of our algorithm is very fast. In fact, it requires up to 7.5 times fewer iterations than the Power method and is up to five times faster in actual computation time. To the best of our knowledge, this is the first time to use graph structures explicitly to solve PPR quickly. Our contributions can be summarized as follows. 1. We provide an algorithm for computing a tree decomposition, which is more efficient and scalable than any previous algorithm. 2. Using the above algorithm, we can obtain a core-tree decomposition of any web graph and social network. This allows us to decompose a web graph and a social network into (1) the core, which behaves like an expander graph, and (2) a small tree-width graph, which behaves like a tree in an algorithmic sense. 3. We apply a direct method to the small tree-width graph to construct an LU decomposition. 4. Building on the LU decomposition and using it as preconditoner, we apply GMRES method (a state-of-theart advanced iterative method) to compute PPR for whole web graphs and social networks.
International Conference on Management of Data, 2012
There is a large amount of textual data on the Web and in Wikipedia, where mentions of entities (such as Gandhi) are annotated with a link to the disambiguated entity (such as M. K. Gandhi). Such annotation may have been done manually (as in Wikipedia) or can be done using named entity recognition/disambiguation techniques. Such an annotated corpus allows queries to return entities, instead of documents. Entity ranking queries retrieve entities that are related to keywords in the query and belong to a given type/category specified in the query; entity ranking has been an active area of research in the past few years. More recently, there have been extensions to allow entity-relationship queries, which allow specification of multiple sets of entities as well as relationships between them. In this paper we address the problem of entity ranking ("near") queries and entity-relationship queries on the Wikipedia corpus. We first present an extended graph model which combines the power of graph models used earlier for structured/semi-structured data, with information from textual data. Based on this model, we show how to specify entity and entity-relationship queries, and defined scoring methods for ranking answers. Finally, we provide efficient algorithms for answering such queries, exploiting a space efficient in-memory graph structure. A performance comparison with the ERQ system proposed earlier shows significant improvement in answer quality for most queries, while also handling a much larger set of entity types.
Abstract. As the wealth of data on the World Wide Web grows, and as the structuring of that data improves, more sophisticated applications can be developed to derive meaningful characteristics relating to the content and structure of that data. In particular, ranking the various elements of sets of structured information is of great utility with respect to semantic network analysis.
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06, 2006
This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family.
Web pages are often recognized by others through contexts. These contexts determine how linked pages influence and interact with each other. When differentiating such interactions, the authority of web pages can be better estimated by controlling the authority flows among pages. In this work, we determine the authority distribution by examining the topicality relationship between associated pages. In addition, we find it is not enough to quantify the influence of authority propagation from only one type of neighbor, such as parent pages in PageRank algorithm, since web pages, like people, are influenced by diverse types of neighbors within the same network. We propose a probabilistic method to model authority flows from different sources of neighbor pages. In this way, we distinguish page authority interaction by incorporating the topical context and the relationship between associated pages. Experiments on the 2003 and 2004 TREC Web Tracks demonstrate that this approach outperforms other competitive topical ranking models and produces a more than 10% improvement over PageRank on the quality of top 10 search results. When increasing the types of incorporated neighbor sources, the performance shows stable improvements.
We introduce a tool which is an application of personalized pagerank vectors such as personalized search engines. We use pre-computed pagerank vectors to rank the search results in favor of user preferences. We describe the design and architecture of our tool. By using pre-computed personalized pagerank vectors we generate search results biased to user preferences such as top-level domain and regional preferences. We conduct a user study to evaluate search results of three different ranking methods such as similarity-based ranking, plain PageRank and weighted (personalized) PageRank ranking methods. We discuss the results of our user study and evaluate the benefits our personalized PageRank vectors in personalized search engines.
Proceedings of the Thirteenth ACM conference on Information and knowledge management - CIKM '04, 2004
Our work is motivated by the problem of ranking hyperlinked documents for a given query. Given an arbitrary directed graph with edge and node labels, we present a new flow-based model and an efficient method to dynamically rank the nodes of this graph with respect to any of the original labels. Ranking documents for a given query in a hyperlinked document set and ranking of authors/articles for a given topic in a citation database are some typical applications of our method. We outline the structural conditions that the graph must satisfy for our ranking to be different from the traditional PageRank. We have built a system using two indices that is capable of dynamically ranking documents for any given query. We validate our system and method using experiments on a few datasets: a crawl of the IBM Intranet (12 million pages), a crawl of the www (30 million pages) and the DBLP citation dataset. We compare our method to existing schemes for topic-biased ranking that require a classifier and the traditional PageRank. In these experiments, we demonstrate that our method is well suited for fine-grained ranking and that our method performs better than the existing schemes. We also demonstrate that our system can obtain an improved ranking with very little impact on query time.
Proceedings of the 20th international joint conference …, 2007
Social networks allow users getting personalized recommendations for interesting resources like websites or scientific papers by using reviews of users they trust. Search engines rank documents by using the reference structure to compute a visibility for each document with reference structure-based functions like PageRank. Personalized document visibilities can be computed by integrating both approaches. We present a framework for incorporating the information from both networks, and ranking algorithms using this information for personalized recommendations. Because the computation of document visibilities is costly and therefore cannot be done in runtime, i.e., when a user searches a document repository, we pay special attention to develop algorithms providing an efficient calculation of personalized visibilities at query time based on precalculated global visibilities. The presented ranking algorithms are evaluated by a simulation study.
2009
As the amount of information grows and as users become more sophisticated, ranking techniques become important building blocks to meet user needs when answering queries. PageRank is one of the most successful link-based ranking methods, which iteratively computes the importance scores for web pages based on the importance scores of incoming pages. Due to its success, PageRank has been applied in a number of applications that require customization. We address the scalability challenges for two types of customized ranking. The first challenge is to compute the ranking of a subgraph. Various Web applications focus on identifying a subgraph, such as focused crawlers and localized search engines. The second challenge is to compute online personalized ranking. Personalized search improves the quality of search results for each user. The user needs are represented by a personalized set of pages or personalized link importance in an entity relationship graph. This requires an efficient online computation. To solve the subgraph ranking problem efficiently, we estimate the ranking scores for a subgraph. We propose a framework of an exact solution (IdealRank) and
Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09, 2009
There has been an explosion of hyperlinked data in many domains, e.g., the biological Web. Expressive query languages and effective ranking techniques are required to convert this data into browsable knowledge. We propose the Graph Information Discovery (GID) framework to support sophisticated user queries on a rich web of annotated and hyperlinked data entries, where query answers need to be ranked in terms of some customized ranking criteria, e.g., PageRank or ObjectRank. GID has a data model that includes a schema graph and a data graph, and an intuitive query interface. The GID framework allows users to easily formulate queries consisting of sequences of hard filters (selection predicates) and soft filters (ranking criteria); it can also be combined with other specialized graph query languages to enhance their ranking capabilities. GID queries have a well-defined semantics and are implemented by a set of physical operators, each of which produces a ranked result graph. We discuss rewriting opportunities to provide an efficient evaluation of GID queries. Soft filters are a key feature of GID and they are implemented using authority flow ranking techniques; these are query dependent rankings and are expensive to compute at runtime. We present approximate optimization techniques for GID soft filter queries based on the properties of random walks, and using novel path-length-bound and graphsampling approximation techniques. We experimentally validate our optimization techniques on large biological and bibliographic datasets. Our techniques can produce high quality (Top K) answers with a savings of up to an order of magnitude, in comparison to the evaluation time for the exact solution.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.