Modelling and Mining of Networked Information Spaces

Project Title: Improved Web search methods
Participants: Allan Borodin, Hyun Chul Lee
Project Description: Current search engines make heavy use of graph-theoretic methods to rank the results retrieved after a Web search. PageRank, the ranking method used by Google, and the Hub-Authority method used by Teoma, derive their ranking from an iterative process where the rank a web page is updated in each step according to the rank of the pages it links to, or that link to it. The general idea behind these methods is that a hyperlink to a page is like a vote of confidence in that page. Hence, a page with many in-links will be more important, and if the pages pointing to it have a high ranking themselves, then the page will be even more important. The effectiveness of such methods has been persuasively demonstrated by the success of Google. However, many questions regarding the limits of their performance remain open

The first phase of our project focused notions of stability, convergence, and similarity between methods. Mathematically precise definitions of these concepts were formulated, and an analysis of the most common ranking methods based on these concepts has been performed. In the future, we will extend this work to a probabilistic analysis utilizing some of the probabilistic graph models being proposed in the literature, and also being studied in the context of this project. More generally, we are considering the question: what makes a search query hard for current search engines? In particular, we will attempt to define classes of information needs that are hard to resolve using current search engines. Defining such classes may then lead to new methods that make such information needs easier to resolve.