Modelling and Mining of Networked Information Spaces

Project Title: Inference of community structure from a geometric graph model
Participants: Dr. Bill Aiello, Dr. Anthony Bonato, Dr. Jeannette Janssen, Dr. Pawel Pralat
Project Description: Generally, a community in a social network or networked information space consists of entities that are somehow close to each other; the definition of close will depend on the context. In social networks this, a community could be a group of members with common interests. In networked information spaces, it will be formed by a collection of information entities (scientific papers, Web pages) that have similar content.

Most recent studies define a community simply as a group of nodes with higher density than the rest of the graph. We propose a holistic approach: a graph or network is seen as the visible manifestation of an invisible underlying structure, much like the form of an organism is the visible manifestation of the underlying genetic material. The invisible structure can be modeled as a collection of points in a high-dimensional feature space, where closeness between points is a measure of the degree to which the corresponding individuals are related or similar. The network is formed from this collection of points by a graph generation process. Consequently, some of the information about the relative positions of the points is retained in the structure of the network.

The main goal is to gain an understanding of the process that leads from the invisible to the visible structure. A. Bonato, C. Cooper and J. Janssen recently proposed a new graph generation model which combines a geometric positioning of the nodes with a link pattern based on preferential attachment. In the model, a region of influence is created around each node. The area of the region of influence is determined by the in-degree of the node. Moreover, in each time step all regions shrink by a certain factor. A new node v can only link to an existing node u if v falls within the region of influence of u. If v falls into the region of influence u, it will link to u with probability p. We are currently exploring the properties of the model, to see whether it leads to the typical structure of a social or information network. Initial results indicate that the model produces a power law degree distribution — next steps will be to verify the small world property and the locally dense structure.

Another goal is to develop a methodology for a reverse mapping of the nodes into Euclidean space, under the assumption that the graph was produced by our geometric model. By identifying points in close proximity of each other, we can identify communities. Also, we will be able to infer the interaction of communities with each other. In contrast with most current heuristics for community extraction, we will also be able to present a measure of confidence in our findings, by measuring how closely our findings conform to the assumption of the geometric model.