Modelling and Mining of Networked Information Spaces

 

Identifying Communities

The link structures of networked information spaces share particular properties. In the undirected sense, they are "small-world networks", in that they are locally dense and globally sparse. The locally dense structure likely represent "communities" of some sort. The challenge is how to extract these communities. Structural analysis of dynamic communication graphs

  • Identifying communities and detecting patterns

  • Identifying Communities in network traffic: encrypted traffic identification

     

    Designing Attack-Detector Communities

    All users of virus checkers, firewalls and more generally intrusion detection systems are familiar with the need to continuously receive updates to the original base system. The basic nature of the intrusion detection problem is that 'new' attacks are continuously under development. As a consequence patches to your personal firewall, virus checker or enterprise intrusion detection system are also required in order to plug the current favorite attack instance. The bottom line however, is that an omnipresent third party is available who is capable of recognizing unseen attacks and then develop the necessary signature patch. Thus, your detector is only as good as the most recent attacks you are able to label correctly. The principal objective of the attacker is to camouflage a 'generic' attack in such a way that the signatures at the detector are unable to discover the true nature of the user. The motivation for this work is to present a methodology for enabling the detector to automatically generalize to a wide range of embodiments of the same generic attack. To do so, we develop both detectors and corresponding attackers, and formulate the problem as an "arms race" between both parties. Within such a context the detector is encouraged to generalize beyond recognizing the specific instance of a single attack, therefore freeing the detector from working in a purely reactive manner. There are three basic components to this project:

  • Attack

  • Detector

  • Integration

     

    Modelling the link structure

    Research on the properties of networked information spaces has determined that the various organically grown small-world graphs share a lot of similarities, but also several differences. The design of new graph models is coupled with the fitting of these models on real graphs, where the most suitable model must be found and its parameters selected in an optimal manner. The subprojects of modelling networked information spaces using stochastic and deterministic models follow this methodology.

  • Inference of community structure from a geometric graph model

  • Geometric Properties of Models of Complex Networks

  • Dynamic models for on-line social networks

  • Modelling events on the network

  • Graph similarity and model validation





    Improved Web search and crawling

    Focussed crawlers have been proposed as a solution to the exponentially increasing size of the Web. They take the user's description of a topic of interest and search for web pages that are relevant to the topic. A good focussed crawler will allow for much faster updates of relevant information than can be maintained by a general purpose search engine, and are able to run on the user's workstation. These properties make focussed crawlers a valued tool for professionals with severe time constraints, such as medical doctors, that need to extract up-to-date information from a networked information space, such as the most recent medical literature. The latest generation of search engines makes heavy use of graph-theoretic methods to rank the results retrieved after a web search. The value of such methods is exemplified by the success of Google; however many theoretical issues remain to be resolved. Analysis of the algorithms, as well as of the quality of the graph that is the input to the algorithms, will lead to improved performance.

  • Focused crawling by learning the user's topic-specific browsing patterns

  • Improved web search methods