MoMiNIS 
  Seminar Series at Dalhousie University
Location: Room 430, Goldberg Computer Science Building, 6050 University Avenue (Building E600 on Studley Campus Map), Halifax, Nova Scotia, Canada
Time: Tuesday 1130-1300 (Thursday 1130-1300, or other times are also possible, if necessary).
The MoMiNIS Seminar Series provides a research oriented forum for prominent researchers to present their current research on the Modelling and Mining of Networked Information Spaces. The seminars are meant to appeal to a broad audience, and present both theoretical results in graph theory, machine learning and text mining, and their practical ramifications in areas such as Web mining, social network analysis, network management and security, and digital libraries.
Seminar Co-ordinators: Jeannette Janssen, Evangelos Milios, Nur Zincir-Heywood
|   Date&Time  | 
      Speaker  | 
      Topic  | 
      Slides  | 
      Host  | 
  
| June 21, 2012 | Ricardo 
      Baeza-Yates Yahoo! Research, Barcelona  | 
    Search Engines and Social Media | jj, eem  | 
  |
| June 23 or 24, 2012 | 
    Nelly 
      Litvak University of Twente  | 
    Extremal properties of Web graphs | jj  | 
  |
| June 23 or 24, 2012 | 
    Jennifer 
      Chayes Microsoft Research Cambridge, Mass  | 
    A weak distributional limit for preferential attachment graphs (tentative) | jj  | 
  |
| TBA | Mourad Debbabi Concordia University  | 
    Network security protocols | nzh  | 
  |
| PAST SEMINARS | ||||
| Nov. 21, 2008 | Allan Borodin, 
       Univ. of Toronto  | 
    Personalized Search, Community Extraction in Blog sites |   jj, eem  | 
  |
| Mar. 3, 2009 | George 
      Forman,  HP Labs  | 
    What Are You Talking About? Topic Recognition via Machine Learning Text Classification and Quantification |   nzh, eem  | 
  |
| Mar. 11, 2009 | Ellen Zegura, 
       Georgia Tech  | 
    NetMark: Selecting a Benchmark of Network Topologies | jj  | 
  |
| Aug. 12, 2009, 2.30pm | Natasa Przulj, 
       Dept. of Computer Science, UC Irvine  | 
    From Network Topology to Biological Function and Disease | jj  | 
  |
Aug. 13, 2009  | 
    Russell Greiner, 
       Dept. of Computer Science, U. of Alberta  | 
    Budgeted Learning of Probabilistic Classifiers | eem  | 
  |
| Dec. 10, 2009 10:30am  | 
    Stan Matwin,  | 
    Privacy and Data Mining: New Developments and Challenges | eem  | 
  |
| Feb. 18, 2010 2:30pm  | 
    Hugh Chipman, 
       Acadia Univ.  | 
    Mixed Membership Stochastic Blockmodels for multi-recipient transactions on a network (joint work with M. Mahdi Shafiei). |   jj, eem  | 
  |
| Mar. 30, 2010, 2:30am | Aaron Clauset Santa Fe Institute  | 
    The trouble with community detection in complex networks | jj  | 
  |
| Sep. 17, 2010 | Edo Airoldi Harvard University  | 
    Modeling approaches for analyzing complex networks | jj  | 
  |
| March 24, 2011 | Bernie Hogan Oxford Internet Institute the University of Oxford  | 
    Capture of online networks | ag, jj  | 
  |
| May 3, 2011 | Frank Tompa University of Waterloo  | 
    Finding implicit lists and tables in web pages | eem, nzh  | 
  |
| May 10, 2011 | Laks V.S. Lakshmanan University of British Columbia  | 
    Musings on Next Generation Recommender Systems | eem  | 
  |
| May 19, 2011 | Julita Vassileva University of Saskatchewan  | 
    Sharing experience in Social Computing, Persuasion and Science Outreach | nzh, eem  | 
  |
| May 19, 2011 | Jian Pei Simon Fraser University  | 
    Query Friendly Compression and Analysis of Social Networks Using Multi-Position Linearization | jj  | 
  |
| March 15, 2012 | Denilson 
      Barbosa University of Alberta  | 
    Towards Summarizing and Making Sense of the Blogosphere | eem  | 
  |
Seminars will be announced by e-mail 1-2 weeks in advance. 
  To subscribe to the mailing list, please send an email to cs-seminars 
  AT cs.dal.ca with your name, position, email address and home page. 
To suggest speakers or topics, or to volunteer for a seminar, please contact one of the co-ordinators.
We gratefully acknowledge MPrime (formerly MITACS), NSERC and the Faculty of Computer Science, Dalhousie University, for their financial and logistical support of the seminar series.
Last updated Saturday, 17-Mar-2012 20:20
---------------------------------------
  Title: WHAT ARE YOU TALKING ABOUT? 
  Text classification and quantification via machine learning
  Speaker: Dr. George Forman
  Hewlett Packard Labs, Port Orchard, WA 
Date: Tuesday, March 3, 2009
  Time: 11:30 a.m.
  Location: Jacob Slonim Conference Room (430), 6050 University Ave., Halifax
  
  In theory, practice is the same as theory, but in practice it is not. In the 
  process of applying proven text classification methods from the
  research literature to business-driven problems at Hewlett-Packard, I have encountered 
  substantial failures and gaps. Investigating the
  failures in detail has repeatedly led to new discoveries and perspectives for 
  research that are simply not afforded by the academic
  benchmark datasets and problem formulations. In this talk, I will describe two 
  interesting applications of supervised machine learning
  that we have deployed inside Hewlett-Packard, as well as the challenging, fundamental 
  research opportunities they have led to.
Short Biography:
  George Forman is a senior research scientist at Hewlett-Packard Labs. His research 
  interests stem from practical issues that arise in the
  application of machine learning to industrial problems, e.g. feature selection, 
  robustness, small training sets, and novel problem
  formulations, such as quantification. His Ph.D. in Computer Science & Engineering 
  is from the University of Washington, Seattle, 1996
  Speaker URL: http://www.hpl.hp.com/personal/George_Forman/
 Title: NetMark: Selecting a Benchmark 
  of Network Topologies
  Speaker : Dr. Ellen Zegura, Georgia Tech
  
  Wednesday, March 11, 9.30am, Colloquium room, Chase building
Prof. Zegura's research work concerns the development of wide-area (Internet) networking services and, more recently, mobile wireless networking. Wide-area services are utilized by applications that are distributed across multiple administrative domains (e.g., web, file sharing, multi-media distribution). Her focus is on services implemented both at the network layer, as part of network infrastructure, and at the application layer. In the context of mobile wireless networking, she is interested in challenged environments where traditional ad-hoc and infrastructure-based networking approaches fail. These environments have been termed Disruption Tolerant Networks.
Ellen W. Zegura received the B.S. degree in Computer Science (1987), the B.S. 
  degree in Electrical Engineering (1987), the M.S. degree in 
  Computer Science (1990) and the D.Sc. in Computer Science (1993) all from Washington 
  University, St. Louis, Missouri. Since 1993, she has been on the faculty in 
  the College of Computing at Georgia Tech. She was an Assistant Dean in charge 
  of Space and Facilities Planning from Fall 2000 to January 2003. She served 
  as Interim Dean of the College for six months in 2002. Since February 2003, 
  she has been an Associate Dean, with responsibilities ranging from Research 
  and Graduate Programs to Space and Facilities Planning. She has spent five years 
  as the user representative in the planning of the Klaus Advanced Computing Technologies 
  Building, scheduled to open in Fall 2006. Starting in August 2005, she has chaired 
  the Computing Science and Systems Division of the College of Computing. She 
  is the proud mom of two girls, Carmen (born in August 1998) and Bethany (born 
  in May 2001).
Title: From Network Topology to 
  Biological Function and Disease
  Speaker: Natasa Przulj, Dept. of Computer Science, UC Irvine
  
  Date: August 12, 2.30pm
We discuss our new tools that are advancing network analysis towards a theoretical 
  understanding of the structure of biological networks. Analogous to tools for 
  analyzing and comparing genetic sequences, we are developing new tools that 
  decipher large network data sets, with the goal of improving biological understanding 
  and contributing to development of new therapeutics. We demonstrate that local 
  node similarity corresponds to similarity in biological function and involvement 
  in disease. We introduce a systematic highly constraining measure of a network's 
  local structure and demonstrate that protein-protein interaction (PPI) networks 
  are better modeled by geometric graphs than by any previous model. The geometric 
  model is further corroborated by demonstrating that PPI networks can explicitly 
  be embedded into a low-dimensional geometric space. We also present a new network 
  alignment algorithm.
  
  Bio: 
  Dr.Przulj is an Assistant Professor in the Department of Computer Science, UC 
  Irvine. She is also a member of the UCI Cancer Center, the UCI Center for Complex 
  Biological Systems (CCBS), the UCI's program in Mathematical, Computational 
  and Systems Biology (MCSB), and the UCI’s Institute for Genomics and Bioinformatics 
  (IGB). She received an NSF CAREER award for 2007-2011. She is on the Editorial 
  Review Board of the International Journal of Knowledge Discovery in Bioinformatics 
  (IJKDB). Dr. Przulj's research involves applications of graph theory, mathematical 
  modeling, and computational techniques to solving large-scale problems in computational 
  and systems biology. I am interested in computational and theoretical solutions 
  to practical problems in many areas of systems biology, planar cell polarity, 
  proteomics, cancer informatics, and chemo-informatics.
Title:Budgeted Learning of Probabilistic 
  Classifiers
  Speaker: Russell Greiner, Dept. of Computer Science, Un. of Alberta
 Researchers often use clinical trials to collect the data needed to evaluate 
  some hypothesis, or produce a classifier. During this process, they have to 
  pay the cost of performing each test. Many studies will run a comprehensive 
  battery of tests on each subject, for as many subjects as their budget will 
  allow -- ie, "round robin" (RR). We consider a more general model, 
  where the researcher can sequentially decide which single test to perform on 
  which specific individual; again subject to spending only the available funds. 
  Our goal here is to use these funds most effectively, to collect the data that 
  allows us to learn the most accurate classifier.
  We first explore the simplified "coins version" of this task. After 
  observing that this is NP-hard, we consider a range of heuristic algorithms, 
  both standard and novel, and observe that our "biased robin" approach 
  is both efficient and much more effective than most other approaches, including 
  the standard RR approach. We then apply these ideas to learning a naive-bayes 
  classifier, and see similar behavior. Finally, we consider the most realistic 
  model, where both the researcher gathering data to build the classifier, and 
  the user (eg, physician) applying this classifier to an instance (patient) must 
  pay for the features used --- eg, the researcher has $10,000 to acquire the 
  feature values needed to produce an optimal $30/patient classifier. Again, we 
  see that our novel approaches are almost always much more effective that the 
  standard RR model. 
  This is joint work with Aloak Kapoor, Dan Lizotte and Omid Madani.
  
  Bio:
  After earning a PhD from Stanford, Russ Greiner worked in both academic and 
  industrial research before settling at the University of Alberta, where he is 
  now a Professor in Computing Science and the founding Scientific Director of 
  the Alberta Ingenuity Centre for Machine Learning, which won the ASTech Award 
  for "Outstanding Leadership in Technology" in 2006. He has been Program 
  Chair for the 2004 "Int'l Conf. on Machine Learning", Conference Chair 
  for 2006 "Int'l Conf. on Machine Learning", Editor-in-Chief for "Computational 
  Intelligence", and is serving on the editorial boards of a number of other 
  journals. He was elected a Fellow of the AAAI (Association for the Advancement 
  of Artificial Intelligence) in 2007, and was awarded a McCalla Professorship 
  in 2005-06 and a Killam Professorship in 2007. He has published over 100 refereed 
  papers and patents, most in the areas of machine learning and knowledge representation. 
  The main foci of his current work are (1) bioinformatics and medical informatics; 
  (2) learning effective probabilistic models and (3) formal foundations of learnability.
Title: Privacy and Data Mining: 
  New Developments and Challenges
  Speaker: Stan Matwin, University of Ottawa
Privacy and Data Mining: New Developments and Challenges
There is little doubt that data mining technologies create new challenges in 
  the area of data privacy. In this talk, we will review some of the new developments 
  in Privacy-preserving Data Mining. In particular, we will discuss techniques 
  in which data mining results can reveal personal data, and how this can be prevented. 
  We will look at the practically interesting situations where data to be mined 
  is distributed among several parties. We will mention new applications in which 
  mining 
  spatio-temporal data can lead to identification of personal information. We 
  will argue that methods that effectively protect personal data, while at the 
  same time preserve the quality of the data from the data analysis perspective, 
  are some of the principal new challenges before the field.
 Title: Mixed-Membership Stochastic 
  Block-Models for Transactional Data
  Speaker: Hugh Chipman, Acadia University
  Time: Thursday, February 18, 2.30pm
Abstract:
  Transactional network data arise in many fields. Although social network models 
  have been applied to transactional data, these models typically assume binary 
  relations between pairs of nodes. We develop a latent mixed membership model 
  capable of modelling richer forms of transactional data. Estimation and inference 
  are accomplished via a variational EM algorithm. Simulations indicate that the 
  learning algorithm can recover the correct generative model. We further present 
  results on a subset of the Enron email dataset. This is joint work with Mahdi 
  Shafiei.
About the speaker: Dr. Hugh Chipman is a Canada Research Chair in Mathematical 
  Modelling at Acadia University, and the director of the Acadia Centre for Mathematical 
  Modelling and Computations. His research focuses on statistical models for extracting 
  information from such large and complex datasets. He completed his PhD studies 
  at the University of Waterloo in 1994, and held a faculty position at the University 
  of Chicago before moving to Acadia. In 2009, he was awarded the CRM-SSC Prize 
  for his outstanding contributions to the application of Bayesian statistical 
  inference for data analysis.
Title: The trouble with community 
  detection in complex networks
  Speaker: Aaron Clauset, Santa Fe institute
  Date: Tuesday, March 30, 2.30pm
  
  Abstract: Although widely used in practice, the performance of the popular network 
  clustering technique called "modularity
  maximization" is not well understood when applied to networks with unknown 
  modular structure. In this talk, I'll show that precisely in
  the case we want it to perform the best---that is, on modular networks---the 
  modularity function Q exhibits extreme degeneracies,
  in which the global maximum is hidden among an exponential number of high-modularity 
  solutions. Further, these degenerate solutions can
  be structurally very dissimilar, suggesting that any particular high- modularity 
  partition, or statistical summary of its structure,
  should not be taken as representative of the other degenerate solutions. These 
  results partly explain why so many heuristics do
  well at finding high-modularity partitions and why different heuristics can 
  disagree on the modular composition the same network.
  I'll conclude with some forward-looking thoughts about the general problem of 
  identifying network modules from connectivity data alone,
  and the likelihood of circumventing this degeneracy problem.
Title: Modeling approaches for 
  analyzing complex networks
  Speaker: Edo Airoldi, Department of Statistics, FAS Center for Systems Biology, 
  Faculty of Arts & Sciences, Harvard University
  Date: Friday Sept 17, 2010, 9:30 a.m.
Abstract: Networks are ubiquitous in science and have become a focal point 
  for discussion in everyday life. Formal statistical models for the
  analysis of network data have emerged as a major topic of interest in diverse 
  areas of study, and most of these involve a collections of
  measurements on pairs of objects. Probability models on graphs date back to 
  1959. Along with empirical studies in social psychology and sociology
  from the 1960s, these early works generated an active “network community” 
  and a substantial literature in the 1970s. This effort moved
  into the statistical literature in the late 1970s and 1980s, and the past decade 
  has seen a burgeoning network literature in statistical
  physics and computer science. The growth of the World Wide Web and the emergence 
  of online “networking communities” such as Facebook and
  LinkedIn, and a host of more specialized professional network communities has 
  intensified interest in the study of networks and
  network data. In this talk, I will review a few ideas that are central to this 
  burgeoning literature, placing emphasis on modeling approaches
  available for data analysis, and review some of the recent work that is going 
  on in my group.
  
  Speaker Bio: In December 2006, Dr. Airoldi received a Ph.D. from Carnegie Mellon, 
  working on statistical machine learning and the
  analysis of complex systems with Stephen Fienberg and Kathleen Carley. His dissertation 
  introduced statistical and computational elements of graph theory that support 
  data analysis of complex systems and their evolution. Till December 2008, he 
  was a postdoctoral fellow in the Lewis-Sigler Institute for Integrative Genomics 
  of Princeton University working with Olga Troyanskaya, David Botstein, and James 
  Broach. He developed mechanistic models to gain computational insights into 
  aspects of the molecular and cellular biology that are not directly observable 
  with experimental probes. He has been working closely with biologists and in 
  the areas of cellular differentiation, cellular development and cancer, since.
  Speaker URL: http://www.people.fas.harvard.edu/~airoldi/
  
Title: Facebook as a data capture 
  site: Techniques, Traps, Terms and Conditions
  
  This talk will give an overview of the sorts of social network data that are 
  accessible through the Facebook API and some of issues that come with downloading 
  and processing this data. In the first part of the talk, I review several pieces 
  of software that allow for the download and capture of social networks, including 
  NodeXL, NetVizz, NameGenWeb, iGraph and Pajek. I walk through different routines 
  and cover efficiency through FQL queries. The talk will also walk through three 
  recent examples of privacy leaks with the Facebook data (The "Taste, Ties 
  and Time" data set, Pete Warden's open profiles data and the Oxford 100 
  schools data set) and how privacy issues inhibited their full use. I tie this 
  to the evolving developer terms of use on Facebook, as well as some of the other 
  emergent API issues (such as Twitter's recent decision to no longer whitelist 
  accounts). My intention is to end the talk by reinforcing the importance of 
  
  careful and minimal data collection efforts rather than a cavalier approach 
  indifferent to the risks of real world data. I also wish to make an appeal to 
  technical fields whose ethics procedures tend to be inadequate for this sort 
  of semi-private and sensitive data.
Slides -- Slides from March 25 seminar in the Social Media Lab
Bio:
  Bernie Hogan is a Research Fellow at the Oxford Internet Institute. He specializes 
  in novel methods for online data capture and analysis,
  especially via social media. Recent work has focused on the capture analysis 
  of Facebook networks, particularly through his application
  namegenweb, which downloads a social network for visualization in network programs 
  such as NodeXL. Past work included an online audit
  study of racism on Craigslist, pen and paper methods for visualizing social 
  networks, the analysis of profile photos and techniques for
  online surveys of spouses and partners. Bernie received his dissertation from 
  the University of Toronto in 2009 under Barry
  Wellman. This thesis won the Dordick award for Best Dissertation from the Communication 
  and Technology section of the International
  Communication Association. 
Speaker's contact info:
  Dr Bernie Hogan
  Research Fellow, Oxford Internet Institute
  University of Oxford
  http://www.oii.ox.ac.uk/people/?id=140
  Title: Towards Summarizing and 
  Making Sense of the Blogosphere
Speaker: Prof. Denilson Barbosa
  Department of Computing Science, Univ. of Alberta
Date: Thursday March 15, 2012
Time: 2:40 p.m.
Location: Jacob Slonim Conference Room (430), Computer Science
  Dalhousie University
  6050 University Avenue, Halifax
 Abstract:
  The extraction of structured information from text is a fast improving subfield 
  of Natural Language Pro- cessing which has been re-invigorated with the ever-increasing 
  availability of user-generated textual content online. One environment which 
  stands out as a source of invaluable information is the blogosphere–the 
  network of social media sites, in which individuals express and discuss opinions, 
  facts, events, and ideas pertaining to their own lives, their community, profession, 
  or society at large. Indeed, the automatic extraction of reliable information 
  from the blogosphere promises a viable approach for discovering very rich social 
  data: the issues that engage society in thousands of collective and parallel 
  conversations online. Considerable attention has been given to the problem of 
  automatically extracting and studying the social dynamics among the participants 
  (i.e., authors) in shared environments like the blogosphere. In that line of 
  work, the goal is to understand how the network of humans conversing in the 
  blogosphere is formed, evolves over time, and influences others in their own 
  opinions. Our goal, on the other hand, is to extract the network of entities, 
  facts, ideas and opinions expressed in social media sites, as well as the relationships 
  among them. Such structured data can be organized as one or more information 
  networks, which in turn are powerful metaphors for the study and visualization 
  of various kinds of complex systems. In this talk, I will cover the basic NLP 
  tools that are necessary for automatically extracting information networks from 
  social media text, relying to a large extent on the experiences gathered on 
  our ongoing SONEX project.
Speaker Bio:
  I am an Associate Professor at the University of Alberta, where I joined in 
  2008. I completed my Ph.D. at the University of Toronto in 2005, working on 
  XML data management and took an academic job at the University of Calgary between 
  2005 and 2008. I am interested in databases on their own merit, and also on 
  the application of database and information retrieval principles to the management 
  of linked data. I am a member of the NSERC Strategic Network on Business Intelligence, 
  where I work on information extraction, and the Canadian Writing Research Collaboratory, 
  where I work text mining, data management for prosopography, and document engineering.
  Speaker url: http://webdocs.cs.ualberta.ca/~denilson/
Host: Evangelos Milios (eem@cs.dal.ca)