MoMiNIS Seminar Series at Dalhousie University
Location: Room 430, Goldberg Computer Science Building, 6050 University Avenue (Building E600 on Studley Campus Map), Halifax, Nova Scotia, Canada
Time: Tuesday 1130-1300 (Thursday 1130-1300, or other times are also possible, if necessary).
The MoMiNIS Seminar Series provides a research oriented forum for prominent researchers to present their current research on the Modelling and Mining of Networked Information Spaces. The seminars are meant to appeal to a broad audience, and present both theoretical results in graph theory, machine learning and text mining, and their practical ramifications in areas such as Web mining, social network analysis, network management and security, and digital libraries.
Seminar Co-ordinators: Jeannette Janssen, Evangelos Milios, Nur Zincir-Heywood
|June 21, 2012||Ricardo
Yahoo! Research, Barcelona
|Search Engines and Social Media||
|June 23 or 24, 2012
University of Twente
|Extremal properties of Web graphs||
|June 23 or 24, 2012
Microsoft Research Cambridge, Mass
|A weak distributional limit for preferential attachment graphs (tentative)||
|Network security protocols||
|Nov. 21, 2008||Allan Borodin,
Univ. of Toronto
|Personalized Search, Community Extraction in Blog sites||
|Mar. 3, 2009||George
|What Are You Talking About? Topic Recognition via Machine Learning Text Classification and Quantification||
|Mar. 11, 2009||Ellen Zegura,
|NetMark: Selecting a Benchmark of Network Topologies||
|Aug. 12, 2009, 2.30pm||Natasa Przulj,
Dept. of Computer Science, UC Irvine
|From Network Topology to Biological Function and Disease||
Aug. 13, 2009
Dept. of Computer Science, U. of Alberta
|Budgeted Learning of Probabilistic Classifiers||
|Dec. 10, 2009
|Privacy and Data Mining: New Developments and Challenges||
|Feb. 18, 2010
|Mixed Membership Stochastic Blockmodels for multi-recipient transactions on a network (joint work with M. Mahdi Shafiei).||
|Mar. 30, 2010, 2:30am||Aaron Clauset
Santa Fe Institute
|The trouble with community detection in complex networks||
|Sep. 17, 2010||Edo Airoldi
|Modeling approaches for analyzing complex networks||
|March 24, 2011||Bernie Hogan
Oxford Internet Institute
the University of Oxford
|Capture of online networks||
|May 3, 2011||Frank Tompa
University of Waterloo
|Finding implicit lists and tables in web pages||
|May 10, 2011||Laks V.S. Lakshmanan
University of British Columbia
|Musings on Next Generation Recommender Systems||
|May 19, 2011||Julita Vassileva
University of Saskatchewan
|Sharing experience in Social Computing, Persuasion and Science Outreach||
|May 19, 2011||Jian Pei
Simon Fraser University
|Query Friendly Compression and Analysis of Social Networks Using Multi-Position Linearization||
|March 15, 2012||Denilson
University of Alberta
|Towards Summarizing and Making Sense of the Blogosphere||
Seminars will be announced by e-mail 1-2 weeks in advance.
To subscribe to the mailing list, please send an email to cs-seminars AT cs.dal.ca with your name, position, email address and home page.
To suggest speakers or topics, or to volunteer for a seminar, please contact one of the co-ordinators.
We gratefully acknowledge MPrime (formerly MITACS), NSERC and the Faculty of Computer Science, Dalhousie University, for their financial and logistical support of the seminar series.
Last updated Saturday, 17-Mar-2012 20:20
Title: WHAT ARE YOU TALKING ABOUT? Text classification and quantification via machine learning
Speaker: Dr. George Forman
Hewlett Packard Labs, Port Orchard, WA
Date: Tuesday, March 3, 2009
Time: 11:30 a.m.
Location: Jacob Slonim Conference Room (430), 6050 University Ave., Halifax
In theory, practice is the same as theory, but in practice it is not. In the process of applying proven text classification methods from the
research literature to business-driven problems at Hewlett-Packard, I have encountered substantial failures and gaps. Investigating the
failures in detail has repeatedly led to new discoveries and perspectives for research that are simply not afforded by the academic
benchmark datasets and problem formulations. In this talk, I will describe two interesting applications of supervised machine learning
that we have deployed inside Hewlett-Packard, as well as the challenging, fundamental research opportunities they have led to.
George Forman is a senior research scientist at Hewlett-Packard Labs. His research interests stem from practical issues that arise in the
application of machine learning to industrial problems, e.g. feature selection, robustness, small training sets, and novel problem
formulations, such as quantification. His Ph.D. in Computer Science & Engineering is from the University of Washington, Seattle, 1996
Speaker URL: http://www.hpl.hp.com/personal/George_Forman/
Title: NetMark: Selecting a Benchmark
of Network Topologies
Speaker : Dr. Ellen Zegura, Georgia Tech
Wednesday, March 11, 9.30am, Colloquium room, Chase building
Prof. Zegura's research work concerns the development of wide-area (Internet) networking services and, more recently, mobile wireless networking. Wide-area services are utilized by applications that are distributed across multiple administrative domains (e.g., web, file sharing, multi-media distribution). Her focus is on services implemented both at the network layer, as part of network infrastructure, and at the application layer. In the context of mobile wireless networking, she is interested in challenged environments where traditional ad-hoc and infrastructure-based networking approaches fail. These environments have been termed Disruption Tolerant Networks.
Ellen W. Zegura received the B.S. degree in Computer Science (1987), the B.S.
degree in Electrical Engineering (1987), the M.S. degree in
Computer Science (1990) and the D.Sc. in Computer Science (1993) all from Washington University, St. Louis, Missouri. Since 1993, she has been on the faculty in the College of Computing at Georgia Tech. She was an Assistant Dean in charge of Space and Facilities Planning from Fall 2000 to January 2003. She served as Interim Dean of the College for six months in 2002. Since February 2003, she has been an Associate Dean, with responsibilities ranging from Research and Graduate Programs to Space and Facilities Planning. She has spent five years as the user representative in the planning of the Klaus Advanced Computing Technologies Building, scheduled to open in Fall 2006. Starting in August 2005, she has chaired the Computing Science and Systems Division of the College of Computing. She is the proud mom of two girls, Carmen (born in August 1998) and Bethany (born in May 2001).
Title: From Network Topology to
Biological Function and Disease
Speaker: Natasa Przulj, Dept. of Computer Science, UC Irvine
Date: August 12, 2.30pm
We discuss our new tools that are advancing network analysis towards a theoretical
understanding of the structure of biological networks. Analogous to tools for
analyzing and comparing genetic sequences, we are developing new tools that
decipher large network data sets, with the goal of improving biological understanding
and contributing to development of new therapeutics. We demonstrate that local
node similarity corresponds to similarity in biological function and involvement
in disease. We introduce a systematic highly constraining measure of a network's
local structure and demonstrate that protein-protein interaction (PPI) networks
are better modeled by geometric graphs than by any previous model. The geometric
model is further corroborated by demonstrating that PPI networks can explicitly
be embedded into a low-dimensional geometric space. We also present a new network
Dr.Przulj is an Assistant Professor in the Department of Computer Science, UC Irvine. She is also a member of the UCI Cancer Center, the UCI Center for Complex Biological Systems (CCBS), the UCI's program in Mathematical, Computational and Systems Biology (MCSB), and the UCI’s Institute for Genomics and Bioinformatics (IGB). She received an NSF CAREER award for 2007-2011. She is on the Editorial Review Board of the International Journal of Knowledge Discovery in Bioinformatics (IJKDB). Dr. Przulj's research involves applications of graph theory, mathematical modeling, and computational techniques to solving large-scale problems in computational and systems biology. I am interested in computational and theoretical solutions to practical problems in many areas of systems biology, planar cell polarity, proteomics, cancer informatics, and chemo-informatics.
Title:Budgeted Learning of Probabilistic
Speaker: Russell Greiner, Dept. of Computer Science, Un. of Alberta
Researchers often use clinical trials to collect the data needed to evaluate
some hypothesis, or produce a classifier. During this process, they have to
pay the cost of performing each test. Many studies will run a comprehensive
battery of tests on each subject, for as many subjects as their budget will
allow -- ie, "round robin" (RR). We consider a more general model,
where the researcher can sequentially decide which single test to perform on
which specific individual; again subject to spending only the available funds.
Our goal here is to use these funds most effectively, to collect the data that
allows us to learn the most accurate classifier.
We first explore the simplified "coins version" of this task. After observing that this is NP-hard, we consider a range of heuristic algorithms, both standard and novel, and observe that our "biased robin" approach is both efficient and much more effective than most other approaches, including the standard RR approach. We then apply these ideas to learning a naive-bayes classifier, and see similar behavior. Finally, we consider the most realistic model, where both the researcher gathering data to build the classifier, and the user (eg, physician) applying this classifier to an instance (patient) must pay for the features used --- eg, the researcher has $10,000 to acquire the feature values needed to produce an optimal $30/patient classifier. Again, we see that our novel approaches are almost always much more effective that the standard RR model.
This is joint work with Aloak Kapoor, Dan Lizotte and Omid Madani.
After earning a PhD from Stanford, Russ Greiner worked in both academic and industrial research before settling at the University of Alberta, where he is now a Professor in Computing Science and the founding Scientific Director of the Alberta Ingenuity Centre for Machine Learning, which won the ASTech Award for "Outstanding Leadership in Technology" in 2006. He has been Program Chair for the 2004 "Int'l Conf. on Machine Learning", Conference Chair for 2006 "Int'l Conf. on Machine Learning", Editor-in-Chief for "Computational Intelligence", and is serving on the editorial boards of a number of other journals. He was elected a Fellow of the AAAI (Association for the Advancement of Artificial Intelligence) in 2007, and was awarded a McCalla Professorship in 2005-06 and a Killam Professorship in 2007. He has published over 100 refereed papers and patents, most in the areas of machine learning and knowledge representation. The main foci of his current work are (1) bioinformatics and medical informatics; (2) learning effective probabilistic models and (3) formal foundations of learnability.
Title: Privacy and Data Mining:
New Developments and Challenges
Speaker: Stan Matwin, University of Ottawa
Privacy and Data Mining: New Developments and Challenges
There is little doubt that data mining technologies create new challenges in
the area of data privacy. In this talk, we will review some of the new developments
in Privacy-preserving Data Mining. In particular, we will discuss techniques
in which data mining results can reveal personal data, and how this can be prevented.
We will look at the practically interesting situations where data to be mined
is distributed among several parties. We will mention new applications in which
spatio-temporal data can lead to identification of personal information. We will argue that methods that effectively protect personal data, while at the same time preserve the quality of the data from the data analysis perspective, are some of the principal new challenges before the field.
Title: Mixed-Membership Stochastic
Block-Models for Transactional Data
Speaker: Hugh Chipman, Acadia University
Time: Thursday, February 18, 2.30pm
Transactional network data arise in many fields. Although social network models have been applied to transactional data, these models typically assume binary relations between pairs of nodes. We develop a latent mixed membership model capable of modelling richer forms of transactional data. Estimation and inference are accomplished via a variational EM algorithm. Simulations indicate that the learning algorithm can recover the correct generative model. We further present results on a subset of the Enron email dataset. This is joint work with Mahdi Shafiei.
About the speaker: Dr. Hugh Chipman is a Canada Research Chair in Mathematical
Modelling at Acadia University, and the director of the Acadia Centre for Mathematical
Modelling and Computations. His research focuses on statistical models for extracting
information from such large and complex datasets. He completed his PhD studies
at the University of Waterloo in 1994, and held a faculty position at the University
of Chicago before moving to Acadia. In 2009, he was awarded the CRM-SSC Prize
for his outstanding contributions to the application of Bayesian statistical
inference for data analysis.
Title: The trouble with community
detection in complex networks
Speaker: Aaron Clauset, Santa Fe institute
Date: Tuesday, March 30, 2.30pm
Abstract: Although widely used in practice, the performance of the popular network clustering technique called "modularity
maximization" is not well understood when applied to networks with unknown modular structure. In this talk, I'll show that precisely in
the case we want it to perform the best---that is, on modular networks---the modularity function Q exhibits extreme degeneracies,
in which the global maximum is hidden among an exponential number of high-modularity solutions. Further, these degenerate solutions can
be structurally very dissimilar, suggesting that any particular high- modularity partition, or statistical summary of its structure,
should not be taken as representative of the other degenerate solutions. These results partly explain why so many heuristics do
well at finding high-modularity partitions and why different heuristics can disagree on the modular composition the same network.
I'll conclude with some forward-looking thoughts about the general problem of identifying network modules from connectivity data alone,
and the likelihood of circumventing this degeneracy problem.
Title: Modeling approaches for
analyzing complex networks
Speaker: Edo Airoldi, Department of Statistics, FAS Center for Systems Biology, Faculty of Arts & Sciences, Harvard University
Date: Friday Sept 17, 2010, 9:30 a.m.
Abstract: Networks are ubiquitous in science and have become a focal point
for discussion in everyday life. Formal statistical models for the
analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a collections of
measurements on pairs of objects. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology
from the 1960s, these early works generated an active “network community” and a substantial literature in the 1970s. This effort moved
into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical
physics and computer science. The growth of the World Wide Web and the emergence of online “networking communities” such as Facebook and
LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and
network data. In this talk, I will review a few ideas that are central to this burgeoning literature, placing emphasis on modeling approaches
available for data analysis, and review some of the recent work that is going on in my group.
Speaker Bio: In December 2006, Dr. Airoldi received a Ph.D. from Carnegie Mellon, working on statistical machine learning and the
analysis of complex systems with Stephen Fienberg and Kathleen Carley. His dissertation introduced statistical and computational elements of graph theory that support data analysis of complex systems and their evolution. Till December 2008, he was a postdoctoral fellow in the Lewis-Sigler Institute for Integrative Genomics of Princeton University working with Olga Troyanskaya, David Botstein, and James Broach. He developed mechanistic models to gain computational insights into aspects of the molecular and cellular biology that are not directly observable with experimental probes. He has been working closely with biologists and in the areas of cellular differentiation, cellular development and cancer, since.
Speaker URL: http://www.people.fas.harvard.edu/~airoldi/
Title: Facebook as a data capture
site: Techniques, Traps, Terms and Conditions
careful and minimal data collection efforts rather than a cavalier approach indifferent to the risks of real world data. I also wish to make an appeal to technical fields whose ethics procedures tend to be inadequate for this sort of semi-private and sensitive data.
Slides -- Slides from March 25 seminar in the Social Media Lab
Bernie Hogan is a Research Fellow at the Oxford Internet Institute. He specializes in novel methods for online data capture and analysis,
especially via social media. Recent work has focused on the capture analysis of Facebook networks, particularly through his application
namegenweb, which downloads a social network for visualization in network programs such as NodeXL. Past work included an online audit
study of racism on Craigslist, pen and paper methods for visualizing social networks, the analysis of profile photos and techniques for
online surveys of spouses and partners. Bernie received his dissertation from the University of Toronto in 2009 under Barry
Wellman. This thesis won the Dordick award for Best Dissertation from the Communication and Technology section of the International
Speaker's contact info:
Dr Bernie Hogan
Research Fellow, Oxford Internet Institute
University of Oxford
Title: Towards Summarizing and Making Sense of the Blogosphere
Speaker: Prof. Denilson Barbosa
Department of Computing Science, Univ. of Alberta
Date: Thursday March 15, 2012
Time: 2:40 p.m.
Location: Jacob Slonim Conference Room (430), Computer Science
6050 University Avenue, Halifax
The extraction of structured information from text is a fast improving subfield of Natural Language Pro- cessing which has been re-invigorated with the ever-increasing availability of user-generated textual content online. One environment which stands out as a source of invaluable information is the blogosphere–the network of social media sites, in which individuals express and discuss opinions, facts, events, and ideas pertaining to their own lives, their community, profession, or society at large. Indeed, the automatic extraction of reliable information from the blogosphere promises a viable approach for discovering very rich social data: the issues that engage society in thousands of collective and parallel conversations online. Considerable attention has been given to the problem of automatically extracting and studying the social dynamics among the participants (i.e., authors) in shared environments like the blogosphere. In that line of work, the goal is to understand how the network of humans conversing in the blogosphere is formed, evolves over time, and influences others in their own opinions. Our goal, on the other hand, is to extract the network of entities, facts, ideas and opinions expressed in social media sites, as well as the relationships among them. Such structured data can be organized as one or more information networks, which in turn are powerful metaphors for the study and visualization of various kinds of complex systems. In this talk, I will cover the basic NLP tools that are necessary for automatically extracting information networks from social media text, relying to a large extent on the experiences gathered on our ongoing SONEX project.
I am an Associate Professor at the University of Alberta, where I joined in 2008. I completed my Ph.D. at the University of Toronto in 2005, working on XML data management and took an academic job at the University of Calgary between 2005 and 2008. I am interested in databases on their own merit, and also on the application of database and information retrieval principles to the management of linked data. I am a member of the NSERC Strategic Network on Business Intelligence, where I work on information extraction, and the Canadian Writing Research Collaboratory, where I work text mining, data management for prosopography, and document engineering.
Speaker url: http://webdocs.cs.ualberta.ca/~denilson/
Host: Evangelos Milios (firstname.lastname@example.org)