PmWiki | Seminars / 2010-2011

June 08, 2011, 11:00 This seminar will be given in on Wednesday in collaboration with MLG
Using Weighted Automata as a probabilistic model
Raphaël Bailly, LIF - Marseille University, France.

The framework of this seminar is the problem of density estimation on structured data (finite strings or trees). The most widely used model for this kind of task is the Hidden Markov Model, or HMM, whose training is done by the Baum-Welch algorithm, which maximizes the likelihood of a training sample. It is then necessary to limit the expressiveness of the considered HMM family by setting a maximum number of hidden states, in order to avoid overfitting. This is usually done by an a priori knowledge, or by some heuristics.
The Weighted Automata are a generalization of HMMs. The main interest of Weighted Automata relies on the fact that there is a simple, convergent way to estimate the number of states and the parameters of such models. The main problem comes from the use of these models as probabilistic models - in the sense that Weighted Automata obtained by these methods do not define, in general, a probability distribution.
We will expose some properties of these models, related to their use as probabilistic models. We then give convergence results for the parameters estimation method. Finally, we introduce some variations of Weighted Automata which solve many of the problems mentioned here.

Mai 27, 2011, 11:00
Community detection in networks of crime authors
Gautier Krings, UCL, ICTEAM.

The Belgian Federal Police maintains a database with all recorded criminal facts in the country. For each fact that has been solved, one or several authors are known. One can then construct a network of crime authors by connecting authors to the crimes in which they are involved. In this seminar, I will present this unusual database and its particularities, and I will address the problem of community detection in this network.

April 29, 2011, 11:00
Internet Topology Discovery
Benoît Donnet, UCL

Since the beginning of the nineties, the Internet has undergone impressive growth. This growth can be appreciated in terms of the equipment, such as routers and links, that has been added, as well as in the numbers of users and the value of commerce that it supports. In parallel to this expansion, over the past decade the networking research community has shown a growing interest in discovering and analyzing the internet topology. Some researchers have developed tools for gathering network topology data while others have tried to understand and model the Internet's properties. These efforts have brought us to a crucial juncture for toplogy measurement infrastructures: while, previously, these were both small (in terms of number of measurement points) and monolithic, we are starting to see the deployment of large-scale distributed systems composed of hundreds or thousands of monitors. As we look forward to this next generation of systems, we take stock of what has been achieved so far. In this survey, we discuss past and current mechanisms for discovering the internet topology at various levels: the IP interface, the router, and the AS level. In addition to discovery techniques, we provide insights into some of the well-known properties of the Internet topology.

April 18, 2011, 10:00This seminar will be given on Monday at 10am
Resilience to Clustering: Analyzing the Dynamics in Evolving Networks
Dr. Bivas Mitra, Institute of Complex Systems Paris Ile-de-France (ISC-PIF) & CNRS

Understanding the dynamics in large scale networks is a major challenge in front of the network research community. Traditional graph theoretic approaches have their own limitations and are not applicable due to the large size and dynamic nature of the network. In this background, my talk primarily addresses two different issues in technological and social networks. The first half of my talk is directed towards understanding the resilience and emergence of technological networks, specifically the superpeer networks. We propose an analytical framework in order to measure the resilience of superpeer networks in face peer churn and attacks. Other side, it is not obvious why bootstrapping of peer nodes and other local dynamics results in the appearance of bimodal degree distribution in superpeer networks like Gnutella. We develop a formalism which enables us to explain the emergence of bimodal network in face of dynamics like peer bootstrapping, churn, link rewiring etc. Further analysis leads us in formulating interesting bootstrapping protocols such that superpeer network evolves with desired topological properties.
The second half of my talk mostly focuses towards the detection and analysis of dynamical communities in social networks, specifically in citation network. Most of the recent methods aim at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving ‘longitudinal’ rather than ‘dynamic’ community detection. Assuming that communities are fundamentally defined by a certain amount of interaction recurrence among a possibly disparate set of nodes over time, we suggest that the loss of information induced by considering successive snapshots makes it difficult to appraise essentially dynamic phenomena. We propose a methodology which aims at tackling this issue in the context of citation datasets, and present several illustrations on both empirical and synthetic dynamic network datasets.

April 07, 2011, 11:00
Graphs, semirings and routing (PDF available: graphs-semirings-routing.pdf ?)
Seweryn Dynerowicz, FUNDP

The problem of the shortest-path constitutes a central problem in graph theory and provides a formalism which can be used to describe and analyse numerous problems. For decades, the operations research community has been studying the intimate link that relates this theory to universal algebra. This connection is achieved through the use of algebraic structures such as semirings and dioids. The goal of this presentation is to underline these links and to provide pointers to various resources and possibilities provided by this particular perception of Euler's legacy.

April 01, 2011, 11:00
Families of matrices sharing a common invariant cone
Vladimir Protassov, Moscow State University

When does a given finite set of linear operators possess a common invariant cone ? This question found important applications in recent papers on neural and gene networks, linear switched systems, functional equations, etc. In fact, linear operators with an invariant cone inherit many properties of nonnegative matrices (Perron-Frebenius theory, etc). For one operator this problem was solved in classical works of Krein, Rutman and Vandergraft. For several operators this is more difficult. We present a sharp criterion of the existence of a common invariant cone for an arbitrary set of matrices. It allows us, at least theoretically, to decide the existence of a common invariant cone just by comparing two special numbers depending on matrices. However, the computing of those numbers is hard. In fact, we show that this problem for arbitrary matrices with integer entries is algorithmically undecidable. On the other hand, some corollaries of the criterion lead to simple sufficient and necessary conditions. Finally, we formulate an approximative analogue of the problem and introduce a ``co-directional number'' of several matrices. Possible approaches to its computation are discussed.

March 25, 2011, 11:00
A statistical analysis of the Belgian mobile phone network
Adeline Decuyper, UCL INMA

Presentation of the results from her master thesis.

March 18, 2011, 11:00
An Assymetric Laplacian and Commute Times for a Directed Graph.
Dan Boley, University of Minnesota, Computer Science & Engineering

For undirected graphs, it is well known how to obtain the expected inter-vertex commute times from the graph Laplacian matrix. We show the same formulas hold in the case of strongly connected directed graphs. Our result is obtained by deriving a close relation between the Laplacian with a particular scaling and the so-called Fundamental matrix for an associated random walk over the graph. We find that the commute times still form a metric and give bounds in terms of the stationary probabilities for the random walk. We compare these commute times with those obtained from a previously proposed symmetrized Laplacian derived from a related weighted undirected graph.

March 07, 2011, 12:00 This seminar will be given on Monday
Cascading Failures in Complex Networks: Power Blackouts and the Domino Effect
Ingve Simonsen, Norwegian University of Science and Technology (NTNU), Department of Physics

Our contemporary societies rely more and more on a steady and reliable power supply for their well-functioning. During the last few decades a number of large-scale power blackouts have been witnessed around the world, and this has caused major concerns among politicians and citizens. In this talk we will mention a few major power blackouts and discuss the sequence of events and why they occurred. These empirical examples show that major power blackouts often are results of a cascading of failures (a ``Domino effect''). We will introduce a generic (random walk) model for the study of cascading failures in networks, and investigate the impact of transient dynamics caused by the redistribution of loads after an initial network failure (triggering event). It is found that considering instead the stationary states, as has been done in the past, may dramatically overestimate (by 80-95%) the robustness of the network. This is due to the transient oscillations or overshooting in the loads, when the flow dynamics adjusts to the new (remaining) network structure. Consequently, additional parts of the network may be overloaded and therefore fail before the stationary state is reached. The dynamical effects are strongest on links in the neighborhood of broken links. This can cause failure waves in the network along neighboring links, while the evaluation of the stationary solution predicts a different order of link failures.

March 01, 2011, 15:00 This seminar will be given on Tuesday as a Cesame seminar
Analyzing the Whole Transcriptome by RNA-Seq data: Statistical and Computational Challenges
Claudia Angelini , Istituto per le Applicazioni del Calcolo "Mauro Picone", CNR

The main goal of the whole transcriptome analyses is to identify, characterize and catalogue all the transcripts expressed within a specific cell/tissue - at a particular stage - with the great potential to determine the correct splicing patterns and the precise structure of genes, and to quantify the differential expression of transcripts in both physiological and pathological conditions. Until 2004, hybridization and tag-based technologies, such as microarray, have allowed researchers to obtain intriguing insights into human genetics, even though microarray techniques suffer from background cross-hybridization issues and a narrow detection range, and tag-based approaches require laborious time- and cost-effective steps for the cloning of fragments prior sequencing. Recently, the introduction of massively parallel sequencing on Next Generation Sequencing (NGS) platforms has completely revolutionized the way of thinking in molecular biology. Among the different application of NGS platforms, RNA-Seq is probably one of the most complex of the various sequencing protocols developed so far. RNA-Seq has clear advantages over the existing approaches. First, RNA-Seq is not limited to the detection of known transcripts, thus allowing the identification, characterization and quantification of new splice isoforms. In addition, it permits researchers to determine the correct gene annotation, also defining the transcriptional boundaries of genes. Other advantages of RNA-Seq are the low "background signal", the absence of an upper limit for quantification and consequently, the larger dynamic range of expression levels over which transcripts can be detected. RNA-Seq data also show high levels of reproducibility. In this seminar we will illustrate some of the computational tools available for the analysis of RNA-Seq data and discuss the main challenges to face when analyzing RNA-Seq experiments. In particular we will focus on 1) Identification and quantification of transcriptional regions. 2) Identification and quantification of isoforms. 3) Detection of differential expressed events between two or more experimental conditions. 4) Connection between RNA-seq and ChIP-seq data.

February 24, 2011, 14:00
Social event detection using probabilistic location inference of mobile phone
Vincent Traag and Arnaud Browet, UCL INMA

The unprecedented amount of data from mobile phones creates new possibilities to analyze various facets of human behavior. Over the last few years, much attention has been devoted to studying the mobility patterns of humans. In this presentation we will focus on unusually large gatherings of people, i.e. social events. We introduce the methodology of detecting such social events in massive mobile phone data, based on a Bayesian location inference framework. More specifically, we also develop a framework for deciding who is attending an event. We demonstrate the method on a few examples. Finally we suggest some future approaches for event detection, and some possible analyses of the detected social events.

February 18, 2011, 12:00
Detection and analysis of communities in France
Pierre Devile, UCL INMA

This presentation will focus on the detection and localization of communities in France based on phone calls.
We apply algorithms of community detection on the graph obtained from mobile calls. The graphs have about 20,000 nodes and 50 millions edges.
Different analyses are performed to state the stability of the communities and their evolution in time.
We will show the surprising match between the observed communities and the French administrative boundaries.

January 27, 2011, 11:00
Survivability in WDM optical Network using p-Cycles
Hamza Drid, IRISA, University of Rennes

Survivability in optical networks is an important issue due to the huge bandwidth offered by optical technology. Survivability means that the network has the ability to maintain an acceptable service level even after an occurrence of failures within the network. In this talk, firstly, we study and classify the various mechanisms of survivability proposed in the literature. Then we focus on p-cycles design. The major challenge of p-cycle design resides in finding an optimal set of p-cycles protecting the network for a given working capacity.

January 19, 2011, 14:00
Community detection in evolving networks
Thomas Aynaud, Complex Network, lip6, Paris

Complex networks structure is often described as composed of several groups of nodes with many links inside the groups but only a few between them. These groups are called communities and we will study the detection of such communities in evolving networks. We will present briefly some existing methods. Then we will show that classical static community algorithms fail in discovering a meaningful structure because of their instability and propose a modification to stabilize one of them. Finally we will present a new way to detect communities that are good on average and not only at one moment.

December 16, 2010, 11:00
Sparse Kernel Spectral Clustering for Large-Scale Data Analysis
Carlos Alzate, ESAT-SCD, Katholieke Universiteit Leuven

Classical spectral clustering methods are relaxations of NP-hard graph partitioning problems. These relaxations lead to eigenvalue problems of a graph Laplacian matrix. The relaxed solutions correspond to particular eigenvectors which contain the grouping information. A different formulation to spectral clustering has been introduced lately as a primal-dual optimization setting allowing natural out-of-sample extensions and model selection. The formulation in the dual corresponds to an eigenvalue problem and can be interpreted as a weighted version of kernel PCA for a particular choice of weights.

The clustering model can be extended to new points via projections onto the eigenvectors which is important for obtaining a good generalization performance. The eigenvectors and the projections display a special structure when the clusters are well-formed and the kernel function and its parameters have been chosen appropriately. Data points in the same cluster are collinear in the space spanned by the projections. However, these projections are expressed in terms of non-sparse kernel expansions where every training data point contributes. This issue can be problematic in large-scale data sets due mainly to memory limitations in standard hardware.

In this talk, we discuss several methods to obtain sparse kernel clustering models. The objective is to approximate the clustering models such that they depend only on a very reduced number of data points while still providing high performance. Experimental results with toy data, images and large graphs show the effectiveness of our method.

December 09, 2010, 11:00
Hybrid clustering of multiple graphs by simultaneous trace maximization
Xinhai Liu, ESAT-SCD, Katholieke Universiteit Leuven

With the modern technology fast developing, most of entities can be observed by different perspectives. These multiple view information allows us to find a better pattern as long as we integrate them in an appropriate way. So clustering by integrating multi-view representations that describe the same class of entities has become a crucial issue for knowledge discovering. By extending the graph Laplacian based spectral clustering, we formulate a hybrid clustering of multiple graphs by simultaneous trace maximization. Alternating least square mechanism is adopted to obtain the optimal subspace while the effect of each graph is leveraged. A pre-processing step of dimension reduction based on multi-linear singular value decomposition (MLSVD) is employed for large-scale data. Our hybrid clustering algorithm is applied to the real application of scientific publication analysis and cross compared with the relevant hybrid clustering methods.

November 24, 2010, 11:00
Oleg Burdakov, Department of Mathematics, Linkoping University, Sweden

Given a weighted directed graph with a selected root node and a setof terminal nodes, the directed Steiner tree problem (DSTP) is tofind a directed tree of the minimal weight which is rooted in theroot node and spanning all terminal nodes. We consider the DSTPwith a constraint on the total number of arcs (hops) in the tree.This problem is known to be NP-hard, and therefore, only heuristicscan be applied in the case of large-scale instances.

For the hop-constrained DSTP, we propose local search strategies aimed at improving any heuristically produced initial Steiner tree. They are based on solving a sequence of hop-constrained shortest path problems for which we have recently developed efficient labelcorrecting algorithms.

The approach presented in this talk is applied to solving theproblem of 3D placing unmanned aerial vehicles (UAVs) used for multi-target surveillance. The efficiency of our algorithms isillustrated by results of numerical experiments.

November 23, 2010, 11:00
Dynamics of interactions in a large communication network
Janos Kertesz, Budapest University of Technology and Economics

We study the spreading of information in a mobile phone network by using time stamped data for an SI model. In spite of the fact that the aggregate network is a small world, the spreading is surprisingly slow. By using an appropriate set of null models we identify those correlations, which are mostly responsible for this slowing down: Topological correlations and their relation to the intensity of interactions, as well as the bursty character of activities. We give a detailed analysis of burstiness and conclude that none of the current models is able to describe the empirical findings.

November 03, 2010, 11:30
Discrete curve fitting on manifolds
Nicolas Boumal, Large graphs & Networks, UCL

We address the problem of fitting curves to data points on a manifold M. Regression serves at least two goals: (i) reducing noise on measurements and (ii) filling the gaps where data is missing.

We propose a novel objective function and exploit recent geometric optimization schemes to achieve regression on manifolds. In particular, we consider the case where M is the sphere S2, the special orthogonal group SO(n) or the cone of positive-definite matrices P_n. Our approach mainly differs from existing methods in that we build discretized curves, hence reverting to finite dimensional optimization problems.

Our algorithms perform well in practice. The proposed objective enabled us to build varied types of discrete regression curves, such as approximating cubic spline-like curves and geodesic regression curves. We envision applications in control, computer graphics and medical imaging.

November 03, 2010, 11:00
On the Policy Iteration algorithm for PageRank optimization
Romain Hollanders, Large graphs & Networks, UCL

In search engines, it is critical to be able to sort web-pages according to their relative importance. The importance of a page can be seen as the average portion of time spent in it by an infinite random walk on the web. Since the web can be modeled as a graph, the importance of a page can be measured by the PageRank of its corresponding node. Furthermore, the problem of optimizing the PageRank of a node in a directed graph when the presence of some links are uncertain is a problem that has been intensively studied lately. Here, we explain how this problem can be formulated as a Markov Decision Processes. Classical algorithms can then be used to solve this new formulation of the problem. Among these, one of them, built following the principle of Policy Iteration, seems to perform very well in practice. Yet, its theoretical behavior is unknown. In this talk, we study the complexity of this algorithm and we show that it converges in polynomial time under the natural assumption that the random walk restarts with a fixed probability called damping. We also try to extend our result to the case in which there is no damping and we give some clues towards new results in this direction.

October 21, 2010, 11:00
Blossom-Quad: a non-uniform quadrilateral mesh generator using a minimum cost perfect macthing algorithm
J.F. Remacle, Mechanics, Materials and Civil Engineering (iMMC)

Quadrilateral surface meshes are sometimes considered as superior to triangular meshes for finite element simulations. Discussions about if and why quadrilaterals are better than triangles are usually passionate in the finite element community. Here, we will not try to argue about that thorny question: we assume that quadrilateral meshes are useful for some people.

In this talk, we present a new way of generating quadrilateral meshes. Let us first recall briefly what are the methods that allow to build non uniform quadrilateral meshes in an automated manner. There are essentially two categories of methods.

In direct methods, the quadrilaterals are constructed at once, either using some kind of advancing front technique or using regular grid-based methods (quaadtrees). Advancing front methods for quads are considered to be non robust and quadtree methods usually produce low quality elements close to the boundaries of the domain and are unable to fullfill general size constraints (anisotropy, strong variations).

In indirect methods, a triangular mesh is build at first. Triangle-merge methods use the triangles of the initial mesh and recombine them to form quadrangles. Other more sophisticated indirect methods use a mix of advancing front and triangle merge.

The method we present here is an indirect approach to quadrilateralization. We make use of one famous algorithm of the theory of graphs: the Blossom algorithm was discovered by Edmunds in 1967 and allows to find the minimal cost perfect macthing of a given graph. The Blossom-quad algorithm has some clear advantages: (i) it provides a mesh that is guaranteed to be quadrilateral only, (ii) it is optimal in a certain way and (iii) it is extremely fast.

October 14, 2010, 11:00
''Integrating measurement, metrology and analysis for the study of dynamic complex networks
Clémence Magnien, LIP6, Paris

Complex networks appear in various contexts and are at the center of many applied and theoretical problems. Typical examples are internet topology, mobility graphs, peer-to-peer exchanges, web graphs, and social networks. All these graphs have nontrivial properties in common and raise similar questions, which makes it relevant to study them as a whole. Most of these networks evolve with time, with the appearance and disappearance of nodes and links. Studying these dynamics is a crucial topic, which has however received limited attention up till now.

I will describe my works on this topic, following three different axes: measurement, i.e. acquiring information on which nodes and links are present and at which moments in a network; metrology, i.e. the study of the bias induced by the measurement procedure; and analysis, i.e. describing a network and its dynamics, often through statistical or structural properties.

We have shown that these problematics, which have been highlighted in studies devoted to static graphs, are also relevant in the dynamic case, and that new questions arise in this case. In the case of measurement, it appeared that it is essential to rigourously study the measurement parameters, in order to assert the quality of the collected data, and encourage reprocucibility of these measurements. We studied several cases in which measurements induce a bias in the observations. In particular, we introduced a methodology allowing to decide whether a measurement lasted long enough to ensure reliable observations. In the case of analysis, we focused on the dynamics of the internet topology as it can be observed by a single monitor. We have found that the IP addresses observable from a monitor change much faster than what was thought before, and that routing changes play an important part in this phenomena.

September 08, 2010, 11:30
Coherence in Large-Scale Controlled Networks: Fundamental Limitations of Local Feedback Bassam Bamieh, University of California (Santa Barbara)

Problems of coordinated motion of vehicular formations have received much attention lately, especially in the areas of formation flight and so-called vehicular platoons. We investigate fundamental limitations that the use of only local feedback control has on the coherence of large formations that are subjected to stochastic disturbances. The notion of coherence translates roughly into how closely can such formations resemble a solid object, and is unrelated to the notion of string instability. This question is also of interest in the context of naturally occurring swarms, such as birds in flying formations and schools of fish. Similar dynamical phenomena occur in distributed load balancing in parallel processing and consensus-type algorithms. We examine these dynamical effects using the network topology of regular lattices, and investigate the role of topological dimension. A common phenomenon appears where a higher spatial dimension implies a more favorable scaling of coherence measures, with dimensions of 3 and 5 being necessary to achieve coherence in consensus and vehicular formations respectively with only local feedback. We show that microscopic error measures that involve only neighboring vehicles scale much more favorably, and in some cases do not suffer from this effect. This phenomenon reflects a fact that in lower spatial dimensions, local stabilizing feedbacks are relatively unable to regulate against spatially large-scale disturbances, resulting in an unregulated, "undulating" type of motion for the entire formation. We point out connections between this analysis and the statistical mechanics of harmonic solids where such phenomena have long been observed, as well as connections with the theory of resistive lattices as has been observed by others.