Liste de nos séminaires

(ordre anti-chronologique)

Séminaire DAPA du 13 / 9 / 2018 à 14h

Diurnal variations of psychometric indicators in Twitter content

Fabon Dzogang (Intelligent Systems Laboratory, University of Bristol, Bristol, UK)

Lieu : LIP6, salle 101 (1er étage), couloir 26-00, 4 place Jussieu, 75005 Paris

The psychological state of a person is characterised by cognitive and emotional variables which can be inferred by psychometric methods. Using the word lists from the Linguistic Inquiry and Word Count, designed to infer a range of psychological states from the word usage of a person, we studied temporal changes in the average expression of psychological traits in the general population. We sampled the contents of Twitter in the United Kingdom at hourly intervals for a period of four years, revealing a strong diurnal rhythm in most of the psychometric variables, and finding that two independent factors can explain 85% of the variance across their 24-h profiles. The first has peak expression time starting at 5am/6am, it correlates with measures of analytical thinking, with the language of drive (e.g power, and achievement), and personal concerns. It is anticorrelated with the language of negative affect and social concerns. The second factor has peak expression time starting at 3am/4am, it correlates with the language of existential concerns, and anticorrelates with expression of positive emotions. Overall, we see strong evidence that our language changes dramatically between night and day, reflecting changes in our concerns and underlying cognitive and emotional processes. These shifts occur at times associated with major changes in neural activity and hormonal levels.


Fabon obtained his PhD in Computer Science in 2013 “on Learning and Representation from Texts for both Emotional and dynamical Information” at the University of Pierre et Marie Curie, in the DAPA department at LIP6. After graduating he held a short post-doctoral position in LIP6, working on building interpretable models for the classification of multivariate time series’ data. At this time he grew an interest in the analysis of time series’ data, and in the Fourier transform as a mean to extract meaningful features from data. He later joined the University of Bristol as a research associate in 2014 where he worked on efficient machine learning algorithms for data streams, and developed tools to study our human behaviours at a collective level via the analysis of the social media and large samples of press archives. He combined his works on information dynamics and his interest in the study of emotions to research periodic patterns of emotions and mental health. His results provide evidence that a share of the variance in our collective behaviours and emotions are predictable across the year, and over the 24-h cycle.

Plus d'information sur Fabon Dzogang :

Séminaire DAPA du 28 / 6 / 2018 à 14h

A Cognitive Architecture for Object Recognition in Video

Jose C. Principe (Computational NeuroEngineering Lab, University of Florida)

Lieu : LIP6, salle 105 (1er étage), couloir 25-26, 4 place Jussieu, 75005 Paris

This talk describes our efforts to abstract from the animal visual system the computational principles to explain images in video. We develop a hierarchical, distributed architecture of dynamical systems that self-organizes to explain the input imagery using an empirical Bayes criterion with sparseness constraints and dual state estimation. The interpretation of the images is mediated through causes that flow top down and change the priors for the bottom up processing. We will present preliminary results in several data sets.


Jose C. Principe (M’83-SM’90-F’00) is a Distinguished Professor of Electrical and Computer Engineering and Biomedical Engineering at the University of Florida where he teaches advanced signal processing, machine learning and artificial neural networks (ANNs) modeling. He is BellSouth Professor and the Founder and Director of the University of Florida Computational NeuroEngineering Laboratory (CNEL) . His primary area of interest is processing of time varying signals with adaptive neural models. The CNEL Lab has been studying signal and pattern recognition principles based on information theoretic criteria (entropy and mutual information).

Dr. Principe is an IEEE Fellow. He was the past Chair of the Technical Committee on Neural Networks of the IEEE Signal Processing Society, Past-President of the International Neural Network Society, and Past-Editor in Chief of the IEEE Transactions on Biomedical Engineering. He is a member of the Advisory Board of the University of Florida Brain Institute. Dr. Principe has more than 800 publications. He directed 92 Ph.D. dissertations and 65 Master theses. He wrote in 2000 an interactive electronic book entitled “Neural and Adaptive Systems” published by John Wiley and Sons and more recently co-authored several books on “Brain Machine Interface Engineering” Morgan and Claypool, “Information Theoretic Learning”, Springer, and “Kernel Adaptive Filtering”, Wiley.

Plus d'information sur Jose C. Principe :

Séminaire DAPA du 1 / 2 / 2018 à 10h

Machine Learning @dailymotion : Toward better content understanding and more accurate recommendation

Yves Mabiala (Dailymotion)

Lieu : LIP6, salle 105 (1er étage), couloir 25-26, 4 place Jussieu, 75005 Paris

In this talk I will describe two of the main subjects the data science team at Dailymotion is focusing on. I will first start by describing how a video is automatically characterized in terms of verticals (sport, music, ...) and topics (coming from wikipedia) using multi-modal approaches based on the sound and the images of the video but also the text characterizing it. In a second step, I will describe how we are able to pick out of a 250 million video catalog the most accurate videos for millions of users especially using sequence models for session-based recommendation


Yves Mabiala is a data scientist leading the data science team at Dailymotion. He is currently working working on large scale recommendation problems and content characterization from raw signals (audio, video).
Prior to Dailymotion he was working at Thales as a research scientist in the data science lab where he was focusing on large scale unsupervised anomaly detection in cyber-security, credit card fraud detection or unsupervised sequence learning especially applied to predictive maintenance.
He was also a member of the LIP6/Thales joint lab, where he was working with the ComplexNetwork team on studying the dynamics of large graphs but also with MLIA team on time series representation learning.

Séminaire DAPA du 18 / 1 / 2018 à 10h30

Circadian Mood Variations in Twitter Content

Fabon Dzogang (Intelligent Systems Laboratory (ISL), University of Bristol, Bristol, UK)

Lieu : LIP6, salle 105 (1er étage), couloir 25-26, 4 place Jussieu, 75005 Paris

Circadian regulation of sleep, cognition, and metabolic state is driven by a central clock, which is in turn entrained by environmental signals. Understanding the circadian regulation of mood, which is vital for coping with day-to-day needs, requires large datasets and has classically utilised subjective reporting. We use a massive dataset of over 800 million Twitter messages collected over the course of 4 years in the United Kingdom. We extract robust signals of the changes that happened during the course of the day in the collective expression of emotions and fatigue. We use methods of statistical analysis and Fourier analysis to identify periodic structures, extrema, change-points, and compare the stability of these events across seasons and weekends. We reveal strong, but different, circadian patterns for positive and negative moods. The cycles of fatigue and anger appear remarkably stable across seasons and weekend/weekday boundaries. Positive mood and sadness interact more in response to these changing conditions. Anger and, to a lower extent, fatigue show a pattern that inversely mirrors the known circadian variation of plasma cortisol concentrations. Most quantities show a strong inflexion in the morning. Since circadian rhythm and sleep disorders have been reported across the whole spectrum of mood disorders, we suggest that analysis of social media could provide a valuable resource to the understanding of mental disorder.


Fabon defended his PhD thesis in Computer Science in 2013 “on Learning and Representation from Texts for both Emotional and dynamical Information” at the University of Pierre et Marie Curie, in the DAPA department at LIP6. After graduating he held a short post-doctoral position in LIP6, working on building interpretable models for the classification of multivariate time series’ data. At this time he grew an interest in the analysis of time series’ data, and in the Fourier transform as a mean to extract meaningful features from data. He later joined the University of Bristol as a research associate in 2014 where he worked on efficient machine learning algorithms for data streams, and developed tools to study our human behaviours at a collective level via the analysis of the social media and large samples of press archives. He combined his works on information dynamics and his interest in the study of emotions to research periodic patterns of emotions and mental health. His results provide evidence that a share of the variance in our collective behaviours and emotions are predictable across the year, and over the 24-h cycle.

Plus d'information sur Fabon Dzogang :

Séminaire DAPA du 29 / 11 / 2017 à 14h

Exploring the Trade-Offs of Web Interfaces to Support Live Queries over (Semantic) Web Data

Olaf Hartig (Linköping University)

Lieu : LIP6, salle 405 (4ème étage), couloir 24-25, 4 place Jussieu, 75005 Paris

In the context of the Linked Open Data effort, a significant number
of public SPARQL endpoints had been made available on the Web to provide
query-based access to various types of datasets. Many such endpoints have
sacrificed high availability because maintaining a server that provides a
reliable SPARQL endpoint is costly. To address this issue we have started
investigating approaches that shift some of the effort of executing queries
from the server to the clients; these approaches rely only on data access
interfaces that are limited to simple types of requests. In this two-parts
talk I will first introduce two such interfaces and present experimental
results that highlight their respective properties. Thereafter, in the second
part of the talk, I will introduce an abstract machine model that allows us to
study such client-server scenarios formally. I will present results of such a
study based on which we have drawn a fairly complete expressiveness lattice
that shows the interplay between several combinations of client and server
capabilities. Additionally, I will show the usefulness of our model to
formally analyze the fine-grain interplay between several metrics such as the
number of requests sent to the server, and the bandwidth of communication
between client and server.


Olaf is an Assistant Professor at the Department of Computer and
Information Science of Linköping University. He holds a Ph.D. in Computer
Science from the Humboldt-Universität zu Berlin, and worked previously as a
postdoctoral research fellow at the Cheriton School of Computer Science at the
University of Waterloo and, thereafter, at the Hasso Plattner Institute,
Potsdam. Olaf is interested in problems related to the management of data and
databases. His focus in this broad context is on data on the Web and on graph
data, as well as on problems in which the data is distributed over multiple,
autonomous and/or heterogeneous sources. Regarding these topics, Olaf's
interests range from systems-building related research (e.g., efficient storage
of data, query processing, and query optimization) all the way to theoretical
foundations (e.g., complexity and expressive power of query languages). Olaf
was honored with the SWSA Distinguished Dissertation Award in 2015 for his
Ph.D. dissertation “Querying a Web of Linked Data: Foundations and Query
Execution,” and he has received two best research paper awards (ESWC 2009 and
ESWC 2015). Olaf is leader or contributor of several open source projects,
most notably SQUIN, which is a novel query processing system for the Semantic
Web. He co-organized international research workshops, served on multiple
program committees, and participated as an invited expert in the provenance
incubator group and the provenance working group of the World Wide Web

Plus d'information sur Olaf Hartig :

Séminaire DAPA du 2 / 10 / 2017 à 13h

Apprentissage profond et génération de musique

Jean-Pierre Briot (LIP6, UPMC)

Lieu : LIP6, salle 105 (1er étage), couloir 25-26, 4 place Jussieu, 75005 Paris

L’apprentissage profond s’est imposé dans le paysage de l’apprentissage machine à base de données avec des applications à large échelle en matière de reconnaissance d’image, vocale et de traduction. Du fait de son ADN hérité des réseaux de neurones artificiels et de la régression linéaire, il est de manière naturelle très approprié pour des tâches de prédiction et de classification. De récents travaux portent sur son application à la génération de contenu, images, texte et musique, en bénéficiant des capacités d’apprentissage de corpus et ainsi de style. Des enjeux actuels sont la capacité d’imposer des contraintes globales sur la génération (ex : tonalité, structure…) ainsi que de favoriser l’originalité des contenus générés, ce qui n’est pas l’objectif premier de l’apprentissage profond. Nous présenterons ici diverses approches, identifiées à partir de l’analyse de nombreux articles scientifiques et travaux récents dans ce domaine très actif, telles : le contrôle de la génération d’échantillons (sampling), la manipulation de données d’entrée, les architectures génératives adversaires (GAN), l’apprentissage par renforcement et la sélection et la concaténation d’unités musicales. Nous présenterons quelques exemples représentatifs de telles approches.

Cet exposé se base sur le récent pré-ouvrage sur le sujet, en collaboration avec Gaëtan Hadjeres et François Pachet :


Jean-Pierre Briot est Directeur de recherche CNRS, membre du LIP6, au sein de l’équipe SMA dans le Département DESIR. Ayant principalement travaillé sur des modèles de programmation et de conception de logiciel adaptatif et coopératif (objets, acteurs concurrents, composants répartis, agents), il s’est récemment ré-intéressé à l’informatique musicale, entamée lors de sa thèse entre l’IRCAM et le LITP (un des laboratoires fondateur du LIP6) au milieu des années 80 et également récemment intéressé au phénomène de l’apprentissage profond.

Plus d'information sur Jean-Pierre Briot :

Séminaire DAPA du 4 / 7 / 2017 à 10h

Big Data Analytics using Deep Learning and Information Theoretical Learning: Applications to Astronomy

Pablo A. Estévez (Department of Electrical Engineering, University of Chile, and Millennium Institute of Astrophysics, Chile)

Lieu : LIP6, salle 105 (1er étage), couloir 25-26, 4 place Jussieu, 75005 Paris

Astronomy is facing a paradigm shift caused by the exponential growth of the sample size, data complexity and data generation rates of new sky surveys. To cope with a change of paradigm to data-driven science new computational intelligence, machine learning and statistical approaches are needed. In this talk I will present two main applications. The first is to discriminate periodic versus non-periodic light curves, and then estimate the period of the periodic ones. Light curves are one-dimensional time series of the brightness of a star versus time. We have developed several methods based on the correntropy function (generalized correlation using information theoretical learning concepts), which outperforms conventional approaches. Results using 32.8 million light curves will be presented. Interestingly, some of these techniques can be applied to other problems such as sleep EEG analysis, and I will present preliminary results on this topic too.
The second application is the automated real-time transient detection in astronomical images. The aim is to achieve real-time detection of supernovae and other transients with the Dark Energy Camera. A novel transient detection pipeline was developed. We have been applying convolutional neural nets (deep learning) to discriminate between true transients and bogus transients, among other techniques, e.g non-negative matrix factorization combined with random forests. Results using 1.5 million images will be presented. The new pipeline was successfully tested online in February 2015 finding more than 100 supernovae in a few days of telescope observation.


Pablo A. Estévez received his professional title in electrical engineering (EE) from Universidad de Chile, in 1981, and the M.Sc. and Dr.Eng. degrees from the University of Tokyo, Japan, in 1992 and 1995, respectively. He is a Full Professor with the Electrical Engineering Department, University of Chile, and former Chairman of the EE Department in the period 2006-2010.

Prof. Estévez is one of the founders of the Millennium Institute of Astrophysics (MAS), Chile, which was created in January 2014. He is currently leading the Astroinformatics/Astrostatistics group at MAS. He has been an Invited Researcher with the NTT Communication Science Laboratory, Kyoto, Japan; the Ecole Normale Supérieure, Lyon, France, and a Visiting Professor with the University of Tokyo.

Prof. Estévez is an IEEE Fellow. He is currently the President of the IEEE Computational Intelligence Society (CIS) for the term 2016-2017. He has served as IEEE CIS President-elect (2015), CIS Vice-president of Members Activities (2011-2014), CIS ADCOM Member-at-Large (2008-2010), CIS Distinguished Lecturer (2006-2011) and as an Associate Editor of the IEEE Transactions on Neural Networks (2007-2012).

Prof. Estévez served as conference chair of the International Joint Conference on Neural Networks (IJCNN), held in July 2016, in Vancouver, Canada, and general chair of the Workshop on Self-Organizing Maps (WSOM), held in December 2012, in Santiago, Chile. Currently he is serving as general co-chair of the 2018 IEEE World Congress on Computational Intelligence, WCCI 2018, to be held in Rio de Janeiro, Brazil, July 2018.

His current research interests include big data, deep learning, neural networks, self-organizing maps, data visualization, feature selection, information theoretic-learning, time series analysis, and advanced signal and image processing. One of his main topics of research is the application of computational intelligence techniques to astronomical datasets, and EEG signals.

Plus d'information sur Pablo A. Estévez :

Séminaire DAPA du 4 / 5 / 2017 à 10h

Multi-Criteria Decision Making and Uncertainty

Ronald R. Yager (Machine Intelligence Institute, Iona College, New Rochelle, NY, USA)

Lieu : LIP6, salle 105 (1er étage), couloir 25-26, 4 place Jussieu, 75005 Paris

Multi-Criteria aggregation is a pervasive problem appearing in many technological domains. During this presentation we shall discuss some issue related to this task. One issue is the modeling of multi-criteria decision functions and a related issue is the evaluation of these decision functions in the face of uncertain information. One case we shall consider is the evaluation of the OWA operator when the satisfaction to the individual criteria is expressed via a probability distribution. We shall also consider the case of interval criteria satisfactions. We shall look at the role of fuzzy measures in the modeling process. One issue that must be dealt with is the ordering of the complex uncertain criteria satisfactions that is required to use the Choquet integral in the criteria aggregation.


Ronald R. Yager is Director of the Machine Intelligence Institute and Professor of Information Systems at Iona College. He is editor and chief of the International Journal of Intelligent Systems. He has published over 500 papers and edited over 30 books in areas related to fuzzy sets, human behavioral modeling, decision-making under uncertainty and the fusion of information. He is among the world’s most highly cited researchers with over 57,000 citations in Google Scholar. He was the 2016 recipient of the IEEE Frank Rosenblatt Award the most prestigious honor given out by the IEEE Computational Intelligent Society. He was the recipient of the IEEE Computational Intelligence Society Pioneer award in Fuzzy Systems. He received the special honorary medal of the 50-th Anniversary of the Polish Academy of Sciences. He received the Lifetime Outstanding Achievement Award from International the Fuzzy Systems Association. He received honorary doctorate degrees, honoris causa, from the Azerbaijan Technical University and the State University of Information Technologies, Sofia Bulgaria. Dr. Yager is a fellow of the IEEE, the New York Academy of Sciences and the Fuzzy Systems Association. He has served at the National Science Foundation as program director in the Information Sciences program. He was a NASA/Stanford visiting fellow and a research associate at the University of California, Berkeley. He has been a lecturer at NATO Advanced Study Institutes. He was a visiting distinguished scientist at King Saud University, Riyadh Saudi Arabia. He was an honorary professor at Aalborg University in Denmark. He received his undergraduate degree from the City College of New York and his Ph. D. from the Polytechnic Institute New York University. He recently edited a volume entitled Intelligent Methods for Cyber Warfare.

Plus d'information sur Ronald R. Yager :

Séminaire DAPA du 28 / 3 / 2017 à 11h

Quelques Résultats Récents dans le Domaine des Systèmes Robotiques Distribués Intelligents

Didier El Baz (Laboratoire d'analyse et d'architecture des systèmes)

Lieu : LIP6, salle 105 (1er étage), couloir 25-26, 4 place Jussieu, 75005 Paris

Dans cet exposé nous présentons nos travaux de recherche dans le domaine des systèmes robotiques distribués intelligents. Nous présentons notamment les travaux effectués dans le cadre des projets ANR Smart Surface et Smart Blocks qui ont porté sur la conception et la fabrication de convoyeurs distribués reconfigurables. En particulier, nous détaillons les aspects relatifs à l'algorithmique distribuée
pour la reconnaissance des pièces sur un convoyeur distribué et pour le déplacement des blocs.
Nous concluons par de nouveaux résultats sur la conception des blocs et de leurs moteurs linéaires.


Le Dr. Didier El Baz est ingénieur diplômé en Génie Electrique de l’INSA de Toulouse (1981). Didier El Baz est Docteur Ingénieur en Automatique de l’INSA de Toulouse (1984), diplômé du Programme d’Eté Data Networks du MIT, USA, 1984 et a reçu l’Habilitation à Diriger des Recherches de l’Institut National Polytechnique de Toulouse en 1998. Didier El Baz a été Stagiaire Postdoctoral INRIA au Laboratory for Information and Decision Systems du MIT, USA, de mars 1984 à février 1985.

Didier El Baz est Chercheur CNRS, fondateur et responsable au LAAS-CNRS de l’équipe Calcul Distribué et Asynchronisme, CDA. Didier El Baz a été le porteur et le coordonnateur du projet ANR Calcul intensif pair à pair (ANR-07-CIS7-011) qui a commencé en 2008 et s’est achevé en 2011.

Les domaines de recherche de Didier El Baz concernent le calcul intensif, le calcul distribué, la conception et l’analyse d’algorithmes parallèles ou distribués, les itérations asynchrones. Les applications traitées vont de la commande optimale, à la résolution d’équations aux dérivées partielles discrétisées en passant par l’optimisation non linéaire, l’optimisation combinatoire et la robotique. Didier El Baz est l’auteur de quarante articles dans des revues scientifiques internationales et de soixante-dix articles dans des conférences internationales avec actes. Il a dirigé onze thèses de Doctorat.

Didier El Baz est membre du Comité de Programme de la Conférence Parallel Distributed and network-based Processing depuis 2003. Il membre du Steering Commitee de PDP depuis 2008. Didier El Baz a été Président du Comité de Programme de la conférence PDP en 2008 et 2009 et Président du Comité d’Organisation de PDP en 2008. Il a été General co-Chairman de la conférence internationale IEEE iThings 2013, Pékin après avoir été Workshops Chairman de IEEE iThings en 2012 à Besançon. Il a été Chairman du Comité de Programme de seize workshops Internationaux sur le calcul parallèle et distribué notamment en liaison avec des Symposiums comme IEEE IPDPS. Le Dr Ingénieur Didier El Baz a été Général Chairman de la 16ème conférence IEEE Scalable Computing and Communications, de la Conférence IEEE Cloud and Big Data Computing de la treizième conférence IEEE Ubiquitous Intelligence and Computing ainsi

Séminaire DAPA du 23 / 3 / 2017 à 10h

Fouille textuelle et visuelle: innovations et transfert industriel

Hervé Le Borgne (CEA LIST)
Benjamin Labbe (CEA LIST)

Lieu : LIP6, salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Un premier volet concerne les activités scientifiques à travers deux
travaux publiés récemment. Le premier [1,2] concerne la représentation des
textes et des images dans des espaces communs de représentation, qui
peut être utile pour de nombreuses applications telles que la génération
automatique de légendes, l'illustration automatique d'un texte ou encore
la recherche d'images à partir d'une requête en langage naturel. Nous
présenterons divers principes permettant leur fabrication puis
présenterons certains biais de représentations qui y ont été identifiés.
Le second travail [3] présenté concerne le transfert d'apprentissage de
réseaux convolutifs (CNN). Nous abordons la question de l'universalité
des la représentation dans un tel contexte. Nous proposons un schéma
d'apprentissage des CNN qui permet d'améliorer significativement cette
universalité, en intégrant simplement les niveaux d'apprentissage de la
catégorisation humaine.
Un second volet de le présentation abordera les activités "de transfert
technologique" de l'équipe, au travers de deux applications vers des
industriels. Au delà des technologies proprement dites, nous parlerons
des difficultés rencontrées et du dimensionnement des problèmes et
systèmes manipulés.

[1] T. Q. N. Tran, H. Le Borgne, M. Crucianu, Aggregating Image and Text Quantized Correlated Components, CVPR 2016.

[2] T. Q. N. Tran, H. Le Borgne, M. Crucianu, Cross-modal Classification by Completing Unimodal Representations, ACM Multimedia 2016 Workshop: Vision and Language Integration Meets Multimedia Fusion.

[3] Tamaazousti Y., Le Borgne H., et Hudelot C. MuCaLe-Net: MultiCategorical-Level Networks to Generate More Discriminating Features. CVPR 2017.


Hervé Le Borgne is a researcher at the CEA LIST since 2006, carrying out
research on computer vision and multimedia retrieval. Previously, he
received his PhD from the INP Grenoble in 2004 and worked as a post-doc
at Dublin City university from 2004 to 2006. He published more than 50
articles in international conferences and journals. His research
interests include multimedia retrieval, computer vision, machine
learning and more generally multimedia mining in order to extract
semantic. He has served as a reviewer for several international
conferences and journals, including Computer Vision and Image
Understanding and Multimedia Tools and Applications. He has been a
project manager since 2006, both for public funded projects and
industrial contracts. He supervised 15 master students and co-advised
one PhD in collaboration with Ecole Centrale Paris. Currently, he
co-advises two PhD students, in collaboration with CNAM and Ecole
Centrale Paris.

Plus d'information sur Hervé Le Borgne :


Benjamin Labbé is a researcher at the CEA LIST since 2011, carrying out
transfer of technology and research on computer vision and multimedia retrieval.
He received his PhD in computer science from the INSA Rouen in 2011.
His research interests include first of all machine learning since its PhD to design
multiclass and novelty detecting support vector machines in the context
of naval infrared defensive systems. Then his research interests spread
out to computer vision and large scale multimedia retrieval. One of his last
achievements is the transfer to industrial partners of the image retrieval
software framework ELISE for copy detection, instance search and semantic image annotation.

Séminaire DAPA du 9 / 3 / 2017 à 10h

Massive Online Analytics for the Internet of Things (IoT)

Albert Bifet (Telecom ParisTech)

Lieu : salle 405, couloir 24-25, 4 place Jussieu, 75005 Paris

Big Data and the Internet of Things (IoT) have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing
increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, and I will introduce
some popular open source tools for data stream mining.


Albert Bifet is Associate Professor at Telecom ParisTech and Honorary Research Associate at the WEKA Machine Learning Group at University of Waikato. Previously he worked at Huawei Noah's Ark Lab in Hong Kong, Yahoo Labs in Barcelona, University of Waikato and UPC BarcelonaTech. He is the author of a book on Adaptive Stream Mining and Pattern
Learning and Mining from Evolving Data Streams. He is one of the leaders of MOA and Apache SAMOA software environments for implementing algorithms and running experiments for online learning from evolving data streams. He was serving as Co-Chair of the Industrial track of IEEE MDM 2016, ECML PKDD 2015, and as Co-Chair of BigMine (2015, 2014, 2013, 2012), and ACM SAC Data Streams Track (2017, 2016, 2015, 2014, 2013, 2012).

Plus d'information sur Albert Bifet :

Séminaire DAPA du 1 / 3 / 2017 à 14h

Models for pessimistic or optimistic decisions under different uncertain scenarios

Giulianella Coletti (University of Perugia, Italy)

Lieu : LIP6, salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

We consider a problem in the ambit of decisions under uncertainty, i.e., we study situations where a (not necessarily complete) preference relation is given on an arbitrary set of gambles and the decision model of reference is the Choquet expected value with respect to a belief or a plausibility function. For this aim we introduce two rationality principles which are necessary and sufficient conditions for the existence of a belief function or a plausibility function such that the corresponding Choquet integral represents the relation.
Nevertheless sometimes a decision maker could be either not able or not interested in giving preferences among gambles, but he could only be able to specify his preferences under the hypothesis that a particular event happens. In other words, he could not be able to express his preference relation under a generic scenario, but could assess it under various scenarios, which are taken into account at the same time. So we need to consider the Choquet expected values with respect to a conditional belief or plausibility function as decision model.
As is well known there are several notions of conditioning for belief or plausibility functions in the literature. The choice of the different conditioning notions heavily impacts on the properties of relations represented by the above model.


Giulianella Coletti is a Full Professor of Probability and Mathematical Statistics at the University of Perugia, Italy. She has been Coordinator of the "Dottorato di Ricerca" (Ph.D. school) "Mathematics and Informatics to Handle and Represent Information and knowledge " at University of Perugia since 2003 and Supervisor for the "Dottorato di Ricerca" (Ph.D. school) in Mathematics, Computer Science and Statistics, promoted by University of Florence, University of Perugia and INdAM since 2012. She has also been a member of the Scientific Committee of INdAM (Istituto Nazionale di Alta Matematica) since 2013. Her main fields of interest are: probability, non additive uncertainty measures, decision making, theory of measurements.
She is the author of more than 160 articles, 1 book (edited by Kluwer) and she is the editor of 4 books (edited by Elsevier, Springer, Plenum Press, CNR Applied Mathematics Monographs- Giardini Editori ) and of some special issues for international journals.

Plus d'information sur Giulianella Coletti :

Séminaire DAPA du 22 / 2 / 2017 à 14h

Riding the Big IoT Data Wave: Complex Analytics for IoT Data Series

Themis Palpanas (LIPADE, Université Paris-Descartes)

Lieu : LIP6, salle 101, couloir 26-00, 4 place Jussieu, 75005 Paris

The realization of the Internet of Things (IoT) is creating an unprecedented tidal data wave, consisting of the collection of continuous measurements from an enormous number of sensors. The goal is to better understand, model, and analyze real-world phenomena, interactions, and behaviors. Consequently, there is an increasingly pressing need for developing techniques able to index and mine very large collections of sequences, or data series. This need is also present across several applications in diverse domains, ranging (among others) from engineering, telecommunications, and finance, to astronomy, neuroscience, and the web. It is not unusual for the applications mentioned above to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size.

In this talk, we describe recent efforts in designing techniques for indexing and mining truly massive collections of data series that will enable scientists to easily analyze their data. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce solutions to this problem. Furthermore, we discuss novel techniques that adaptively create data series indexes, allowing users to correctly answer queries before the indexing task is finished. We also show how our methods allow mining on datasets that would otherwise be completely untenable, including the first published experiments using one billion data series.

Finally, we present our vision for the future in big sequence management research.


Themis Palpanas is a professor of computer science at the Paris
Descartes University (France), where he is a director of the Data
Intensive and Knowledge Oriented Systems (diNo) group. He received
the BS degree from the National Technical University of Athens,
Greece, and the MSc and PhD degrees from the University of Toronto,
Canada. He has previously held positions at the University of Trento
and the IBM T.J. Watson Research Center. He has also worked for the
University of California, Riverside, and visited Microsoft Research
and the IBM Almaden Research Center. His research solutions have been
implemented in world-leading commercial data management products
and he is the author of nine US patents. He is the recipient of
three Best Paper awards (including ICDE and PERCOM), and the IBM
Shared University Research (SUR) Award in 2012, which represents
a recognition of research excellence at worldwide level. He has been
a member of the IBM Academy of Technology Study on Event Processing,
and is a founding member of the Event Processing Technical Society.
He has served as General Chair for VLDB 2013, the top international
conference on databases. His research has been supported by the EU,
CNRS, NSF, Facebook, IBM Research, Hewlett Packard Labs, and Telecom

Plus d'information sur Themis Palpanas :

Séminaire DAPA du 2 / 2 / 2017 à 14h

From mining under constraints to mining with constraints

Ahmet Samet (IRISA, University of Rennes 1)

Lieu : salle 101, couloir 26-00, 4 place Jussieu, 75005 Paris

The mining of frequent itemsets from uncertain databases has become a very hot topic within the data mining community over the last few years. Although the extraction process within binary databases constitutes a deterministic problem, the uncertain case is based on expectation. Recently, a new type of databases also referred as evidential database that handle the constraint of having both uncertain and imprecise data has emerged. In this talk, we present an applicative study case of evidential databases use within the chemistry field. Then, we shed light on a WEvAC approach for amphiphile molecule properties prediction.

Furthermore, the most existing approaches of pattern mining, which are based on procedural programs (as we often use/develop), would require specific and long developments to support the addition of extra constraints. In view of this lack of flexibility, such systems are not suitable for experts to analyze their data. Recent researches on pattern mining have suggested to use declarative paradigms such as SAT, ASP or CP to provide more flexible tools for pattern mining. The ASP framework has been proven to be a good candidate for developing flexible pattern mining tools. It provides a simple and principled way for incorporating expert's constraints within programs.


Ahmed Samet is a post-doctoral researcher at the University of Rennes 1. He received his M.Sc. degree in Computer Science from the Université de Tunis (Tunisia) in 2010. Then, he obtained a Ph.D. in Computer Science within a Cotutelle agreement between the Université de Tunis (Tunisia) and Université d'Artois (France). He held, at first, the position of a postdoctoral researcher with Sorbonne University: Université de technologie de Compiegne (France). His research topics involve decision making, machine learning under uncertainty and data mining.

Plus d'information sur Ahmet Samet :

Séminaire DAPA du 14 / 12 / 2016 à 10h

Challenges and issues with data quality measurement

Antoon Bronselaer (DDCM research group, Ghent University, Belgium)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Over the past years, challenges in data management having gained more and more attention. Assessment of quality of data is one such challenge that has tremendous potential. In this talk, we revise the current state-of-the-art about measurement of data quality and argue that there is a great need of fundamental research to establish formal systems for measurement of data quality. We revise a formal framework that was proposed very recently and expresses quality in an ordinal manner. We then show the role of uncertainty modelling within this framework. We conclude the talk with revising the role of fusion functions within systems of measurement of data.


Antoon Bronselaer is assistant professor at Ghent University and member of the DDCM research group ( Over the past ten years, he has been conducting research in the field of data quality, with an emphasis on the application of uncertainty models.

Plus d'information sur Antoon Bronselaer :

Séminaire DAPA du 8 / 12 / 2016 à 11h

Linguistic summaries of process data

Anna Wilbik (Eindhoven University of Technology, The Netherlands)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Linguistic summarization techniques make it easy to gain insight into large amounts of data by describing the main properties of the data linguistically. We focus on a specific type of data, namely process data, i.e., event logs that contain information about when some activities were performed for a particular customer case. An event log may contain many different sequences, because actions or events are often performed in slightly different orders for different customer cases.

We discuss protoforms that are designed to capture process specific information. Linguistic summaries can capture information on the tasks or sequences of tasks that are frequently executed as well as properties of these tasks or sequences, such as their throughput and service time. Such information is of specific interest in the context of process analysis and diagnosis.
Through a case study with a data from practice, we show that the knowledge derived from these linguistic summaries is useful for identifying problems in processes and establishing best practices.


Anna Wilbik received her Ph.D degree in computer science from the Systems Research Institute, Polish Academy of Science, Warsaw, Poland in 2010. She is currently an Assistant Professor at School of Industrial Engineering, Eindhoven University of Technology, The Netherlands. In 2011 she was a Post-doctoral Fellow at Electrical and Computer Engineering Department, University of Missouri, Columbia, MO, USA. In 2012 she participated in TOP 500 Innovators: Science - Management – Commercialization Program of the Polish Ministry of Science and Higher Education. Her research interests include linguistic summaries, data analysis, machine learning, and computational intelligence with a focus on applications in healthcare.

Plus d'information sur Anna Wilbik :

Séminaire DAPA du 20 / 7 / 2016 à 10h

Soft Hierarchical Analytics for Discrete Event Sequences

Trevor Martin (Artificial Intelligence Group, Bristol University)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Over recent years, increasing quantities of data have been generated and recorded about many aspects of our lives. In cases such as internet logs, physical access logs, transaction records, email and phone records, the data consists of multiple overlapping sequences of events related to different individuals and entities. Identification and analysis of such event sequences is an important task which can be used to find similar groups, predict future behaviour and to detect anomalies. It is ideally suited to a collaborative intelligence approach, in which human analysts provide insight and interpretation, while machines perform data collection, repetitive processing and visualisation. An important aspect of this process is the common definition of terms used by humans and machines to identify and categorise similar (and dissimilar) events.

In this talk we will argue that fuzzy set theory gives a natural framework for the exchange of information, and interaction, between analysts and machines. We will describe a new approach to the definition of fuzzy hierarchies, and show how this enables event sequences to be extracted, compared and mined at different levels of resolution.


Trevor Martin (M’07) is a Professor of artificial intelligence at the University of Bristol, U.K. He received the B.Sc. degree in chemical physics from the University of Manchester, in 1978, and the Ph.D. degree in quantum chemistry from the University of Bristol, in 1984. Since 2001, he has been funded by British Telecommunications (BT) as a Senior Research Fellow, for his research on soft computing in intelligent information management, including areas such as the semantic Web, soft concept hierarchies, and user modeling.

Plus d'information sur Trevor Martin :

Séminaire DAPA du 9 / 6 / 2016 à 10h

Vers une approche Agile d’Informatique Décisionnelle basée sur le Soft Computing

Gregory Smits (IUT de Lannion - département R&T)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

La valeur ajoutée d’un jeu de données réside dans les connaissances qu’un expert du domaine peut en extraire. Pour faire face à l’accroissement constant de la volumétrie des jeux de données qu’un expert a à traiter, des outils efficaces doivent être développés afin notamment de générer des explications concises, pertinentes et intelligibles décrivant les données et leur structure. Le terme Informatique Décisionnelle Agile (IDA) désigne les techniques visant à aider les experts (assureurs, décideurs, communicants, etc.) dans l’analyse de données métiers. Au cours de ce séminaire, une approche d’IDA basée sur l’utilisation de théories et techniques issues du soft computing sera présentée. Le soft computing est utilisé dans ce cadre applicatif pour construire une interface entre l’espace numérique/catégoriel de description des données et l’espace conceptuel/linguistique du raisonnement humain. Basé sur une modélisation du vocabulaire subjectif de l’expert, des explications linguistiques et personnalisées sont générées efficacement pour offrir une vue synthétique des données et de leur structure intrinsèque. Ces explications linguistiques sont ensuite traduites sous forme de visualisation graphique qui constitue également une interface expressive d’exploration des données. Les résultats de premières expérimentations montrent la pertinence et l’efficacité de l’utilisation du soft computing dans ce contexte.


Grégory Smits a obtenu un doctorat d’informatique en traitement automatique des langues naturelles à l’université de Caen (France) en 2008. Il est actuellement maître de conférence à l’IUT de Lannion (Université de Rennes 1) et est membre du laboratoire IRISA (Institut de Recherche en Informatique et Systèmes Aléatoires). Au sein de l’équipe SHAMAN (dépt. Data and Knowledge Management), ses recherches concernent principalement l’interrogation flexible de bases de données ainsi que les stratégies de réponse coopérative.

Plus d'information sur Gregory Smits :

Séminaire DAPA du 7 / 3 / 2016 à 17h

A Survey of Applications and Future Directions of Computational Intelligence

Gary Fogel (Natural Selection, Inc.)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Computational intelligence (CI) has a rich history of inspiration from natural systems. While the field continues to grow with new inspiration and methods, application of CI to real-world problems also continues to increase. CI is used on a daily basis in everything from rice cookers to electronic games, by transportation companies and financial analysts, in drug design and diagnostics. While these "non-traditional" methods of decision making accelerate to the marketplace, their acceptance requires and understanding of the advantages and limitations of CI approaches. This survey will introduce the public to CI methods and provide the necessary background to appreciate and understand their current and possible future applications in society.


Gary B. Fogel is Chief Executive Officer of Natural Selection, Inc. in San Diego, California. He received a B.A. in Biology from the University of California, Santa Cruz and a Ph.D. in Biology from the University of California, Los Angeles in 1998. His current research interests are the broad application of computational intelligence approaches to industry, medicine, and defense, focusing mainly on biomedical and chemical applications. He has authored over 100 peer-reviewed publications including the co-edited books Evolutionary Computation in Bioinformatics (Morgan Kauffman, 2003) and Computational Intelligence in Bioinformatics (Wiley-IEEE Press, 2008). He is an IEEE Fellow and serves as Editor-in-Chief of the Elsevier journal BioSystems and has served on the editorial boards of 7 other journals. He currently serves on the Administrative Committee for the IEEE Computational Intelligence Society and will soon receive the 2016 IEEE CIS Meritorious Service Award.

Séminaire DAPA du 22 / 10 / 2015 à 10h

Ensemble Approaches in Learning

Xin Yao (University of Birmingham, United Kingdom, President of the IEEE Computational Intelligence Sociey)

Lieu : salle 105, couloir 25-26, 4 place Jussieu, 75005 Paris

Designing a monolithic system for a large and complex learning task is hard.
Divide-and-conquer is a common strategy in tackling such large and complex
problems. Ensembles can be regarded an automatic approach towards automatic
divide-and-conquer. Many ensemble methods, including boosting, bagging,
negative correlation, etc., have been used in machine learning and data mining
for many years. This talk will describe three examples of ensemble methods,
i.e., multi-objective learning, online learning with concept drift, and
multi-class imbalance learning. Given the important role of diversity in
ensemble methods, some discussions and analyses will be given to gain a better
understanding of how and when diversity may help ensemble learning.

Some materials used in the talk are based on the following papers:

  • A Chandra and X. Yao, ``Ensemble learning using multi-objective evolutionary
    algorithms,'' Journal of Mathematical Modelling and Algorithms, 5(4):417-445,
    December 2006.
  • L. L. Minku and X. Yao, "DDD: A New Ensemble Approach For Dealing With Concept
    Drift,'' IEEE Transactions on Knowledge and Data Engineering, 24(4):619-633,
    April 2012.
  • S. Wang and X. Yao, ``Multi-Class Imbalance Problems: Analysis and Potential
    Solutions,'' IEEE Transactions on Systems, Man and Cybernetics, Part B,
    42(4):1119-1130, August 2012.

Xin Yao is a Chair (Professor) of Computer Science and the Director of CERCIA
(Centre of Excellence for Research in Computational Intelligence and
Applications) at the University of Birmingham, UK. He is an IEEE Fellow and
the President (2014-15) of IEEE Computational Intelligence Society (CIS). His
work won the 2001 IEEE Donald G. Fink Prize Paper Award, 2010 and 2015 IEEE
Transactions on Evolutionary Computation Outstanding Paper Awards, 2010 BT
Gordon Radley Award for Best Author of Innovation (Finalist), 2011 IEEE
Transactions on Neural Networks Outstanding Paper Award, and many other best
paper awards. He won the prestigious Royal Society Wolfson Research Merit Award
in 2012 and the 2013 IEEE CIS Evolutionary Computation Pioneer Award.
His major research interests include evolutionary computation, ensemble
learning, and their applications, especially in software engineering.

Plus d'information sur Xin Yao :