IX Advanced Multimodal Information Retrieval int'l summer school


From Cognition to Information Retrieval

23th, 24th & 25th of sept. 2014 Ile de Porquerolles - National Park - Var

NEWS : ERMITES 2015 on Big Bioacoustic Data is in April 2015

ERMITES 2014 BOOK PROCEEDING without yet video links, 354 pages, .pdf 93Mo

The ERMITES 2014 Summer School brings together international leading researchers and provides participants the opportunity to gain deeper insight into current research trends in scaled audiovisual information retrieval within an interdisciplinary framework. It is organized as a series of long talks, during which attendies are invited to interact.

The target audience is wide, from graduate and PhD students, post-doctoral researchers, to academic or industrial researchers.
Any participant can present its research (poster or oral), to be published in the official registred proceedings (email us).
Number of participants is limited (first come first served politic).

- Extended deadline for registration: 5th of sept - If wished, proposal submission deadline: 15th of july (acceptation 25th july (extended)).


Overview and Objectives

H. Glotin *, Professor Inst. Univ France, UTLN, CNRS, LSIS

An overview, illustration with the Cognilego ANR project: from Pixels to Semantics, 40 min


A) Cognitive Development

J. Grainger *, Research Director Laboratoire de Psychologie Cognitive CNRS & Aix-Marseille Univ. LPC

Orthographic Processing in Human, Monkey & Machine, 2h


Orthographic processing lies at the interface between non-linguistic visual processing and the processing of truly linguistic entities, and therefore offers a privileged window onto how perception can shape language, and vice versa. In this talk I will describe how skilled readers process orthographic information, how this skill develops during reading acquisition, and how basic visual processes might adapt to the specific characteristics of printed words. Four main sources of evidence will be presented: 1) experiments examining basic processes in single word reading in children and adults; 2) experiments comparing perceptual processing of letter strings with strings of other kinds of visual stimuli; 3) a study of orthographic processing in baboons; and 4) neural network simulations of the results of the aforementioned studies.

C. Touzet *, Senior Researcher AMU CNRS LNIA

Cognition Neural Theory & Reading, 2h


Formalized in 2010, the Theory of neuronal Cognition (TnC) departs from all existing materialist theories of mind by claiming that our brain does not process information, but only represents information. The logical implication is that we are only a crystallization of our interactions with the environment. Since « extraordinary claims require extraordinary proofs », the goal of my talk will be to provide the audience with the neuronal blueprints of a number of cognitive functions and concepts. Reading will illustrate my description of the cortex as a hierarchy of self-organizing associative memories. Afterwhat, I will show how the synergy between sensory and sensory-motor maps generates behaviors, and offer explanations about intelligence (a side effect of the observer knowledge), consciousness (an automatic verbalization), endogenous and exogenous attentions, episodic and semantic memories, motivation or joy (a side effect of associative memories functioning). Last, I will present new insights about how unsupervised systems achieve homeostasis.

T. Hannagan *, Coordinator of Neurocomputation group in ERC Brain & Language Research Institute

Spherical reader and Convolutional Neural Net, 2h + demo

Brain and Language Research Institute – ERC

"What are the cognitive representations that children use for letters and for words in the very first stages of reading? I will describe a deep learning convolutional model that operates with a plausible developmental timeline and with a realistic visual environment. With this model, I will then explore the possible mechanisms whereby mirror invariance could be formed and selectively broken in the child's visual system, upon learning about letter and word stimuli."

B) Simulated Information Development

P.-Y. Oudeyer *, Research Director INRIA

Curiosity-driven automatic learning and information development with robots, 2h + demo of the OPEN SOURCE POPPY ROBOT


A great mystery is how human infants develop: how they progressively discover their bodies, how they learn to interact with objects and social peers, and accumulate new skills all over their lives. Constructing robots, and building mechanisms that model such developmental processes, is key to advance our understanding of human development. I will present examples of robotics models of curiosity-driven learning and exploration, and show how developmental trajectories can self-organize, starting from discovery of the body, then object affordances, then vocal babbling and vocal interactions with others. In particular, I will show that the onset of language spontaneously forms out of such sensorimotor development.
A demonstration of our new full Open Source Poppy robot will be given ( can be previewed here ).

B. de Boer *, Professor, ULB, Belgium

Evolution of Language Learning, 2h + demo


Acquisition of speech and language can be seen as an example of sophisticated information retrieval, yet it is performed effortlessly by children. This feat is even more amazing if we take into account that children start essentially from scratch, knowing neither the signals nor the meanings they have to learn. However, when we consider that language has evolved both biologically and culturally, it will become clear that its acquisition may be less mysterious than once thought. In this contribution, we will discuss what students of information retrieval can learn from studying the acquisition of language, as well as what linguists can learn from studying information retrieval. It will contain a brief overview of what children do, what we (think we) know about evolution and what role computer models have played. The focus will be on speech, not only because this is the presenter's specialty, but also because it is a physical signal (which makes it easier to study directly) as well as a continuous signal (which makes it a special challenge for studying it computationally).

A. Graves *, Senior Researcher, Deep Mind Tech., Google, London

Teaching Neural Net how to Write Joined up, 2h + demo


The idea a building machine able to perform the quintessentially human act of cursive handwriting has fascinated scientists and inventors for centuries. As well as being a challenging task in fine motor control, handwriting is interesting from the perspective of pattern discovery due to the great diversity of writing styles and letter forms. This talk describes a novel recurrent neural network architecture able to transform character sequences into highly realistic pen trajectories. Unlike most handwriting synthesis methods (which are trained for a single writer), the network learns to model, and interpolate between, a wide variety of writing styles. It can also be used to mimic – and even improve – the writing of a particular individual.

C) Robust Scaled Information Retrieval

C. Kermorvant *, Research Manager A2IA

Deep Neural Net for Industrial Written Text Recognition, 2h + demo


Since their first success in 2009, deep neural networks have been largely adopted by the written text recognition community. Today, most of the state-of-the art systems on this task include deep and recurrent neural networks for feature extraction, classification and/or sequence modeling. We will present how the deep architectures are used in text recognition systems and what are the results of these systems in recent international evaluations.

Y. Li *, Associate Reseacher at INRA, Paris. CNRS LIP6

Multimedia Maximal Marginal Relevance for Multi-video Summarization, 1h30 + demo


The amount of various videos from mobile phone, personal DV, video surveillance, movie industry and so on rapidly increases on the Internet and in our daily life. Consequently, how to manage such a large amount of visual data is an active research topic now. Video summarization has been identified as an important component to deal with the large-scale video data. Video summarization produces an abbreviated form of the video by extracting the most important and pertinent content in the video. I will present a novel video summarization algorithm, Maximal Marginal Relevance (MMR), which can incrementally constructs the video summary by exploiting all the multimodal indices in the video, including the text, the video and the audio. MMR is an universal approach for all the video genres and does not require a priori knowledge. Most of this work was conducted at Eurecom.

P. Bellot *, Professor AMU CNRS LSIS

Information Retrieval in Big Text, 2h + demo


Information retrieval focuses on automatic linking textual user queries and documents: web pages, books, news, tweets... The first numerical models defined statistical and probabilistic criteria to represent how a word could be representative of a document collection and how likely a document might be relevant for a user. This has led to define the concept of user profiles and to some information retrieval models "learning to rank". On the other hand, the Web allowed the development of models taking into account the hyperlinks between pages and the construction of large semantic networks and of natural language processing softwares the inclusion of high level features in the retrieval models. In this talk, I will describe some models that are effective on very large collections of documents and I will show how different disciplines can work together to achieve more adaptive and personnalized search models and systems. I will present the French equipment of excellency OpenEdition.org, a Digital Library for Open Humanities, that aims to develop new capabilities for browsing, searching and reading recommendation.

ERMITES is supported by TOULON PROVENCE MEDITERRANEE (TPM), USTV, Fed. for Computer Sciences and Interactions (FRIIAM), MASTODONS CNRS project, IUF, INRIA, CNRS, LSIS, ARIA, and ANR COGNILEGO.

ERMITES is recognized by the doctoral schools as disciplinary lectures, for a total of 25 hours.

Link to online videos of previous editions and link to previous ERMITES editions.

Registration Fees (payment by CB or invoice to UTLN)
You may choose between 1 or 3 days pack, single or shared room studio.
The 3 days pack includes: 2 nights, 5 meals, 2 breakfasts, coffee breaks, proceedings,
with D1 or D2 registrations for double shared room studio,
D1 : Only for PhD., Post-doctorate, Master = 310 euros,
D2 : Other (Full position, company) = 460 euros.
Or with S1 or S2 formula for single room,
S1 : Like D1 but single room = 340 euros,
S2 : Like D2 but single room = 490 euros.

The daily pack includes 1 meal, coffee break, proceedings, without sleeping accommodations.
Daily student: PhD, Post-doctorate, Master = 80 euros per day,
Daily non-student = 110 euros per day.
You can Register by invoice, or credit card : DO ONLINE REGISTRATION HERE (few rooms left, extended deadline = 5th sept):

Access : ERMITES 14 is at IGESA center, in the middle of Porqueroles island, with access from Hyeres TGV station then bus (67), or Toulon International Airport, then boat (15 mn). We may also organize car travels from Hyeres to the boat - More details on trains / boats.

Social activities : a little walk starting from IGESA to the Cap Grand Langoustier will offer a great breath in this paradise to the attendies, and the opportunity to extend unformal discussions :

Committees :

Organizing co. : Pierre-Hugues Joalland and J. Razik (pres), H. Glotin, M. Bartcus
Program co. : H. Glotin (Pres.), S. Bengio, J. Grainger, C. Kermorvant, C. Touzet, T. Hannagan, S. Paris, J. Razik, F. Chamroukhi.
Contact : ermites@gmail.com