10th Advanced Multimodal Information Retrieval int'l summer school (anniversary ed.)

[ ERMITES 2015 ]

Big Data Sciences for Bioacoustic Environmental Survey

21 and 22 April 2015, Toulon, Provence Cote d'Azur - France



ERMITES brings together international leading researchers and provides participants the opportunity to gain deeper insight into advanced research trends in scaled audiovisual information retrieval within an interdisciplinary framework. It is organized as a series of long talks, during which attendees are invited to interact (links to online videos of previous editions).

The goal of this edition is to train you to improve the performance of soundscape or bioacoustic pattern detection and classification, at low signal to noise ratio, and within the Big Data paradigm. Thus, the objectives are three-fold: (a) to make the signal representation more robust, (b) to develop more efficient supervised or unsupervised classification of complex bioacoustic patterns, (c) to collect and manage Big Bioacoustic Data to improve model performance with respect to the variability of the targets.

Therefore ERMITES'15 will focus on methods scaled to environmental survey using passive acoustics, design methods for accurate mid-level or high level features based on advanced signal decomposition, compressed sensing for large scale analyses, Deep Neural Net for accurate classification, as well as methods for real-time 3D tracking.

Illustrations will be given from cetaceans to birds songs, bats to dolphins biosonars and other animals from deep forest and abysses. Biodiversity analysis and environmental protection projects are some of the direct outcomes of these algorithms. You'll learn more how to develop such systems and how to conduct research in this area. A special session will present a synthesis of data processing methods for passive acoustic biotope survey with software and hardware demonstrations under monitoring scenarios (such as ones developed / tested within SABIOD.ORG project : Jason and Bombyx plateforms, waveShark, SoundChaser, SeaPro, ...).

Topics: Machine learning, Estimation, Unsupervised learning, Big data science, Signal processing, Wavelets, Deep learning, Sequence analysis, Statistics, Software platforms, Python, GPU programming, Passive acoustic hardware ...

Participation / Publication / Award

The target audience is wide, ranging from graduate and PhD students (ERMITES is recognized by the doctoral schools a disciplinary lectures for a total of 25 hours), post-doctoral researchers, to academic or industrial researchers. Participants can submit poster or oral presentation. An award will be deserved to the best student communication. Communications will be published in registred proceedings (Abstract have to be sent before 15 march to ermites@gmail.com, final paper 4 to 16 pages must be sent by 5 april style is here ).
Number of participants is limited to 40 persons (first come first served policy).

Important Dates


    H. Glotin

    Introduction, presentation of the sessions and objectives. (10 min)

    A) Methods

    P. Flandrin, Research Director, CNRS, Académie des Sciences, Ecole Normale Supérieure Lyon

    A time-frequency perspective on biosonar signals. (2 hours)

    My talk will be devoted to a time-frequency description and interpretation of some natural sonar signals used in bioacoustic active navigation systems, namely bat echolocation calls. We'll observe and explain some properties of the acoustic waveforms emitted by bats, which consist in ultrasonic transient chirps of a few ms, with a wideband spectrum of some tens of kHz between 40 and 100kHz. We'll theoretically show that their performance is close to optimality, with adaptation to multiple tasks (detection, estimation, recognition, interference rejection,...). Besides biosonar issues, we'll also give a comprehensive overview of the mathematical tools (based on Fourier, Wigner-Ville and other time-frequency representations) that are involved in the analysis. You can see this introduction to my talk with C. Vilani and some detailed mathematics here.

    V. Lostanlen, PhD student, Dpt of Computer Science, Ecole Normale Supérieure Paris

    Multivariate scattering for bioacoustic similarity retrieval. (40 min)

    Most of the relevant information in bioacoustic signals is carried in their transient structure. However, this structure is subject to many factors of intra-class variability : pitch shifts, changes in chirping contour and formantic profile, rhythmic drifts and so forth. Thus, a desirable representation for environmental sounds should disentangle these factors of variability, hence providing a simple characterization of long-range interactions. To this aim, we will present the scattering transform, a multiscale operator that is able to represent accurately acoustic transients while remaining stable to small deformations in the time-frequency plane. We will report a numerical evaluation of this approach on the large-scale species identification challenge BirdCLEF.

    H. Glotin, Pr. Univ. Toulon, Inst. Universitaire de France (IUF), CNRS LSIS, with J. Razik, S. Paris, Toulon

    Methods for communicative sounds mining: large scale bird & whale songs classification : how sparse coding allows efficient bioacoustic indexing. (1 hour)

    Sound is the primary carrier of communication and exploration for most of the animals, enabling quick load of information in every ecosystem, from deep forest (bird, insect, frog...) to long distances (thousands of km for whales), from infra to ultrasounds (bats...). Scaled passive acoustic monitoring has recently been developped to assess changes in fauna composition and biodiversity w.r.t. anthropic impact, and to improve management of species or natural resources (forest, ocean...). We'll present recent advances within the 'Scaled Acoustic Biodiversity' SABIOD network of CNRS MASTODONS. Terabytes are now recorded each month within SABIOD, in forest and deep ocean, by innovative autonomous sensor arrays.
    The most difficulties are the time-frequency variability of the sources, heterogeneity and velocity of the recordings, complexity of the acoustic paths and the sources mixture.
    We'll demonstrate strategies to tackle these issues by advanced signal processing and machine learning methods that we develop to detect, classify and localize bioacoustic sources, in various ecosystems and space/time scales. More precisely, we'll show how pattern analyses, including sparse coding and scattering are usefull for tropical forests soundscapes analyses (cf NIPS4B, ICML4B, and LifeClef2014 and 2015 1000 species challenges), and undersea acoustic survey (on Toulon astrophysic observatory, ONC Victoria...). We demonstrate an efficient sparse coding classification towards world wide whale song recordings and perspectives for bird song analyses.

    P. Dugan (Visio Conference), Dr. Univ. Cornell, New-York

    High-Performance Computing Platform for Analyzing Big Bioacoustic Data. (25 min)

    Marine mammals are dependent on access to their normal acoustic habitats for basic life functions, including communication, food finding, navigation and predator detection. Cetaceans are adapted to produce and perceive a great variety of sounds that collectively span 4-6 orders of magnitude along the dimensions of frequency, time and space. Sounds from human activities, (vessel noise, energy exploration, commercial shipping) can result in measurable losses of marine mammal acoustic habitats; which drives the need for building technology capable of finding whale sounds in large databases of sounds. By converting sounds to pictures, using spectrograms, the human visual system is very good at finding whale calls and song, despite being inefficient and tedious for the human operator. This talk will focus on advanced developments in computer algorithm technologies designed to automatically find whale sounds in large datasets of acoustic recordings. Recent developments in advanced computing has allowed researches to unlock new information about marine mammals in large datasets. The authors will summarize specific examples, recorded in the Stellwagen Bank National Marine Sanctuary, MA, USA, for processing large quantities of continuous sound data using advanced detection-classification analytics. This talk will also combine the application of high-performance-computing system called the acoustic data accelerator (HPC-ADA) to explore the spatio-temporal dynamics for a suite of acoustically active marine mammals (fin, humpback, minke, and right whales). Mechanics of the HPC-ADA will be discussed along with how distributed processing is tackling large datasets and high sample rates (200 kHz). The results yield insights into acoustic behavior for marine mammals with a goal to better help understand marine ecology for large cetaceans. (see here some implementation details).

    A. Joly, Researcher, INRIA, Montpellier

    Towards multimodal environmental data indexing (crowd-sourced audiovisual contents in LifeCLEF). (1 hour)

    We will discuss the applicability of content-based indexing methods for the identification of living organisms in crowd-sourced audio-visual data. We first remind the basics of content-based indexing techniques and give a comprehensive overview of the state-of-the-art of this domain.
    Then we present how such methods can be efficiently deployed as instance-based classifiers for the identification of living organisms, in particular in large-scale corpora built from crowd-sourced observations. We will therefore introduce LifeCLEF, an international evaluation campaign gathering tens of research groups worldwide and working on 3 real-world biodiversity challenges (Pl@ntNet images, xeno-canto bird recordings, Fish4knowledge fish videos).
    We will more particularly describe the 2014 participation of our research group to that challenges and demonstrate the genericity of content-based indexing methods for handling radically different modalities.

    B) Research on the field / industrial applications

    G. Pavan, Pr. Univ. Pavia, Italia

    Long term bioacoustic and ecological analysis of marine and terrestrial soundscapes joint to noise monitoring. (1 hour)

    The long term monitoring of the natural environment and of the changes induced by human activities is an emerging issue worldwide. The acoustic monitoring of the underwater world has been mainly developed in the last two decades to monitor marine mammals presence and distribution in relation to the impact of anthropogenic underwater sound sources (Navy sonar, seismic exploration, ship noise). In the terrestrial environment the acoustic studies have been mainly driven by the interest in studying individual species and, more recently, in monitoring the biodiversity of habitats, either pristine or under anthropogenic pressures.
    Examples of the NEMO-ONDE-SMO experiments to monitor marine sounds and of the SABIOD project focused on the monitoring of a Integral Nature Reserve in central Italy will be presented. Original hardware and software solutions will be also illustrated.

    C. Gervaise, Researcher, GIPSA Lab, France

    Biophonics and 3D transect for the classification of marine biotopes. (25 min)

    Since several years, Underwater Soudscapes have raised as a promising fingerprint of the state of marine ecosystems. Marine soudscape carry information about the three major ecosystemic's components : biotic life, abiotic forcing anthropic pressures. Thanks to recent instrumental development, marine soundscapes may be recorded over long time period to account for several relevant temporal scales. In addition to this temporal characterization, spatial characterization (trends, variabilities, environmental factors) must be assessed. In the presentation, we will present new developments to study spatial variabilities of soundscapes at two different scales (local-scale 100mx100m, meso-scale 3kmsx3kms). In situ protocol and instrumentation, data processing and exemples on real data will be presented in detail.

    D. Mauuary, PhD, PDG Cyberio, France

    Real-time tracking of chiroptera and Crowdsourcing : 'My City my Bats Project'. (1 hour)

    Bats use wide band acoustic signals (10-100 khz) for echolocating their close landscape and preys. We show that these signals are also ideal for being tracked by light-weight multi-sensors devices easily deployable on the field. Consequently biologists may now easily catch new knowledge about the animal behavior and ecology. Acoustic tracking of bats may be considered at several scales : microscopic, mesoscopic and macroscopic scale. We propose different kind of land covering by acoustical sensors to address these different scales.
    The mesoscopic scale is currently under experimental investigation in an urban context ('My City my Bats Project').
    Finally, we suggest that flight path tracking technological development gives rise to new insights/research paradigms about how the bat performs the acoustical tracking in its brain.

    N. Boucher, PDG, SoundId, Australia

    Solutions for long distance recordings, azimuth and classification. (30 min)

    I'll present our R & D advances of detection and recognition: first I'll present a system that is already developed to extract the callers from a dawn chorus at faster than real-time. I will then discuss our methods to estimate the directionality and range of the source by a n x 8 channel system for air and sub-marine use, and to discuss how to obtain good S/N with this system. In order to improve these approaches, we'll show our new scaled algorithm that allows us to find how many unique calls are present in a reference sample of any size, automatically and quickly. We will show real-time recognition. We can compare million of full spectral images with full spectral references per second. Finally, I'll present a 1+ km air sound recorder (and extension to a 10+ km system using cheapish parts (<$2000)). The system is recording successfully in wind and rain.

    G. Pavan, H. Glotin, D. Mauuary, P. Arlotto, N. Boucher

    From individual sounds to soundscape analysis. Demonstration of hardware and software tools. (40 min)

    We present efficient hardware and software solutions for long term and precise recording used in the fields in the SABIOD.ORG project. Some of them are developed by univ. Pavia and univ. Toulon, Gipsa or Cyberio, to record Alpin fauna, whales in front of Toulon, or bats in Port-Cros National Parc ( BOMBYX / JASON involving 30 researchers in Toulon and international collaborations).
    C) Other communications

    Dr. T. Papadopoulos, S. Roberts, K. Willis, James Martin Research Fellow, University of Oxford, UK

    Detecting bird sound in unknown acoustic background using crowdsourced training data. (25 min)

    Biodiversity monitoring using audio recordings is achievable at a truly global scale via large-scale deployment of inexpensive, unattended recording stations or by large-scale crowdsourcing using recording and species recognition on mobile devices. The ability, however, to reliably identify vocalizing animal species is limited by the fact that acoustic signatures of interest in such recordings are typically embedded in a diverse and complex acoustic background. To avoid the problems associated with modeling such backgrounds, we build generative models of bird sounds and use the concept of novelty detection to screen recordings to detect sections of data which are likely bird vocalisations. We present detection results against various acoustic environments and different signal-to-noise ratios. We discuss the issues related to selecting the cost function and setting detection thresholds in such algorithms. Our methods are designed to be scalable and automatically applicable to arbitrary selections of species depending on the specific geographic region and time period of deployment.

    A. Eldridge (a), M.Casey (b), P. Moscoso (a), M. Peck (a) & N. Morales (c) - (a) University of Sussex, UK (b) University of Dartmouth, US (c) Santa Lucia Forest Reserve, Ecuador

    Toward the Extraction of Ecologically-Meaningful Soundscape Objects: A New Direction for Soundscape Ecology and Rapid Acoustic Biodiversity Assessment? (25 min)

    Efficient methods of biodiversity assessment and monitoring are central to ecological research and crucial in conservation management; technological advances in remote acoustic sensing inspire new approaches. In line with the emerging field of Soundscape Ecology, the acoustic approach is based on the rationale that the ecological processes occurring within a landscape are tightly linked to and reflected in the high-level structure of the patterns of sounds emanating from those landscapes ¿ the soundscape. Rather than attempting to recognise species- specific calls, either manually or automatically, analysis of the high level structure of the soundscape tackles the problem of diversity assessment at the community (rather than species) level. Preliminary work has attempted to make a case for community-level acoustic indices. Existing indices provide simple statistical summaries (e.g. Shannon entropy calculated on frequency or time domain signal). In doing so structural complexities arising from spectro-temporal partitioning are lost, limiting their power both as monitoring and investigative tools. In this paper we consider sparse-coding and source separation algorithms for a means to access and summarise ecologically-meaningful sound objects. In doing so we highlight a potentially fruitful union of the conceptual framework of Soundscape Ecology and source separation methods as a new direction for understanding and assessing ecologically relevant interactions in the soundscape.

    T. Emam, University of Hull, UK

    The Yorkshire Soudscape Project. (25 min)

    Recent developments in the field of ecoacoustics have yielded different approaches to environmental monitoring through the use of sound recording, including the capturing and analysis of entire soundscapes (rather than individual species). May 2015 will mark exactly forty years since the World Soundscape Project (from Simon Fraser University, Vancouver) visited places within Europe in order to produce Five Village Soundscapes and the accompanying European Sound Diary (both 1977). This project focuses on the capturing and analysis of a range of soundscapes using rural Yorkshire as its study area, developing effective sound recording and analysis techniques in order to enhance environmental monitoring; taking an interdisciplinary approach, combining sonic arts and ecology practices. It begins with a comparative analysis of archived sound recordings from 1975 as part of the World Soundscape Project database at Simon Fraser University. This paper will elaborate on the current, early, stages of the project which have involved relocating the soundmarks of said catalogue within the Yorkshire Dales, making preliminary notes on the changes that have occurred, sonically and geographically- and developing hypotheses to work from once rigorous data collection has taken place during the anniversary period in May 2015. It will review relevant field and compositional techniques including the creation of sound maps (e.g. Waldlock 2011 and Pijanowski et al 2011), soundscape composition, spectral analysis, the ontological challenges of working with aural archives (e.g. Chasalow 2006), and a comparison of technological changes in the field of sound recording (e.g. Stern 2003). In this sense, practical and theoretical methodologies work hand-in-hand.

    G. Lauvin, P. Sinclair (a), H. Glotin (b) - (a) Univ. AMU, ESA, Bourges ENSAB, (b) Univ. Toulon

    Locus Sonus Open Microphone Project and Split Soundscape: a multi medium listening and analysis. (25 min)

    Locus Sonus, is a research group, maintained by the Art colleges of Aix-En-Provence (ESA Aix) and Bourges ENSAB). Its main aim is to explore the, ever evolving, relationship between sound, place and usage. Since 2006 Locus Sonus has been organizing and maintaining a network of 'Open Microphones' that permanently stream live audio captures of 'soundscapes' to the internet. During a recent encounter between H. Glotin head of SABIOD project, and Peter Sinclair head of Locus Sonus, it appeared that there exist unexpected similarities in the research projects of their respective labs. G. Lauvin and Locus Sonus are developing a sound installation for the second edition of the Sound art Exhibition Domaine des Murmures, at the Chateau d'Avignon, in Camargue. The artwork will use the System developed by SABIOD to capture and transmit audio data from a bio-acoustically rich location close to the Chateau: aquatic, ultra Sonic and (audible) sonic data will be used to (re)-create a natural symphony. The naturally occurring acoustic environment will be orchestrated with sonified data, corresponding to the sub aquatic and ultra sonic sounds. This project is part of G. Lauvin's doctoral research, Split Soundscape, that focuses on the paradigm of real-time transmission and reconstitution of soundscape.

    U. Moliterno de Camargo, University of Helsinki, Finland

    Identification uncertainty and probabilistic classification methods: from DNA sequences to bird species identity. (25 min)

    Just as next-generation sequencers have fueled the growth of genomics, autonomous recording units brought rapid growth in bioacoustics data processing. The amount of data generated by autonomous recorders as compared to manual listening by field observers can be compared to the revolution of massive parallel sequencing and conventional sequencing methods. Species identification through genetic barcoding is based on comparing the similarity of a query sequence to reference sequences obtained from well-identified samples. Similarly, acoustic species identification is based on comparing the similarity of a query vocalization to those present in a reference database. Both approaches have traditionally been based on arbitrary threshold levels of similarity between samples, resulting on a poor description of identification uncertainty. I will present a Bayesian approach to estimate identification uncertainty using DNA barcoding data and illustrate how I am adapting this framework to the automated identification of bird vocalizations.

    J. Sebastian Ulloa (c), A. Gasc (a), P. Gaucher (b), J. Sueur (a)
    (a) Inst. Systematique, Evolution, Biodiversite, ISYEB UMR 7205 CNRS MNHN UPMC EPHE, Museum Nat. d'Histoire Naturelle, Sorbonne Univ., Paris ; (b) CNRS USR 3456 Guyane, Cayenne, France ; (c) Equipe Communications Acoustiques, Neuro-PSI, UMR 9197 CNRS-Univ. Paris-Sud, Orsay, France

    Screening big audio data to determine the time and space distribution of a bird in a tropical forest. (25 min)

    The tropical forest is a complex acoustic environment, besides background noise from wind and rain, there are multiple sources of sound due to the diversity of co-existing species. Using signal processing and pattern recognition techniques, we managed to implement and tune an automatic system aimed to detect a specific acoustic signature; the song of the Amazonian bird Lipaugus vociferans. The detection system scanned nearly 60 thousand files revealing novel ecological information and rising further questions about this species. Our results showed that the vocal activity of the L. vociferans follow a clear circadian rhythm. Spatial distribution patterns corresponded to specific habitat features that we evidenced confronting the results with hydrology and vegetation maps.

Registration (80 euros) by 20 march

Registration fees are only 80 euros including 2 meals, coffee breaks, and proceedings (additional 20 euros to subscribe to the gala dinner).

Registration is now open here : payments can be made now by bank card or order form to UTLN (or check) before 20 March (click through and fill in the form fields).


ERMITES 15 is accommodated by the Université in Toulon city center, near Mayol stadium and the port. This campus can be reached by train (e.g. TGV from Paris) or by plane (nearby airports: Hyères-Toulon, Marseilles, Nice). For further information on how to arrive in Toulon, please refer here.

Social activities

A gala dinner will be organized the 21th evening. The port visit and surrounding Port-Cros National Parc are wonderfull trips at this Provence spring time.


Program: H. Glotin (pres.), P. Flandrin, A. Joly, G. Pavan, C. Gervaise, D. Mauuary, L. Alecu, J. Razik.
Organization: L. Alecu (pres.), J. Razik, R. Balestriero, H. Glotin.



ERMITES is supported by Toulon university, SABIOD MASTODONS CNRS project, IUF, LSIS