logo EDITE Simon BOZONNET
Identité
Simon BOZONNET
État académique
Thèse soutenue le 2012-05-02
Sujet: Segmentation en locuteurs pour l'analyse de contenus audio
Direction de thèse:
Laboratoire:
Voisinage
Ellipse bleue: doctorant, ellipse jaune: docteur, rectangle vert: permanent, rectangle jaune: HDR. Trait vert: encadrant de thèse, trait bleu: directeur de thèse, pointillé: jury d'évaluation à mi-parcours ou jury de thèse.
Productions scientifiques
SB:Eusipco10
A multimodal approach to initialisation for top-down speaker diarization of television shows
Eusipco 2010-08
oai:hal.archives-ouvertes.fr:hal-00601383
The LIA-Eurecom RT'09 speaker diarization system : enhancements in speaker modelling and cluster purification
There are two approaches to speaker diarization. They are bottom-up and top-down. Our work on top-down systems show that they can deliver competitive results compared to bottom-up systems and that they are extremely computationally efficient, but also that they are particularly prone to poor model initialisation and cluster impurities. In this paper we present enhancements to our state-of-the-art, top-down approach to speaker diarization that deliver improved stability across three different datasets composed of conference meetings from five standard NIST RT evaluations. We report an improved approach to speaker modelling which, despite having greater chances for cluster impurities, delivers a 35% relative improvement in DER for the MDM condition. We also describe new work to incorporate cluster purification into a top-down sys- tem which delivers relative improvements of 44% over the baseline system without compromising computational efficiency.
ICASSP 2010, 35th International Conference on Acoustics, Speech, and Signal Processing, March 14-19, 2010, Dallas, Texas, USA ICASSP 2010, 35th International Conference on Acoustics, Speech, and Signal Processing, March 14-19, 2010, Dallas, Texas, USAproceeding with peer review 2010-03-14
oai:hal.archives-ouvertes.fr:hal-00601390
An integrated top-down/bottom-up approach to speaker diarization
Most speaker diarization systems fit into one of two cat- egories: bottom-up or top-down. Bottom-up systems are the most popular but can sometimes suffer from instability from merging and stopping criteria difficulties. Top-down systems deliver competitive results but are particularly prone to poor model initialization which often leads to large variations in performance. This paper presents a new integrated bottom-up/top-down approach to speaker diarization which aims to harness the strengths of each system and thus to improve performance and stability. In contrast to previous work, here the two systems are fused at the heart of the segmentation and clustering stage. Experimental results show improvements in speaker diarization performance for both meeting and TV-show domain data indicating increased intra and inter-domain stability. On the TV-show data in particular, an average relative improvement of 32% DER is obtained.
Interspeech 2010, September 26-30, Makuhari, Japan Interspeech 2010, September 26-30, Makuhari, Japanproceeding with peer review 2010-09-26
oai:hal.archives-ouvertes.fr:hal-00601409
System output combination for improved speaker diarization
System combination or fusion is a popular, successful and sometimes straightforward means of improving performance in many fields of statistical pattern classification, including speech and speaker recognition. Whilst there is significant work in the literature which aims to improve speaker diarization performance by combining multiple feature streams, there is little work which aims to combine the outputs of multiple systems. This paper reports our first attempts to combine the outputs of two state-of-the-art speaker diarization systems, namely ICSI's bottom-up and LIA-EURECOM's top-down systems. We show that a cluster matching procedure reliably identifies corresponding speaker clusters in the two system outputs and that, when they are used in a new realignment and resegmentation stage, the combination leads to relative improvements of 13% and 7% DER on independent development and evaluation sets.
Interspeech 2010, September 26-30, Makuhari, Japan Interspeech 2010, September 26-30, Makuhari, Japanproceeding with peer review 2010-09-26
Soutenance
Thèse: Nouveaux point de vue sur la classification hiérarchique et normalisation linguistique pour la segmentation et regroupement en locuteurs
Soutenance: 2012-05-02