logo EDITE Simon LEGLAIVE
Identité
Simon LEGLAIVE
État académique
Thèse en cours...
Sujet: Séparation sous-déterminée de sources sonores en milieu réverbérant
Direction de thèse:
Laboratoire:
Voisinage
Ellipse bleue: doctorant, ellipse jaune: docteur, rectangle vert: permanent, rectangle jaune: HDR. Trait vert: encadrant de thèse, trait bleu: directeur de thèse, pointillé: jury d'évaluation à mi-parcours ou jury de thèse.
Productions scientifiques
oai:hal.archives-ouvertes.fr:hal-01206808
A priori probabiliste anéchoïque pour la séparation sous-déterminée de sources sonores en milieu réverbérant
International audience

Dans cet article, nous montrons qu'un a priori probabiliste anéchoïque sur les filtres de mélange permet d'aider la séparation aveugle et sous-déterminée de sources audio en milieu réverbérant. En considérant un modèle anéchoïque pour les filtres de mélange, la contribution de chaque source à chaque canal du mélange peut être représentée par un processus aléatoire suivant un modèle de chaîne de Markov en fréquence. Ce modèle est utilisé comme a priori pour estimer les filtres de mélange au sens du Maximum A Posteriori (MAP) en utilisant l'algorithme Espérance-Maximisation (EM). Plusieurs séparations sur des mélanges synthétiques réverbérants et sur des enregistrements réels montrent que l'estimation MAP avec a priori anéchoïque permet d'obtenir de meilleurs résultats de séparation qu'une estimation au sens du Maximum de Vraisemblance (MV) sans a priori.


Colloque GRETSI Colloque GRETSI https://hal-institut-mines-telecom.archives-ouvertes.fr/hal-01206808 Colloque GRETSI, Sep 2015, Lyon, France. 2015ARRAY(0x7f5473f97618) 2015-09
oai:hal.archives-ouvertes.fr:hal-01219635
MULTICHANNEL AUDIO SOURCE SEPARATION WITH PROBABILISTIC REVERBERATION MODELING
International audience
In this paper we show that considering early contributions of mixing filters through a probabilistic prior can help blind source separation in reverberant recording conditions. By modeling mixing filters as the direct path plus R−1 reflections, we represent the propagation from a source to a mixture channel as an autoregressive process of order R in the frequency domain. This model is used as a prior to derive a Maximum A Posteriori (MAP) estimation of the mixing filters using the Expectation-Maximization (EM) algorithm. Experimental results over reverberant synthetic mixtures and live recordings show that MAP estimation with this prior provides better separation results than a Maximum Likelihood (ML) estimation.
Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) https://hal.inria.fr/hal-01219635 IEEE. Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2015, New Paltz, NY, United States. pp.5, 2015, <http://www.waspaa.com/> http://www.waspaa.com/ARRAY(0x7f5472fcdaf0) 2015-10-18
oai:hal.archives-ouvertes.fr:hal-01322937
Autoregressive Moving Average Modeling of Late Reverberation in the Frequency Domain
International audience
In this paper, the late part of a room response is modeled in the frequency domain as a complex Gaussian random process. The autocovariance function (ACVF) and power spectral density (PSD) are theoretically defined from the exponential decay of the late reverberation power. Furthermore we show that the ACVF and PSD are accurately parametrized by an autoregressive moving average (ARMA) model. This leads to a new generative model of late reverberation in the frequency domain. The ARMA parameters are easily estimated from the theoretical ACVF. The statistical characterization is consistent with empirical results on simulated and real data. This model could be used to incorporate priors in audio source separation and dereverberation.
European Signal Processing Conference (EUSIPCO) https://hal.archives-ouvertes.fr/hal-01322937 EURASIP. European Signal Processing Conference (EUSIPCO), Aug 2016, Budapest, Hungary. 2016, Proc. of European Signal Processing Conference (EUSIPCO). <http://www.eusipco2016.org/> http://www.eusipco2016.org/ARRAY(0x7f5473fc14e8) 2016-08-29
oai:hal.archives-ouvertes.fr:hal-01370051
Multichannel Audio Source Separation with Probabilistic Reverberation Priors
International audience
Incorporating prior knowledge about the sources and/or the mixture is a way to improve under-determined audio source separation performance. A great number of informed source separation techniques concentrate on taking priors on the sources into account, but fewer works have focused on constraining the mixing model. In this paper we address the problem of under-determined multichannel audio source separation in reverberant conditions. We target a semi-informed scenario where some room parameters are known. Two probabilistic priors on the frequency response of the mixing filters are proposed. Early reverberation is characterized by an autoregressive model while according to statistical room acoustics results, late reverberation is represented by an autoregressive moving average model. Both reverberation models are defined in the frequency domain. They aim to transcribe the temporal characteristics of the mixing filters into frequency-domain correlations. Our approach leads to a maximum a posteriori estimation of the mixing filters which is achieved thanks to an expectation-maximization algorithm. We experimentally show the superiority of this approach compared with a maximum likelihood estimation of the mixing filters.
IEEE/ACM Transactions on Audio, Speech and Language Processing https://hal.archives-ouvertes.fr/hal-01370051 IEEE/ACM Transactions on Audio, Speech and Language Processing, 2016ARRAY(0x7f54735b9900) 2016-09-21
oai:hal.archives-ouvertes.fr:hal-01416366
Alpha-Stable Multichannel Audio Source Separation
International audience
In this paper, we focus on modeling multichannel audio signals in the short-time Fourier transform domain for the purpose of source separation. We propose a probabilistic model based on a class of heavy-tailed distributions, in which the observed mixtures and the latent sources are jointly modeled by using a certain class of multivariate alpha-stable distributions. As opposed to the conventional Gaussian models, where the observations are constrained to lie just within a few standard deviations near the mean, the pro- posed heavy-tailed model allows us to account for spurious data or important uncertainties in the model. We develop a Monte Carlo Expectation-Maximization algorithm for making inference in the proposed model. We show that our approach leads to significant improvements in audio source separation under corrupted mixtures and in spatial audio object coding.
42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP) https://hal.archives-ouvertes.fr/hal-01416366 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar 2017, New Orleans, United States. 2017, Proc. 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP). <http://www.ieee-icassp2017.org/> http://www.ieee-icassp2017.org/ARRAY(0x7f5472acd0a8) 2017-03-05
oai:hal.archives-ouvertes.fr:hal-01416347
Multichannel audio source separation: variational inference of time-frequency sources from time-domain observations
International audience
A great number of methods for multichannel audio source separation are based on probabilistic approaches in which the sources are modeled as latent random variables in a time-frequency (TF) domain. For reverberant mixtures, most of the methods approximate the time-domain convolutive mixing process in the TF-domain, assuming short mixing filters. The TF latent sources are then inferred from the TF mixture observations. In this paper we propose to infer latent TF sources from the time-domain observations. This approach allows us to exactly model the convolutive mixing process. The inference procedure rely on a variational expectation-maximization algorithm. In significant reverberation conditions, we show that our approach leads a Signal-to-Distortion Ratio improvement of 5.5 dB.
42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP) https://hal.archives-ouvertes.fr/hal-01416347 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar 2017, New Orleans, United States. 2017, Proc. 42nd International Conference on Acoustics, Speech and Signal Processing (ICASSP). <http://www.ieee-icassp2017.org/> http://www.ieee-icassp2017.org/ARRAY(0x7f5472fd93c0) 2017-03-05
oai:hal.archives-ouvertes.fr:hal-01531243
Semi-Blind Student's t Source Separation for Multichannel Audio Convolutive Mixtures
International audience
This paper addresses the problem of multichannel audio source separation in under-determined convolutive mixtures. We target a semi-blind scenario assuming that the mixing filters are known. The convolutive mixing process is exactly modeled using the time-domain impulse responses of the mixing filters. We propose a Student's t time-frequency source model based on non-negative matrix factorization (NMF). The Student's t distribution being heavy-tailed with respect to the Gaussian, it provides some flexibility in the modeling of the sources. We also study a simpler Student's t sparse source model within the same general source separation framework. The inference procedure relies on a variational expectation-maximization algorithm. Experiments show the advantage of using an NMF model compared with the sparse source model. While the Student's t NMF source model leads to slightly better results than our previous Gaussian one, we demonstrate the superiority of our method over two other approaches from the literature.
25th European Signal Processing Conference (EUSIPCO 25th European Signal Processing Conference (EUSIPCO) https://hal.archives-ouvertes.fr/hal-01531243 25th European Signal Processing Conference (EUSIPCO), Aug 2017, Kos, Greece. 25th European Signal Processing Conference (EUSIPCO, pp.2323-2327, 2017, Proc. of 25th European Signal Processing Conference (EUSIPCO). <http://www.eusipco2017.org/> http://www.eusipco2017.org/ARRAY(0x7f5472fe2b28) 2017-08-28
oai:hal.archives-ouvertes.fr:hal-01540481
Séparation de sources audio en milieu réverbérant : Factorisation en matrices non-négatives et représentation temporelle du mélange convolutif
International audience
This paper addresses the problem of multichannel audio source separation in under-determined reverberant mixtures. We target a semi-blind scenario assuming that the mixing filters are known. The proposed method consists in working directly with the time-domain mixture signals. This approach makes it possible to accurately represent the convolutive mixing process, it is therefore suitable for the separation of highly reverberant mixtures. The source signals are represented in the modified discrete cosine transform domain with a Gaussian model based on non-negative matrix factorization (NMF). Source inference is based on a variational expectation-maximization algorithm. We experimentally show the advantage of using a time-domain representation of the convolutive mixture and a source model based on NMF.
Cet article traite du problème de séparation de sources audio sous-déterminé pour les mélanges réverbérants multi- canaux. Nous visons une application semi-aveugle où les filtres de mélange sont connus. La méthode proposée consiste à travailler directement avec les signaux temporels du mélange. Cette approche permet de représenter de façon exacte le processus de mélange convolutif, elle est donc adaptée pour la séparation de mélanges fortement réverbérants. Les signaux sources sont quant à eux représentés dans le domaine de la transformée en cosinus discrète modifiée, en utilisant un modèle gaussien basé sur la factorisation en matrices non-négatives. L'inférence des sources repose sur un algorithme espérance-maximisation variationnel. Nous montrons expérimentalement l'intérêt d'utiliser conjointement une représentation temporelle du mélange convolutif et un modèle de source basé sur la factorisation en matrices non-négatives.
Colloque GRETSI https://hal.archives-ouvertes.fr/hal-01540481 Colloque GRETSI, Sep 2017, Juan-Les-Pins, France. 2017, Actes du XXVIème Colloque GRETSI. <http://gretsi.fr/colloque2017/> http://gretsi.fr/colloque2017/ARRAY(0x7f547168b1e0) 2017-09-05
oai:hal.archives-ouvertes.fr:hal-01548469
Separating Time-Frequency Sources from Time-Domain Convolutive Mixtures Using Non-negative Matrix Factorization
International audience
This paper addresses the problem of under-determined audio source separation in multichannel reverberant mixtures. We target a semi- blind scenario assuming that the mixing filters are known. Source separation is performed from the time-domain mixture signals in order to accurately model the convolutive mixing process. The source signals are however modeled as latent variables in a time-frequency domain. In a previous paper we proposed to use the modified discrete cosine transform. The present paper generalizes the method to the use of the odd-frequency short-time Fourier transform. In this domain, the source coefficients are modeled as centered complex Gaussian random variables whose variances are structured by means of a non-negative matrix factorization model. The inference procedure relies on a variational expectation-maximization algorithm. In the experiments we discuss the choice of the source representation and we show that the proposed approach outperforms two methods from the literature.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) https://hal.archives-ouvertes.fr/hal-01548469 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2017, New Paltz, New York, United States. 2017, Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). <http://www.waspaa.com/> http://www.waspaa.com/ARRAY(0x7f5472fdf0d0) 2017-10-15