Variational Speech Separation of

More Sources than Mixtures

 

Steven J. Rennie, Kannan Achan, Brendan J. Frey, Parham Aarabi

Department of Electrical and Computer Engineering

University of Toronto

 

 

Abstract:

 

We present a novel structured variational inference algorithm for probabilistic speech separation. The    algorithm is built upon a new generative probability model of speech production and mixing in the full spectral domain, which utilizes a detailed probability model of speech trained in the magnitude spectral domain, and the position ensemble of the underlying sources as a natural, low-dimensional parameterization of the mixing process. The algorithm is able to produce high quality estimates of the underlying source configurations, even when there are more underlying sources than available microphone recordings. Spectral phase estimates of all underlying speakers are automatically recovered by the algorithm, facilitating the direct transformation of the obtained source estimates into the time domain, to yield speech signals of high perceptual quality.

 

Publication:

 

    Rennie, S., Achan, K., Frey, B., Aarabi, P., Variational speech

    separation of more sources than mixtures., Tenth International Workshop on

    Artificial Intelligence and Statistics, Barbados, January 2005. "pdf" "ps"

         

Audio Demonstrations:   

 

Test Scenario #1: 

- 6 underlying speech sources

- only 4 microphone observations

- 20 dB microphone noise corruption

                                     

   Microphone Observations:

 

        Microphone 1: Audio Signal

        Microphone 2: Audio Signal

 

        Microphone 3: Audio Signal

        Microphone 4: Audio Signal

 

   Separation Results:

 

      Source 1:  Beamforming Result*

                      Variational Inference Result

      Source 2:  Beamforming Result*

                            Variational Inference Result

 

      Source 3:  Beamforming Result*

                      Variational Inference Result

      Source 4:  Beamforming Result*

                      Variational Inference Result

      Source 5:  Beamforming Result*

                      Variational Inference Result

      Source 6:  Beamforming Result*

                      Variational Inference Result

 

 

Test Scenario #2: 

- 5 underlying speech sources

- only 4 microphone observations

- 20 dB microphone noise corruption

                                     

   Microphone Observations:

 

        Microphone 1: Audio Signal

        Microphone 2: Audio Signal

 

        Microphone 3: Audio Signal

        Microphone 4: Audio Signal

 

   Separation Results:

 

      Source 1:  Beamforming Result*

                      Variational Inference Result

      Source 2:  Beamforming Result*

                            Variational Inference Result

 

      Source 3:  Beamforming Result*

                      Variational Inference Result

      Source 4:  Beamforming Result*

                      Variational Inference Result

      Source 5:  Beamforming Result*

                      Variational Inference Result

 

 

 

Test Scenario #3: 

- 3 underlying speech sources

- only 2 microphone observations

- 20 dB microphone noise corruption

                                     

   Microphone Observations:

 

        Microphone 1: Audio Signal

        Microphone 2: Audio Signal

 

 

   Separation Results:

 

      Source 1:  Beamforming Result*

                      Variational Inference Result

      Source 2:  Beamforming Result*

                            Variational Inference Result

 

      Source 3:  Beamforming Result*

                      Variational Inference Result

     

 

 

Test Scenario #4: 

- 4 underlying speech sources

- only 2 microphone observations

- 20 dB microphone noise corruption

                                     

   Microphone Observations:

 

        Microphone 1: Audio Signal

        Microphone 2: Audio Signal

 

 

   Separation Results:

 

      Source 1:  Beamforming Result*

                      Variational Inference Result

      Source 2:  Beamforming Result*

                            Variational Inference Result

 

      Source 3:  Beamforming Result*

                      Variational Inference Result

      Source 4:  Beamforming Result*

                      Variational Inference Result

 

 

Test Scenario #5: 

- 4 underlying speech sources

- 4 microphone observations

- 20 dB microphone noise corruption

                                     

   Microphone Observations:

 

        Microphone 1: Audio Signal

        Microphone 2: Audio Signal

 

        Microphone 3: Audio Signal

        Microphone 4: Audio Signal

 

   Separation Results:

 

      Source 1:  Beamforming Result*

                      Variational Inference Result

      Source 2:  Beamforming Result*

                            Variational Inference Result

 

      Source 3:  Beamforming Result*

                      Variational Inference Result

      Source 4:  Beamforming Result*

                      Variational Inference Result

 

*Beamforming Estimate: