Variational Speech Separation of
More Sources than Mixtures
Steven J. Rennie, Kannan
Achan, Brendan J.
Frey, Parham
Aarabi
Department of Electrical and Computer
Engineering
Abstract:
We present a
novel structured variational inference algorithm for probabilistic speech
separation. The algorithm is built
upon a new generative probability model of speech production and mixing in the
full spectral domain, which utilizes a detailed probability model of speech
trained in the magnitude spectral domain, and the position ensemble of the
underlying sources as a natural, low-dimensional parameterization of the mixing
process. The algorithm is able to produce high quality estimates of the underlying
source configurations, even when there are more
underlying sources than available microphone recordings. Spectral phase
estimates of all underlying speakers are automatically recovered by the
algorithm, facilitating the direct transformation of the obtained source
estimates into the time domain, to yield speech signals of high perceptual
quality.
Publication:
Rennie, S., Achan, K., Frey, B., Aarabi, P., Variational speech
separation of more sources than mixtures., Tenth International Workshop
on
Artificial Intelligence and
Audio Demonstrations:
|
Test Scenario #1: - 6 underlying speech
sources - only 4 microphone observations - 20 dB microphone
noise corruption |
|
|
Microphone
Observations: |
|
|
Microphone 1: Audio Signal |
Microphone 2: Audio Signal |
|
Microphone 3: Audio Signal |
Microphone 4: Audio Signal |
|
Separation
Results: |
|
|
Source 1: Beamforming Result* |
Source 2: Beamforming Result* |
|
Source 3: Beamforming Result* |
Source 4: Beamforming Result* |
|
Source 5: Beamforming Result* |
Source 6: Beamforming Result* |
|
Test Scenario #2: - 5 underlying speech
sources - only 4 microphone observations - 20 dB microphone
noise corruption |
|
|
Microphone
Observations: |
|
|
Microphone 1: Audio Signal |
Microphone 2: Audio Signal |
|
Microphone 3: Audio Signal |
Microphone 4: Audio Signal |
|
Separation
Results: |
|
|
Source 1: Beamforming Result* |
Source 2: Beamforming Result* |
|
Source 3: Beamforming Result* |
Source 4: Beamforming Result* |
|
Source 5: Beamforming Result* |
|
|
Test Scenario #3: - 3 underlying speech sources
- only 2 microphone observations - 20 dB microphone
noise corruption |
|
|
Microphone
Observations: |
|
|
Microphone 1: Audio Signal |
Microphone 2: Audio Signal |
|
Separation
Results: |
|
|
Source 1: Beamforming Result* |
Source 2: Beamforming Result* |
|
Source 3: Beamforming Result* |
|
|
Test Scenario #4: - 4 underlying speech
sources - only 2 microphone observations - 20 dB microphone
noise corruption |
|
|
Microphone
Observations: |
|
|
Microphone 1: Audio Signal |
Microphone 2: Audio Signal |
|
Separation
Results: |
|
|
Source 1: Beamforming Result* |
Source 2: Beamforming Result* |
|
Source 3: Beamforming Result* |
Source 4: Beamforming Result* |
|
Test Scenario #5: - 4 underlying speech
sources - 4 microphone
observations - 20 dB microphone
noise corruption |
|
|
Microphone
Observations: |
|
|
Microphone 1: Audio Signal |
Microphone 2: Audio Signal |
|
Microphone 3: Audio Signal |
Microphone 4: Audio Signal |
|
Separation
Results: |
|
|
Source 1: Beamforming Result* |
Source 2: Beamforming Result* |
|
Source 3: Beamforming Result* |
Source 4: Beamforming Result* |
*Beamforming Estimate:
