Author(s): Cha Zhang, Dinei Florencio
In distributed meeting applications, microphone arrays have been widely used to capture superior speech sound and perform speaker localization through sound source localization (SSL) and beamforming. This paper presents a unified maximum likelihood framework of these two techniques, and demonstrates how such a framework can be adapted to create efficient SSL and beamforming algorithms for reverberant rooms and unknown directional patterns of microphones. The proposed method is closely related to steered response power-based algorithms, which are known to work extremely well in real-world environments. We demonstrate the effectiveness of the proposed method on challenging synthetic and real-world datasets, including over six hours of recorded meetings.