Tutorial description: Speech front-end processing aims at obtaining the clean speech (as close as to the close-talking speech) in arbitrary scenarios including close-talking mode and distant-talking mode. In the distant talking (far-field) mode, possible “contaminations” such as acoustic echoes, coherent interferences, background noise and room reverberation, would be added together and be more serious with distance increases. In this tutorial, speech front-end processing methods are categorized from the perspective of physically modelling and data-driven modelling to tackle the possible “contaminations”, which well demonstrates the history and methodology of the research area. The details of the tutorial are listed as follows.
Firstly, from the perspective of physically modelling, state-of-the-art single-channel and multi-channel front-end processing in time, spectral and spatial domain are presented. In more detail, Echo cancellation, target speech detection/voice activity detection, linear/nonlinear beamforming and dereverberation are all taken into consideration, which results in a brief summary of classical methods in the past two decades and drawing some conclusions. As a result, some available improvements of speech front-end processing for real applications are well organized.
Secondly, from the perspective of data-driven modelling, in recently years, non-negative matrix factorization (NMF) has been widely used in many audio application, such as source separation and speech enhancement. The clean speech spectrum is estimated by the linear combination of the speech bases weighted by their corresponding activations. For tackling problems of non-stationary noise, speech-like interference or multi-source scenarios, the properties of speech, such as temporal dependency and sparsity, are available ways. With limited training data or with low rank spectral structures noise, NMF is able to improve the SNR significantly.
Deep Neural Network (DNN) is another emerging data-driven technique in front-end processing, especially after the comprehensive success in speech recognition, image recognition and many other areas. The intuitive usage such as Time-Frequency classification and spectral mapping has shown promising results. It shows more capability in tackling the non-stationary noise, which has challenged traditional speech enhancement approaches for decades.
However, data-driven modelling does not solve the problem for all nor does it help us to know more about speech. DNN as inherently a supervised approach only learns what it is taught to learn. Modeling how speech is produced and how speech is perceived has been much the work of the signal processing. So let DNN do what it can do best, that is to learn the prior information from training data, and the signal processing part decide how to use that information. So some recent research updates are presented based on this starting point, which well combines traditional signal processing and machine learning.
At last, a brief introduction of the real applications is presented, which proves the effectiveness of the speech front-end processing.
Relevant publications from presenter(s):
 Yueyue Na, Yanmeng Guo, Qiang Fu, Yonghong Yan, Cross Array and Rank-1 MUSIC Algorithm for Acoustic Highway Lane Detection, IEEE Transactions on Intelligent Transportation Systems, 2016 Accepted
 Chao Wu, Xiaofei Wang, Yanmeng Guo, Qiang Fu, Yonghong Yan, Robust Uncertainty Control of the Simplified Kalman Filter for Acoustic Echo Cancelation, Circuits Syst Signal Process, Feb., 2016
 Y. Na, Y. Guo, Q. Fu, and Y. Yan, "An Acoustic Traffic Monitoring System: Design and Implementation " presented at the 12th IEEE International Conference on Ubiquitous Intelligence and Computing (UIC2015), 2015.
 Chao Wu, et al, Robust beamforming using beam-to-reference weighting diagonal loading and Bayesian framework (2015), in: Electronics Letters, 51:22(1772--1774)
 C. Wu, X. Wang, Y. Guo, Q. Fu, and Y. Yan, "Robust Huber M-estimator based proportionate affine projection algorithm with variable cutoff updating," Electronics Letters, vol. 51, pp. 2113-2115, 2015.
 王晓飞, 国雁萌, 葛凤培, 吴超, 付强, and 颜永红, "具有选择注意能力的语音拾取技术 Speech-picking for speech systems with auditory attention ability," 中国科学:信息科学 Scientia Sinica Informationis, vol. 2015, 2015.
 X. Wang, Y. Guo, C. Wu, Q. Fu, and Y. Yan, "A reverberation robust target speech detection method using dual-microphone in distant-talking scene," Speech Communication, vol. 72, pp. 47-58, 2015.
 Chao Wu, Kaiyu Jiang, Xiaofei Wang, Yanmeng Guo, Qiang Fu, and Yonghong Yan, "A robust step-size control technique based on proportionate constraints on filter update for acoustic echo cancellation," Chinese Journal of Electronics, vol.?, 2014
 Chao Wu, Kaiyu Jiang, Yanmeng Guo, Qiang Fu, and Yonghong Yan, "A robust step-size control algorithm for frequency domain acoustic echo cancellation," presented at the InterSpeech, Singapore, 2014.
 Xiaofei Wang, Yanmeng Guo, Qiang Fu, and Yonghong Yan, "Reverberation robust two-microphone target signal detection algorithm with coherent interference," presented at the IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP), Xi'an, 2014.
 Xiaofei Wang, Yanmeng Guo, Xi Yang, Qiang Fu, and Yonghong Yan, "Acoustic Scene Aware Dereverberation using 2-channel spectral enhancement for REVERB Challenge," presented at the IEEE Workshop on REVERB Challenge, Florence, Italy, 2014.
 Xiaofei Wang, Yanmeng Guo, Qiang Fu, and Yonghong Yan, "Speech Enhancement Using Multi-channel Post-filtering with Modified Signal Presence Probability in Reverberant Environment," Chinese Journal of Electronics, vol. 23, pp. 598-604, 2014.
 Kaiyu Jiang, Chao Wu, Yanmeng Guo, Qiang Fu, and Yonghong Yan, "Acoustic echo control with frequency-domain stage-wise regression," IEEE Signal Processing Letters, vol. 21, pp. 1265-1269, 2014.
 Kaiyu Jiang, Yanmeng Guo, Qiang Fu, and Yonghong Yan, "Controlled cross spectrum whitening for coherence based two-microphone speech enhancement," presented at the 21st International Congress on Sound and Vibration (ICSV), Beijing, China, 2014.
 吴超, 付强, and 颜永红, "基于噪声估计和能量比的双讲检测方法," presented at the 全国人机语音通讯学术会议（NCMMSC）, 贵阳, 2013.
 王晓飞, 姜开宇, 国雁萌, 付强, and 颜永红, "基于空间声场扩散信息的混响抑制方法," 清华大学学报：自然科学版, vol. 53, pp. 917-920, 2013.
 Y. Guo, K. Li, Q. Fu, and Y. Yan, "A two-microphone based voice activity detection for distant-talking speech in wide range of direction of arrival," presented at the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 2012.
 Yanmeng Guo, Kai Li, Qiang Fu, and Yonghong Yan, "Target speech detection based on microphone array using inter-channel phase differences," presented at the IEEE International Conference on Consumer Electronics(ICCE), Las Vegas, USA, 2012.
 Kai Li, Qiang Fu, and Yonghong Yan, "Speech enhancement using robust generalized sidelobe canceller with multi-channel post-filtering in adverse environments," Chinese Journal of Electronics, vol. 21, pp. 85-90, 2012.
 Kai Li, Yanmeng Guo, Qiang Fu, Junfeng Li, and Yonghong Yan, "Two-microphone noise reduction using spatial information-based spectral amplitude estimation," IEICE Transactions on Information and Systems, vol. E95-D, pp. 1454-1464, 2012.
 Kai Li, Yanmeng Guo, Qiang Fu, and Yonghong Yan, "A two microphone-based approach for speech enhancement in adverse environments," presented at the IEEE International Conference on Consumer Electronics(ICCE), Las Vegas, USA, 2012.
 Kai Li, Qiang Fu, and Yonghong Yan, "Dual-channel optimally modified log-spectral amplitude estimator using spatial information," presented at the 4th International Congress on Image and Signal Processing, Shanghai, China, 2011.
 Kai Li, Qiang Fu, Junfeng Li, and Yonghong Yan, "Noise cross power spectral density estimation using spatial information controlled recursive averaging," presented at the Inter-Noise, Osaka, Japan, 2011.