P. Scalart and J. V. Filho "Speech enhancement based on a priori signal to noise estimation," Proc. IEEE ICASSP. 629-632 (1996).
Y. Ephraim and D. Malah, "Speech enhancement using a minimum meansquare error short-time spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process. 32, 1109-1121 (1984).10.1109/TASSP.1984.1164453
N. Mohammadiha, P. Smaragdis, and A. Leijion, "Supervised and unsupervised speech enhancement using nonnegative matrix factorization," IEEE Trans. Audio, Speech Lang. Process. 21, 2140- 2151 (2013).10.1109/TASL.2013.2270369
Y. Xu, J. Du, L. -R. Dai, and C. -H. Lee, "A regression approach to speech enhancement based on deep neural networks," IEEE Trans. Audio, Speech Lang. Process. 23, 7-19 (2015).10.1109/TASLP.2014.2364452
S. R. Park and J. W. Lee, "A fully convolutional neural network for speech enhancement," Proc. Interspeech, 1993-1997 (2017).10.21437/Interspeech.2017-1465
A. L. Maas, Q. V. Le, T. M. O'Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, "Recurrent neural networks for noise reduction in robust ASR," Proc. Interspeech, 22-25 (2012).
X. Feng, Y. Zhang, and J. Glass, "Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition," Proc. IEEE ICASSP. 1759-1763 (2014).10.1109/ICASSP.2014.6853900
B. Li and K. C. Sim, "A spectral masking approach to noise-robust speech recognition using deep neural networks," IEEE Trans. Audio, Speech Lang. Process. 22, 1296-1305 (2014).10.1109/TASLP.2014.2329237
D. Wang and J. Chen, "Supervised speech separation based on deep learning: An overview," IEEE/ACM Trans. Audio, Speech Lang. Process. 26, 1702-1726 (2018).10.1109/TASLP.2018.284215931223631PMC6586438
D. Berthelot, T. Schumm, and L. Metz, "Began: Boundary equilibrium generative adversarial networks." arXiv preprint arXiv:1703.10717 (2017).
S. Tulyakov, M. -Y. Liu, X. Yang, and J. Kautz, "Mocogan: Decomposing motion and content for video generation," Proc. the IEEE conference on computer vision and pattern recognition, 1526-1535 (2018).10.1109/CVPR.2018.00165
L. Yu, W. Zhang, J. Wang, and Y. Yu, "Seqgan: Sequence generative adversarial nets with policy gradient." Thirty-First AAAI Conference on Artificial Intelligence, 2852-2858 (2017).
S. Pascual, A. Bonafonte, and J. Serra, "SEGAN: Speech enhancement generative adversarial network," Proc. Interspeech, 3642-3646 (2017).10.21437/Interspeech.2017-1428
A. Pandey and D. Wang, "On adversarial training and loss functions for speech enhancement," Proc. IEEE ICASSP. 5414-5418 (2018).10.1109/ICASSP.2018.8462614
C. Donahue, B. Li, and R. Prabhavalkar, "Exploring speech enhancement with generative adversarial networks for robust speech recognition," Proc. IEEE ICASSP. 5024-5028 (2018).10.1109/ICASSP.2018.8462581
W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition." Proc. IEEE ICASSP. 4960-4964 (2016).10.1109/ICASSP.2016.7472621
A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: A generative model for raw audio." arXiv preprint arXiv:1609.03499 (2016).
D. Michelsanti and Z. H. Tan, "Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification," Proc. Interspeech, 2008-2012 (2017).10.21437/Interspeech.2017-1620
M. Mimura, S. Sakai, and T. Kawahara, "Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks," Proc. IEEE Automatic Speech Recognition and Understanding Workshop, 134-140 (2017).10.1109/ASRU.2017.8268927
H. Zhang, C. Liu, N. Inoue, and K. Shinoda, "Multi- task autoencoder for noise-robust speech recognition," Proc. IEEE ICASSP. 5599-5603 (2018).10.1109/ICASSP.2018.8461446PMC5999154
K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," Proc. the IEEE International Conference on Computer Vision, 1026-1034 (2015).10.1109/ICCV.2015.123
M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. E. Hinton, "On rectified linear units for speech processing," Proc. IEEE ICASSP. 3517- 3521 (2017).
X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," Proc. the thirteenth international conference on artificial intelligence and statistics, 249-256 (2010).
S. X. Wen, J. Du, and C. -H. Lee, "On generating mixing noise signals with basis functions for simulating noisy speech and learning dnnbased speech enhancement models," Proc. IEEE International Workshop on MLSP. 1-6 (2017).10.1109/MLSP.2017.8168192
ITU-T, Rec. P. 56: Objective Measurement of Active Speech Level, 2011.
X. Lu, Y. T. Sao, S. Matsuda, and C. Hori, "Speech enhancement based on deep denoising autoencoder," Proc. Interspeech, 436-440 (2013).
R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," Proc. 30th ICML. 2347-2355 (2013).
- Publisher :The Acoustical Society Of Korea
- Publisher(Ko) :한국음향학회
- Journal Title :The Journal of the Acoustical Society of Korea
- Journal Title(Ko) :한국음향학회지
- Volume : 38
- No :6
- Pages :670-677
- Received Date :2019. 10. 22
- Accepted Date : 2019. 11. 11