All Issue

2019 Vol.38, Issue 6 Preview Page

November 2019. pp. 670-677
Abstract


References
1 

P. Scalart and J. V. Filho "Speech enhancement based on a priori signal to noise estimation," Proc. IEEE ICASSP. 629-632 (1996).

2 

Y. Ephraim and D. Malah, "Speech enhancement using a minimum meansquare error short-time spectral amplitude estimator," IEEE Trans. Acoust. Speech Signal Process. 32, 1109-1121 (1984).

10.1109/TASSP.1984.1164453
3 

N. Mohammadiha, P. Smaragdis, and A. Leijion, "Supervised and unsupervised speech enhancement using nonnegative matrix factorization," IEEE Trans. Audio, Speech Lang. Process. 21, 2140- 2151 (2013).

10.1109/TASL.2013.2270369
4 

Y. Xu, J. Du, L. -R. Dai, and C. -H. Lee, "A regression approach to speech enhancement based on deep neural networks," IEEE Trans. Audio, Speech Lang. Process. 23, 7-19 (2015).

10.1109/TASLP.2014.2364452
5 

S. R. Park and J. W. Lee, "A fully convolutional neural network for speech enhancement," Proc. Interspeech, 1993-1997 (2017).

10.21437/Interspeech.2017-1465
6 

A. L. Maas, Q. V. Le, T. M. O'Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, "Recurrent neural networks for noise reduction in robust ASR," Proc. Interspeech, 22-25 (2012).

7 

X. Feng, Y. Zhang, and J. Glass, "Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition," Proc. IEEE ICASSP. 1759-1763 (2014).

10.1109/ICASSP.2014.6853900
8 

B. Li and K. C. Sim, "A spectral masking approach to noise-robust speech recognition using deep neural networks," IEEE Trans. Audio, Speech Lang. Process. 22, 1296-1305 (2014).

10.1109/TASLP.2014.2329237
9 

D. Wang and J. Chen, "Supervised speech separation based on deep learning: An overview," IEEE/ACM Trans. Audio, Speech Lang. Process. 26, 1702-1726 (2018).

10.1109/TASLP.2018.284215931223631PMC6586438
10 

D. Berthelot, T. Schumm, and L. Metz, "Began: Boundary equilibrium generative adversarial networks." arXiv preprint arXiv:1703.10717 (2017).

11 

S. Tulyakov, M. -Y. Liu, X. Yang, and J. Kautz, "Mocogan: Decomposing motion and content for video generation," Proc. the IEEE conference on computer vision and pattern recognition, 1526-1535 (2018).

10.1109/CVPR.2018.00165
12 

L. Yu, W. Zhang, J. Wang, and Y. Yu, "Seqgan: Sequence generative adversarial nets with policy gradient." Thirty-First AAAI Conference on Artificial Intelligence, 2852-2858 (2017).

13 

S. Pascual, A. Bonafonte, and J. Serra, "SEGAN: Speech enhancement generative adversarial network," Proc. Interspeech, 3642-3646 (2017).

10.21437/Interspeech.2017-1428
14 

A. Pandey and D. Wang, "On adversarial training and loss functions for speech enhancement," Proc. IEEE ICASSP. 5414-5418 (2018).

10.1109/ICASSP.2018.8462614
15 

C. Donahue, B. Li, and R. Prabhavalkar, "Exploring speech enhancement with generative adversarial networks for robust speech recognition," Proc. IEEE ICASSP. 5024-5028 (2018).

10.1109/ICASSP.2018.8462581
16 

W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition." Proc. IEEE ICASSP. 4960-4964 (2016).

10.1109/ICASSP.2016.7472621
17 

A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: A generative model for raw audio." arXiv preprint arXiv:1609.03499 (2016).

18 

D. Michelsanti and Z. H. Tan, "Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification," Proc. Interspeech, 2008-2012 (2017).

10.21437/Interspeech.2017-1620
19 

M. Mimura, S. Sakai, and T. Kawahara, "Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks," Proc. IEEE Automatic Speech Recognition and Understanding Workshop, 134-140 (2017).

10.1109/ASRU.2017.8268927
20 

H. Zhang, C. Liu, N. Inoue, and K. Shinoda, "Multi- task autoencoder for noise-robust speech recognition," Proc. IEEE ICASSP. 5599-5603 (2018).

10.1109/ICASSP.2018.8461446PMC5999154
21 

K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," Proc. the IEEE International Conference on Computer Vision, 1026-1034 (2015).

10.1109/ICCV.2015.123
22 

M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. E. Hinton, "On rectified linear units for speech processing," Proc. IEEE ICASSP. 3517- 3521 (2017).

23 

X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," Proc. the thirteenth international conference on artificial intelligence and statistics, 249-256 (2010).

24 

S. X. Wen, J. Du, and C. -H. Lee, "On generating mixing noise signals with basis functions for simulating noisy speech and learning dnnbased speech enhancement models," Proc. IEEE International Workshop on MLSP. 1-6 (2017).

10.1109/MLSP.2017.8168192
25 

ITU-T, Rec. P. 56: Objective Measurement of Active Speech Level, 2011.

26 

X. Lu, Y. T. Sao, S. Matsuda, and C. Hori, "Speech enhancement based on deep denoising autoencoder," Proc. Interspeech, 436-440 (2013).

27 

R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," Proc. 30th ICML. 2347-2355 (2013).

Information
  • Publisher :The Acoustical Society Of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 38
  • No :6
  • Pages :670-677
  • Received Date :2019. 10. 22
  • Accepted Date : 2019. 11. 11