All Issue

2020 Vol.39, Issue 5 Preview Page

Research Article

September 2020. pp. 447-453
Abstract
References
1
E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. Gonzalez-Dominguez, "Deep neural networks for small footprint text dependent speaker verification," Proc. IEEE ICASSP. 4052-4056 (2014).
10.1109/ICASSP.2014.6854363
2
D. Snyder, D. Garcia-Romero, G. Sell, D. Povey, and S. Khudanpur, "X-vectors: Robust dnn embeddings for speaker recognition," Proc. IEEE ICASSP. 5329- 5333 (2018).
10.1109/ICASSP.2018.8461375
3
T. Jianwei, J. Xiaoqi, H. Qingjia, Z. Weijuan, and Z. Shengzhi, "SEF-ALDR: A speaker embedding framework via adversarial learning based disentangled representation," arXiv preprint arXiv:1912.02608 (2020).
4
C. Li, M. Xiaokong, J. Bing, L. Xiangang, Z. Xuewei, L. Xiao, C. Ying, K. Ajay, and Z. Zhenyao, "Deep speaker: an end-to-end neural speaker embedding system," arXiv preprint arXiv:1705.02304 650 (2017).
5
I. Kim, K. Kim, J. Kim, and C. Choi, "Deep speaker representation using orthogonal decomposition and recombination for speaker verification," Proc. IEEE ICASSP. 6126-6130 (2019).
10.1109/ICASSP.2019.8683332
6
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," Advances in NIPS. 2672-2680 (2014).
7
W. Ding and L. He, "MTGAN: Speaker verification through multitasking triplet generative adversarial networks," arXiv preprint arXiv: 1803.09059 (2018).
10.21437/Interspeech.2018-102330384525
8
Y. Liu, Z. Wang, H. Jin, and I. Wassell, "Multi-task adversarial network for disentangled feature learning." Proc. IEEE CVPR. 3743-3751 (2018).
10.1109/CVPR.2018.0039429589627
9
J. S. Chung, N. Arsha, and A. Zisserman, "Voxceleb2: deep speaker recognition," arXiv preprint arXiv:1806. 05622 (2018).
10.21437/Interspeech.2018-1929PMC6639222
10
N. Arsha, J. S. Chung, and A. Zisserman, "Voxceleb: a large-scale speaker identification dataset," arXiv preprint arXiv:1706.08612 (2017).
11
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE CVPR. 770-778 (2016).
10.1109/CVPR.2016.9026180094
12
A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," arXiv preprint arXiv: 1511.06434 (2015).
13
W. Cai, J. Chen, and M. Li, "Exploring the encoding layer and loss function in end-to-end speaker and language recognition system," arXiv preprint arXiv: 1804.05160 (2018).
10.21437/Odyssey.2018-11PMC5865263
14
W. Xie, A. Nagrani, J. S. Chung, and A. Zisserman, "Utterance-level aggregation for speaker recognition in the wild," Proc. IEEE ICASSP. 5791-5795 (2019).
10.1109/ICASSP.2019.8683120
15
L. V. D. Maaten and G. Hinton, "Visualizing data using t-SNE," J. Machine Learning Research, 9, 2579-2605 (2008).
Information
  • Publisher :The Acoustical Society Of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 39
  • No :5
  • Pages :447-453
  • Received Date :2020. 07. 31
  • Accepted Date : 2020. 09. 16