All Issue

2023 Vol.42, Issue 6 Preview Page

Research Article

30 November 2023. pp. 536-543
Abstract
References
1
D. Garcia-Romero, D. Snyder, G. Sell, D. Povey, and A. McCree, "Speaker diarization using deep neural network embeddings," Proc. ICASSP, 4930-4934 (2017). 10.1109/ICASSP.2017.7953094
2
Q. Wang, C. Downey, L. Wan, P. A. Mansfield, and I. L. Moreno, "Speaker diarization with LSTM," Proc. ICASSP, 5239-5243 (2018). 10.1109/ICASSP.2018.8462628
3
M. Diez, L. Burget, S. Wang, J. Rohdin, and H. Cernocký, "Bayesian HMM based x-Vector clustering for speaker diarization," Proc. Interspeech, 346-350 (2019). 10.21437/Interspeech.2019-2813
4
I. Medennikov, M. Korenevsky, T. Prisyach, Y. Khokhlov, M. Korenevskaya, I. Sorokin, T. Timofeeva, A. Mitrofanov, A. Andrusenko, I. Podluzhny, A. Laptev, and A. Romanenko, "Target-speaker voice activity detection: a novel approach for multispeaker diarization in a dinner party scenario," Proc. Interspeech, 274-278 (2020). 10.21437/Interspeech.2020-1602
5
Y. C. Liu, E. Han, C. Lee, and A. Stolcke, "End-to-end neural diarization: From transformer to conformer," Proc. Interspeech, 3081-3085 (2021). 10.21437/Interspeech.2021-1909
6
Z. Du, S. Zhang, S. Zheng, and Z. Yan, "Speaker embedding-aware neural diarization: A novel framework for overlapping speech diarization in the meeting scenario," arXiv preprint arXiv:2203.09767 (2022).
7
Y. Fujita, N. Kanda, S. Horiguchi, K. Nagamatsu, and S. Watanabe, "End-to-end neural speaker diarization with permutation-free objectives," Proc. Interspeech, 4300-4304 (2019). 10.21437/Interspeech.2019-2899
8
Y. Fujita, N. Kanda, S. Horiguchi, Y. Xue, K. Nagamatsu, and S. Watanabe, "End-to-end neural speaker diarization with self-attention," Proc. ASRU, 296-303 (2019). 10.1109/ASRU46091.2019.9003959
9
Y. Yu, D. Park, and H. K. Kim, "Auxiliary loss of transformer with residual connection for end-to-end speaker diarization," Proc. ICASSP, 8377-8381 (2022). 10.1109/ICASSP43922.2022.9746602
10
V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: An asr corpus based on public domain audio books," Proc. ICASSP, 5206-5210 (2015). 10.1109/ICASSP.2015.7178964
11
D. Snyder, G. Chen, and D. Povey, "Musan: A music, speech, and noise corpus," arXiv preprint arXiv:1510. 08484 (2015).
12
T. Ko, V. Peddinti, D. Povey, M. L. Seltzer, and S. Khudanpur, "A study on data augmentation of reverberant speech for robust speech recognition," Proc. ICASSP, 5220-5224 (2017). 10.1109/ICASSP.2017.7953152
13
J. Carletta, "Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus," Lang. Resour. Eval. 41, 181-190 (2007). 10.1007/s10579-007-9040-x
14
A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C. Wooters, "The ICSI meeting corpus," Proc. ICASSP, 364-367 (2003).
15
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Proc. NIPS, 5998-6008 (2017).
16
J. G. Fiscus, J. Ajot, and J. S. Garofolo, The Rich Transcription 2007 Meeting Recognition Evaluation (Springer, Maryland, 2007), pp. 373-389. 10.1007/978-3-540-68585-2_36
17
H. Bredin, R. Yin, J. M. Coria, G. Gelly, P. Korshunov, M. Lavechin, D. Fustes, H. Titeux, W, Bouaziz, and M. P. Gill, "Pyannote. audio: Neural building blocks for speaker diarization," Proc. ICASSP, 7124-7128 (2020). 10.1109/ICASSP40776.2020.9052974
18
H. Bredin and A. Laurent, "End-to-end speaker segmentation for overlap-aware resegmentation," Proc. Interspeech, 3111-3115 (2021). 10.21437/Interspeech.2021-560
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 42
  • No :6
  • Pages :536-543
  • Received Date : 2023-08-08
  • Accepted Date : 2023-09-06