All Issue

2020 Vol.39, Issue 5 Preview Page

Research Article

30 September 2020. pp. 414-423
Abstract
References
1
P. K. Atrey, N. C. Maddage, and M. S. Kankanhlli, "Audio based event detection for multimedia surveillance," Proc. IEEE ICASSP. 5. V-V (2006).
2
J. Maxime, X. Alameda-Pineda, L. Girin, and R. Horaud, "Sound representation and classification benchmark for domestic robots," Proc. IEEE ICRA. 6285- 6292 (2014).
10.1109/ICRA.2014.6907786
3
D. Stowell, M. Wood, Y. Stylianou, and H. Glotin, "Bird detection in audio: a survey and a challenge," Proc. IEEE 26th MLSP. 1-6 (2016).
10.1109/MLSP.2016.773887530241180
4
D. Stowell and M. D. Plumbley, "Audio-only bird classification using unsupervised feature learning," Proc. CLEF. 673-684 (2014).
5
K. Ko, J. Park, D. K. Han, and H. Ko, "Channel and frequency attention module for diverse animal sound classification," IEICE Trans. on Information and Systems, E102-D, 2615-2618 (2019).
10.1587/transinf.2019EDL8128
6
S. Park, M. Elhilali, D. K. Han, and H. Ko, "Amphibian sounds generating network based on adversarial learning," IEEE Signal Processing Letters, 27, 640- 644 (2020).
10.1109/LSP.2020.2988199
7
K. Ko, S. Park, and H. Ko, "Convolutional neural netework based amphibian sound classification using covariance and modulogram" (in Korean), J. Acoust. Soc. Kr. 37, 61-65 (2018).
8
D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, "Detection and classification of acoustic scenes and events," IEEE Trans. Multimedia, 17, 1733-1746 (2015).
10.1109/TMM.2015.2428998
9
G. Parascandolo, H. Huttunen, and T. Virtanen, "Recurrent neural networks for polyphonic sound event detection in real life recordings," Proc. IEEE ICASSP. 6440-6444 (2016).
10.1109/ICASSP.2016.7472917
10
A. Mesaros, T. Heittola, and T. Virtanen, "TUT database for acoustic scene classification and sound event detection," Proc. 24th EUSIPCO. 1128-1132 (2016).
10.1109/EUSIPCO.2016.7760424
11
S.-Y. Chou, J.-S. R. Jang, and Y.-H. Yang, "Frame CNN: A weakly-supervised learning framework for frame-wise acoustic event detetion and classification," DACSE. Tech. Rep., 2017.
12
A. Kumar and B. Raj, "Deep cnn framework for audio event recognition using weak labeled web data," arXiv: 1707.02530 (2017).
13
Y. Xu, Q. Kong, W. Wang, and M. D. Plumbley, "Large-scale weakly supervised audio classification using gated convolutional neural network," Proc. IEEE ICASSP. 121-125 (2018).
10.1109/ICASSP.2018.8461975
14
Q. Kong, Y. Xu, I. Sobieraj, W. Wang, and M. D. Plumbley, "Sound event detection and time-frequency segmentation from weak labelled data," IEEE/ACM Trans. on Audio, Speech, And Lang. Processing, 27, 777-787 (2019).
10.1109/TASLP.2019.2895254
15
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv: 1409.1556 (2014).
16
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE CVPR. 770-778 (2016).
10.1109/CVPR.2016.9026180094
17
Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," Proc. PMLR. 70, 933-941 (2017).
18
Y. Chen, Q. Guo, X. Liang, J. Wang, and Y. Qian, "Environmental sound classification with dilated convolutions," Applied Acoustics, 148, 123-132 (2019).
10.1016/j.apacoust.2018.12.019
19
J. Salamon, D. MacConnell, M. Cartwright, P. Li, and J. P. Bello, "SCAPER:a library for soundscape synthesis and augmentation," Proc. IEEE WASPAA. 344-348 (2017).
10.1109/WASPAA.2017.8170052
20
A. Kolesnikov and C. H. Lampert, "Seed, expand and constrain: Three principles for weakly-supervised image segmentation," Proc. ECCV. 695-711 (2016).
10.1007/978-3-319-46493-0_42
21
Q. Kong, T. Iqbal, Y. Xu, W. Wang, and M. D. Plumbley, "DCASE 2018 challenge baseline with convolutional neural networks," DACSE. Tech. Rep., 2018.
22
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Tuda, and K. Takeda, "Weakly-supervised sound event detection with self-attention," Proc. IEEE ICASSP. 66-70 (2020).
10.1109/ICASSP40776.2020.9053609PMC6960455
23
Y. Li, M. Liu, K. Drossos, and T. Virtanen, "Sound event detection via dilated convolutional recurrent neural networks," Proc. IEEE ICASSP. 286-290 (2020).
10.1109/ICASSP40776.2020.9054433
24
D. Kingma and J. Ba, "Adam: a method for stochastic optimization," arXiv:1412.6980 (2015).
25
S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," Proc. 32nd ICML. 448-456 (2015).
26
J. A. Hanley and B. J. McNeil, "The meaning and use of the area under a receiver operating characteristic (ROC) curve," Radiology, 431, 29-36 (1982).
10.1148/radiology.143.1.70637477063747
27
R. Girshich, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proc. IEEE CVPR. 580- 587 (2014).
10.1109/CVPR.2014.81
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 39
  • No :5
  • Pages :414-423
  • Received Date : 2020-07-24
  • Accepted Date : 2020-08-27