Multi-band multi-scale DenseNet with dilated convolution for background music separation

Woon-Haeng Heo; Hyemi Kim; Oh-Wook Kwon

doi:10.7776/ASK.2019.38.6.697

All Issue

2019 Vol.38, Issue 6 Preview Page Next Page

Research Article

Multi-band multi-scale DenseNet with dilated convolution for background music separation 배경음악 분리를 위한 확장된 합성곱을 이용한 멀티 밴드 멀티 스케일 DenseNet

30 November 2019. pp. 697-702

PDF XML

Abstract

We propose a multi-band multi-scale DenseNet with dilated convolution that separates background music signals from broadcast content. Dilated convolution can learn the multi-scale context information represented by spectrogram. In computer simulation experiments, the proposed architecture is shown to improve Signal to Distortion Ratio (SDR) by 0.15 dB and 0.27 dB in 0dB and –10 dB Signal to Noise Ratio (SNR) environments, respectively.

Keywords

Broadcast content

Background music separation

Dilated convolution

DenseNet

방송 콘텐츠의 혼합 신호에서 배경음악 신호를 분리하는 확장된 합성곱을 이용한 멀티 밴드 멀티 스케일 DenseNet을 제안한다. 확장된 합성곱은 스펙트로그램의 다양한 스케일 문맥 정보를 학습하기 용이하도록 한다. 컴퓨터 모의실험 결과, 제안한 구조는 신호대잡음비(Signal to Noise Ratio, SNR) 0 dB, -10 dB의 환경에서 각각 0.15 dB, 0.27 dB의 신호대왜곡비(Signal to Distortion Ratio, SDR)를 개선하였다.

키워드

방송 콘텐츠

배경음악 분리

확장된 합성곱

DenseNet

References

D. D. Lee and H. S. Seung, "Algorithms for non- negative matrix factorization," Proc. NIPS, 556-562 (2001).

J. Le Roux, J. Hershey, and F. Weninger, "Deep NMF for speech separation," Proc. IEEE Int. Conf. Acoust., Speech Signal Process, 66-70 (2015).

10.1109/ICASSP.2015.7177933

A. A. Nugraha, A. Liutkus, and E. Vincent, "Multichannel music separation with deep neural networks," Proc. EUSIPCO. 1748-1752 (2015).

10.1109/EUSIPCO.2016.7760548

A. Jansson, E. Humphrey, N. Montecchio, R. Bittner, A. Kumar, and T. Weyde, "Singing voice separation with deep U-Net convolutional Networks," Proc. ISMIR, 323-332 (2017).

N. Takahashi and Y. Mitsufuji, "Multi-scale multi- band DenseNets for audio source separation," Proc. WASPAA. 261-265 (2017).

10.1109/WASPAA.2017.8169987PMC5397131

O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," Proc. Int. Conf. Medical Image Computing and Computer-Assisted Intervention, 234-241 (2015).

10.1007/978-3-319-24574-4_28

G. Huang, Z. Liu, K. Q. Weinberger, and L. Maaten, "Densely connected convolutional networks," Proc. CVPR. 4700-4708 (2017).

10.1109/CVPR.2017.243PMC5598342

D. Stoller, S. Ewert, and S. Dixon, "Wave-u-net: A multi-scale neural network for end-to-end audio source separation," Proc. ISMIR. (2018).

D. Ward, R. D. Mason, R. C. Kim, F.-R. Stöter, A. Liutkus, and M. D. Plumbley, "SISEC 2018: State of the art in musical audio source separation-subjective selection of the best algorithm," Proc. 4th Workshop on Intelligent Music Production, (2018).

N. Takahashi, P. Agrawal, N. Goswami, and Y. Mitsufuji, "PhaseNet: Discretized phase modeling with deep neural networks for audio source separation," Proc. Interspeech, 2713-2717 (2018).

10.21437/Interspeech.2018-1773

N. Takahashi, N. Goswami, and Y. Mitsufuji, "MM DenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation," Proc. IWAENC. 106-110 (2018).

10.1109/IWAENC.2018.8521383

F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," Proc. Int. Conf. Learn. Representations, (2016).

S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," Proc. ICML. 448-456 (2015).

X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," Proc. AISTATS. 315-323 (2011).

V. Dumoulin and F. Visin, "A guide to convolution arithmetic for deep learning," arXiv preprint arXiv: 1603.07285 (2016).

H. Kim, J. Kim, and J. Park, "Music-speech separation based background music identification in TV programs" (in Korean), Proc. HCI KOREA, 1158- 1161 (2019).

A. Liutkus, F. Stöter, Z. Rafii, D. Kitamura, B. Rivet, N. Ito, N. Ono, and J. Fontecave, "The 2016 Signal separation evaluation campaign," Proc. LVA/ICA. 66-70 (2017).

10.1007/978-3-319-53547-0_31

Information

Publisher :The Acoustical Society of Korea
Publisher(Ko) :한국음향학회
Journal Title :The Journal of the Acoustical Society of Korea
Journal Title(Ko) :한국음향학회지
Volume : 38
No :6
Pages :697-702
Received Date : 2019-09-10
Accepted Date : 2019-09-20
DOI :https://doi.org/10.7776/ASK.2019.38.6.697

The Journal of the Acoustical Society of KoreaISSN:1225-4428(Print) 2287-3775(Online)한국음향학회

All Issue