All Issue

2020 Vol.39, Issue 6

Review Article

30 November 2020. pp. 505-514
Abstract
References
1
G. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Computation, 14, 1771-1800 (2002). 10.1162/08997660276012801812180402
2
G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, 18, 1527-1554 (2006). 10.1162/neco.2006.18.7.152716764513
3
A. Krizhevsky, I. Sutskever, and G. Hinton, "Image Net classification with deep convolutional neural networks," Proc. Advances in Neural Information Processing Systems, 1097-1105 (2012).
4
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Processing Magazine, 29, 82-97 (2012). 10.1109/MSP.2012.2205597
5
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," Proc. Advances in Neural Information Processing Systems, 2672-2680 (2014).
6
D. Kingma and M. Welling, "Auto-encoding variational Bayes," arXiv:1312.6114 (2013).
7
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE Conference on Computer Vision and Pattern Recognition, 770- 778 (2016). 10.1109/CVPR.2016.9026180094
8
D. Rumelhart, G. Hinton, and R. Williams, "Learning representations by back-propagating errors," Nature, 323, 533-536 (1986). 10.1038/323533a0
9
A. Paszke, S. Gross, F. Massa, A. Lere, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. Devito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "Pytorch: An imperative style, high-performance deep learning library," Proc. Advances in Neural Information Processing Systems, 8026-8037 (2019).
10
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Ollah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Wardern, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. "Tensorflow: Large-scale machine learning on heterogeneous distributed systems," arXiv:1603.04467 (2016).
11
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hanneman, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The Kaldi speech recognition toolkit," Proc. IEEE Automatic Speech Recognition and Understanding Workshop (2011).
12
S. Smith and Q. Le, "A Bayesian perspective on generalization and stochastic gradient descent," Proc. Int. Conf. on Learning Representations (2018).
13
T. Zhang, "Solving large scale linear prediction problems using stochastic gradient descent algorithms," Proc. Int. Conf. on Machine learning (2004). 10.1145/1015330.1015332PMC303402
14
F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, "On parallelizability of stochastic gradient descent for speech DNNs," Proc. IEEE Int. Conf. on Acoustic, Speech, and Signal Processing, 235-239 (2014). 10.1109/ICASSP.2014.6853593
15
M. Zinkevich, M. Weimer, A. Smola, and L. Li, "Parallelized stochastic gradient descent," Proc. Advances in Neural Information Processing Systems, 2595-2603 (2010).
16
L. Valiant, "A bridging model for parallel computation," Communications of the ACM, 33, 103-111 (1990). 10.1145/79173.79181
17
H. Su, H. Chen, and H. Xu, "Experiments on parallel training of deep neural network using model averaging," arXiv:1507.01239 (2015).
18
J. Hermans, On scalable deep learning and parallelizing gradient descent, (Master Thesis, Maastricht University, 2017).
19
S. Zhang, A. Choromanska, and Y. LeCun, "Deep learning with elastic averaging SGD," Proc. Advances in Neural Information Processing Systems, 685-693 (2015).
20
K. Chen and Q. Huo, "Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering," Proc. IEEE Int. Conf. on Acoustic, Speech, and Signal Processing, 5880-5884 (2016). 10.1109/ICASSP.2016.7472805
21
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng, "Large scale distributed deep networks," Proc. Advances in Neural Information Processing Systems, 1223-1231 (2012).
22
J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz, "Revisiting distributed synchronous SGD," arXiv:1604.00981 (2016).
23
F. Niu, B. Recht, C. Re, and S. Wright, "Hogwild!: A lock-free approach to parallelizing stochastic gradient descent," Proc. Advances in Neural Information Processing Systems, 693-701 (2011).
24
S. Sallinen, N. Satish, M. Smelyanskiy, S. Sury, and C. Re, "High performance parallel stochastic gradient descent in shared memory," Proc. IEEE International Parallel and Distributed Processing Symposium, 873- 882 (2016). 10.1109/IPDPS.2016.107
25
S. Zhao and W. Li, "Fast asynchronous parallel stochastic gradient descent: A lock-free approach with convergence guarantee," Proc. AAAI Conference on Artificial Intelligence, 2379-2385 (2016)
26
X. Lian, W. Zhang, C. Zhang, and J. Liu, "Asynchronous decentralized parallel stochastic gradient descent," Proc. Int. Conf. on Machine Learning, 3049-3058 (2018).
27
I. Mitliagkas, C. Zhang, S. Hadjis, and C. Re, "Asynchrony begets momentum, with an application to deep learning," Proc. Annual Allerton Conference, 997-1004 (2016). 10.1109/ALLERTON.2016.7852343
28
S. Zheng, Q. Meng, T. Wang, W. Chen, N. Yu, Z. Ma, and T. Liu, "Asynchronous stochastic gradient descent with delay compensation," Proc. Int. Conf. on Machine Learning, 4120-4129 (2017).
29
O. Yadan, K. Adams, Y. Taigman, and M. Ranzato, "Multi-GPU training of ConvNets," arXiv:1312.5853 (2013).
30
A. Petrowski, G. Dreyfus, and C. Girault, "Performance analysis of a pipelined backpropagation parallel algorithm," IEEE Trans. on Neural Networks, 4, 970- 981 (1993). 10.1109/72.28689218276527
31
X. Chen, A. Eversole, G. Li, D. Yu, and F. Seide, "Pipelined back-propagation for context-dependent deep neural networks," Proc. Interspeech, 26-29 (2012).
32
Z. Huo, B. Gu, Q. Yang, and H. Huang, "Decoupled parallel backpropagation with convergence guarantee," Proc. Int. Conf. on Machine Learning, 2098-2106 (2018).
33
Z. Huo, B. Gu, and H. Huang, "Training neural networks using features replay," Proc. Advances in Neural Information Processing Systems, 6659-6668 (2018).
34
M. Jaderberg, W. Czarnecki, S. Osindero, O. Vinyals, A. Graves, D. Silver, and K. Kavukcuoglu, "Decoupled neural interfaces using synthetic gradients," Proc. Int. Conf. on Machine Learning, 1627-1635 (2017).
35
C. Chen, C. Yang, and H. Cheng, "Efficient and robust parallel DNN training through model parallelism on multi-GPU platform," arXiv:1809.02839 (2018).
36
Y. Huang, Y. Cheng, A. Bapna, O. Firat, M. Chen, D. Chen, H. Lee, J. Ngiam, Q. Le, Y. Wu, and Z. Chen, "GPipe: Efficient training of giant neural networks using pipeline parallelism," Proc. Advances in Neural Information Processing Systems, 103-112 (2019).
37
H. Lee, K. Lee, I. Yoo, and D. Yook, "Analysis of parallel training algorithms for deep neural networks," Proc. Annual Conference on Computational Science and Computational Intelligence, 1462-1463 (2018). 10.1109/CSCI46756.2018.00291PMC6175979
38
D. Narayanan, A. Harlap, A. Phanishayee, V. Seshadri, N. Devanur, G. Ganger, P. Gibbons, and M. Zaharia, "PipeDream: Generalized pipeline parallelism for DNN training," ACM Symposium on Operating Systems Principles, 1-15 (2019). 10.1145/3341301.3359646
39
S. Watanabe, M. Mandel, J. Barker, E. Vincent, A. Arora, X. Chang, S. Khudanpur, V. Manohar, D. Povey, D. Raj, D. Snyder, A. Subramania, J. Trmal, B. Yair, C. Boeddeker, Z. Ni, Y. Fujita, S. Horiguchi, N. Kanda, and T. Yoshioka, "CHiME-6 Challenge: Tackling multispeaker speech recognition for unsegmented recordings," Proc. Int. Workshop on Speech Processing in Everyday Environments (2020).
Information
  • Publisher :The Acoustical Society of Korea
  • Publisher(Ko) :한국음향학회
  • Journal Title :The Journal of the Acoustical Society of Korea
  • Journal Title(Ko) :한국음향학회지
  • Volume : 39
  • No :6
  • Pages :505-514
  • Received Date : 2020-05-12
  • Revised Date : 2020-08-18
  • Accepted Date : 2020-09-18