前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >珍藏 | 近200篇机器学习与深度学习经典论文整理

珍藏 | 近200篇机器学习与深度学习经典论文整理

作者头像
zenRRan
发布2018-12-24 17:29:09
2.7K0
发布2018-12-24 17:29:09
举报

阅读大概需要22分钟 跟随小博主,每天进步一丢丢

转载自SIGAI

这篇文章整理出了机器学习、深度学习领域的经典论文。为了减轻大家的阅读负担,只列出了最经典的一批,如有需要,可以自己根据实际情况补充。

机器学习理论

PCA学习理论

[1] L. Valiant. A theory of the learnable. Communications of the ACM, 27, 1984.

VC维

[1] Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M. K. Learnability and the Vapnik–Chervonenkis dimension. Journal of the ACM. 36 (4): 929–865, 1989.

[2] Natarajan, B.K. On Learning sets and functions. Machine Learning. 4: 67–97, 1989.

[3] Karpinski, Marek; Macintyre, Angus. Polynomial Bounds for VC Dimension of Sigmoidal and General Pfaffian Neural Networks. Journal of Computer and System Sciences. 54 (1): 169–176, 1997.

泛化理论

[1] Wolpert, D.H., Macready, W.G. No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation 1, 67, 1997.

[2] Wolpert, David. The Lack of A Priori Distinctions between Learning Algorithms. Neural Computation, pp. 1341-1390, 1996.

[3] Wolpert, D.H., and Macready, W.G. Coevolutionary free lunches. IEEE Transactions on Evolutionary Computation, 9(6): 721-735, 2005.

[4] Whitley, Darrell, and Jean Paul Watson. Complexity theory and the no free lunch theorem. In Search Methodologies, pp. 317-339. Springer, Boston, MA, 2005.

[5] Kawaguchi, K., Kaelbling, L.P, and Bengio. Generalization in deep learning. 2017.

[6] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. Understanding deep learning requires rethinking generalization. international conference on learning representations, 2017.

最优化理论和方法

[1] L. Bottou. Stochastic Gradient Descent Tricks. Neural Networks: Tricks of the Trade. Springer, 2012.

[2] I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the Importance of Initialization and Momentum in Deep Learning. Proceedings of the 30th International Conference on Machine Learning, 2013.

[3] Duchi, E. Hazan, and Y. Singer. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. The Journal of Machine Learning Research, 2011.

[4] M. Zeiler. ADADELTA: An Adaptive Learning Rate Method. arXiv preprint, 2012.

[5] T. Tieleman, and G. Hinton. RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning.Technical report, 2012.

[6] D. Kingma, J. Ba. Adam: A Method for Stochastic Optimization. International Conference for Learning Representations, 2015.

[7] Hardt, Moritz, Ben Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. Proceedings of The 33rd International Conference on Machine Learning. 2016.

决策树

[1] Breiman, L., Friedman, J. Olshen, R. and Stone C. Classification and Regression Trees, Wadsworth, 1984.

[2] J. Ross Quinlan. Induction of decision trees. Machine Learnin, 1(1): 81-106, 1986.

[3] J. Ross Quinlan. Learning efficient classification procedures and their application to chess end games. 1993.

[4] J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, CA, 1993.

贝叶斯分类器

[1] Rish, Irina. An empirical study of the naive Bayes classifier. IJCAI Workshop on Empirical Methods in Artificial Intelligence, 2001.

数据降维

主成分分析(PCA)

[1] Pearson, K. On Lines and Planes of Closest Fit to Systems of Points in Space. Philosophical Magazine. 2 (11): 559–572. 1901.

[2] Ian T. Jolliffe. Principal Component Analysis. Springer Verlag, New York, 1986.

[3] Scholkopf, B.,Smola,A.,Mulller,K.-P. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299-1319, 1998.

[4] Sebastian Mika,Bernhard Scholkopf,Alexander J Smola,Klausrobert Muller,Matthias Scholz Gun. Kernel PCA and de-noising in feature spaces. neural information processing systems, 1999.

流形学习

[1] Roweis, Sam T and Saul, Lawrence K. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500). 2000: 2323-2326.

[2] Belkin, Mikhail and Niyogi, Partha. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation. 15(6). 2003:1373-1396.

[3] He Xiaofei and Niyogi, Partha. Locality preserving projections. NIPS. 2003:234-241.

[4] Tenenbaum, Joshua B and De Silva, Vin and Langford, John C. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500). 2000: 2319-2323.

[5] Laurens Van Der Maaten, Geoffrey E Hinton. Visualizing Data using t-SNE. 2008, Journal of Machine Learning Research.

线性判别分析(LDA)

[1] Ronald A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7 Part 2: 179-188, 1936.

[2] Geoffrey J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York, 1992.

logistic回归

[1] Cox, DR. The regression analysis of binary sequences (with discussion). J Roy Stat Soc B. 20 (2): 215–242, 1958.

[2] David W Hosmer, Stanley Lemeshow. Applied logistic regression. Technometrics. 2000.

[3] Thomas P. Minka. A comparison of numerical optimizers for logistic regression, 2003.

[4] Kwangmoo Koh, Seung-Jean Kim, and Stephen Boyd. An interior-point method for large scale l1-regularized logistic regression. Journal of Machine Learning Research, 8:1519-1555, 2007.

[5] Chih-Jen Lin, Ruby C.Weng, S.Sathiya Keerthi. Trust Region Newton Method for Large-Scale Logistic Regrression. Journal of Machine Learning Research,9, 627-650, 2008.

[6] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, Chih-Jen Lin. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 9, 1871-1874, 2008.

支持向量机(SVM)

[1] B.E.Boser, I.Guyon, and V. Vapni. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144-152. ACM Press, 1992.

[2] Cortes, C. and Vapnik, V. Support vector networks. Machine Learning, 20, 273-297, 1995.

[3] Bernhard Scholkopf, Christopher J. C. Burges, and Valdimir Vapnik. Extracting support data for a given task. 1995.

[4] Burges JC. A tutorial on support vector machines for pattern recognition. Bell Laboratories, Lucent Technologies, 1997.

[5] Scholkopf, Christopher J. C. Burges, and Alexander J. Smola, editor. Advances in Kernel Methods - Support Vector Learning, Cambridge, MA, MIT Press. 1998.

[6] John C. Platt. Fast training of support vector machines using sequential minimal optimization. 1998.

[7] C.-C. Chang and C.-J. Lin. LIBSVM: a Library for Support Vector Machines. ACM TIST, 2:27:1-27:27, 2011.

距离度量学习

[1] S. Chopra, R. Hadsell, Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005), pages 349-356, San Diego, CA, 2005.

[2] Kilian Q Weinberger, Lawrence K Saul. Distance Metric Learning for Large Margin Nearest Neighbor Classification. Journal of Machine Learning Research, 2009.

集成学习

Bagging与随机森林

[1] Breiman, Leo. Random Forests. Machine Learning 45 (1), 5-32, 2001.

Boosting算法

[1] Freund, Y. Boosting a weak learning algorithm by majority. Information and Computation, 1995.

[2] Yoav Freund, Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. computational learning theory. 1995.

[3] Freund, Y. An adaptive version of the boost by majority algorithm. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, 1999.

[4] R.Schapire. The boosting approach to machine learning: An overview. In MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA, 2001.

[5] Freund Y, Schapire RE. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5):771-780. 1999.

[7] Jerome Friedman, Trevor Hastie and Robert Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics 28(2), 337–407. 2000.

[8] Jerome H Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 2001.

[9] Tianqi Chen, Carlos Guestrin. XGBoost: A Scalable Tree Boosting System. knowledge discovery and data mining, 2016.

[10] Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tieyan Liu. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. neural information processing systems, 2017.

概率图模型

贝叶斯网络

[1] Nir Friedman, Dan Geiger, Moises Goldszmidt. Bayesian Network Classifiers. Machine Learning.1997.

隐马尔可夫模型

[1] Baum, L. E., Petrie, T. Statistical Inference for Probabilistic Functions of Finite State Markov Chains. The Annals of Mathematical Statistics. 37 (6): 1554–1563. 1966.

[2] Baum, L. E., Eagon, J. A. An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bulletin of the American Mathematical Society. 73 (3): 360. 1967.

[3] Baum, L. E., Petrie, T., Soules, G., Weiss, N. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. The Annals of Mathematical Statistics. 41: 164. 1970

[4] Baum, L.E. An Inequality and Associated Maximization Technique in Statistical Estimation of Probabilistic Functions of a Markov Process. Inequalities. 3: 1–8. 1972.

[5] Lawrence R. Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE. 77 (2): 257–286. 1989.

条件随机场

[1] Lafferty, J., McCallum, A., Pereira, F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th International Conf. on Machine Learning. Morgan Kaufmann. pp. 282–289. 2001.

人工神经网络

[1] Anil K. Jain, Jianchang Mao, and K. Moidin Mohiuddin. Artificial neural networks: A tutorial. Computer, 29(3):31-44, 1996.

[2] Richard Lippmann. An introduction to computing with neural nets. IEEE ASSP Magazine, page 4-22, April 1987.

[3] David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning internal representations by back-propagating errors. Nature, 323(99): 533-536, 1986.

[4] Robert Hecht-Nielsen. Theory of the backpropagation neural network. In Proceeding of the International Joint Conference on Neural Networks(IJCNN), volume 1, 593-605. IEEE, New York, 1989.

[5] Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks.1991.

[6] Hava T Siegelmann, Eduardo D Sontag. On the computational power of neural nets. conference on learning theory. 1992.

[7] Hornik, K., Stinchcombe, M., and White, H. Multilayer feedforward networks are universal approximators. Neural Networks, 2, 359-366, 1989.

[8] Cybenko, G. Approximation by superpositions of a sigmoid function. Mathematics of Control, Signals, and Systems, 2, 303-314, 1989.

[9] Hornik, K., Stinchcombe, M., and White, H. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural networks, 3(5), 551-560, 1990.

[10] Leshno, M., Lin, V. Y., Pinkus, A., and Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Networks, 6, 861-867, 1993.

[11] Barron, A. E. Universal approximation bounds for superpositions of a sigmoid function. IEEE Transactions on Information Theory, 39, 930-945, 1993.

[12] Raman Arora, Amitabh Basu, Poorya Mianjy, Anirbit Mukherjee. Understanding Deep Neural Networks with Rectified Linear Units. Electronic Colloquium on Computational Complexity. 2016.

[13] Xavier Glorot, Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. Journal of Machine Learning Research. 2010.

[14] M. Riedmiller and H. Braun, A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm, Proc. ICNN, San Francisco (1993).

[15] Choromanska A, Henaff M, Mathieu M, Ben ArousG, Le Cun Y. The loss surfaces of multilayer networks. arXiv:1412.0233. 2014.

[16] Gori, M, Tesi, A. On the problem of local minima in backpropagation. IEEE Transcations on Pattern Analysis and Machine Intelligence, 14(1), 76-86. 1992.

[17] Saxe, A. M., McClelland, J. L., Ganguli, S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. ICLR. 2013.

[18] Dauphin, Y., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y. 2014. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. NIPS 2014.

[19] Goodfellow, I. J., Vinyals, O., Saxe, A. M. Qualitatively characterizing neural network optimization problems. International Conference on Learning Representations. 2015.

卷积神经网络

[1] LeCun, B.Boser, J.S.Denker, D.Henderson, R.E.Howard, W.Hubbard, and L.D.Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computation, 1989.

[2] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Handwritten digit recognition with a back-propagation network. In David Touretzky, editor, Advances in Neural Information Processing Systems 2 (NIPS*89), Denver, CO, Morgan Kaufman, 1990.

[3] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, november 1998.

[4] Alex Krizhevsky, Ilya Sutskever, Geoffrey E.Hinton. ImageNet Classification with Deep Convolutional Neural Networks. 2012.

[5] Zeiler M D, Fergus R. Visualizing and Understanding Convolutional Networks. European Conference on Computer Vision, 2013.

[6] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Going Deeper with Convolutions, Arxiv Link: http://arxiv.org/abs/1409.4842.

[7] K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. international conference on learning representations. 2015.

[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. computer vision and pattern recognition, 2015.

[9] Andreas Veit,Michael J Wilber, Serge J Belongie. Residual Networks Behave Like Ensembles of Relatively Shallow Networks. neural information processing systems, 2016.

[10] Long J, Shelhamer E, Darrell T, et al. Fully convolutional networks for semantic segmentation. Computer Vision and Pattern Recognition, 2015.

循环神经网络

[1] Ronald J Williams, David Zipser. A learning algorithm for continually running fully recurrent neural networks. Neural Computation. 1989.

[2] Mikael Boden. A guide to recurrent neural networks and backpropagation. 2001.

[3] Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio. How to Construct Deep Recurrent Neural Networks. international conference on learning representations. 2014.

[4] Fernando J Pineda. Generalization of back-propagation to recurrent neural networks. Physical Review Letters. 1987.

[5] Paul J Werbos. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE.1990.

[6] Xavier Glorot, Yoshua Bengio. On the difficulty of training recurrent neural networks. international conference on machine learning. 2013.

[7] Y. Bengio, P. Simard, P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157-166, 1994.

[8] S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural computation, 9(8): 1735-1780, 1997.

[9] M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673-2681, 1997.

[10] Alex Graves. Supervised Sequence Labelling with Recurrent Neural Networks. 2012.

[11] Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio. Gated Feedback Recurrent Neural Networks. international conference on machine learning. 2015.

[12] Alex Graves, Santiago Fernandez, Faustino J Gomez, Jurgen Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks.international conference on machine learning. 2006.

[13] A. Graves, A.Mohamed, G. Hinton, Speech Recognition with Deep Recurrent Neural Networks, ICASSP 2013.

[14] Ilya Sutskever, Oriol Vinyals, Quoc V Le. Sequence to Sequence Learning with Neural Networks. neural information processing systems. 2014.

[15] Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan Cernocký, Sanjeev Khudanpur. Recurrent neural network based language model. 2010, conference of the international speech communication association.

[16] Graves, Alex. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).

自动编码器

[1] J. Deng, Z. X. Zhang, M. Erik. Sparse auto-encoder based feature transfer learning for speech emotion recognition[J]. Proc. of Humaine Association Conference on Affective Computing and Intelligent Interaction, 511-516, 2013.

[2] J. Gehring, Y. J. Miao, F. Metze. Extracting deep bottleneck features using stacked auto-encoders[J]. Proc. of the 26th IEEE International Conference on Acoustics, Speech and Signal Processing, 2013: 3377-3381.

[3] Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, Yoshua Bengio. Contractive Auto-Encoders: Explicit Invariance During Feature Extraction. international conference on machine learning, 2011.

[4] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierreantoine Manzagol. Extracting and composing robust features with denoising auto encoders. international conference on machine learning, 2008.

[5] Junbo Jake Zhao,Michael Mathieu , Ross Goroshin , Yann Lecun. Stacked What-Where Auto-encoders. arXiv: Machine Learning, 2015.

[6] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierreantoine Manzagol. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. Journal of Machine Learning Research, 2010.

[7] G.E.Hinton, A.Krizhevsky, S.D.Wang. Transforming Auto-encoders. international conference on artificial neural networks, 2011.

[8] G.E.Hinton, et al. Reducing the Dimensionality of Data with Neural Networks, Science 313, 504,2006.

受限玻尔兹曼机

[1] Nicolas Le Roux, Yoshua Bengio. Representational power of Restricted Boltzmann Machines and deep belief networks. Neural Computation., 2008.

[2] Geoffrey E Hinton.A Practical Guide to Training Restricted Boltzmann Machines. 2012.

[3] Ruslan Salakhutdinov, Geoffrey E Hinton. Deep Boltzmann Machines. international conference on artificial intelligence and statistics, 2009.

[4] Ruslan Salakhutdinov, Andriy Mnih, Geoffrey E Hinton. Restricted Boltzmann machines for collaborative filtering. international conference on machine learning, 2007.

生成对抗网络

[1] Goodfellow Ian, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Advances in Neural Information Processing Systems, 2672-2680, 2014.

[2] Mirza M, Osindero S. Conditional Generative Adversarial Nets. Computer Science, 2672-2680, 2014.

[3] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.

[4] Denton E L, Chintala S, Fergus R. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. Advances in neural information processing systems. 1486-1494, 2015.

[5] S Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele,Honglak Lee. Generative Adversarial Text to Image

Synthesis. international conference on machine learning, 2016.

[6] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. Computer Vision and Pattern Recognition, 2016.

[7] Chen X, Duan Y, Houthooft R, et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[J]. arXiv preprint arXiv:1606.03657, 2016.

[8] Martin Arjovsky, Leon Bottou. Towards Principled Methods for Training Generative Adversarial Networks. ICLR 2017.

变分自动编码器

[1] Diederik P Kingma, Max Welling. Auto-Encoding Variational Bayes. international conference on learning representations, 2014.

深度模型压缩与优化

[1] Song Han, Huizi Mao, William J Dally. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. international conference on learning representations, 2016.

[2] Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, Ali Farhadi. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. european conference on computer vision, 2016.

[3] Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran Elyaniv, Yoshua Bengio. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv: Learning, 2016.

[4] Han, Song, Pool, Jeff, Tran, John, and Dally, William J. Learning both weights and connections for efficient neural networks. In Advances in Neural Information Processing Systems, 2015.

[5] Denton, E.L, Zaremba, W, Bruna, J, LeCun, Y, Fergus, R. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems. 1269-1277, 2014.

[6] Jaderberg. M, Vedaldi, A. Zisserman, A. Speeding up convolutional neural networks with low rank expansions. arXiv: 1405.3866, 2014.

[7] Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.

[8] Andrew G, Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. 2017.

聚类算法

[1] MacQueen, J. B. Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. 1. University of California Press. pp. 281–297, 1967

[2] Martin Ester, Hanspeter Kriegel, Jorg Sander, Xu Xiaowei. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, 1996.

[3] Mihael Ankerst, Markus M Breunig, Hanspeter Kriegel, Jorg Sander. OPTICS: Ordering Points To Identify the Clustering Structure. ACM SIGMOD international conference on Management of data. ACM Press. pp. 49–60, 1999.

[4] Yizong Cheng. Mean Shift, Mode Seeking, and Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995.

[5] J C Dunn. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. Cybernetics and Systems, 1973.

[6] Charles Elkan. Using the triangle inequality to accelerate k-means. international conference on machine learning, 2003.

[7] Arthur P Dempster, Nan M Laird, Donald B Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the royal statistical society series b-methodological, 1976.

[8] Dan Pelleg, Andrew W Moore. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In ICML (Vol. 1), 2000.

[9] Greg Hamerly, Charles Elkan. Learning the k in k-means. Advances in neural information processing systems, 2004.

[10] M Emre Celebi, Hassan A Kingravi, Patricio A Vela. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Systems With Applications, 2013.

[11] Andrew Y Ng, Michael I Jordan, Yair Weiss. On Spectral Clustering: Analysis and an algorithm. neural information processing systems, 2002.

[12] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 2007.

[13] Dorin Comaniciu, Peter Meer. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002.

半监督学习

[1] Olivier Chapelle, Bernhard Schlkopf, Alexander Zien. Semi-supervised learning. Cambridge. Mass: MIT Press. ISBN 978-0-262-03358-9.

[2] Isaac Triguero, Salvador Garcia, Francisco Herrera. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems, 2015.

[3] Zhu, Xiaojin. Semi-supervised learning literature survey. Computer Sciences, University of Wisconsin-Madison, 2008.

[6] Antti Rasmus, Harri Valpola, Mikko Honkala, Mathias Berglund, Tapani Raiko. Semi-supervised learning with Ladder networks. neural information processing systems, 2015.

[7] M. Belkin, P. Niyogi. Semi-supervised Learning on Riemannian Manifolds. Machine Learning. 56 (Special Issue on Clustering): 209–239, 2004.

[8] Diederik P Kingma, Shakir Mohamed, Danilo Jimenez Rezende, Max Welling. Semi-supervised Learning with Deep Generative Models. neural information processing systems, 2014.

[9] Thorsten Joachims. Transductive inference for text classification using support vector machines. international conference on machine learning, 1999.

强化学习

[1] Kaelbling, Leslie P., Littman, Michael L., Moore, Andrew W. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research. 4: 237-285, 1996.

[2] Richard Sutton. Learning to predict by the methods of temporal differences. Machine Learning. 3 (1): 9-44.1988.

[3] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou. Playing Atari with Deep Reinforcement Learning. NIPS 2013.

[4] Mnih, Volodymyr, et al. Human-level control through deep reinforcement learning. Nature. 518 (7540): 529-533, 2015.

[5] John N Tsitsiklis, B Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 1997.

[6] Richard S Sutton, David A Mcallester, Satinder P Singh, Yishay Mansour. Policy Gradient methods for reinforcement learning with function approximation. neural information processing systems, 2000.

[7] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa. Continuous Control With Deep Reinforcement Learning. international conference on learning representations, 2016.

[8] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Tim Harley, Timothy P Lill. Asynchronous methods for deep reinforcement learning. international conference on machine learning, 2016.

[9] Yuxi Li. Deep Reinforcement Learning: An Overview. 2017.

[10] David Silver, et al. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 2016.

[11] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez. AlphaZero: Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv: Artificial Intelligence, 2017.

[12] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279–292, 1992.

[13] Gerald Tesauro. Temporal difference learning and TD-gammon. Communications of the ACM, 38(3):58–68, 1995.

PS:是不是干货满满?还不收藏或转发起来吗?

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2018-12-04,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 深度学习自然语言处理 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档