现如今不断有公开数据集涌现出来,例如MNIST,CIFAR10,ImageNet等等。我们也可以通过一些公开的网站获取各种数据集,例如Kaggle, Google Dataset Search以及Elsevier Data Search等等。但是对于一些特殊的任务,尤其是医疗或者涉及到个人隐私的任务,由于数据很难获取,所以通常很难找到一个合适的数据集或者数据集很小。解决这一问题主要有两种思路:数据生成和数据搜索。
数据生成
图像:
Cubuk, Ekin D., et al. “Autoaugment: Learning augmentation policies from data.” arXiv preprint arXiv:1805.09501 (2018).
语音:
-Park, Daniel S., et al. “Specaugment: A simple data augmentation method for automatic speech recognition.” arXiv preprint arXiv:1904.08779(2019).
文本
Xie, Ziang, et al. “Data noising as smoothing in neural network language models.” arXiv preprint arXiv:1703.02573 (2017).
Yu, Adams Wei, et al. “Qanet: Combining local convolution with global self-attention for reading comprehension.” arXiv preprint arXiv:1804.09541 (2018).
GAN
Karras, Tero, Samuli Laine, and Timo Aila. “A style-based generator architecture for generative adversarial networks.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
模拟器
Brockman, Greg, et al. “Openai gym.” arXiv preprint arXiv:1606.01540 (2016).
数据搜索
Roh, Yuji, Geon Heo, and Steven Euijong Whang. “A survey on data collection for machine learning: a big data-ai integration perspective.” arXiv preprint arXiv:1811.03402(2018).
Yarowsky, David. “Unsupervised word sense disambiguation rivaling supervised methods.” 33rd annual meeting of the association for computational linguistics. 1995.
Zhou, Yan, and Sally Goldman. “Democratic co-learning.” 16th IEEE International Conference on Tools with Artificial Intelligence. IEEE, 2004.
数据清洗
Krishnan, Sanjay, and Eugene Wu. “Alphaclean: Automatic generation of data cleaning pipelines.” arXiv preprint arXiv:1904.11827 (2019).
Chu, Xu, et al. “Katara: A data cleaning system powered by knowledge bases and crowdsourcing.” Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015.
Krishnan, Sanjay, et al. “Activeclean: An interactive data cleaning framework for modern machine learning.” Proceedings of the 2016 International Conference on Management of Data. ACM, 2016.
Krishnan, Sanjay, et al. “SampleClean: Fast and Reliable Analytics on Dirty Data.” IEEE Data Eng. Bull. 38.3 (2015): 59-75.
特征工程
特征工程可分为三个部分:
特征选择
特征构造
H. Vafaie and K. De Jong, “Evolutionary feature space transformation,” in Feature Extraction, Construction and Selection. Springer, 1998, pp. 307–323
J. Gama, “Functional trees,” Machine Learning, vol. 55, no. 3, pp. 219–250, 2004.
D. Roth and K. Small, “Interactive feature space construction using semantic information,” in Proceedings of the Thirteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 2009, pp. 66–74.
特征提取
Q. Meng, D. Catchpoole, D. Skillicom, and P. J. Kennedy, “Relational autoencoder for feature extraction,” in 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017, pp. 364–371.
O. Irsoy and E. Alpaydın, “Unsupervised feature extraction with autoencoder trees,” Neurocomputing, vol. 258, pp. 63–73, 2017.
B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning.” [Online]. Available:http://arxiv.org/abs/1611.01578
H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean, “Efficient neural architecture search via parameter sharing,” vol. ICML. [Online]. Available: http://arxiv.org/abs/1802.03268
H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean, “Efficient neural architecture search via parameter sharing,” vol. ICML. [Online]. Available: http://arxiv.org/abs/1802.03268
B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition.” [Online]. Available: http://arxiv.org/abs/1707.07012
Z. Zhong, J. Yan, W. Wu, J. Shao, and C.-L. Liu, “Practical block-wise neural network architecture generation.” [Online]. Available: http://arxiv.org/abs/1708.05552
B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing neural network architectures using reinforcement learning,” vol. ICLR. [Online]. Available: http://arxiv.org/abs/1611.02167
E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. Le, and A. Kurakin, “Large-scale evolution of image classifiers.” [Online]. Available: http://arxiv.org/abs/1703.01041
E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search.” [Online]. Available: http://arxiv.org/abs/1802.01548
T. Chen, I. Goodfellow, and J. Shlens, “Net2net: Accelerating learning via knowledge transfer,” arXiv preprint arXiv:1511.05641, 2015.
T. Elsken, J. H. Metzen, and F. Hutter, “Efficient multi-objective neural architecture search via lamarckian evolution.” [Online]. Available: http://arxiv.org/abs/1804.09081
H. Cai, T. Chen, W. Zhang, Y. Yu, and J. Wang, “Efficient architecture search by network transformation,” in Thirty-Second AAAI Conference on Artificial Intelligence, 2018
搜索策略
网格搜索
H. H. Hoos, Automated Algorithm Configuration and Parameter Tuning, 2011
I. Czogiel, K. Luebke, and C. Weihs, Response surface methodology for optimizing hyper parameters. Universitatsbibliothek Dortmund, 2006.
C.-W. Hsu, C.-C. Chang, C.-J. Lin et al., “A practical guide to support vector classification,” 2003.
J. Y. Hesterman, L. Caucci, M. A. Kupinski, H. H. Barrett, and L. R. Furenlid, “Maximum-likelihood estimation with a contracting-grid search algorithm,” IEEE transactions on nuclear science, vol. 57, no. 3, pp. 1077–1084, 2010.
随机搜索
J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” p. 25.
H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, “An empirical evaluation of deep architectures on problems with many factors of variation,” in Proceedings of the 24th international conference on Machine learning. ACM, 2007, pp. 473–480.
L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar, “Hyperband: A novel bandit-based approach to hyperparameter optimization.” [Online]. Available: http://arxiv.org/abs/1603.06560
强化学习
B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning.” [Online]. Available: http://arxiv.org/abs/1611.01578
B. Baker, O. Gupta, N. Naik, and R. Raskar, “Designing neural network architectures using reinforcement learning,” vol. ICLR. [Online]. Available: http://arxiv.org/abs/1611.02167
H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean, “Efficient neural architecture search via parameter sharing,” vol. ICML. [Online]. Available: http://arxiv.org/abs/1802.03268
B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition.” [Online]. Available: http://arxiv.org/abs/1707.07012
Z. Zhong, J. Yan, W. Wu, J. Shao, and C.-L. Liu, “Practical block-wise neural network architecture generation.” [Online]. Available: http://arxiv.org/abs/1708.05552
M. Suganuma, S. Shirakawa, and T. Nagao, “A genetic programming approach to designing convolutional neural network architectures.” [Online]. Available: http://arxiv.org/abs/1704.00764
E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. Le, and A. Kurakin, “Large-scale evolution of image classifiers.” [Online]. Available: http://arxiv.org/abs/1703.01041
T. Elsken, J. H. Metzen, and F. Hutter, “Efficient multi-objective neural architecture search via lamarckian evolution.” [Online]. Available: http://arxiv.org/abs/1804.09081
J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimization of machine learning algorithms,” in Advances in neural information processing systems, 2012, pp. 2951–2959.
S. Falkner, A. Klein, and F. Hutter, “BOHB: Robust and efficient hyperparameter optimization at scale,” p. 10.
F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Sequential model-based optimization for general algorithm configuration,” in Learning and Intelligent Optimization, C. A. C. Coello, Ed. Springer Berlin Heidelberg, vol. 6683, pp. 507–523. [Online]. Available: http://link.springer.com/10.1007/978-3-642-25566-3 40
J. Bergstra, D. Yamins, and D. D. Cox, “Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures,” p. 9.
A. Klein, S. Falkner, S. Bartels, P. Hennig, and F. Hutter, “Fast bayesian optimization of machine learning hyperparameters on large datasets.” [Online]. Available: http://arxiv.org/abs/1605.07079
梯度下降算法
H. Liu, K. Simonyan, and Y. Yang, “DARTS: Differentiable architecture search.” [Online]. Available: http://arxiv.org/abs/1806.09055
S. Saxena and J. Verbeek, “Convolutional neural fabrics,” in Advances in Neural Information Processing Systems, 2016, pp. 4053–4061.
K. Ahmed and L. Torresani, “Connectivity learning in multi-branch networks,” arXiv preprint arXiv:1709.09582, 2017.
R. Shin, C. Packer, and D. Song, “Differentiable neural network architecture search,” 2018.
D. Maclaurin, D. Duvenaud, and R. Adams, “Gradient-based hyperparameter optimization through reversible learning,” in International Conference on Machine Learning, 2015, pp. 2113–2122.
F. Pedregosa, “Hyperparameter optimization with approximate gradient,” arXiv preprint arXiv:1602.02355, 2016.
S. H. Han Cai, Ligeng Zhu, “PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE,” 2019
G. D. H. Andrew Hundt, Varun Jain, “sharpDARTS: Faster and More Accurate Differentiable Architecture Search,” Tech. Rep. [Online]. Available: https://arxiv.org/pdf/1903.09900.pdf
Klein, S. Falkner, S. Bartels, P. Hennig, and F. Hutter, “Fast bayesian optimization of machine learning hyperparameters on large datasets.” [Online]. Available: http://arxiv.org/abs/1605.07079
B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition.” [Online]. Available: http://arxiv.org/abs/1707.07012
E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search.” [Online]. Available: http://arxiv.org/abs/1802.01548
A. Zela, A. Klein, S. Falkner, and F. Hutter, “Towards automated deep learning: Efficient joint neural architecture and hyperparameter search.” [Online]. Available: http://arxiv.org/abs/1807.06906
Y.-q. Hu, Y. Yu, W.-w. Tu, Q. Yang, Y. Chen, and W. Dai, “Multi-Fidelity Automatic Hyper-Parameter Tuning via Transfer Series Expansion ,” p. 8, 2019.
迁移学习
C. Wong, N. Houlsby, Y. Lu, and A. Gesmundo, “Transfer learning with neural automl,” in Advances in Neural Information Processing Systems, 2018, pp. 8356–8365.
T. Wei, C. Wang, Y. Rui, and C. W. Chen, “Network morphism,” in International Conference on Machine Learning, 2016, pp. 564–572.
T. Chen, I. Goodfellow, and J. Shlens, “Net2net: Accelerating learning via knowledge transfer,” arXiv preprint arXiv:1511.05641, 2015.
基于代理(surrogate-based)
K. Eggensperger, F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Surrogate benchmarks for hyperparameter optimization.” in MetaSel@ ECAI, 2014, pp. 24–31.
C. Wang, Q. Duan, W. Gong, A. Ye, Z. Di, and C. Miao, “An evaluation of adaptive surrogate modeling based optimization with two benchmark problems,” Environmental Modelling & Software, vol. 60, pp. 167–179,2014.
K. Eggensperger, F. Hutter, H. Hoos, and K. Leyton-Brown, “Efficient benchmarking of hyperparameter optimizers via surrogates,” in Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
K. K. Vu, C. D’Ambrosio, Y. Hamadi, and L. Liberti, “Surrogate-based methods for black-box optimization,” International Transactions in Operational Research, vol. 24, no. 3, pp. 393–424, 2017.
C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy, “Progressive neural architecture search.” [Online]. Available: http://arxiv.org/abs/1712.00559
早停(early-stopping)
A. Klein, S. Falkner, J. T. Springenberg, and F. Hutter, “Learning curve prediction with bayesian neural networks,” 2016.
B. Deng, J. Yan, and D. Lin, “Peephole: Predicting network performance before training,” arXiv preprint arXiv:1712.03351, 2017.
T. Domhan, J. T. Springenberg, and F. Hutter, “Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves,” in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
M. Mahsereci, L. Balles, C. Lassner, and P. Hennig, “Early stopping without a validation set,” arXiv preprint arXiv:1703.09580, 2017.