Goodfellow I, Bengio Y and Courville A. Deep Learning. Cambridge: MIT Press, 2016.
Settles B. Active learning literature survey. Technical Re- port 1648. Department of Computer Sciences, University of Wisconsin at Madison, Wisconsin, WI, 2010 [ http://pages. cs.wisc.edu/∼bsettles/pub/settles.activelearning.pdf].
Chapelle O, Scho ̈lkopf B and Zien A (eds). Semi-Supervised Learning. Cambridge: MIT Press, 2006.
Zhu X. Semi-supervised learning literature survey. Technical Report 1530. Department of Computer Sciences, University of Wisconsin at Madison, Madison, WI, 2008 [ http://www.cs. wisc.edu/∼jerryzhu/pub/ssl ̇survey.pdf].
Zhou Z-H and Li M. Semi-supervised learning by disagreement. Knowl Inform Syst 2010; 24: 415–39.
Huang SJ, Jin R and Zhou ZH. Active learning by querying informative and representative examples. IEEE Trans Pattern Anal Mach Intell 2014; 36: 1936–49.
Lewis D and Gale W. A sequential algorithm for training text classi ers. In 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 1994; 3–12.
Seung H, Opper M and Sompolinsky H. Query by committee. In 5th ACM Workshop on Computational Learning Theory, Pitts- burgh, PA, 1992; 287–94.
Abe N and Mamitsuka H. Query learning strategies using boosting and bagging. In 15th International Conference on Ma- chine Learning, Madison, WI, 1998; 1–9.
Nguyen HT and Smeulders AWM. Active learning using pre- clustering. In 21st International Conference on Machine Learn- ing, Banff, Canada, 2004; 623–30.
Dasgupta S and Hsu D. Hierarchical sampling for active learn- ing. In 25th International Conference on Machine Learning, Helsinki, Finland, 2008; 208–15.
Wang Z and Ye J. Querying discriminative and representative samples for batch mode active learning. In 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, 2013; 158–66.
Dasgupta S, Kalai AT and Monteleoni C. Analysis of perceptron-based active learning. In 28th Conference on Learn- ing Theory, Paris, France, 2005; 249–63.
Dasgupta S. Analysis of a greedy active learning strategy. In Advances in Neural Information Processing Systems 17, Cambridge, MA: MIT Press, 2005; 337–44.
Ka ̈a ̈ria ̈inen M. Active learning in the non-realizable case. In 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, 2006; 63–77.
Balcan MF, Broder AZ and Zhang T. Margin based active learn- ing. In 20th Annual Conference on Learning Theory, San Diego, CA, 2007; 35–50.
Hanneke S. Adaptive rates of convergence in active learning. In 22nd Conference on Learning Theory, Montreal, Canada, 2009.
Wang W and Zhou ZH. Multi-view active learning in the non-realizable case. In Advances in Neural Information Processing Systems 23, Cambridge, MA: MIT Press, 2010; 2388–96.
Miller DJ and Uyar HS. A mixture of experts classi er with learning based on both labelled and unlabelled data. In Advances in Neural Information Processing Systems 9, Cam- bridge, MA: MIT Press, 1997; 571–7.
Nigam K, McCallum AK and Thrun S et al. Text classi cation from labeled and unlabeled documents using EM. Mach Learn 2000; 39: 103–34.
Dempster AP, Laird NM and Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B Stat Meth 1977; 39: 1–38.
Fujino A, Ueda N and Saito K. A hybrid genera- tive/discriminative approach to semi-supervised classier design. In 20th National Conference on Articial Intelligence, Pittsburgh, PA, 2005; 764–9.
Blum A and Chawla S. Learning from labeled and unlabeled data using graph mincuts. In ICML, 2001; 19–26.
Zhu X, Ghahramani Z and Lafferty J. Semi-supervised learn- ing using Gaussian elds and harmonic functions. In 20th International Conference on Machine Learning, Washington, DC, 2003; 912–9.
Zhou D, Bousquet O and Lal TN et al. Learning with local and global consistency. In Advances in Neural Information Processing Systems 16, Cambridge, MA: MIT Press, 2004; 321–8.
Carreira-Perpinan MA and Zemel RS. Proximity graphs for clustering and manifold learning. In Advances in Neural Information Processing Systems 17, Cambridge, MA: MIT Press, 2005; 225–32.
Wang F and Zhang C. Label propagation through linear neighborhoods. In 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006; 985–92.
Hein M and Maier M. Manifold denoising. In Advances in Neural Information Processing Systems 19, Cambridge, MA: MIT Press, 2007; pp. 561–8.
Joachims T. Transductive inference for text classi cation using support vector machines. In 16th International Conference on Machine Learning, Bled, Slovenia, 1999; 200–9.
Chapelle O and Zien A. Semi-supervised learning by low density separation. In 10th International Workshop on Articial Intelligence and Statistics, Barbados, 2005; 57–64.
Li YF, Tsang IW and Kwok JT et al. Convex and scalable weakly labeled SVMs. J Mach Learn Res 2013; 14: 2151–88.
Blum A and Mitchell T. Combining labeled and unlabeled data with co- training. In 11th Conference on Computational Learning Theory, Madison, WI, 1998; 92–100.
Zhou Z-H and Li M. Tri-training: exploiting unlabeled data using three classiers. IEEE Trans Knowl Data Eng 2005; 17: 1529–41.
Zhou Z-H. When semi-supervised learning meets ensemble learning. In 8th International Workshop on Multiple Classi er Systems, Reykjavik, Iceland, 2009; 529–38.
Cozman FG and Cohen I. Unlabeled data can degrade classi cation performance of generative classi ers. In 15th International Conference of the Florida Arti cial Intelligence Research Society, Pensacola, FL, 2002; 327–31.
Li YF and Zhou ZH. Towards making unlabeled data never hurt. IEEE Trans Pattern Anal Mach Intell 2015; 37: 175–88.
Castelli V and Cover TM. On the exponential value of labeled samples. Pattern Recogn Lett 1995; 16: 105–11.
Wang W and Zhou ZH. Theoretical foundation of co-training and disagreement-based algorithms. arXiv:1708.04403, 2017.
Dietterich TG, Lathrop RH and Lozano-Pe ́rez T. Solving the multiple-instance problem with axis-parallel rectangles. Artif Intell 1997; 89: 31–71.
Foulds J and Frank E. A review of multi-instance learning assumptions. Knowl Eng Rev 2010; 25: 1–25.
Zhou Z-H and Zhang M-L. Solving multi-instance problems with classi er ensemble based on constructive clustering. Knowl Inform Syst 2007; 11: 155–70.
Wei X-S, Wu J and Zhou Z-H Scalable algorithms for multi-instance learning. IEEE Trans Neural Network Learn Syst 2017; 28:975–87.
Amores J. Multiple instance classi cation: review, taxonomy and comparative study. Artif Intell 2013; 201: 81–105.
Zhou Z-H and Xu J-M. On the relation between multi-instance learning and semi-supervised learning. In 24th International Conference on Machine Learning, Corvallis, OR, 2007; 1167–74.
Zhou Z-H, Sun Y-Y and Li Y-F. Multi-instance learning by treating instances as non-i.i.d. samples. In 26th International Conference on Machine Learning, Montreal, Canada, 2009; 1249–56.
Chen Y and Wang JZ. Image categorization by learning and reasoning with regions. J Mach Learn Res 2004; 5: 913–39.
Zhang Q, Yu W and Goldman SA et al. Content-based image retrieval using multiple-instance learning. In 19th International Conference on Machine Learning, Sydney, Australia, 2002; 682–9.
Tang JH, Li HJ and Qi GJ et al. Image annotation by graph-based inference with integrated multiple/single instance representations. IEEE Trans Multimed 2010; 12: 131–41.
Andrews S, Tsochantaridis I and Hofmann T. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems 15, Cambridge, MA: MIT Press, 2003; 561–8.
Settles B, Craven M and Ray S. Multiple-instance active learning. In Advances in Neural Information Processing Systems 20, Cambridge, MA: MIT Press, 2008; 1289–96.
Jorgensen Z, Zhou Y and Inge M. A multiple instance learning strategy for combating good word attacks on spam lters. J Mach Learn Res 2008; 8: 993– 1019.
Fung G, Dundar M and Krishnappuram B et al. Multiple instance learning for computer aided diagnosis. In Advances in Neural Information Processing Sys- tems 19, Cambridge, MA: MIT Press, 2007; 425–32.
Viola P, Platt J and Zhang C. Multiple instance boosting for object detection. In Advances in Neural Information Processing Systems 18, Cambridge, MA: MIT Press, 2006; 1419–26.
Felzenszwalb PF, Girshick RB and McAllester D et al. Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 2010; 32: 1627–45.
Zhu J-Y, Wu J and Xu Y et al. Unsupervised object class discovery via saliency- guided multiple class learning. IEEE Trans Pattern Anal Mach Intell 2015; 37: 862–75.
Babenko B, Yang MH and Belongie S. Robust object tracking with online multi- ple instance learning. IEEE Trans Pattern Anal Mach Intell 2011; 33: 1619–32.
Wei X-S and Zhou Z-H. An empirical study on image bag generators for multi-instance learning. Mach Learn 2016; 105:155–98.
Liu G, Wu J and Zhou ZH. Key instance detection in multi-instance learning. In 4th Asian Conference on Machine Learning, Singapore, 2012; 253–68.
Xu X and Frank E. Logistic regression and boosting for labeled bags of instances. In 8th Paci c-Asia Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 2004; 272–81.
Chen Y, Bi J and Wang JZ. MILES: multiple-instance learning via embedded instance selection. IEEE Trans Pattern Anal Mach Intell 2006; 28: 1931–47.
Weidmann N, Frank E and Pfahringer B. A two-level learning method for gen- eralized multi-instance problem. In 14th European Conference on Machine Learning, Cavtat-Dubrovnik, Croatia, 2003; 468–79.
Long PM and Tan L. PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples. Mach Learn 1998; 30: 7–21.
Auer P, Long PM and Srinivasan A. Approximating hyper-rectangles: learning and pseudo-random sets. J Comput Syst Sci 1998; 57: 376–88.
Blum A and Kalai A. A note on learning from multiple-instance examples. Mach Learn 1998; 30: 23–9.
Sabato S and Tishby N. Homogenous multi-instance learning with arbitrary dependence. In 22nd Conference on Learning Theory, Montreal, Canada, 2009.
Fre ́nay B and Verleysen M. Classi cation in the presence of label noise: a survey. IEEE Trans Neural Network Learn Syst 2014; 25: 845–69.
Angluin D and Laird P. Learning from noisy examples. Mach Learn 1988; 2: 343–70.
Blum A, Kalai A and Wasserman H. Noise-tolerant learning, the parity problem, and the statistical query model. J ACM 2003; 50: 506–19.
Gao W, Wang L and Li YF et al. Risk minimization in the presence of label noise. In 30th AAAI Conference on Arti cial Intelligence, Phoenix, AZ, 2016; 1575–81.
Brodley CE and Friedl MA. Identifying mislabeled training data. J Artif Intell Res 1999; 11: 131–67.
Muhlenbach F, Lallich S and Zighed DA. Identifying and handling mislabelled instances. J Intell Inform Syst 2004; 22: 89–109.
Brabham DC. Crowdsourcing as a model for problem solving: an introduction and cases. Convergence 2008; 14: 75–90.
Sheng VS, Provost FJ and Ipeirotis PG. Get another label? Improving data 8. quality and data mining using multiple, noisy labelers. In 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Ve- gas, NV, 2008; 614–22.
Snow R, O’Connor B and Jurafsky D et al. Cheap and fast - but is it good? Evaluating non-expert annotations for natural language tasks. In 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, HI, 2008; 254–63.
Raykar VC, Yu S and Zhao LH et al. Learning from crowds. J Mach Learn Res 2010; 11: 1297–322.
Whitehill J, Ruvolo P and Wu T et al. Whose vote should count more: opti- mal integration of labels from labelers of unknown expertise. In Advances in Neural Information Processing Systems 22, Cambridge, MA: MIT Press, 2009; 2035–43.
Raykar VC and Yu S. Eliminating spammers and ranking annotators for crowd- sourced labeling tasks. J Mach Learn Res 2012; 13: 491–518.
Wang W and Zhou ZH. Crowdsourcing label quality: a theoretical analysis. Sci China Inform Sci 2015; 58: 1–12.
Dekel O and Shamir O. Good learners for evil teachers. In 26th International Conference on Machine Learning, Montreal, Canada, 2009; 233–40.
Urner R, Ben-David S and Shamir O. Learning from weak teachers. In 15th International Conference on Arti cial Intelligence and Statistics, La Palma, Canary Islands, 2012; 1252–60.
Wang L and Zhou ZH. Cost-saving effect of crowdsourcing learning. In 25th International Joint Conference on Arti cial Intelligence, New York, NY, 2016; 2111–7.
Karger DR, Sewoong O and Devavrat S. Iterative learning for reliable crowd- sourcing systems. In Advances in Neural Information Processing Systems 24, Cambridge, MA: MIT Press, 2011; 1953–61.
Tran-Thanh L, Venanzi M and Rogers A et al. Ef cient budget allocation with accuracy guarantees for crowdsourcing classi cation tasks. In 12th Interna- tional conference on Autonomous Agents and Multi-Agent Systems, Saint Paul, MN, 2013; 901–8.
Ho CJ, Jabbari S and Vaughan JW. Adaptive task assignment for crowd- sourced classi cation. In 30th International Conference on Machine Learning, Atlanta, GA, 2013; 534–42.
Chen X, Lin Q and Zhou D. Optimistic knowledge gradient policy for opti- mal budget allocation in crowdsourcing. In 30th International Conference on Machine Learning, Atlanta, GA, 2013; 64–72.
Dawid AP and Skene AM. Maximum likelihood estimation of observer error- rates using the EM algorithm. J Roy Stat Soc C Appl Stat 1979; 28: 20– 8
Zhong J, Tang K and Zhou Z-H. Active learning from crowds with unsure op- tion. In 24th International Joint Conference on Arti cial Intelligence, Buenos Aires, Argentina, 2015; 1061–7.
Ding YX and Zhou ZH. Crowdsourcing with unsure opinion. arXiv:1609.00292, 2016.
Shah NB and Zhou D. Double or nothing: multiplicative incentive mechanisms for crowdsourcing. In Advances in Neural Information Processing Systems 28, Cambridge, MA: MIT Press, 2015; 1–9.
Rahmani R and Goldman SA. MISSL: multiple-instance semi-supervised learn- ing. In 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006; 705–12.
Yan Y, Rosales R and Fung G et al. Active learning from crowds. In 28th Inter- national Conference on Machine Learning, Bellevue, WA, 2011; 1161–8.
Sutton RS and Barto AG. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998.
Schwenker F and Trentin E. Partially supervised learning for pattern recognition. Pattern Recogn Lett 2014; 37: 1–3.
Garcia-Garcia D and Williamson RC. Degrees of supervision. In Advances in Neural Information Processing Systems 17, Cambridge, MA: MIT Press Work- shops, 2011.
Herna ́ ndez-Gonza ́ lez J, Inza I and Lozano JA. Weak supervision and other non-standard classification problems: a taxonomy. Pattern Recogn Lett 2016; 69: 49–55.
KunchevaLI,Rod ́ıguezJJandJacksonAS.Restrictedsetclassi cation:who is there? Pattern Recogn 2017; 63:158–70.
Zhang M-L and Zhou Z-H. A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 2014; 26: 1819–37.
Sun YY, Zhang Y and Zhou ZH. Multi-label learning with weak label. In 24th AAAI Conference on Arti cial Intelligence, Atlanta, GA, 2010; 593–8.
Li X and Guo Y. Active learning with multi-label SVM classi cation. In 23rd International Joint Conference on Arti cial Intelligence, Beijing, China, 2013; 1479–85.
Qi GJ, Hua XS and Rui Y et al. Two-dimensional active learning for image classi cation. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Anchorage, AK, 2008.
Huang SJ, Chen S and Zhou ZH. Multi-label active learning: query type matters. In 24th International Joint Conference on Arti cial Intelligence, Buenos Aires, Argentina, 2015; 946–52.
周志华:南京大学计算机软件新技术国家重点实验室(National Key Laboratory for Novel Software Technology)教授。NSR专题特邀编辑(Guest Editor of Special Topic of NSR)