贝叶斯估计中极大似然估计、拉普拉斯平滑定理以及M-估计

In this report we will discuss the following question: how to estimate the probability of success in the next (n+1) trial. In addition, we would like to devote special attention to the cases when the sample size is effectively small. In this context we will discuss and compare three methods: relative frequency, Laplace’s law of succession and m-estimate.

What exactly do we mean by effectively small sample size? Good (1965) states then when there are more than three successes and more than three failures in our examples set, there is little difference between any of the described methods for many practical purposes. So, we will basically be dealing with the cases when either the number of successes or the number of failures are small (e.g. 0, 1, 2). Note that such situations can occur also when the actual sample size is large, but we divide the trial set to subsets that fulfill certain conditions for the purpose of estimating conditional probabilities in such subsets. 2 Estimates for success in the next trial 2.1 Relative frequency Relative frequency is sometimes called also maximum likelihood estimate. Probability of success in the nest trial is computed according to the following formula: P=r/n The estimation of probabilities can be regarded as a relatively simple task when the sample size is large enough. In such case we would hardly require any theory. Bernoulli’s theorem states that when n is large enough, the probability of success in the next trial can be reliably estimated by the relative frequency P=r/n More formally, for any arbitrary small ε and δ such n0 can be found that for every n≥n0 the following inequation holds: P(|r/n -P| <ε) >1 - δ However, after completing just one trial, which was a failure, the relative frequency probability estimate of a success in the next trial would be 0. 2.2 Laplace’s law of succession In order to alleviate such zero probability estimations, a modified formula was proposed: P = (r+1)/(n+2) In this formula a uniform prior probability is assumed (Good, 1965). In fact, the rationale behind adding 1 in the numerator and 2 in denominator is the following: before performing the actual experiment, we assume that there were two trials, one successful and one failure. 2.3 Bayesian m-estimate More general Bayesian probability estimate is described in (Cestnik, 1990, 1991). To calculate a general Bayesian probability estimate on the basis of evidence, a prior probability distribution has to be assumed first. Then, given the evidence, the prior distribution is updated to a posterior one, from which the expectation can be taken as a point estimate of p. The task of determining a prior probability distribution has been identified as an intrinsic difficulty of the Bayesian approach. P=(r + Pam)/(n + m) 3 Estimation of conditional probabilities The basic idea behind the proposed m-estimate (Cestnik, 1990, 1991) for estimating conditional probabilities is that the prior probabilities can be estimated from an unconditional sample. The remaining parameter m, which is related to the variance, has to be determined also. The prior variance is computed by the following formula: Var(p) = Pa(1-Pa)/(m+1) The parameter m is inversely proportional to the variance of the prior distribution. It also leverages the tradeoff between relative frequency and prior probability, as can be observed from the following form of m-estimate: P=n/(n+m) * r/n + m/(n+m) * Pa 4 Literature B. Cestnik: Estimating probabilities in machine learning, Ph.D. thesis, University of Ljubljana, Faculty of Computer and Information Science, 1991.

B. Cestnik: Estimating probabilities: A crucial task in machine learning. In: Carlucci Aiello, Luigia (ed.). ECAI 90. London: Pitman, 1990, str. 147-149.

B. Cestnik, I. Bratko: On estimating probabilities in tree pruning. In: Kodratoff, Yves. Machine learning - EWSL-91 : European working session on learning, Porto, Portugal, March 6-8, 1991: proceedings, (Lecture notes in computer science, Lecture notes in artificial intelligence, 482). Berlin [etc.]: Springer-Verlag, 1991, str. 138-150.

S. Džeroski, B. Cestnik, I. Petrovski: Using the m-estimate in rule induction. CIT. J. Comput. Inf. Technol., 1993, vol. 1, str. 37-46 I. J. Good: The Estimation of Probabilities: An Essay on Modern Bayesian Methods, Cambridge, Mass., M.I.T. Press, 1965.

0 条评论

相关文章

21230

16950

26980

26770

26540

35740

42170

42970

46040

为什么我的CNN石乐志？我只是平移了一下图像而已

16420 