- Home
- Products
- Integration
- Tutorial
- Barcode FAQ
- Purchase
- Company
MAXIMUM LIKELIHOOD AND LEAST-SQUARED ERROR HYPOTHESES in Software
64 MAXIMUM LIKELIHOOD AND LEAST-SQUARED ERROR HYPOTHESES QR Code ISO/IEC18004 Generator In None Using Barcode creator for Software Control to generate, create Quick Response Code image in Software applications. QR-Code Scanner In None Using Barcode reader for Software Control to read, scan read, scan image in Software applications. As illustrated in the above section, Bayesian analysis can sometimes be used to show that a particular learning algorithm outputs MAP hypotheses even though it may not explicitly use Bayes rule or calculate probabilities in any form In this section we consider the problem of learning a continuous-valued target function-a problem faced by many learning approaches such as neural network learning, linear regression, and polynomial curve fitting A straightforward Bayesian analysis will show that under certain assumptions any learning algorithm that minimizes the squared error between the output hypothesis predictions and the training data will output a maximum likelihood hypothesis The significance of this result is that it provides a Bayesian justification (under certain assumptions) for many neural network and other curve fitting methods that attempt to minimize the sum of squared errors over the training data Consider the following problem setting Learner L considers an instance space X and a hypothesis space H consisting of some class of real-valued functions defined over X (ie, each h in H is a function of the form h : X -+ 8, where 8 represents the set of real numbers) The problem faced by L is to learn an unknown target function f : X -+ 8 drawn from H A set of m training examples is provided, where the target value of each example is corrupted by random noise drawn according to a Normal probability distribution More precisely, each training example is a pair of the form (xi,d i ) where di = f (xi) ei Here f (xi)is the noise-free value of the target function and ei is a random variable representing the noise It is assumed that the values of the ei are drawn independently and that they are distributed according to a Normal distribution with zero mean The task of the learner is to output a maximum likelihood hypothesis, or, equivalently, a MAP hypothesis assuming all hypotheses are equally probable a priori A simple example of such a problem is learning a linear function, though our analysis applies to learning arbitrary real-valued functions Figure 62 illustrates Paint QR Code In Visual C#.NET Using Barcode encoder for Visual Studio .NET Control to generate, create Quick Response Code image in Visual Studio .NET applications. QR-Code Printer In VS .NET Using Barcode maker for ASP.NET Control to generate, create QR Code image in ASP.NET applications. FIGURE 62 Learning a real-valued function The target function f corresponds to the solid line The training examples (xi, ) are assumed di to have Normally distributed noise ei with zero mean added to the true target value f (xi) The dashed line corresponds to the linear function that minimizes the sum of squared errors Therefore, it is the maximum likelihood hypothesis ~ M L given these five , training examples Denso QR Bar Code Encoder In VS .NET Using Barcode printer for .NET framework Control to generate, create QR Code image in .NET framework applications. Generate Quick Response Code In Visual Basic .NET Using Barcode drawer for VS .NET Control to generate, create QR Code 2d barcode image in .NET framework applications. CHAPTER 6 BAYESIAN LEARNING
Generate Barcode In None Using Barcode drawer for Software Control to generate, create bar code image in Software applications. Encoding Barcode In None Using Barcode generator for Software Control to generate, create bar code image in Software applications. a linear target function f depicted by the solid line, and a set of noisy training examples of this target function The dashed line corresponds to the hypothesis hML with least-squared training error, hence the maximum likelihood hypothesis Notice that the maximum likelihood hypothesis is not necessarily identical to the correct hypothesis, f , because it is inferred from only a limited sample of noisy training data Before showing why a hypothesis that minimizes the sum of squared errors in this setting is also a maximum likelihood hypothesis, let us quickly review two basic concepts from probability theory: probability densities and Normal distributions First, in order to discuss probabilities over continuous variables such as e, we must introduce probability densities The reason, roughly, is that we wish for the total probability over all possible values of the random variable to sum to one In the case of continuous variables we cannot achieve this by assigning a finite probability to each of the infinite set of possible values for the random variable Instead, we speak of a probability density for continuous variables such as e and require that the integral of this probability density over all possible values be one In general we will use lower case p to refer to the probability density function, to distinguish it from a finite probability P (which we will sometimes refer to as a probability mass) The probability density p(x0) is the limit as E goes to zero, of times the probability that x will take on a value in the interval [xo,xo + 6 ) Data Matrix ECC200 Printer In None Using Barcode generator for Software Control to generate, create Data Matrix image in Software applications. Printing ANSI/AIM Code 128 In None Using Barcode creation for Software Control to generate, create Code 128A image in Software applications. Probability density function: GTIN - 13 Generator In None Using Barcode creator for Software Control to generate, create EAN-13 Supplement 5 image in Software applications. USS Code 39 Encoder In None Using Barcode creation for Software Control to generate, create Code 39 Extended image in Software applications. Second, we stated that the random noise variable e is generated by a Normal probability distribution A Normal distribution is a smooth, bell-shaped distribution that can be completely characterized by its mean p and its standard deviation a See Table 54 for a precise definition Given this background we now return to the main issue: showing that the least-squared error hypothesis is, in fact, the maximum likelihood hypothesis within our problem setting We will show this by deriving the maximum likelihood hypothesis starting with our earlier definition Equation (63), but using lower case p to refer to the probability density Printing GS1 - 8 In None Using Barcode maker for Software Control to generate, create EAN-8 Supplement 5 Add-On image in Software applications. Generate Bar Code In Objective-C Using Barcode maker for iPhone Control to generate, create barcode image in iPhone applications. As before, we assume a fixed set of training instances (xl xm) and therefore consider the data D to be the corresponding sequence of target values D = (dl d m ) Here di = f ( x i ) + ei Assuming the training examples are mutually independent given h , we can write P ( D J h )as the product of the various ~ ( dlh) i Code 128 Scanner In None Using Barcode decoder for Software Control to read, scan read, scan image in Software applications. Bar Code Printer In .NET Using Barcode maker for Reporting Service Control to generate, create barcode image in Reporting Service applications. Given that the noise ei obeys a Normal distribution with zero mean and unknown variance a 2 , each di must also obey a Normal distribution with variance a2 centered around the true target value f ( x i ) rather than zero Therefore p(di lh) can be written as a Normal distribution with variance a2 and mean p = f ( x i ) Let us write the formula for this Normal distribution to describe p(di Ih), beginning with the general formula for a Normal distribution from Table 54 and substituting the appropriate p and a 2 Because we are writing the expression for the probability of di given that h is the correct description of the target function f , we will also substitute p = f ( x i ) = h(xi), yielding EAN / UCC - 13 Reader In None Using Barcode reader for Software Control to read, scan read, scan image in Software applications. UPC Code Generation In Visual Basic .NET Using Barcode generation for .NET Control to generate, create Universal Product Code version A image in .NET framework applications. We now apply a transformation that is common in maximum likelihood calculations: Rather than maximizing the above complicated expression we shall choose to maximize its (less complicated) logarithm This is justified because l n p is a monotonic function of p Therefore maximizing In p also maximizes p Encoding UPC Code In Visual Studio .NET Using Barcode printer for ASP.NET Control to generate, create UPCA image in ASP.NET applications. Code 128A Generation In Objective-C Using Barcode generation for iPhone Control to generate, create ANSI/AIM Code 128 image in iPhone applications. |
|