Facial recognition

Predictive learning problems

From my previous post How to Teach a Computer to Distinguish Cats from Dogs

Predictive learning problems constitute the majority of tasks machine learning can
be used to solve today. Applicable to a wide array of situations and data types, in
this section we introduce the two major predictive learning problems: regression and


Suppose we wanted to predict the share price of a company that is about to go public (that is, when a company first starts offering its shares of stock to the public). Following the pipeline discussed in Section 1.1.1, we first gather a training set of data consisting of a number of corporations (preferably active in the same domain) with known share prices. Next, we need to design feature(s) that are thought to be relevant to the task at

machine learning graph 1
Figure 1.7 (top left panel) A toy training dataset of ten corporations with their associated share price and revenue values. (top right panel) A linear model is fit to the data. This trend line models the overall trajectory of the points and can be used for prediction in the future as shown in the bottom left and bottom right panels. hand. The company’s revenue is one such potential feature, as we can expect that the higher the revenue the more expensive a share of stock should be.2 Now in order to connect the share price to the revenue, we train a linear model or regression line using our training data.

The top panels of Fig. 1.7 show a toy dataset comprising share price versus revenue
information for ten companies, as well as a linear model fit to this data. Once the model
is trained, the share price of a new company can be predicted based on its revenue, as
depicted in the bottom panels of this figure. Finally, comparing the predicted price to the
actual price for a testing set of data we can test the performance of our regression model
and apply changes as needed (e.g., choosing a different feature). This sort of task, fitting
a model to a set of training data so that predictions about a continuous-valued variable
(e.g., share price) can be made, is referred to as regression.We now discuss some further
examples of regression.


Example 1.1 The rise of student loan debt in the United States

Figure 1.8 shows the total student loan debt, that is money borrowed by students to pay for college tuition, room, and board, etc., held by citizens of the United States from 2006 to 2014, measured quarterly. Over the eight year period reflected in this plot total student debt has tripled, totaling over one trillion dollars by the end of 2014. The regression line (in magenta) fit this dataset represents the data quite well and, with its sharp positive slope, emphasizes the point that student debt is rising dangerously fast. Moreover, if this trend continues, we can use the regression line to predict that total student debt will reach a total of two trillion dollars by the year 2026.

Figure 1.8
Figure 1.8  Total student loan debt in the United States measured quarterly from 2006 to 2014. The rapid increase of the debt, measured by the slope of the trend line fit to the data, confirms the concerning claim that student debt is growing (dangerously) fast. The debt data shown in this figure was taken from [46].

Example 1.2 Associating genes with quantitative traits

Genome-wide association (GWA) studies (Fig. 1.9) aim at understanding the connections between tens of thousands of genetic markers, taken from across the human genome of numerous subjects, with diseases like high blood pressure/cholesterol, heart disease, diabetes, various forms of cancer, and many others [26, 76, 80]. These studies are undertaken with the hope of one day producing gene-targeted therapies, like those used to treat diseases caused by a single gene (e.g., cystic fibrosis), that can help individuals with these multifactorial diseases. Regression as a commonly employed tool in GWA studies is used to understand complex relationships between genetic markers (features) and quantitative traits like the level of cholesterol or glucose (a continuous output variable).

Figure 1.9
Figure 1.9 – Conceptual illustration of a GWA study employing regression, wherein a quantitative trait is to be associated with specific genomic locations.


The machine learning task of classification is similar in principle to that of regression. The key difference between the two is that instead of predicting a continuous-valued output (e.g., share price, blood pressure, etc.), with classification what we aim at predicting takes on discrete values or classes. Classification problems arise in a host of forms. For example, object recognition, where different objects from a set of images are distinguished from one another (e.g., handwritten digits for the automatic sorting of mail or street signs for semi-autonomous and self-driving cars), is a very popular classification problem. The toy problem of distinguishing cats from dogs discussed How to Teach a Computer to Distinguish Cats from Dogs in  was such a problem. Other common classification problems include speech recognition (recognizing different spoken words for voice recognition systems), determining the general sentiment of a social network like Twitter towards a particular product or service, as well as determining what kind of hand gesture someone is making from a finite set of possibilities (for use in e.g., controlling a computer without a mouse). Geometrically speaking, a common way of viewing the task of classification is one of finding a separating line (or hyperplane in higher dimensions) that separates the two

Figure 1.10
Figure 1.10 – (top left panel) A toy 2-dimensional training set consisting of two distinct classes, red and blue. (top right panel) A linear model is trained to separate the two classes. (bottom left panel) A test point whose class is unknown. (bottom right panel) The test point is classified as blue since it lies on the blue side of the trained linear classifier.

classes of data from a training set as best as possible. This is precisely the perspective on classification we took in describing the toy example in Section 1.1, where we used a line to separate (features extracted from) images of cats and dogs. New data from a testing set is then automatically classified by simply determining which side of the line/hyperplane the data lies on. Figure 1.10 illustrates the concept of a linear model or classifier used for performing classification on a 2-dimensional toy dataset.


Example 1.3 Object detection

Object detection, a common classification problem, is the task of automatically identifying a specific object in a set of images or videos. Popular object detection applications include the detection of faces in images for organizational purposes and camera focusing, pedestrians for autonomous driving vehicles,4 and faulty components for automated quality control in electronics production. The same kind of machine learning framework, which we highlight here for the case of face detection, can be utilized for solving many such detection problems.
After training a linear classifier on a set of training data consisting of facial and nonfacial images, faces are sought after in a new test image by sliding a (typically) square window over the entire image. At each location of the sliding window, the image content inside is tested to see which side of the classifier it lies on (as illustrated in Fig. 1.11). If the (feature representation of the) content lies on the “face side” of the classifier the content is classified as a face.

figure 1.11
Figure 1.11 – To determine if any faces are present in a test image (in this instance an image of the Wright brothers, inventors of the airplane, sitting together in one of their first motorized flying machines in 1908) a small window is scanned across its entirety. The content inside the box at each instance is determined to be a face by checking which side of the learned classifier the feature representation of the content lies. In the figurative illustration shown here the area above and below the learned classifier (shown in black on the right) are the “face” and “non-face” sides of the classifier, respectively.


Next up will be Feature designs.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s