Perceptron is Definition of the term, features, application

Table of contents:

Perceptron is Definition of the term, features, application
Perceptron is Definition of the term, features, application
Anonim

In machine learning, a perceptron is a supervised learning algorithm for binary classifiers. It is also often called a perceptron. A binary classifier is a function that can decide whether an input, represented by a vector of numbers, belongs to some particular class. This is a type of linear classifier, that is, a classification algorithm that makes its predictions based on a linear predictor function that combines a set of weights with a feature vector.

Perceptron formulas
Perceptron formulas

In recent years, artificial neural networks have gained attention due to advances in deep learning. But what is an artificial neural network and what does it consist of?

Meet the Perceptron

In this article, we'll take a quick look at artificial neural networks in general, then look at a single neuron, and finally (this is the coding part) we'll take the most basic version of an artificial neuron, the perceptron, and classify its points intoplane.

Have you ever wondered why there are tasks that are so easy for any person, but incredibly difficult for computers? Artificial Neural Networks (ANN for short) were inspired by the human central nervous system. Like their biological counterpart, ANNs are built on simple signal processing elements that are combined into a large grid.

Neural networks must learn

Unlike traditional algorithms, neural networks cannot be "programmed" or "tuned" to work as intended. Just like the human brain, they must learn to complete the task. Roughly speaking, there are three learning strategies.

Easiest way can be used if there is a test case (large enough) with known results. Then training goes like this: process one set of data. Compare the result with the known result. Set up the network and try again. This is the learning strategy we will be using here.

Unsupervised learning

Useful if there is no test data available and if it is possible to derive some cost function from the desired behavior. The cost function tells the neural network how far it is from the target. The network can then adjust its parameters on the fly, working with real data.

Reinforced Learning

The "carrot and stick" method. Can be used if the neural network generates a continuous action. Over time, the network learns to prefer the right actions and avoid the wrong ones.

Okay, now we know a little aboutnature of artificial neural networks, but what exactly are they made of? What will we see if we open the lid and look inside?

Neurons are the building blocks of neural networks. The main component of any artificial neural network is an artificial neuron. Not only are they named after their biological counterparts, but they are also modeled after the behavior of neurons in our brains.

Biology vs technology

Just as a biological neuron has dendrites to receive signals, a cell body to process them, and an axon to send signals to other neurons, an artificial neuron has multiple input channels, a processing stage, and one output that can branch out to many others. artificial neurons.

Can we do something useful with a single perceptron? There is a class of problems that a single perceptron can solve. Consider the input vector as point coordinates. For a vector with n-elements, this point will live in n-dimensional space. To simplify life (and the code below), let's assume it's 2D. Like a piece of paper.

Next, imagine that we draw some random points on this plane and split them into two sets by drawing a straight line across the paper. This line divides the points into two sets, one above and one below the line. The two sets are then called linearly separable.

One perceptron, no matter how simple it may seem, is able to know where this line is, and when it has finished training, it can determine whether a given point is above or below this line.

Historyinventions

The algorithm for this method was invented in 1957 at the Cornell Aviation Laboratory by Frank Rosenblatt (often named after him), funded by the US Office of Naval Research. The perceptron was intended to be a machine, not a program, and although its first implementation was in software for the IBM 704, it was subsequently implemented on custom-built hardware as the "Mark 1 Perceptron". This machine was designed for image recognition: it had an array of 400 photocells randomly connected to neurons. The weights were encoded in potentiometers and the weight update during training was done by electric motors.

At a press conference hosted by the US Navy in 1958, Rosenblatt made statements about the perceptron that caused heated debate among the young AI community; based on Rosenblatt's claims, the New York Times reported that the perceptron is "the embryonic electronic computer that the Navy expects to be able to walk, talk, see, write, reproduce itself, and be aware of its existence."

Perceptron Segments
Perceptron Segments

Further developments

Although the perceptron initially seemed promising, it was quickly proven that perceptrons could not be trained to recognize many classes of patterns. This led to a stagnation in the research field with perceptron neural networks for many years before it was recognized that a feed-forward neural network with two or more layers (also calledmultilayer perceptron) had much more processing power than single layer perceptrons (also called single layer perceptrons). A single-layer perceptron is only capable of studying linearly separable structures. In 1969, the famous book "Perceptrons" by Marvin Minsky and Seymour Papert showed that these classes of networks could not learn the XOR function. However, this does not apply to non-linear classification functions that can be used in a single-layer perceptron.

Perceptron Rosenblatt
Perceptron Rosenblatt

The use of such functions extends the capabilities of the perceptron, including the implementation of the XOR function. It is often assumed (incorrectly) that they also assumed that a similar result would hold for a multilayer perceptron network. However, this is not the case, since both Minsky and Papert already knew that multilayer perceptrons were capable of producing an XOR function. Three years later, Steven Grossberg published a series of papers presenting networks capable of modeling differential functions, contrast enhancement functions, and XOR functions.

Works were published in 1972 and 1973. However, the often overlooked Minsky/Papert text caused a significant decline in interest and research funding with the neural network perceptron. Another ten years passed before neural network research was revived in the 1980s.

Features

The Perceptron Kernel Algorithm was introduced in 1964 by Yzerman et al. Mori and Rostamizadeh (2013), who extend previous results and give new bounds L1.

Perceptron is a simplified model of a biological neuron. While the complexity of biological neural models is often required to fully understand neural behavior, research shows that a perceptron-like linear model can induce some of the behavior seen in real neurons.

The Perceptron is a linear classifier, so it will never get into a state with all input vectors correctly classified if the training set D is not linearly separable, i.e. if positive examples cannot be separated from negative examples by a hyperplane. In this case, no "approximate" solution will step by step through the standard learning algorithm, instead learning will fail entirely. Therefore, if the linear separability of the training set is not known a priori, one of the training options below should be used.

Perceptron Relationships
Perceptron Relationships

Pocket Algorithm

The ratchet pocket algorithm solves the perceptron learning robustness problem by keeping the best solution so far found "in the pocket". The pocket algorithm then returns the solution in the pocket rather than the last solution. It can also be used for non-separable datasets where the goal is to find a perceptron with few misclassifications. However, these solutions look stochastic and hence the pocket algorithm does not fit them.gradually over the course of training, and they are not guaranteed to be detected over a certain number of training steps.

Maxover Algorithm

Maxover's algorithm is "robust" in the sense that it will converge regardless of the knowledge of the linear separability of the data set. In the case of a linear split, this will solve the learning problem, optionally even with optimal stability (maximum margin between classes). For non-separable datasets, a solution with a small number of misclassifications will be returned. In all cases, the algorithm gradually approaches the solution during the learning process, without remembering previous states and without random jumps. Convergence lies in global optimality for shared data sets and local optimality for non-separable data sets.

perceptron equation
perceptron equation

Voted Perceptron

The Voted Perceptron algorithm is a variant using multiple weighted perceptrons. The algorithm starts a new perceptron each time an example is misclassified, initializing the weight vector with the final weights of the last perceptron. Each perceptron will also be given a different weight corresponding to how many examples they correctly classify before misclassifying one, and at the end the output will be a weighted vote across the entire perceptron.

Application

In separable problems, perceptron training can also be aimed at finding the largest separation boundary between classes. So-calledAn optimal stability perceptron can be determined using iterative learning and optimization schemes such as the Min-Over or AdaTron algorithm. AdaTron exploits the fact that the corresponding quadratic optimization problem is convex. The optimal stability perceptron, together with the kernel trick, is the conceptual basis of the support vector machine.

Multilayer perceptron
Multilayer perceptron

Alternative

Another way to solve non-linear problems without using multiple layers is to use higher order networks (sigma-pi block). In this type of network, each element of the input vector is expanded by each pairwise combination of multiplied inputs (second order). This can be extended to an n-order network. The Perceptron is a very flexible thing.

However, remember that the best classifier is not necessarily the one that accurately classifies all training data. Indeed, if we had the prior constraint that the data comes from equal-variant Gaussian distributions, a linear split in the input space is optimal and a non-linear solution is overridden.

Other linear classification algorithms include Winnow, support vector and logistic regression. Perceptron is a universal set of algorithms.

Russian translation of the scheme
Russian translation of the scheme

Main scope for supervised learning

Supervised learning is a machine learning task that learns a function that maps input to outputbased on examples of I/O pairs. They infer a feature from labeled training data consisting of a set of examples. In supervised learning, each example is a pair consisting of an input object (usually a vector) and a desired output value (also called a control signal).

The supervised learning algorithm analyzes the training data and produces an estimated function that can be used to display new examples. The optimal scenario would allow the algorithm to correctly determine class labels for invisible instances. This requires the learning algorithm to generalize the learning data to unseen situations in a "reasonable" way.

The parallel task in human and animal psychology is often called conceptual learning.

Recommended: