An intuitive overview of a perceptron with python implementation (PART 1: Model)
In this article we will go over Rosenblatt’s perceptron hopefully providing you with an intuitive understanding of how a single perceptron operates for the purpose of classification. This is considered one of the biggest breakthroughs in the history of artificial intelligence. Personally this was the first machine learning concept I learned which propelled my interest in this field and provided me with the foundation to more easily understand other machine learning and artificial intelligence algorithms.
If you’re like me in that you understand logic-based concepts better by tracing over code rather than long reading long equations with tons of notations, this read should be perfect for you. If you don’t have any python background, you can still follow along for the high level concepts. But anyways, I will do my best to explain what every single piece of code does so that if you don’t have much python experience you should be able to follow along.
NOTE: if you would like to download the full code used in this article you can find it here
A high level abstraction to what a perceptron does
We’ll be focusing on the use of a single layered perceptron for classification. this method of ML is considered ‘supervised learning’ as we will feed the algorithm labelled training data. This means that the perceptron will be given a dataset containing ’n’ class labelled samples to train on.
To give you a better idea, think of how a baby can learn to differentiate between a dog and a cat. Let’s say we present to him 100 different images of cats and dogs telling him what each one is, he’ll be able to differentiate between each of the two based on the features that he sees. For example, he concludes that cats have short ears while dog’s are long. Now of course the human mind doesn’t look at a cat and says “if the ears are between 2.345–3.67cm long it’s a cat” but this is how a perceptron could see it.
If you’ve opened any other article or video on perceptrons before getting here, chances are that you’ve seen this diagram. Let’s first go over each component here and then go over the process itself.
x’s and w’s: each X represented by a subscript ‘i’ here represents a feature, so here we have ‘m’ number of features, again in the cat-dog scenario this could be ear length. Each feature is connected to a weight, in mathematical terms the weight of each feature is it’s coefficient. A ‘weight’ as the naming suggests can be thought of a representation of the level of influence this feature has. The ‘learning part’ in any machine learning algorithm is the process of finding the optimal value of each weight.
Net Input: the net input (represented by the sigma in the diagram) is the scalar sum of the features and weights:
Activation Function: this is the final step in our classification model, a function of the net input based on which we can ‘classify’ the data with a binary output. The function itself can have different forms but for the purposes of simplicity we will be focusing on the ‘step function’ which is the most basic implementation; as the shape in the diagram and name suggests, the model will make the classification prediction based on whether or not the net input crosses a specific threshold value.
Error: during the training process the model makes a prediction using the components we defined and checks if the output was correct. Based on the error we update our weights. ultimately our goal is to minimize the errors with each training iteration for our model to converge.
Putting it all together
Now let’s make sense of this model. So we want to train a perceptron to classify two seperate entities given a number of features. The algorithm runs a specified number of iterations running through the steps described in the diagram, updating the weights based on the error recieved with each iteration.
Once the training process in done we have an optimized set of weights that can be used along with the features to classify unlabelled data. So far these are all highly abstract concepts, if you’re a complete beginner to neural networks I don’t expect you to be able to model a perceptron classifier algorithm, but it’s important to have a quick overview of these concepts before diving into the algorithmic implementation.
Implementing a binary perceptron classifier in Python
Having went over the high level concepts we can now look into the details of a very basic perceptron implementation in python to consolidate our understanding.
First off, lets quickly go over the libraries we’ll be using:
- Numpy: this is one of the most common libraries used in data applications, it allows us to create ‘numpy arrays’ which are similar to lists but their implementation allows us to performs computations on these arrays.
- Random: we’ll be using random to initialize our weights.
- Pandas: pandas is also a very powerful library for dealing with data, but in our case we’ll be just using pandas to import our sample dataset.
- Matplotlib: lastly we’ll be using matplotlib to help us visualize concepts.
Sample dataset:
For this example, we’ll be using the ‘iris flower’ dataset. This data has 3 different flowers with 50 different samples for each flower type containing 4 features that describe the flower numerically (i.e petal length). Let’s import and view this dataset to get a better picture of what we’re dealing with.
To keep things simple we’ll be using 2 out of the 3 flowers, so our perceptron will essentially predict wether a flower is a Setosa or a Versicolor based on two features (i.e petal width and length). From the graph it’s easily noticable that our data can seperated by a linear line, and this is what our model will attempt to learn. Note that since we’re keeping it basic the model we’re going for will only be able to classify linearly seperable data.
Having imported the sample data, and libraries we can get started on our model:
here we define a new class “perceptron”, initializing the learning rate (eta) & number of iterations (n_iter). These two values are called hyperparameters, the learning rate is a float from 0 to 1, while the number of iterations is an integer. The optimal value for these two variables depends on the problem itself. Hyperparameter optimisation is a topic for another article, for now all we need to know about these two is the following:
- Learning rate (eta): The intuition here is that if the learning rate is too high our model will overshoot.
- Number of iterations (n_iter): This is the amount of iterations that our algorithm will run over the training data.
Next let’s take the flowchart diagram we discussed before and translate it to python code:
There’s three functions defined in our perceptron model. The fit() function is basically where all the training happens and the weights get updated iteratively so that after we finish training we should have an array of size 3 containing the weights for the features and the last value is the ‘bias’ which is the equivelent to the ‘y-intercept’.
The fit function consists of two nested loops. The outer loop is how many times the algorithm will run over our training set, and the inner loop is the actual ‘running over training set’. Here, the ‘X’ parameter is a 2-d numpy array containing the feature values and ‘y’ is the label for each row of features. We use the built-in python keyword zip(X, y) to basically go over each labelled sample.
The most integral component to our learning process is the weight update. The update is defined as:
This can be thought of as the mathematical representation of how correct or incorrect our model’s prediction was. To understand this further, we can look at the four possible outcomes from this equation. As we previously mentioned, each flower is expressed in a numerical value of either 1 or -1, where -1 is ‘Setosa’ and 1 is ‘Versicolor’.
now, in our inner loop the compiler will go over a row of samples and attempt to ‘predict’ what this sample is using it’s given features and our model’s current weights.
Taking the two scenarios where our model is incorrect (we’re ignoring the eta since it’s just a constant):
- (1)-(-1) = 2
- (-1)-(1) = -2
As for when our model is correct we get, in other word both the label and prediction have the same value:
- (1)-(1) = 0
- (-1)-(-1) = 0
When the prediction is wrong the weights get updated in the direction of the error. Whereas if it’s correct the result is zero and the weights are left unchanged.
Training our perceptron
hopefully by now you have an idea of what the model does, you also refer to the comments in the code if anything is unclear. So let’s start training it:
first we initialize our perceptron as a new object. We already imported our dataset but here we define it in a format that can be understood by our fit() function (taking features in columns at index 0 and 2 as a 2-d numpy array & the 4th index which is the class of the sample in a seperate numpy array). We also use the function np.where to change the string values of the class to 1 (versicolor) and -1 (setosa). Then we can run the fit() function on our X and y values to train our perceptron, finally after a certain number of iteration we should have a numpy array containing our optimal weight and can now call the predict() function on an unlabelled sample and have our perceptron classify it.
Plotting errors
The defined self.errors_ variable doesn’t actually contribute to our learning process but we can use it to plot the number of errors against each iteration:
From the graph we can see that our model converged after the 6th iteration. This can slightly change with each run since we are randomizing our initial weights.
Summary and Conclusions
In this article, we covered the abstract concepts of a very basic perceptron then went into it’s details with a python implementation. As I’ve mentioned in my introduction, this article is mainly intended to anyone familiar with python and wants to get into machine learning. From my personal experience, I find it best to implement an algorithm from scratch if I want to understand a concept in-depth. So feel free to play around with the code, changing the hyperparameters using print() and plots to better understand each component.
In the next part of this article, I will dive more into the visualization side of things. Animating the learning process in graphs to hopefully get even a better grasp of supervised learning. Keep in mind, this is a very basic form of supervised ML and ofcourse our model has many limitations. In the future, we’ll be going over more complex models that extend on the model described here.
Hope you found this insightful!
References
2015. Chapter 2. Training Simple Machine Learning Algorithms for Classification . In S. Raschka, Python Machine Learning (pp. 63–83). Packt Publishing.