For practically all AI jobs involving computer vision and image processing, convolutional neural networks are the most crucial artificial neural network architecture today. We will briefly explore the history of CNNs in this piece, starting with biological research in the 1950s and ending with the sophisticated pre-trained Computer Vision models of today.
Simple and complicated cells were discovered by David Hubel and Torsten Wiesel in 1959. According to their research, we use two different types of cells to recognize visual patterns. A basic cell can identify edges and bars with specific orientations at a specific location in the image.
A complex cell, on the other hand, reacts to edges and bars of specific orientations as well. Complex cells, as opposed to basic cells, have the additional capacity to react to these edges and bars wherever they appear in the scene. For instance, a complex cell can react to vertical bars that are positioned anywhere in the scene, whereas a simple cell can only react to vertical bars that are located in the upper part of a scene. By combining the data from various simple cells, complex cells can develop this location-independent recognition capability. We observe both straightforward and intricate cell structures throughout the human body, which together make up our vision system.
The Neocognitron, Proposed by Kunihiko Fukushima
The work of Hubel and Wiesel served as a source of inspiration for Dr. Kunihiko Fukushima, who in the 1980s developed an AI neural network that replicates the functionality of both simple and complex cells. S-cells act as artificial simple cells, whilst C-cells act as artificial complex cells. They are artificial because, instead of being biological neurons, they mimic the mathematical architecture of simple and complex cells. The basic idea behind Fukushima’s Neocognitron was simple: capture complex patterns (like a dog) using simple cells that can recognize simpler patterns or complex cells that can learn from other pretty low complexity cells (e.g., a tail).
The LeNet Proposed by Yann LeCun
Fukushima’s work had a significant impact on the still-developing field of artificial intelligence, even though Yann LeCun and his associates developed the very first contemporary use of convolutional neural networks in the 1990s. They released “Gradient-Based Learning Applied to Document Recognition”, among the most popular AI papers from the 1990s (cited by 34378 papers).
In the research, YaCun trained a CNNon the handwritten digits from the MNIST dataset. According to Wikipedia, the MNIST database contains 10,000 test images from Census Bureau personnel and 60,000 training images from American high school students. The MNIST dataset contains labels identifying the actual number of penned digits (from 0 to 9) in greyscale RGB values:
The concept was an extension of Fukushima’s Neocognitron: combining less complex artificial cells with more sophisticated natural cells to create more complex features. These steps were used to train the LeNet on MNIST:
- Give the model an illustration as an example;
- Request the model’s label prediction;
- Comparing the prediction’s result to the actual label value, update the model’s settings;
- Repeat this procedure up until the loss is minimized for the optional model settings.
The image processing and computer vision applications of today are based on LeCun’s implementation.
Also Read: Softmax Vs Sigmoid Detailed Points
Historical period: 1940s through 1970s
- An article on the potential function of neurons was written in 1943 by mathematician Walter Pitts and neurophysiologist Warren McCulloch. They created a basic neural network using electrical circuits to simulate how neurons in the brain could function.
- The Organization of Behavior, written by Donald Hebb in 1949, made the argument that brain connections are strengthened each time they are used. This idea is crucial to understanding how humans learn. He said that when two nerves fire simultaneously, the link between them is strengthened.
- In the 1950s, as computers evolved, it became possible to model a theoretical neural network. Nathanial Rochester from the IBM research laboratory took the initial move in this direction. Unfortunately, his initial attempt at doing so was unsuccessful.
- In 1959, Stanford academics Bernard Widrow and Marcian Hoff developed the “ADALINE” and “MADALINE” models. Many flexible sequential Components, another demonstration of Stanford’s love of acronyms, is where the titles of the elements are derived. When reading streamed data from a phone line, ADALINE was designed to recognize binary images in order to predict the next bit. MADALINE, which employs an adaptive algorithm to eliminate echoes on phone lines, became the first neural net to be applied to a problem in the real world. Even though the technology is as old as the aviation system, it is still utilized in the commercial world.
- Widrow & Hoff developed a learning technique in 1962 that examines the value before the weight modifies it, using the formula “Weight Change = (Pre-Weight line value) * (Error / (Number of Inputs) “(i.e. 0 or 1) It is based on the idea that weight values can be modified to disseminate errors throughout the system, or at the very minimum to surrounding perceptrons, even though one operational perceptron may have a huge error. When applying this rule, there remains an error if the line preceding the weight is zero, but it will eventually correct itself. If the error is preserved and equally distributed across all of the weights, the error is eliminated.
- Despite the later success of the neural network, the discipline of computers was dominated by classical von Neumann architecture, putting neural study in the dust. Unexpectedly, John von Neumann also suggested that telegraph relays or vacuum tubes may be used to simulate brain functions.
- According to research released around the same time, a thin-walled neural network cannot be expanded to multiple-layered neural networks. Furthermore, a widely employed learning function in the area was fundamentally flawed since it was impossible to distinguish along the full line. As a result, funding for research was drastically reduced.
- This was paired with the fact that some neural networks’ early triumphs caused an overestimate of their potential, especially when taking into account the available technology at the time. Promises were broken, and occasionally more profound philosophical issues brought up terror. As of today, writers are still debating the impact that so-called “thinking robots” will have on people.
- It is highly alluring to think about a computer that can write its own programming. Microsoft’s Windows 2000 might be able to fix the tens of thousands of flaws that the programming team created if it could reprogram itself. Although enticing, these concepts were incredibly challenging to put into practice. Von Neumann’s architecture was also becoming more and more well-liked. Although there were a few developments in the field, there weren’t many studies done.
The 1990s and later
Convolutional neural networks were developed more quickly in the 1990s, 2000s, and 2010s when more and larger datasets were utilized to train increasingly complicated models. The PASCAL VOC challenge began with about 20,000 photos and 20 object classes in 2005. Contestants compete to get the lowest loss + greatest accuracy performance using their model. These figures, meanwhile, were overwhelmed by other private research as the discipline advanced. Beginning in 2010, Fei-Fei Li and the PASCAL VOC team began working together to make the enormous image collection known as ImageNet public. Each year, the ImageNet Large Scale Visual Recognition Challenge is accepted by researchers (ILSVRC). Currently, there are 14,197,122 photos in 1000 different object classes in the ImageNet collection.
By leveraging GPUs, the AlexNex deep convolutional neural architecture in 2012 achieved a 16% error rate—10% less than the second-place finisher. After AlexNext’s remarkable accomplishment for its time, the usage of GPUs for tasks involving computer vision became the norm. 29 out of the 38 teams that competed at the ILSVRC in 2017 scored less than 5% inaccuracy. The ILSVRC organizers declared that the format of the conference would soon be a 3D object categorization as we are currently addressing complex 2D classification challenges.
What is the use of a convolutional network?
: The principal applications of a convolutional neural network (CNN), which comprises one or more convolutional layers, are image processing, classification, segmentation, and other auto-correlated data. In essence, convolution is a filter that is dragged over the input.
Convolutional neural networks’ convolutional layer is crucial, so why?
The central component of a CNN is the convolutional layer, which is also where the most of computation takes place. It needs data input, a filter, and a feature map, among other things. Suppose that the data will be a color image that is composed of a 3D pixel matrix.