The area of Neural Networks has originally been primarily inspired by the goal of modeling biological neural systems, but has since diverged and become a matter of engineering and achieving good results in Machine Learning tasks. Nonetheless, we begin our discussion with a very brief and high-level description of the biological system that a large portion of this area has been inspired by. BIOLOGICAL MOTIVATIONS AND CONNECTIONS The basic computational unit of the brain is a neuron. Approximately 86 billion neurons can be found in the human nervous system and they are connected with approximately 10^14 - 10^15 synapses. The diagram below shows a cartoon drawing of a biological neuron (first model) and a common mathematical model (second model). Each neuron receives input signals from its dendrites and produces output signals along its (single) axon. The axon eventually branches out and connects via synapses to dendrites of other neurons. In the computational model of a neuron, the signals that travel along the axons (e.g. x0x0) interact multiplicatively (e.g. w0x0w0x0) with the dendrites of the other neuron based on the synaptic strength at that synapse (e.g. w0w0). The idea is that the synaptic strengths (the weights ww) are learnable and control the strength of influence (and its direction: excitory (positive weight) or inhibitory (negative weight)) of one neuron on another. In the basic model, the dendrites carry the signal to the cell body where they all get summed. If the final sum is above a certain threshold, the neuron can fire, sending a spike along its axon. In the computational model, it is assumed that the precise timings of the spikes do not matter, and that only the frequency of the firing communicates information. Based on this rate code interpretation, the firing rate of the neuron is with an activation function ff, which represents the frequency of the spikes along the axon. Historically, a common choice of activation function is the sigmoid function σσ, since it takes a real-valued input (the signal strength after the sum) and squashes it to range between 0 and 1.
Models mentioned above:
These neurons, when put together in groups, create networks known as Neural Networks. The architecture of a Neural Network can be broken down into layers. There are input layers, hidden layers, and output layers.
Left: A 2-layer Neural Network (one hidden layer of 4 neurons (or units) and one output layer with 2 neurons), and three inputs. Right: A 3-layer neural network with three inputs, two hidden layers of 4 neurons each and one output layer. Notice that in both cases there are connections (synapses) between neurons across layers, but not within a layer.
Using more hidden layers will give the Neural Network more accuracy at the expense of being very computationally intensive and thus take much longer to train in some cases. The more complex a problem is, the more hidden layers and hidden neurons will be needed.
As you can see, the more hidden neurons shown above, the more precise the network can be in its answer. Data points are shown as dots. Their color represents the expected value, and the shaded regions represent the output of the network at that given position. i.e. a green dot in a red region means the network has incorrectly predicted the data. You can play with this yourself at http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html
Thanks for reading! Let me know if you would like me to continue the tutorials. It may seem dry right now but as you get deeper and deeper into this topic it becomes much more interesting.
Heres a little extra video for those who are new to Machine Learning:
-Ares