Neural networks examples of use. In simple words about the complex: what are neural networks? What else awaits us in the future?

Recently, more and more people are talking about so-called neural networks, they say they will soon be actively used in robotics, mechanical engineering, and many other areas of human activity, but search engine algorithms, like Google, are already slowly starting to use them work. What are these neural networks, how do they work, what are their applications and how can they be useful for us? Read on about all this.

What are neural networks

Neural networks are one of the areas of scientific research in the field of creating artificial intelligence (AI), which is based on the desire to imitate the human nervous system. Including its (nervous system) ability to correct errors and self-learn. All this, although somewhat crudely, should allow us to simulate the functioning of the human brain.

Biological neural networks

But this definition in the paragraph above is purely technical; if we speak in the language of biology, then neural network represents the human nervous system, that set of neurons in our brain, thanks to which we think, make certain decisions, and perceive the world around us.

A biological neuron is a special cell consisting of a nucleus, a body and processes, and also has a close connection with thousands of other neurons. Through this connection, electrochemical impulses are continually transmitted, bringing the entire neural network into a state of excitement or, conversely, calm. For example, some pleasant and at the same time exciting event (meeting a loved one, winning a competition, etc.) will generate an electrochemical impulse in the neural network, which is located in our head, which will lead to its excitation. As a result, the neural network in our brain will transmit its excitation to other organs of our body and lead to increased heart rate, more frequent eye blinking, etc.

Here in the picture is a highly simplified model of the biological neural network of the brain. We see that a neuron consists of a cell body and a nucleus; the cell body, in turn, has many branched fibers called dendrites. Long dendrites are called axons and have a length much greater than shown in this figure; through axons, communication between neurons is carried out, thanks to them the biological neural network works in our heads.

History of neural networks

What is the history of the development of neural networks in science and technology? It originates with the advent of the first computers or computers (electronic computers) as they were called in those days. So, back in the late 1940s, a certain Donald Hebb developed a neural network mechanism, which laid down the rules for teaching computers, these “proto-computers.”

The further chronology of events was as follows:

  • In 1954, the first practical use of neural networks in computer operation took place.
  • In 1958, Frank Rosenblatt developed a pattern recognition algorithm and a mathematical annotation to it.
  • In the 1960s, interest in the development of neural networks faded somewhat due to the weak computer power of that time.
  • And it was revived again in the 1980s; it was during this period that a system with a feedback mechanism appeared and self-learning algorithms were developed.
  • By 2000, computer power had grown so much that it could make the wildest dreams of scientists of the past come true. At this time, voice recognition programs, computer vision and much more appear.

Artificial neural networks

Artificial neural networks are commonly understood as computer systems that have the ability to self-learn and gradually increase their performance. The main elements of the neural network structure are:

  • Artificial neurons, which are elementary, interconnected units.
  • is a connection that is used to send and receive information between neurons.
  • The signal is the actual information to be transmitted.

Application of neural networks

The scope of artificial neural networks is expanding every year; today they are used in such areas as:

  • Machine learning, which is a type of artificial intelligence. It is based on training AI using the example of millions of similar tasks. Nowadays, machine learning is actively implemented by the search engines Google, Yandex, Bing, and Baidu. So, based on the millions of search queries that we all enter into Google every day, their algorithms learn to show us the most relevant results so that we can find exactly what we are looking for.
  • In robotics, neural networks are used to develop numerous algorithms for the iron “brains” of robots.
  • Computer system architects use neural networks to solve the problem of parallel computing.
  • With the help of neural networks, mathematicians can solve various complex mathematical problems.

Types of Neural Networks

In general, different types and types of neural networks are used for different tasks, among which are:

  • convolutional neural networks,
  • recurrent neural networks,
  • Hopfield neural network.

Convolutional Neural Networks

Convolutional networks are one of the most popular types of artificial neural networks. Thus, they have proven their effectiveness in visual pattern recognition (video and images), recommender systems and language processing.

  • Convolutional neural networks scale well and can be used for image recognition of any high resolution.
  • These networks use 3D volumetric neurons. Within one layer, neurons are connected by only a small field, called the receptive layer.
  • Neurons of neighboring layers are connected through a spatial localization mechanism. The operation of many such layers is ensured by special nonlinear filters that respond to an increasing number of pixels.

Recurrent neural networks

Recurrent neural networks are those whose connections between neurons form an indicative cycle. Has the following characteristics:

  • Each connection has its own weight, also known as priority.
  • Nodes are divided into two types, introductory nodes and hidden nodes.
  • Information in a recurrent neural network is transmitted not only in a straight line, layer by layer, but also between the neurons themselves.
  • An important distinguishing feature of a recurrent neural network is the presence of a so-called “area of ​​attention”, when the machine can be given certain pieces of data that require enhanced processing.

Recurrent neural networks are used in the recognition and processing of text data (Google Translator, the Yandex “Palekh” algorithm, etc.) voice assistant Apple Siri).

Neural networks, video

And finally, an interesting video about neural networks.


When writing the article, I tried to make it as interesting, useful and high-quality as possible. I would be grateful for any feedback and constructive criticism in the form of comments on the article. You can also write your wish/question/suggestion to my email. [email protected] or on Facebook, sincerely the author.


Many of the terms in neural networks are related to biology, so let's start at the beginning:

The brain is a complex thing, but it can be divided into several main parts and operations:

The causative agent may be internal(for example, an image or an idea):

Now let's take a look at the basic and simplified parts brain:

The brain is generally like a cable network.

Neuron- the basic unit of calculation in the brain, it receives and processes chemical signals from other neurons, and, depending on a number of factors, either does nothing or generates an electrical impulse, or Action Potential, which then sends signals through synapses to neighboring ones related neurons:

Dreams, memories, self-regulating movements, reflexes and in general everything you think or do - everything happens thanks to this process: millions, or even billions of neurons work at different levels and create connections that create various parallel subsystems and represent the biological neuron net.

Of course, these are all simplifications and generalizations, but thanks to them we can describe a simple
neural network:

And describe it formally using a graph:

Some clarification is required here. The circles are neurons, and the lines are connections between them,
and to keep things simple at this point, relationships represent the direct movement of information from left to right. The first neuron in this moment active and highlighted in grey. We also assigned a number to it (1 if it works, 0 if it doesn’t). The numbers between neurons show weight communications.

The graphs above show the moment in time of the network; for a more accurate display, you need to divide it into time periods:

To create your own neural network, you need to understand how weights affect neurons and how neurons learn. As an example, let's take a rabbit (test rabbit) and put it under the conditions of a classic experiment.

When a safe stream of air is directed at them, rabbits, like people, blink:

This behavior model can be depicted in graphs:

As in the previous diagram, these graphs show only the moment when the rabbit feels the breath, and we thus encode whiff like boolean value. In addition, we calculate whether the second neuron fires based on the weight value. If it is equal to 1, then the sensory neuron fires, we blink; if the weight is less than 1, we do not blink: the second neuron limit- 1.

Let's introduce one more element - safe sound signal:

We can model the rabbit's interest like this:

The main difference is that now the weight is equal to zero, so we didn’t get a blinking rabbit, well, not yet, at least. Now let's teach the rabbit to blink on command by mixing
Stimuli (beep and blow):

It is important that these events occur at different times era, in graphs it will look like this:

The sound itself doesn't do anything, but the airflow still causes the rabbit to blink, and we show this through the weights multiplied by the stimuli (in red).

Education complex behavior can be simplistically expressed as a gradual change in weight between connected neurons over time.

To train a rabbit, we repeat the steps:

For the first three attempts, the schemes will look like this:

Please note that the weight for the sound stimulus increases after each repetition (highlighted in red), this value is currently arbitrary - we chose 0.30, but the number can be anything, even negative. After the third repetition, you will not notice a change in the rabbit's behavior, but after the fourth repetition, something amazing will happen - the behavior will change.

We removed the air exposure, but the rabbit still blinks when it hears the beep! Our last diagram can explain this behavior:

We trained the rabbit to respond to sound by blinking.

In a real experiment of this kind, it may take more than 60 repetitions to achieve the result.

Now we will leave the biological world of the brain and rabbits and try to adapt everything that
learned to create an artificial neural network. First, let's try to do a simple task.

Let's say we have a machine with four buttons that dispenses food when the correct one is pressed
buttons (well, or energy if you are a robot). The task is to find out which button gives a reward:

We can depict (schematically) what a button does when clicked like this:

It's best to solve this problem entirely, so let's look at all the possible results, including the correct one:

Click on the 3rd button to get your dinner.

To reproduce a neural network in code, we first need to make a model or graph to which the network can be compared. Here is one graph suitable for the task; moreover, it well displays its biological analogue:

This neural network simply receives incoming information - in this case it would be the perception of which button was pressed. Next, the network replaces the incoming information with weights and makes an inference based on the addition of a layer. It sounds a little confusing, but let's see how the button is represented in our model:

Note that all weights are 0, so the neural network is like a baby, completely empty, but completely interconnected.

Thus, we compare an external event with the input layer of the neural network and calculate the value at its output. It may or may not coincide with reality, but we will ignore this for now and begin to describe the problem in a way that a computer can understand. Let's start by entering the weights (we'll use JavaScript):

Var inputs = ; var weights = ; // For convenience, these vectors can be called
The next step is to create a function that takes the input values ​​and weights and calculates the output value:

Function evaluateNeuralNetwork(inputVector, weightVector)( var result = 0; inputVector.forEach(function(inputValue, weightIndex) ( layerValue = inputValue*weightVector; result += layerValue; )); return (result.toFixed(2)); ) / / May seem complex, but all it does is match weight/input pairs and add the result
As expected, if we run this code, we will get the same result as in our model or graph...

EvaluateNeuralNetwork(inputs, weights); // 0.00
Live example: Neural Net 001.

The next step in improving our neural network will be a way to check its own output or resulting values ​​in a way that is comparable to the real situation,
let's first encode this specific reality into a variable:

To detect inconsistencies (and how many there are), we'll add an error function:

Error = Reality - Neural Net Output
With it we can evaluate the performance of our neural network:

But more importantly, what about situations where reality produces a positive outcome?

Now we know that our neural network model is broken (and we know how much), great! What's great is that we can now use the error function to control our learning. But all this will make sense if we redefine the error function as follows:

Error = Desired Output- Neural Net Output
An elusive, but such an important discrepancy, silently showing that we will
use previous results to compare with future actions
(and for learning, as we will see later). This also exists in real life, full
repeating patterns, so it can become an evolutionary strategy (well, in
most cases).

Var input = ; var weights = ; var desiredResult = 1;
And a new function:

Function evaluateNeuralNetError(desired,actual) ( return (desired - actual); ) // After evaluating both the Network and the Error we would get: // "Neural Net output: 0.00 Error: 1"
Live example: Neural Net 002.

Let's summarize. We started with a problem, made a simple model of it in the form of a biological neural network, and had a way to measure its performance compared to reality or the desired result. Now we need to find a way to correct the discrepancy, a process that for both computers and humans can be thought of as learning.

How to train a neural network?

The basis of training for both biological and artificial neural networks is repetition
And learning algorithms, so we will work with them separately. Let's start with
training algorithms.

In nature, learning algorithms refer to changes in physical or chemical
characteristics of neurons after experiments:

The dramatic illustration of how two neurons change over time in the code and our "learning algorithm" model means that we will simply change things over time to make our lives easier. So let's add a variable to indicate how easy life is:

Var learningRate = 0.20; // The larger the value, the faster the learning process will be :)
And what will this change?

This will change the weights (just like the rabbit!), especially the output weight we want to produce:

How to code such an algorithm is your choice, for simplicity I add the learning factor to the weight, here it is in the form of a function:

Function learn(inputVector, weightVector) ( weightVector.forEach(function(weight, index, weights) ( if (inputVector > 0) ( weights = weight + learningRate; ) )); )
When used, this training function will simply add our learning factor to the weight vector active neuron, before and after a round of training (or repetition), the results will be as follows:

// Original weight vector: // Neural Net output: 0.00 Error: 1 learn(input, weights); // New Weight vector: // Neural Net output: 0.20 Error: 0.8 // If it's not obvious, the neural net output is close to 1 (chicken output) - which is what we wanted, so we can conclude that we are moving towards in the right direction
Live example: Neural Net 003.

Okay, now that we're moving in the right direction, the last piece of this puzzle will be implementation repetitions.

It's not that complicated, in nature we just do the same thing over and over again, but in code we just specify the number of repetitions:

Var trials = 6;
And the implementation of the number of repetitions function into our training neural network will look like this:

Function train(trials) ( for (i = 0; i< trials; i++) { neuralNetResult = evaluateNeuralNetwork(input, weights); learn(input, weights); } }
Well, here's our final report:

Neural Net output: 0.00 Error: 1.00 Weight Vector: Neural Net output: 0.20 Error: 0.80 Weight Vector: Neural Net output: 0.40 Error: 0.60 Weight Vector: Neural Net output: 0.60 Error: 0.40 Weight Vector: Neural Net output: 0.80 Error : 0.20 Weight Vector: Neural Net output: 1.00 Error: 0.00 Weight Vector: // Chicken Dinner !
Live example: Neural Net 004.

Now we have a weight vector that will only produce one output (chicken for dinner) if the input vector corresponds to reality (clicking the third button).

So what's the coolest thing we just did?

In this particular case, our neural network (after training) can recognize the input data and say what will lead to the desired result (we will still need to program specific situations):

In addition, it is a scalable model, a toy and a tool for our learning. We were able to learn something new about machine learning, neural networks and artificial intelligence.

Warning to users:

  • There is no mechanism for storing learned weights, so this neural network will forget everything it knows. When updating or re-running the code, you need at least six successful iterations for the network to fully learn, if you think that a person or machine will press buttons at random... This will take some time.
  • Biological networks for learning important things have a learning rate of 1, so only one successful iteration would be needed.
  • There is a learning algorithm that closely resembles biological neurons, and it has a catchy name: Widroff-hoff rule, or Widroff-hoff training.
  • Neuronal thresholds (1 in our example) and retraining effects (with a large number of repetitions the result will be greater than 1) are not taken into account, but they are very important in nature and are responsible for large and complex blocks of behavioral reactions. So do negative weights.

Notes and list of references for further reading

I tried to avoid math and strict terms, but if you're interested, we built a perceptron, which is defined as a supervised learning algorithm (supervised learning) of dual classifiers - heavy stuff.

The biological structure of the brain is not a simple topic, partly because of imprecision and partly because of its complexity. It's better to start with Neuroscience (Purves) and Cognitive Neuroscience (Gazzaniga). I've modified and adapted the rabbit example from Gateway to Memory (Gluck), which is also a great introduction to the world of graphs.

An Introduction to Neural Networks (Gurney) is another great resource for all your AI needs.

And now on Python! Thanks to Ilya Andshmidt for providing the Python version:

Inputs = weights = desired_result = 1 learning_rate = 0.2 trials = 6 def evaluate_neural_network(input_array, weight_array): result = 0 for i in range(len(input_array)): layer_value = input_array[i] * weight_array[i] result += layer_value print("evaluate_neural_network: " + str(result)) print("weights: " + str(weights)) return result def evaluate_error(desired, actual): error = desired - actual print("evaluate_error: " + str(error) ) return error def learn(input_array, weight_array): print("learning...") for i in range(len(input_array)): if input_array[i] > 0: weight_array[i] += learning_rate def train(trials ): for i in range(trials): neural_net_result = evaluate_neural_network(inputs, weights) learn(inputs, weights) train(trials)
And now on GO! Thanks to Kieran Maher for this version.

Package main import ("fmt" "math") func main() ( fmt.Println("Creating inputs and weights ...") inputs:= float64(0.00, 0.00, 1.00, 0.00) weights:= float64(0.00, 0.00, 0.00, 0.00) desired:= 1.00 learningRate:= 0.20 trials:= 6 train(trials, inputs, weights, desired, learningRate) ) func train(trials int, inputs float64, weights float64, desired float64, learningRate float64) ( for i:= 1;i< trials; i++ { weights = learn(inputs, weights, learningRate) output:= evaluate(inputs, weights) errorResult:= evaluateError(desired, output) fmt.Print("Output: ") fmt.Print(math.Round(output*100) / 100) fmt.Print("\nError: ") fmt.Print(math.Round(errorResult*100) / 100) fmt.Print("\n\n") } } func learn(inputVector float64, weightVector float64, learningRate float64) float64 { for index, inputValue:= range inputVector { if inputValue >0.00 ( weightVector = weightVector + learningRate ) return weightVector ) func evaluate(inputVector float64, weightVector float64) float64 ( result:= 0.00 for index, inputValue:= range inputVector ( layerValue:= inputValue * weightVector result = result + layerValue ) return result ) func evaluateError(desired float64, actual float64) float64 ( return desired - actual )

You can help and transfer some funds for the development of the site

 

The funny thing about high technology is that it is thousands of years old! For example, calculus was invented independently by Newton and Leibniz more than 300 years ago. What was once considered magic is now well understood. And, of course, we all know that Euclid invented geometry a couple of thousand years ago. The trick is that it often takes years before something becomes “popular.” Neural networks are an excellent example. We've all heard about neural networks and what they promise, but for some reason we don't see regular programs based on them. The reason for this is that the true nature of neural networks is extremely complex mathematics, and it is necessary to understand and prove the complex theorems that cover it, and perhaps knowledge of probability theory and combinatorial analysis is necessary, not to mention physiology and neuroscience.

The incentive to create any technology for a person or people is to create a Killer Program with its help. We all now know how DOOM works, i.e. using BSP trees. However, John Carmack did not invent them at the time; he read about them in an article written in 1960. This article described the theory of BSP technology. John took the next step by understanding how BSP trees could be used, and DOOM was born. I suspect that neural networks will undergo a similar renaissance in the next few years. Computers are fast enough to simulate them, VLSI designers create them right in silicon, and there are hundreds of published books on the subject. And since neural networks are the most mathematical entity we know, they are not tied to any physical representation and we can create them using software or create real silicon models. The main thing is that the essence of a neural network is an abstract model.

In many ways, the limits of digital computing have already been realized. Of course, we will improve them and make them even faster, smaller and cheaper, but digital computers will always be able to perceive only digital information, since they are based on a binary computing model. Neural networks, however, are based on different models calculations. They are based on a high-level, distributed, probabilistic model that is not required to find a solution to a problem in the same way that it does computer program; it models a network of cells that can find, establish, or correlate possible ways solving a problem in a more “biological” way, solving the problem in small pieces and adding the results together. This article is an overview of neural network technology, where they will be analyzed in as much detail as possible in several pages.

Biological analogues

Neural networks were inspired by our own brains. Literally - someone's brain in someone's head once said: "I'm interested in how I work?", and then proceeded to create a simple model of itself. Strange, right? The standard neural node model, based on a simplified model of the human neuron, was invented more than fifty years ago. Take a look at Figure 1.0. As you can see, there are three main parts of a neuron, these are:

  • Dentrite(s)..................Responsible for collecting incoming signals
  • Soma.................................Responsible for the main processing and summation of signals
  • Axon.............................Responsible for transmitting signals to other dendrites.

The average human brain contains about 100 billion, or 10 to the 11th power, neurons, and each of them has up to 10,000 connections through dendrites. Signals are transmitted using electrochemical processes based on sodium, potassium and ions. Signals are transmitted by accumulating the potential difference caused by these ions, but the chemistry is not important and the signals can be thought of as simple electrical impulses traveling from axon to dendrite. The attachment of one dendrite to another axon is called a synapse, and these are the main points of impulse transmission.

So how does a neuron work? There is no simple answer to this question, but for our purposes the following explanation will suffice. Dendrites collect signals received from other neurons, then somas perform summation and calculation of signals and data, and finally, based on the result of processing, they can “tell” axons to transmit the signal further. The transfer further depends on a number of factors, but we can model this behavior as a transfer function that takes inputs, processes them, and prepares outputs if the properties of the transfer function are satisfied. In addition, in real neurons the data output is nonlinear, that is, the signals are not digital, but analog. In fact, neurons continuously receive and transmit signals and their actual pattern depends on frequency and must be analyzed in the S-domain (frequency domain). Real Transmission function We have essentially modeled a simple biological neuron.

Now we have some idea of ​​what neurons are and what we are actually trying to model. Let's take a moment and talk about how we can use neural networks in video games.

Games

Neural networks seem to be the answer to all our needs. If we could transfer symbols and words into little gaming brains, imagine how cool that would be. The neural network model gives us the rough structure of neurons, but does not give us a high level of intelligence and deduction, at least in the classical sense of the word. It will take some thinking to come up with ways to apply neural network technology to gaming AI, but once you do, you can use it in conjunction with deterministic algorithms, fuzzy logic, and genetic algorithms to create a very robust and advanced AI thinking model for your games. Without a doubt, the result will be better than anything you can achieve with hundreds of if-else blocks or script scripts. Neural networks can be used for things like:

Scanning and recognition of the environment— the neural network can receive information in the form of vision or hearing. This information can then be used to generate a response or reaction, or to train the network. These responses can be displayed in real time and updated to optimize responses.

Memory- a neural network can be used as a form of memory for game characters. Neural networks can learn from their own experience and expand the set of responses and reactions.

Behavioral control— the output of the neural network can be used to control the actions of the game character. The inputs can be various game engine variables. Then the network will be able to control the behavior of the game character.

Mapping responses— neural networks work really well with “associations,” which is essentially the binding of one space to another. Association comes in two flavors: auto-association, which associates the input with itself, and hetero-association, which associates the input with something else. Response binding uses neural networks as back-end or output to create another layer of indirect control of an object's behavior. Typically, we might have a number of control variables, but we only have clear answers to a number of specific combinations that we can train the network to use. However, using a neural network as an output we can get other answers that are approximately in the same area as our clearly predefined ones.

The examples given may seem a little vague, but they are. The fact is that neural networks are a tool that we can use as we please. The key here is that using them makes the task of creating AI easier and will allow game characters to behave more intelligently.

Neural Networks 101

In this section, we will review the basic terms and concepts used when discussing neural networks. This is not so simple, since neural networks are truly the product of several different disciplines, and each of them brings with it its own specific vocabulary. Alas, the vocabulary related to neural networks is the intersection of the vocabularies of all these disciplines, so we simply cannot consider everything. Additionally, neural network theory is replete with hardware that is redundant, which means many people are reinventing the wheel. This influenced the creation of a number of neural network architectures, each of which has its own name. I will try to describe general terms and situations so as not to get bogged down in naming. Well, in this article we will look at some networks that are different enough to have different names. As you read, don't worry too much if you can't immediately understand all the concepts and terms, just read them, we will try to cover them again in the context of the article. Let's start...

Now that we've seen the "biological" version of a neuron, let's look at the basics of an artificial neuron to set the stage for our thinking. Rice. 2.0 is the graphical standard of a “neuronode” or artificial neuron. As you can see, it has several inputs labeled X1 - Xn and B. These inputs have a weight W1-Wn associated with them, and b attached to them. In addition, there is a sum connection Y and one output y. The output y in the neuronode is based on the transfer function, or "activation", which is a function of the neuronode's input into the network. Incoming data comes from Xs and from Bs, which are connected to neighboring nodes. The idea is that B is “past”, “memory”. The basic operation of a neuronode is this: the input data from X is multiplied by the weight associated with it and summed. The summation output is the input to activate Ya. The activation is then fed to the activation function fa(x) and the final output is y. The equation for it all is:

ur. 1.0

n
Ya = B* b + e Xi * wi
i = 1 AND

y = fa(Ya)

The different forms of fa(x) will be covered in a minute.

Before we continue, we need to talk about the introductory Xi, Wi weights, and their corresponding areas. In most cases, the inputs contain positive and negative numbers in a set (- ¥, + inputs = I). However, many neural networks use simple two-digit values ​​(like true/false). The reason for using such a simple scheme is that ultimately all complex data is converted into a pure binary representation. In addition, in many cases we need to solve computer problems, such as voice recognition, that are precisely suited to two-digit representations. However, this is not set in stone. In either case, the values ​​used in the divalent system are primarily 0 and 1 in the binary system or -1 and 1 in the bipolar system. Both systems are similar, except that the bipolar representation turns out to be mathematically more convenient than the binary one. The weights Wi at each input are usually in the range between (-Ґ, +Ґ), and are called “exciting” or “inhibiting” for positive and negative values, respectively. Additional input B, which is always called with 1.0 and multiplied by b, where b is its weight.

Continuing our analysis, after finding the Ya activation for a neuronode, it is applied to the activation function and the result can be calculated. There are a number of activation functions that have different uses. Basic activation functions Fa(x):

The equations for each are quite simple, but each fits its own model or has its own set of parameters.

The step function is used in a number of neural networks and models to achieve a given criticality of the input signal. The purpose of the q factor is to model the critical level of the incoming signal to which the neuron must respond.

A linear activation function is used when we want the neuronode output to follow the input activation as closely as possible. A similar function can be used to create linear systems such as constant speed motion. Finally, the exponential function is the key to advanced neural networks, the only way to create neural networks that can provide nonlinear responses and model nonlinear processes. The exponential activation function is a fork in the development of neural networks, since using step and linear functions we will never be able to create a neural network that gives a nonlinear response. However, we are not required to use this particular function. Hyperbolic, logarithmic and transcendental functions can also be used depending on the desired properties of the network. Finally, we can use all such functions if we want.

As you can guess, one neuron won't do much, so you need to create a group of neurons and a layer of neuronodes, as shown in Fig. 3.0. The figure illustrates a small single-layer neural network. The neural network in Fig. 3.0 contains a number of input and output nodes. By convention, this is a single-layer neural network, because the input layer is not taken into account unless it is the only layer in the network. In this case, the input layer is also the output layer, so the network is single-layer. Rice. 4.0 shows a two-layer neural network. Note that the input layer is still ignored and the inner layer is called "hidden". The output layer is called the output of the response layer. There is theoretically no limit to the number of layers in a neural network, but it can be very difficult to describe the relationships of the different layers and acceptable training methods. The best way creating a multi-layer neural network is to make each network one- or two-layer, and then connect them as components or functional blocks.

Okay, now let's talk about "temporary" or the topic of time. We all know that our brain works quite slowly compared to a digital computer. In fact, our brain produces one cycle on the scale of milliseconds, while the time for a digital computer is calculated in nanoseconds and almost in sub-nanoseconds. This means that the signal path from neuron to neuron takes some time. This is also modeled in artificial neurons in the sense that we perform the calculations layer by layer and produce the results sequentially. This helps model the time lag present in biological systems such as our brains.

We're almost done with the preliminary discussion, let's talk about some high-level concepts and then finish with a couple more terms. The question you should ask is “what’s the big deal about neural networks?” This is a good question, and difficult to answer definitively. The extended question is “What do you want to try to get neural networks to do?” Basically they display a technique that helps reflect one space in relation to another. Neurons are essentially a type of memory. And as with any memory, we can use some appropriate terms to describe them. neurons have both STM (short-term memory) and LTM ( long term memory). STM is the ability of a neural network to remember something it just learned, and LTM is the ability of a neural network to remember something it learned some time ago in light of the information it just learned. This brings us to the concept plasticity or, in other words, to the concept of how a neural network will behave with information or when learning. Can a neural network learn more information and continue to correctly “remember” what it previously learned? If so, then neural networks become unstable because they will eventually contain so much information that the data will endlessly intersect and overlap each other. This brings us to another requirement - stability. The bottom line is that we want the neural network to have good LTM, good STM, be flexible and exhibit stability. Of course, some neural networks are not analogous to memory, they are aimed mainly at functional mapping, and these concepts do not apply to them, but you get the idea. Now that we know about these concepts related to memory, let's wrap up the review with a few mathematical factors that will help evaluate and understand these properties.

One of the main applications of neural networks is to create a memory mechanism that can process incomplete or unclear input data and return a result. The result may be the input data itself (association) or a response completely different from the input data (heteroassociation). It is also possible to superimpose N-dimensional space onto M-dimensional space and non-linear data loading. This means that the neural network is a kind of hyperdimensional memory unit because it can associate an input N element with an output M element, where M may or may not be equal to N.

What neural networks essentially do is divide N-dimensional space into regions that uniquely map inputs to outputs or classify inputs into different classes. Then, as the values ​​(vectors) of the incoming data set (let’s call it S) increase, it logically follows that it will be more difficult for neural networks to separate information. And since neural networks are filled with information, the input values ​​that need to be “remembered” will overlap, since the input space cannot contain all the separated data in an infinite number of dimensions. This overlap means that some inputs are not as strong as they could be. Although this issue is not a problem in some cases, it is a concern when modeling neural memory networks; Let's, to illustrate the concept, assume that we are trying to associate an N-set of input vectors with some set of outputs. The output set is not as big a problem for proper functioning as the output set S is.

If the input set S is strictly binary, then we are looking at sequences of the form 1101010 ... 10110. Let's say that our inputs only have 3 bits each, so the entire input space consists of vectors:

v0 = (0,0,0), v1 = (0,0,1), v2 = (0,1,0), v3 = (0,1,1), v4 = (1,0,0), v5 = (1,0,1), v6 = (1,1,0),

For greater precision, the basis for this set of vectors is:

v = (1,0,0) * b2 + (0,1,0) * b1 + (0,0,1) * b0, where bi can take on the values ​​0 or 1.

For example, if we assume that B2=1, B1=0, and B0=1, we will get the following vector:

v = (1,0,0) * 1 + (0,1,0) * 0 + (0,0,1) * 1 = (1,0,0) + (0,0,0) + (0 ,0,1) = (1,0,1) which is the Vs of our possible input array

A basis is a special summed vector that describes an array of vectors in space. This is how V describes all the vectors in our space. In general, without going into long explanation, the more orthogonal the vectors in the input array, the better they will propagate through the neural network, and the better they can be called. Orthogonality refers to the independence of vectors, in other words, if two vectors are orthogonal, then their dot product is zero, their projection onto each other is zero, and they cannot be described relative to each other. There are many orthogonal vectors in the array v, but they come in small groups, for example V0 Orthogonal to all vectors, so we can always include it. But if we include V1 in the array S, then only the vectors V2 and V4 will maintain orthogonality with it:

v0 = (0,0,0), v1 = (0,0,1), v2= (0,1,0), v4 = (1,0,0)

Why? Because Vi - Vj for all i,j from 0...3 is equal to zero. In other words, the dot product of all pairs of vectors is 0, so they must all be orthogonal. So this array is very good as an input array of a neural network. However the array:

v6 = (1,1,0), v7 = (1,1,1)

potentially bad, because inputs v6-v7 are non-zero, in a binary system it is 1. The next question is - can we measure this orthogonality? The answer is Yes. In the binary vector system there is a measure called the Hamming distance. It is used to measure the N-dimensional distance between binary vectors. It is simply the number of bits that are different between the two vectors. For example, vectors:

v0 = (0,0,0), v1 = (0,0,1)

have a Hamming distance of 1 between themselves, and

v2 = (0,1,0), v4 = (1,0,0)

have a hamming distance of 2.

We can use the Hamming distance as a measure of orthogonality in binary vector systems. And this can help us determine whether our input data sets have intersections. Defining orthogonality with common input vectors is more complicated, but the principle is the same. Enough with the concepts and terminology, let's jump ahead and look at actual neural networks that do something and maybe by the end of this article you can use them to improve your game's AI. We'll look at neural networks used to perform logical functions, classify inputs, and associate them with outputs.

Pure logic

The first artificial neural networks were created in 1943. McCulloch and Pitts. They consisted of a number of neural nodes and were used mainly for calculating simple logical functions such as AND, OR, XOR and their combinations. Rice. 5.0. is a basic McCulloch and Pitts neuronode with two inputs. If you are an electrical engineer, you will immediately see their close resemblance to transistors. In any case, McCulloch-Pitts neuronodes have no connections and the simple activation function Fmp(x) is:

fmp (x) = 1, if xіq
0, if x< q

The MP (McCulloch-Pitts) neuronode operates by summing the product of the inputs Xi and the weights Wi and takes as a result Ya for the function Fmp (x). McCulloch-Pitts' early research focused on creating complex logic circuits with neuronode models. In addition, one of the rules for modeling neuronodes is that signal transmission from neuron to neuron takes one time step. This helps to be closer to the model of natural neurons. Let's take a look at some examples of MP neural networks that implement basic logic functionality. Logical function AND Has following table truth:

We can model this with two input MP neural networks with weights w1=1, w2=2, and q=2. This neural network is shown in Fig. 6.0a. As you can see, all input combinations work correctly. For example, if we try to set the inputs X1=0, Y1=1, then the activation will be:

X1*w1 + X2*w2 = (1)*(1) + (0)*(1) = 1.0

If we apply 1.0 to the activation function Fmp(x), then the result will be 0, which is the correct answer. As another example, if we try to set the inputs X1=1, X2=1, then the activation will be:

X1*w1 + X2*w2 = (1)*(1) + (1)*(1) = 2.0

If we enter 2.0 into the activation function Fmp(x), the result will be 1.0, which is correct. In other cases it will work similarly. The OR function is similar, but the sensitivity q is changed to 1.0 instead of 2.0 as in AND. You can try running the data through truth tables yourself to see the results.

The XOR network is a little different as it actually has 2 layers as... the results of preprocessing are further processed in the output neuron. This good example why a neural network needs more than one layer to solve certain problems. XOR is a common problem neural networks, which is usually used to test network performance. In any case, XOR is not linearly separable into a separate layer; it must be broken down into smaller stages, the results of which are summed up. The truth table for XOR looks like this:

XOR is only true when the inputs are different, this is a problem because... both inputs refer to the same output. XOR is linearly inseparable, as shown in Fig. 7.0. As you can see, there is no way to separate the correct answer in a straight line. The point is that we can separate the correct answer with two lines, and that's exactly what the second layer does. The first layer preprocesses the data or solves part of the problem, and the remaining layer completes the calculation. Referring to Fig. 6.0 we see that the weights Wq=1, W2=-1, W3=1, W4=-1, W5=1, W6=1. The network works as follows: the layer is calculated if X1 and X2 are opposites, the results in cases (0,1) or (1,0) are food for layer two, which sums them up and passes them on if true. In essence, we created a Boolean function:

z = ((X1 AND NOT X2) OR (NOT X1 AND X2))

If you would like to experiment with basic McCulloch-Pitts neuronodes, the following listing is a complete neuronode stimulator with two inputs.

// MCULLOCH PITTS SIMULATOR
// INCLUDES
/////////////////////////////////////////////////////

#include
#include
#include
#include
#include
#include
#include
#include
#include
#include

// MAIN
/////////////////////////////////////////////////////

void main(void)
{
float threshold // this is the theta term used to threshold the summation
w1, w2, // these hold the weights
x1, x2, // inputs to the neurode
y_in, // summed input activation
y_out; // final output of neurode

printf( "nMcCulloch-Pitts Single Neurode Simulator.n") ;
printf("nPlease Enter Threshold?");
scanf("%f" ,& threshold) ;

printf( "nEnter value for weight w1?") ;
scanf("%f" ,& w1) ;

printf( "nEnter value for weight w2?") ;
scanf("%f" ,& w2) ;

printf("nnBegining Simulation:");

// enter main event loop

while(1)
{
printf( "nnSimulation Parms: threshold=%f, W=(%f,%f) n", threshold, w1, w2) ;

// request inputs from user
printf("nEnter input for X1?");
scanf("%f" ,& x1) ;

printf("nEnter input for X2?");
scanf("%f" ,& x2) ;

//compute activation
y_in = x1* w1 + x2* w2;

// input result to activation function (simple binary step)
if (y_in >= threshold)
y_out = (float) 1.0;
else
y_out = (float) 0.0;

// print out result
printf("nNeurode Output is %fn" , y_out) ;

// try again
printf( “nDo you wish to continue Y or N?”) ;
char ans[ 8 ] ;
scanf("%s" , ans) ;

if (toupper(ans[ 0 ] ) != "Y" )
break ;
) // end while

printf("nnSimulation Complete.n");
) // end main

This concludes our discussion of the basic blocks of an MP neural network, let's move on to more complex neural networks, such as those used to classify input vectors.

Classification and recognition of “images”

Finally we are ready to look at real neural networks that have found some use! To proceed to the subsequent discussion of Hebb and Hopfield neural networks, let us analyze general structure neural networks, which will illustrate a number of concepts such as linear separability, bipolar representations, and analogies between neural networks and memories. Let's first take a look at Fig. 8.0, which represents the basic neural network model we are going to use. As you can see, this is a single-node network with three inputs, including bias B, and one output. We want to see how to use this network to implement the logical AND operation that we implemented so easily in McCulloch-Pitts neuronodes.

Let's start by using bipolar representations, so that all 0's are replaced by -1's. The truth table for logical AND when using bipolar inputs and outputs is shown below:

And here is the activation function fc(x) that we will use:


fc (x) = 1, if x i q
— 1, if x< q

Note that the function is a step to bipolar output. Before I continue, let me plant a seed in your brain: mixing and sensitivity ultimately do the same thing, they give us another degree of freedom in our neurons that allows them to create responses that could not be achieved without it. You'll see it soon.

Single neuronode in Fig. 8.0 is trying to do the classification for us. It's going to tell us whether our input is of this class or not. For example, is it an image of a tree or not. Or, in our case (simple logical AND), is it +1 or -1 class? This is the basis of most neural networks, which is why I talked about linear separability. We must arrive at a linear division of space that relates our inputs and outputs so that there is a solid division of the space separating them. So we need to come up with the right weight and bias values ​​that will do this for us. But how do we do this? Just using trial and error, or is there some methodology? The answer is: there are a number of methods for training a neural network. These methods work on various mathematical examples and can be proven, but for now we will simply take the values ​​that work without considering the process of obtaining them. These exercises will take us to learning algorithms and more. complex networks than those given here.

Okay, we are trying to find the weights Wi and bias B that will give the correct result for various inputs with activation function Fc(x). Let's write a summation activation for our neuronode and see if we can create any relationships between weight and input that might help us. Given inputs X1 and X2 with weights W1 and W2 along with B=1 and bias b, we get the following formula:

X1*w1 + X2*w2 + B*b=q

Since B is always equal to 1.0, the formula simplifies to:

X1*w1 + X2*w2 + b=q

X2 = -X1*w1/w2 + (q -b)/w2 (solving in terms of X2)

What it is? This is the line! And if the left side is greater than or equal to q, that is (X1*W1+X2*W2+b), then the neuronode will respond 1, otherwise it will give the result -1. Those. the line is the boundaries of the solution. Rice. 9.0 illustrates this. On the graph you can see that the slope of the line is -w1/w2 and the X2 intercept is (q-b)/w2. Now it’s clear why we can get rid of q? It is part of a constant, and we can always scale b to achieve any loss, so we will assume that Q = 0, resulting in the equation:

X2 = -X1*w1/w2 - b/w

We want to find the weights W1 and W2 and the offset b so that they separate our outputs or classify them into special sections without overlap. This is the key to linear separability. Rice. 9.0 shows the number of decision boundaries that will be sufficient, so you can take any of them. Let's take the simplest ones:

W1=W2=1

With these values ​​the decision boundary becomes:

X2 = -X1*w1/w2 — b/w2 -> X2 = -1*X1 + 1

The slope is -1 and the intercept of X2 = 1. If we plug the input vectors for the logical AND into this equation and use the Fc(x) activation, we get the correct output. For example, if X2+X1-1 > 0, then the neuronode’s response will be -1. Let's try it with our AND inputs and see what happens:

As you can see, neural networks with appropriate weight and bias solve the problem perfectly. In addition, there is a whole family of scales that will do this just as well (pushing the solution boundaries in a direction perpendicular to itself). However, there is an important point. Without offset or sensitivity, only straight passes will be possible, because intercept X2 should be 0. This is very important and this is the main reason for using sensitivity or offset, so this example was important because it clearly shows this fact. Thus, closer to the point - how to find required values weight? Yes, we now have geometric analogies, and this is the beginning of finding an algorithm.

Hebbian learning

Now we are ready to see the first learning algorithm and its application in a neural network. One of the simplest learning algorithms was invented by Donald Hebb, and it is based on using input vectors to change weights so that the weight creates the best linear separation of the inputs and outputs. Unfortunately, the algorithm only works well. Indeed, for orthogonal inputs this works great, but for non-orthogonal inputs the algorithm falls apart. Although the algorithm does not result in correct weights for all inputs, it is the basis of most learning algorithms, so we'll start there.

Before you see the algorithm, remember that it is only for one neuronode, a single-layer neural network. You can, of course, place many neuronodes in a layer, but they will all work in parallel and can be trained in parallel. Instead of using a single weight vector, multi-neuronodes use a weight matrix. Anyway, the algorithm is simple, it looks something like this:

  • The input data is in bipolar form I=(-1,1,0,... -1,1) and contains k elements.
  • There are N input vectors and we will refer to the set of them as a J element, e.g. Ij.
  • The outputs will be called Yj, and there are K outputs, each for one input Ij.
  • The weights W1-Wk are contained in one vector W=(w1, w2,...wk)

Step 1. Initialize all our weights to 0, and let them be contained in a vector W consisting of N entries. Also initialize offset b to 0.

Step 2. From j=1 to n, do:

b = b + yj (Where y is the desired result

w = w + Ij * yj (Remember that this is a vector operation)

The algorithm is nothing more than an “accumulator” of varieties. Shifting decision boundaries is based on changes in input and output. The only problem is that in some cases the boundary will not move fast enough (or at all) and “learning” will not take place.

So how can we use Hebbian learning? The answer to this question is the same as the previous neural network, except that now we have an algorithmic method for training the network, and this network is called a Hebbian Net. As an example, let's take our trusty boolean function and see if the algorithm can find the proper weight and bias values ​​for it. The summation below is equivalent to running the algorithm:

w = + + + = [(-1, -1)*(-1)] + [(-1, 1)*(-1)] + [(1, -1)*(-1)] + [ (1, 1)*(1)] = (2,2)

b = y1 + y2 + y3 + y4 = (-1) + (-1) + (-1) + (1) = -2

Thus W1=2, W2=2, and b=2. These are simply scaled versions of the values ​​W1=1, W2=2, B=-1 that we obtained in the previous section. With this simple algorithm, we can train a neural network (from a single neuronode) to respond to an array of inputs or classify them as true/false, or 1/-1. And now, if we had an array of such neuronodes, we could create a network that not only defines input as yes/no, but also associates input with certain patterns. This is one of the foundations for the next neural network structure, Hopfield networks.

Hopfield algorithms

John Hopfield is a physicist who likes to play with neural networks (which is good for us). He came up with a simple (at least comparatively) but effective neural network model called the Hopfield Network. It is used for association, if you input a vector x, then you will get x as the output (I hope). The Hopfield network is shown in Figure 10.0. This is a single-layer network with a certain number of neuronodes equal to the number of inputs Xi. The network is fully connected, which means that every neuronode is connected to every other neuronode, and the inputs are also outputs. This may seem strange to you, because... appears Feedback. Feedback is one of key features Hopfield network, and it is one of the basic principles for obtaining the correct result.

The Hopfield network is an iterative auto-associative memory. It may take one to several cycles to get the correct result. Let me be clear: a Hopfield network receives input and then gives it back, and the resulting output may or may not be the desired input. This feedback loop may occur several times before the input vector is returned. So the functional sequence of a Hopfield network is: first determine the weight of our inputs that we want to associate, then give the input vector and see what the activation function gives. If the result is the same as our original input, then everything worked out; if not, we take the resulting vector and feed it to the network. Now let's see the weight matrix and learning algorithm used in Hopfield networks.

The learning algorithm for the Hopfield network is based on Hebb's rule and simply sums the result. However, since Hopfield networks have multiple input neurons, the weight is no longer an array of weights, but an array of arrays that are compactly contained in a single matrix. Thus the weight matrix W for the Hopfield network is created based on this equation:

  • The input vectors are in bipolar form I = (-1,1,... -1,1) and contain k elements.
  • There are N input vectors, and we will refer to their set as the j-element of I, i.e. Ij.
  • The outputs will be called Yj, and there are k outputs, each for its own input Ij.
  • The weight matrix W is square and has dimension kxk because we have k inputs.

k
W (kxk) = e Iit x Ii
i = 1

Note: Each outer product will have dimension K x K since we are multiplying a column vector and a row vector.

And, Wii = 0 for all i

The activation function fh(x) is shown below:

fh (x) = 1, if x i 0
0, if x< 0

fh (x) is a step function with a binary result. This means that every input must be binary, but did we already say that the inputs are bipolar? Well, yes, it is, and it is not. When the weight matrix is ​​generated, we convert all input vectors to bipolar format, but during normal operations we use the binary version of the input and output, because The Hopfield network is also binary. This conversion is not necessary, but it makes discussing the network a little easier. Anyway, let's look at an example. Let's say we want to create 4 Hopfield network nodes and we want them to call these vectors:

I1=(0,0,1,0), I2=(1,0,0,0), I3=(0,1,0,1) Note: they are orthogonal

Converting to bipolar values, we get:

I1* = (-1, -1,1, -1) , I2* = (1, -1, -1, -1) , I3* = (-1,1, -1,1)

Now we need to calculate W1,W2,W3, where Wi is the product of transposing each input with itself.

W1= [ I1* t x I1* ] = (- 1 , - 1 , 1 , - 1 ) t x (- 1 , - 1 , 1 , - 1 ) =
1 1 — 1 1
1 1 — 1 1
— 1 — 1 1 — 1
1 1 — 1 1 W2 = [ I2* t x I2* ] = (1 , — 1 , — 1 , — 1 ) t x (1 , — 1 , — 1 , — 1 ) =
1 — 1 — 1 — 1
— 1 1 1 1
— 1 1 1 1
— 1 1 1 1

W3 = [ I3* t x I3* ] = (— 1 , 1 , - 1 , 1 ) t x ( — 1 , 1 , — 1 , 1 ) =
1 — 1 1 — 1
— 1 1 — 1 1
1 — 1 1 — 1
— 1 1 — 1 1

Zeroing the main diagonal gives us the final weight matrix:

W=
0 — 1 — 1 — 1
— 1 0 — 1 3
— 1 — 1 0 — 1
— 1 3 — 1 0

Wow, now let's dance. Let's input our original vectors and look at the results. To do this, we simply multiply the inputs by a matrix and process each output with the function Fh(x). Here are the results:

, — 1 ) and fh((0 , — 1 , — 1 , — 1 ) ) = (1 , 0 , 0 , 0 )

I3 x W = (— 2 , 3 , - 2 , 3 ) and fh((— 2 , 3 , — 2 , 3 ) ) = (0 , 1 , 0 , 1 )

The inputs were perfectly “remembered”, as it should be, because they are orthogonal. As a final example, let's assume that our input (hearing, vision, etc.) is a little noisy and contains one error. Let's take I3=(0,1,0,1) and add some noise, i.e. I3 noise = (0,1,1,1). Now let's see what happens if we introduce this “noisy” vector into the Hopfield network:

I3 noise x W = (-3, 2, -2, 2) and Fh ((-3.2, -2, 2)) = (0,1,0,1)

Surprisingly, the original vector is “remembered”. It's great. This way we have the ability to create a "memory" that is filled with bit patterns that are similar to trees (oak, weeping willow, spruce, etc.), and if we introduce another tree, such as a willow that was not on the network, our network will (hopefully) output information about what it “thinks” the willow tree looks like. This is one of strengths associative memories: we don’t have to train the network for every possible input, just enough for the network to form “associations.” The "close" inputs are then typically stored as the originally learned input. This is the basis of image and voice recognition. Don't ask me where I got the "tree" analogy. In any case, at the end of our article I included an auto-associative Hopfield network simulator, which allows you to create networks of up to 16 neuronodes.

Brain dead...

That's all we'll look at today. I was hoping to get to the preceptron networks, but oh well. I hope you understand at least a little what neural networks are and how to create working programs to model them. We've covered some basic terms and concepts, some basic math, and some of the most common network models. However, there is still a lot more to learn about neural networks. These include perceptrons, fuzzy associative memory or FAMs, bidirectional associative memory or BAMs, Kohonen maps, backpropagation networks, adaptive resonant network theory, and much more. That's it, my neural network is calling me to play!

An example of a neural network program with source code in C++.

Neural networks are written well and in detail here. Let's try to figure out how to program neural networks, and how it works. One of the problems solved by neural networks is the classification problem. The program demonstrates the operation of a neural network that classifies color.

The computer adopts a three-component RGB color representation model, with one byte allocated to each component. full color is represented in 24 bits, resulting in 16 million shades. A person can attribute any of these shades to one of the named colors. So the task:

Given InColor - RGB color (24 bit)

classify color, i.e. assign it to one of the colors specified by the set M = (Black, Red, Green, Yellow, Blue, Purple, Light Blue, White).

OutColor - color from set M

Solution number 1. (digital)

Create an array of 16777216 elements

Solution number 2. (analog)

let's write a function like

int8 GetColor(DWORD Color)
{
double Red = (double(((Color>>16)&0xFF)))/255*100;
double Green = (double(((Color>>8)&0xFF)))/255*100;
double Blue = (double((Color&0xFF)))/255*100;
double Level = Red;
if(Green > Level)
Level = Green;
if(Blue > Level)
Level = Blue;
Level = Level * 0.7;
int8 OutColor = 0;
if(Red > Level)
OutColor |= 1;
if(Green > Level)
OutColor |= 2;
if(Blue > Level)
OutColor |= 4;
return OutColor;
}

This will work if the problem can be described by simple equations, but if the function is so complex that it cannot be described. does not lend itself, this is where neural networks come to the rescue.

Solution number 3. (neural network)

The simplest neural network. Single layer perceptron.

Everything neural is enclosed in the CNeuroNet class

Each neuron has 3 inputs, where the intensities of the color components are supplied. (R,G,B) in the range (0 - 1). There are 8 neurons in total, according to the number of colors in the output set. As a result of the network’s operation, a signal in the range (0 - 1) is generated at the output of each neuron, which means the probability that this color is at the input. We choose the maximum and get the answer.

Neurons have a sigmoid activation function ActiveSigm(). The ActiveSigmPro() function, a derivative of the sigmoid activation function, is used to train a neural network using the backpropagation method.

The first line displays the color intensities. Below is a table of weight coefficients (4 pcs.). The last column contains the value of the neurons' output. Change the color, select the correct answer from the list using the button Teach call the training function. AutoTeach calls the automatic learning procedure 1000 times, the random color is determined by the formula from solution number 2, and the learning function is called.

download source and a neural network program

Programming artificial neural networks - I write in C++ in an object-oriented paradigm

For simple neuroarchitectures (structures), methods and tasks, you can use any language (even BASIC), but for complex projects, object-oriented programming languages ​​(such as C++) are most suitable. I use C++, if necessary (in the most time-consuming places of the program) rewriting individual functions (requiring faster calculations) in inline assembler.

We will show the benefits of an object-oriented approach to programming neural networks. A neural network can have many options for neurons and/or layers (see the note about modern opportunities to assemble a convolutional neural network from a large number of different types of layers and neurons). Some general functionality of neurons or layers can be transferred to an abstract ancestor class, generating (inheriting) from it classes for certain types of neurons/layers (these descendant classes will describe and implement features and behavior that are already unique to a particular type of neuron or layer) . This ensures the elimination of multiple duplication of common (identical) things in the program text and the ability to write program control code that is more flexible and independent of specific types of neurons/layers when using the principles of polymorphism.

As an example, consider the nomenclature of classes for describing the layers of a neural network in one of the programs I developed. There are 3 main hierarchies - one for classes that describe the structure of the layer (a chain of inheritance of three classes), the second for nonlinear functions of neurons (a base class and ten descendants from it), the third for the layers themselves (a chain of 5 base-intermediate classes and 12 real classes that branched off from this chain at its different levels).

Neural networks

In the text of the program, 31 classes are used (implemented) to describe the behavior of network layers, but only 12 of them implement real layers, and the remaining classes:

  • or abstract ones, defining general behavior, allowing in the future to more easily create new descendants (new types of layers) and increasing the degree of invariance of the control logic (implementing the functioning of the neural network) to the various layers that make up the network;
  • or, on the contrary, they are subordinate and are included in the layer as its “components” (an instance of an object that describes the structure and characteristics of the layer, and an instance of an object that implements the nonlinear function of neurons).

The functioning of the created neural network is programmed through calls to the methods and properties of abstract classes, regardless of which specific descendant class implements this or that layer of the neural network.

Those. control logic here is separated from specific content and is tied only to general, invariant foundations. Only the code of the “constructor” of the neural network needs to know about specific classes - which works only at the time of creation new network based on the settings specified in the interface or when loading a previously saved network from a file. Adding new types of neuron layers to the program will not lead to reworking the algorithms (logic) of the network’s operation and training - but will only require a small addition to the code of the mechanisms for creating (or reading from a file) the neural network.

There are no abstractions (classes) for individual neurons - only for layers. If you need to place a single neuron on a certain (namely, on the output) layer, then simply when creating an instance of the class of the desired neural layer, a counter of the number of neurons equal to one is passed.

Thus, object-oriented design and programming provide greater flexibility to implement the divide and conquer principle compared to structured programming, through:

  • moving common code fragments into ancestor classes (inheritance principle),
  • ensuring the necessary concealment of information (encapsulation principle),
  • constructing and obtaining a universal, independent of specific implementations of classes, external control logic (the principle of polymorphism).
  • achieving cross-platform functionality through the removal of hardware- or platform-dependent parts into descendant classes (the number of which can vary for each specific software and hardware implementation, while the number of base ancestor classes and all high-level logic will be the same and unchanged).

For modern tasks of developing flexible and powerful neuromodeling tools, all this turns out to be very useful.

Also see the post about projects of special ANN description languages.

neural networks,
data analysis methods:
from research to development and implementation

home
Services
Neural networks
basic ideas
possibilities
advantages
Areas of use
how to use

Solution Accuracy
NS and AI
Programs
Articles
Blog
About the author / contacts

Fine-tuning the Universe- this is a unique combination of numerous properties of the Universe such that only it is capable of ensuring the existence of the observable Universe. Even minimal deviations in the composition or values ​​of these properties are incompatible with the fact of the existence of the Universe.

Typically, the concept of fine-tuning the Universe is considered in a weak formulation: the values ​​of only a few world constants are taken into account, and a conclusion is drawn only about the impossibility of human existence if they deviate. This limited approach encourages religiously charged attempts to explain this phenomenon, e.g. anthropic principle, declaring the God-given expediency of the Universe contained in human existence.

The fine tuning of the Universe is the most impressive of the apperceptions of modern cosmology: no other can compare with it in the strength and convincing evidence of the Big Deadlock, that the Universe is structured categorically differently from how modern science imagines it, and explores it within the framework of this idea. Not just a few constants, but in general an unimaginably huge body of various facts; if any of them had even a slight difference from what was observed, it would make the existence of life and the Universe impossible. The values ​​of the properties of elementary particles (mass, charges, half-lives...), the properties of fundamental interactions, the properties of substances (and at least water) - all this and much, much more were carefully chosen just so that the Universe existed. Any of the millions of these facts, if it were different, would lead to its non-existence. Or, at least (in a weak formulation) - to the impossibility of life in it.

Artificial neural network

Awareness of this completely destroys the usual “scientific picture of the world” of the era of the Great Deadlock. But, as already said, there is an apperception here: people refuse to realize it.

ITV fine tuning explained

ITV establishes that the Universe consists only of information. The observer observes the observed, receiving information - and this is all that makes up the Universe. Information, being just a description of something, could be any possible if it were obtained on its own. However, information in itself is nothing and cannot describe anything; this always requires some conditions, and they limit the content of information.

Thus, the Universe is a set of conditioned selections of specific observable information from a wide range of what is in principle possible. The existence of the Universe is the final criterion of all these choices: they are such insofar as the Universe exists; if they were different, the Universe would not exist.

That's what it is fine tuning Universe: the values ​​of the world constants and, in general, all the properties of the Universe were determined by the fact of the existence of the Universe, no one specially selected them, they are not derived from any single constant, there is no possibility of establishing their different composition and other values.

Let's analyze the neural network. C#

In this article, I propose to analyze the operation of neural networks and create one of the simplest versions of a neural network that learns with the help of a teacher.

A few words need to be given to the structure of this very “great” and “terrible” neural network. For a long time, people walked back and forth and pondered the question: (what is the meaning of life?)
How can you recognize patterns?

There were a huge number of answers. There are various heuristics, comparisons based on patterns, and much, much more. One answer was a neural network. [By the way, a neural network can not only recognize images]

So. Structure of a neural network. Imagine this: a spider wove a web and the web caught a fly. The place where the fly landed is the neuron that was “maximally” close to the target. A neural network consists of neurons that “describe”
the chances of a particular event. The description of the “probability” of an event (each neuron) can be stored (for example) in a separate file.

Now let's move on to the main topic of conversation this evening.

How does a neural network work?

How does it learn and recognize?

An example of the structure of a neural network is clearly visible in this picture:

The input receives a set of input signals X. Which are multiplied by a set of weights W (Xi * Wi). The neuron calculates the sum of the products and sends a certain number to the output.
After counting the values ​​of all neurons, the search for the largest value is performed. This is the highest value and is considered the correct answer to the question. The program produces an image that is described by the found neuron.
In the training mode, the user has the opportunity to correct the result (based on his experience) and then the program will recalculate the neuron weights.
The recalculation formula is approximately the following: W[i] = W[i] + Speed*Delta*X[i] - here
W[i] — weight of the i-th element,
Speed ​​— learning speed,
Delta - sign (-1 or 1),
X[i] - value of the i-th input signal (in many cases 0 or 1)

Why is delta used?

Let's look at this case.

The input to the program is a picture with the number 6.

What programming language should you use to write neural networks?

The neural network recognized the number 8. The user corrects the number to 6. What happens next in the program?

The program recalculates the data for two neurons describing the number 6 and the number 8, and for the neuron describing the number 6 delta will be equal to 1, and for 8 = -1

How is the speed parameter set?

This parameter, the smaller the parameter, the longer and more accurately (higher-quality) the network training will take place, and the larger the parameter, the faster and more “superficial” the network training will occur.

The Speed ​​parameter can be set either manually by the user or during program execution (for example, const)

As can be seen, the symbol weights must also be determined. What are they determined by in the first place? in fact, everything is also simple here. They are determined completely randomly, this avoids the “bias” of the neural network. Usually, the interval of random values ​​is small -0.4…0.4 or -0.3..0.2, etc.

Now let's move on to the most interesting part. How to code this!

Let's create two classes - the Neuron class and the Network class (Neuron and Net, respectively)

Let us describe the main tasks of the Neuron class:

— Reaction to input signal

— Summation

— Adjustment

(you can additionally add reading from a file, creating initial values, preservation. Let's leave this to the “conscience” of the readers)

Variables inside the Neuron:symbol class
— “Identification” identifier — LastY

— The image being described is symbolsymbol

My acquaintance with neural networks happened when the Prisma application was released. It processes any photo using neural networks and reproduces it from scratch using the selected style. Having become interested in this, I rushed to look for articles and “tutorials,” primarily on Habré. And to my great surprise, I did not find a single article that clearly and step-by-step described the algorithm for the operation of neural networks. The information was scattered and missing key points. Also, most authors rush to show code in one programming language or another without resorting to detailed explanations.

P My first and most important discovery was the playlist of the American programmer Jeff Heaton, in which he explains in detail and clearly the principles of operation of neural networks and their classification. After viewing this playlist, I decided to create my own neural network, starting with the very simple example. You probably know that when you first start learning a new language, your first program will be Hello World. It's a kind of tradition. The world of machine learning also has its own Hello world and this is a neural network that solves the exclusion or (XOR) problem. The XOR table looks like this:

Accordingly, the neural network takes two numbers as input and must output another number - the answer. Now about the neural networks themselves.

What is a neural network?

A neural network is a sequence of neurons connected by synapses. The structure of a neural network came to the world of programming straight from biology. Thanks to this structure, the machine gains the ability to analyze and even remember various information. Neural networks are also capable of not only analyzing incoming information, but also reproducing it from their memory. For those interested, be sure to watch 2 videos from TED Talks:Video 1 , Video 2 ). In other words, a neural network is a machine interpretation of the human brain, which contains millions of neurons transmitting information in the form of electrical impulses.

What types of neural networks are there?

For now, we will consider examples on the most basic type of neural networks - a feed-forward network (hereinafter referred to as a feedforward network). Also in subsequent articles I will introduce more concepts and tell you about recurrent neural networks. SPR, as the name implies, is a network with a sequential connection of neural layers, in which information always flows in only one direction.

What are neural networks for?

Neural networks are used to solve complex problems that require analytical calculations similar to what the human brain does. The most common applications of neural networks are:

Classification - distribution of data by parameters. For example, you are given a set of people as input and you need to decide which of them to give credit to and which not. This work can be done by a neural network, analyzing information such as age, solvency, credit history, etc.

Prediction - the ability to predict the next step. For example, the rise or fall of shares based on the situation in the stock market.

Recognition - Currently, the most widespread use of neural networks. Used in Google when you search for a photo or in phone cameras when it detects the position of your face and highlights it and much more.

Now, to understand how neural networks work, let's take a look at its components and their parameters.

What is a neuron?

A neuron is a computational unit that receives information, performs simple calculations on it, and transmits it further. They are divided into three main types: input (blue), hidden (red) and output (green). There is also a displacement neuron and a context neuron, which we will talk about in the next article. In the case when the neural network consists of large quantity neurons, the term layer is introduced. Accordingly, there is an input layer that receives information, n hidden layers (usually no more than 3) that process it, and an output layer that outputs the result. Each neuron has 2 main parameters: input data and output data. In the case of an input neuron: input=output. In the rest, the input field contains the total information of all neurons from the previous layer, after which it is normalized using the activation function (for now let’s just imagine it as f(x)) and ends up in the output field.

Important to remember that neurons operate with numbers in the range or [-1,1]. But how, you ask, then process numbers that fall outside this range? At this point, the simplest answer is to divide 1 by that number. This process is called normalization and it is very often used in neural networks. More on this a little later.

What is a synapse?

A synapse is a connection between two neurons. Synapses have 1 parameter - weight. Thanks to it, input information changes as it is transmitted from one neuron to another. Let's say there are 3 neurons that transmit information to the next one. Then we have 3 weights corresponding to each of these neurons. For the neuron whose weight is greater, that information will be dominant in the next neuron (for example, mixing colors). In fact, the set of weights of a neural network or the weight matrix is ​​a kind of brain of the entire system. It is thanks to these weights that the input information is processed and turned into a result.

Important to remember , that during the initialization of the neural network, the weights are placed in random order.

How does a neural network work?

IN in this example shows part of a neural network, where the letters I denote input neurons, the letter H denotes a hidden neuron, and the letter w denotes weights. The formula shows that the input information is the sum of all input data multiplied by their corresponding weights. Then we will give 1 and 0 as input. Let w1=0.4 and w2 = 0.7 The input data of neuron H1 will be as follows: 1*0.4+0*0.7=0.4. Now that we have the input, we can get the output by plugging the input into the activation function (more on that later). Now that we have the output, we pass it on. And so, we repeat for all layers until we reach the output neuron. Having launched such a network for the first time, we will see that the answer is far from correct, because the network is not trained. To improve the results we will train her. But before we learn how to do this, let's introduce a few terms and properties of a neural network.

Activation function

An activation function is a way of normalizing input data (we talked about this earlier). That is, if you have a large number at the input, passing it through the activation function, you will get an output in the range you need. There are quite a lot of activation functions, so we will consider the most basic ones: Linear, Sigmoid (Logistic) and Hyperbolic tangent. Their main differences are the range of values.

Linear function

This function is almost never used, except when you need to test a neural network or pass a value without conversion.

Sigmoid

This is the most common activation function and its range of values ​​is . This is where most of the examples on the web are shown, and is also sometimes called the logistic function. Accordingly, if in your case there are negative values ​​(for example, stocks can go not only up, but also down), then you will need a function that also captures negative values.

Hyperbolic tangent

It only makes sense to use hyperbolic tangent when your values ​​can be both negative and positive, since the range of the function is [-1,1]. It is not advisable to use this function only with positive values ​​as this will significantly worsen the results of your neural network.

Training set

A training set is a sequence of data that a neural network operates on. In our case of elimination or (xor), we have only 4 different outcomes, that is, we will have 4 training sets: 0xor0=0, 0xor1=1, 1xor0=1,1xor1=0.

Iteration

This is a kind of counter that increases every time the neural network goes through one training set. In other words, this is the total number of training sets completed by the neural network.

era

When the neural network is initialized, this value is set to 0 and has a manually set ceiling. The larger the epoch, the better trained the network and, accordingly, its result. The epoch increases each time we go through the entire set of training sets, in our case, 4 sets or 4 iterations.

Important do not confuse iteration with epoch and understand the sequence of their increment. First nonce the iteration increases, and then the epoch and not vice versa. In other words, you cannot first train a neural network on only one set, then on another, and so on. You need to train each set once per era. This way, you can avoid errors in calculations.

Error

Error is a percentage that reflects the difference between the expected and received responses. The error is formed every era and must decline. If this doesn't happen, then you are doing something wrong. The error can be calculated in different ways, but we will consider only three main methods: Mean Squared Error (hereinafter MSE), Root MSE and Arctan. There is no restriction on use like there is in the activation function, and you are free to choose any method that will give you the best results. You just have to keep in mind that each method counts errors differently. With Arctan, the error will almost always be larger, since it works on the principle: the greater the difference, the greater the error. The Root MSE will have the smallest error, so it is most common to use an MSE that maintains balance in error calculation.

Root MSE

Arctan

The principle of calculating errors is the same in all cases. For each set, we count the error by subtracting the result from the ideal answer. Next, we either square it or calculate the square tangent from this difference, after which we divide the resulting number by the number of sets.

Task

Now, to test yourself, calculate the output of a given neural network using sigmoid and its error using MSE.

Data:

I1=1, I2=0, w1=0.45, w2=0.78 ,w3=-0.12 ,w4=0.13 ,w5=1.5 ,w6=-2.3.

Solution

H1input = 1*0.45+0*-0.12=0.45

H1output = sigmoid(0.45)=0.61

H2input = 1*0.78+0*0.13=0.78

H2output = sigmoid(0.78)=0.69

O1input = 0.61*1.5+0.69*-2.3=-0.672

O1output = sigmoid(-0.672)=0.33

O1ideal = 1 (0xor1=1)

Error = ((1-0.33)^2)/1=0.45

Result - 0.33, error - 45%.

Thank you very much for your attention! I hope that this article was able to help you in studying neural networks. In the next article, I will talk about bias neurons and how to train a neural network using backpropagation and gradient descent.

Resources used:

Internet