Intro. to CNN

https://www.geeksforgeeks.org/introduction-convolution-neural-network/

In a regular Neural Network there are three types of layers:

CNN consists of multiple layers like the input layer, Convolutional layer, Pooling layer, and fully connected layers.

Conv layer: apply filters to extract features

Pooling layer: downsample to reduce computation

fully connected layer: make final prediction

The network learns the optimal filters through backpropagation and gradient descent.

Advantages of Convolutional Neural Networks (CNNs):

Disadvantages of Convolutional Neural Networks (CNNs):

Backpropagation

https://www.geeksforgeeks.org/backpropagation-in-data-mining/

Backpropagation is an algorithm that backpropagates the errors from the output nodes to the input nodes.

Backpropagation is a widely used algorithm for training feedforward neural networks.

The backpropagation algorithm works by computing the gradient of the loss function with respect to each weight via the chain rule, computing the gradient layer by layer, and iterating backward from the last layer to avoid redundant computation of intermediate terms in the chain rule.

Features of Backpropagation:

Working of Backpropagation:

supervised learning

Need for Backpropagation:

Backpropagation is "backpropagation of errors" and is very useful for training neural networks. It's fast, easy to implement, and simple. Backpropagation does not require any parameters to be set, except the number of inputs. Backpropagation is a flexible method because no prior knowledge of the network is required.

Types of Backpropagation

There are two types of backpropagation networks.

Advantages:

Disadvantages:

Activation Functions

https://www.geeksforgeeks.org/activation-functions-neural-networks/

In the process of building a neural network, one of the choices you get to make is what Activation Function to use in the hidden layer as well as at the output layer of the network.

Elements of a Neural Network

What is an activation function and why use them?

The activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias to it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.

Why do we need Non-linear activation function?

A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks.

Variants of Activation Function

The basic rule of thumb is if you really don't know what activation function to use, then simply use RELU as it is a general activation function in hidden layers and is used in most cases these days.

If your output is for binary classification then, sigmoid function is very natural choice for output layer.

If your output is for multi-class classification then, Softmax is very useful to predict the probabilities of each classes.

Intro. to CNN Pooling Layer

https://www.geeksforgeeks.org/cnn-introduction-to-pooling-layer/

The pooling operation involves sliding a two-dimensional filter over each channel of feature map and summarising the features lying within the region covered by the filter.

Why to use Pooling Layers?

Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number of parameters to learn and the amount of computation performed in the network.

The pooling layer summarises the features present in a region of the feature map generated by a convolution layer. So, further operations are performed on summarised features instead of precisely positioned features generated by the convolution layer. This makes the model more robust to variations in the position of the features in the input image.

Max Pooling

Avg Pooling

Global Pooling

Advantages of Pooling Layer:

Disadvantages of Pooling Layer:

NN Beginners Guide

https://www.geeksforgeeks.org/neural-networks-a-beginners-guide/

NN learn to perform tasks by being exposed to various datasets and examples without any task-specific rules. A NN generates identifying characteristics from the data they have been passed without being programmed with a pre-programmed understanding of these datasets. NN are based on computational models for threshold logic. Threshold logic is a combination of algorithms and mathematics. Components of a typical neural network involve neurons, connections which are known as synapses, weights, biases, propagation function, and a learning rule.

There are basically three sequence of events of learning process.

Types of Neural Networks

For example, a neural network will work with three vectors: a vector of attributes X, a vector of classes Y, and a vector of weights W.

The neural network does not work with any matrices where X's number of rows and columns do not match Y and W's number of rows.

Generalization in Neural Networks

https://www.kdnuggets.com/2019/11/generalization-neural-networks.html

Key words: generalisation, overfitting, balance, dropout, ensemble, regularisation

When training a neural network in deep learning, its performance on processing new data is key. Improving the model's ability to generalize relies on preventing overfitting using these important methods.

Generalisation means how good our model is at learning from the given data and applying the learnt information elsewhere.

Example:

There are 10 breeds of dogs. Tell it they are dogs.

Provide 2 new breeds of dogs, not seen before.

If the network can tell those 2 breeds of animals are also dogs, the network is well generalised.

Overfitting prevents generalisation. (No general understanding of sth)

Need to balance performance with generalization. More parameters may make the network perform better but it also might overfit.

We need to match the neural network's complexity with our data complexity.

Usually how deep learning practitioners go about this is to first train a neural network with a sufficiently high number of parameters such that the model will overfit. So initially, we try to get a model that fits extremely well on the training data. Next we try and reduce the number of parameters iteratively until the model stops overfitting, this can be considered as an optimal neural network. Another technique that we can use to prevent overfitting is using dropout neurons.

Dropout Neurons

At a particular instant, there will be certain neurons that will not be connected to any other neuron in the network. Each neuron in the network has a certain probability of completely dropping out from the network.

At every instant during training, a different set of neurons will be dropped out in a random fashion. Hence we can say that at each instant we are effectively training a certain subset neural network that has fewer neurons than the original neural network. This subset neural network will change each and every time because of the random nature of the dropout neurons. (Some sets of neurons are trained on some examples or features but other sets are not trained on those specific examples)

Training multiple neural networks and aggregating their learnings is called "ensembling" and usually boosts performance. Using dropout essentially does this with while having only 1 neural network.

Weight Regularization

While training neural networks, there is a possibility that the value of certain weights can become very large. This happens because these weights are focusing on certain features in the training data, which is causing them to increase in value continuously throughout the training process. Because of this, the network overfits on the training data.

One of the other reasons why huge weights are bad for a neural network is because of the increased input-output variance. Basically when there is a huge weight in the network, it is very susceptible to small changes in the input, but the neural network should essentially output the same thing for similar inputs. (Think of an analogy. A floating-point number used to compare sth has less precision and a floating-point number has more precision. The one having more precision is more sensitive to the thing it compares)

By adding L1 or L2 regularization, the network will limit the growth of its weights since the magnitude of the weights are a part of the loss function, and the network always tries to minimize the loss function.

In deep learning, we usually use a mix of these methods to improve the performance of our neural networks and to improve the generalization of our models.

Understanding the Design of a Convolutional Neural Network

https://machinelearningmastery.com/understanding-the-design-of-a-convolutional-neural-network/

The most important hyperparameter to a convolutional layer is the size of the filter. Usually, it is in a square shape, and you can consider that as a window or receptive field to look at the input image. Therefore, the higher resolution of the image, then you can expect a larger filter.

On the other hand, a filter too large will blur the detailed features because all pixels from the receptive field through the filter will be combined into one pixel at the output feature map. Therefore, there is a trade-off for the appropriate size of the filter.

Stacking two convolutional layers (without any other layers in between) is equivalent to a single convolutional layer with a larger filter. But a typical design to use nowadays is two layers with small filters stacked together rather than one larger with a larger filter, as there are fewer parameters to train.

The exception would be a convolutional layer with a 1x1 filter. This is usually found as the beginning layer of a network. The purpose of such a convolutional layer is to combine the input channels into one rather than transforming the pixels. Conceptually, this can convert a color image into grayscale, but usually, you can use multiple ways of conversion to create more input channels than merely RGB for the network.

Also, note that in the above network, you are using Conv2D for a 2D filter. There is also a Conv3D layer for a 3D filter. The difference is whether you apply the filter separately for each channel or feature map or consider the input feature maps stacked up as a 3D array and apply a single filter to transform it altogether. Usually, the former is used as it is more reasonable to consider no particular order in which the feature maps should be stacked.

Neural Architecture Search: Everything You Need to Know

https://deci.ai/neural-architecture-search/

Neural Architecture Search is a subfield of automated machine learning (AutoML), so let's first understand what that is. Also referred to as automated ML, AutoML is an overarching term that relates to the process of automating the different tasks involved in applying machine learning to real-world problems.

A subfield of automated ML, NAS is a technique that can help discover the best neural networks for a given problem. It automates the designing of DNNs, ensuring higher performance and lower losses than manually designed architectures. It is also much faster than the traditional manual processes.

The general idea behind NAS is to select the optimal architecture from a space of allowable architectures. The selection algorithm relies on a search strategy, which in turn depends on an objective evaluation scheme. The popular convolutional neural network EfficientNet is an example of an architecture generated by NAS.

Neural Architecture Search has three main building blocks that can be categorized in terms of search space, search strategy/algorithm, and evaluation strategy (Elsken et al., 2019). Each of these components can utilize different methods.

Combining Search and Evaluation: One-Shot Approach

It's expensive to run search and evaluation independently for a large population of child models. To address this, another group of NAS methods, under the umbrella of one-shot architecture search, include a search space of sub-architectures belonging to a single super-architecture with trained weights that are shared among all sub-models (Xie et al., 2019). One-shot methods differ according to how the one-shot model is trained.

Efficient Neural Architecture Search (ENAS) by Pham et al. (2018) is a prominent example of a single-shot algorithm that achieves a 1000X speedup of the search relative to previous techniques. One of the prominent results in this venue is the "once for all" technique (Cai et al., 2019).

Here are some examples of models generated with NAS and their uses on architectures:

Transfer Learning and NAS

Transfer learning is another AutoML approach that reuses a pre-trained model that was developed for one task as a starting point for working on a new problem. The intuition behind it is that neural architectures trained on a large enough dataset can serve as generic models that generalize for similar types of problems. Transfer learning is popular in deep learning as the learned feature maps can be used to train DNNs with a small amount of data.

In contrast, the underlying idea of NAS is that every dataset and its accompanying hardware and production environment, has a specific and unique architecture that performs best with it. Unlike transfer learning, there's flexibility and customization in NAS that requires data scientists and developers to learn and train weights for the new architecture.

The Benefits: Why Use NAS?

NAS allows to:

Selected Resources for Getting Started with NAS

Using NAS involves a lot of moving parts. Here is a list of selected work from AutoML.org

NAS Limitations

AutoML

https://www.automl.org/automl/

What is AutoML?

Relies on human machine learning experts to perform the following tasks:

AutoML to Advance and Improve Research

Hyperparameter optimization and algorithm configuration provide methods to automate the tedious, time-consuming and error-prone process of tuning hyperparameters to new tasks at hand. We for example provide packages for hyperparameter optimization:

Architecture Search and Automated Deep Learning

Packages for architecture search and hyperparameter optimization for deep learning include:

Genetic Algorithm from MathWorks

https://www.mathworks.com/help/gads/genetic-algorithm.html

Genetic algorithm solves smooth or nonsmooth optimization problems with any types of constraints, including integer constraints. It is a stochastic, population-based algorithm that searches randomly by mutation and crossover among population members.

GA Terminology

https://www.mathworks.com/help/gads/some-genetic-algorithm-terminology.html

Fitness Scaling

https://www.mathworks.com/help/gads/fitness-scaling.html

Scaling the Fitness Scores

Scaled values = expectation values.

Fitness scaling converts the raw fitness scores that are returned by the fitness function to values in a range that is suitable for the selection function. The selection function uses the scaled fitness values to select the parents of the next generation. The selection function assigns a higher probability of selection to individuals with higher scaled values.

The range of the scaled values affects the performance of the genetic algorithm. If the scaled values vary too widely, the individuals with the highest scaled values reproduce too rapidly, taking over the population gene pool too quickly, and preventing the genetic algorithm from searching other areas of the solution space. On the other hand, if the scaled values vary only a little, all individuals have approximately the same chance of reproduction and the search will progress very slowly.

Scaling options: Rank, Top, etc.(?)

Refer to the webpage to see more.

GA

https://www.mathworks.com/help/gads/what-is-the-genetic-algorithm.html

What Is the Genetic Algorithm?

The genetic algorithm is a method for solving both constrained and unconstrained optimization problems that is based on natural selection, the process that drives biological evolution. The genetic algorithm repeatedly modifies a population of individual solutions. At each step, the genetic algorithm selects individuals from the current population to be parents and uses them to produce the children for the next generation. Over successive generations, the population "evolves" toward an optimal solution. You can apply the genetic algorithm to solve a variety of optimization problems that are not well suited for standard optimization algorithms, including problems in which the objective function is discontinuous, nondifferentiable, stochastic, or highly nonlinear. The genetic algorithm can address problems of mixed integer programming, where some components are restricted to be integer-valued.

The genetic algorithm differs from a classical, derivative-based, optimization algorithm in two main ways, as summarized in the following:

Classical Algorithm

How the Genetic Algorithm Works

https://www.mathworks.com/help/gads/how-the-genetic-algorithm-works.html

Outline of the Algorithm

... Produces children from the parents. Children are produced either by making random changes to a single parent-mutation-or by combining the vector entries of a pair of parents-crossover. ...

Initial Population

If you know approximately where the minimal point for a function lies, you should set InitialPopulationRange so that the point lies near the middle of that range.

Creating the Next Generation

The algorithm selects a group of individuals in the current population, called parents, who contribute their genes-the entries of their vectors-to their children. The algorithm usually selects individuals that have better fitness values as parents. You can specify the function that the algorithm uses to select the parents in the SelectionFcn option.

The genetic algorithm creates three types of children for the next generation:

Crossover Children

At each coordinate of the child vector, the default crossover function randomly selects an entry, or gene, at the same coordinate from one of the two parents and assigns it to the child. For problems with linear constraints, the default crossover function creates the child as a random weighted average of the parents.

Mutation Children

By default, for unconstrained problems the algorithm adds a random vector from a Gaussian distribution to the parent. For bounded or linearly constrained problems, the child remains feasible.

As the number of generations increases, the individuals in the population get closer together and approach the minimum point [0 0].

What Is Global Optimization?

https://www.mathworks.com/help/gads/what-is-global-optimization.html

Local vs. Global Optima

A local minimum of a function is a point where the function value is smaller than or equal to the value at nearby points, but possibly greater than at a distant point.

A global minimum is a point where the function value is smaller than or equal to the value at all other feasible points.

Generally, Optimization Toolbox solvers find a local optimum. (This local optimum can be a global optimum.) They find the optimum in the basin of attraction of the starting point.

In contrast, Global Optimization Toolbox solvers are designed to search through more than one basin of attraction. They search in various ways:

Basins of Attraction

Springer: The basin of attraction of an attracting set is the set of all the initial conditions in the phase space whose trajectories go to that attracting set.

Scholarpedia: For each such attractor, its basin of attraction is the set of initial conditions leading to long-time behavior that approaches that attractor.

GA from GeeksforGeeks

https://www.geeksforgeeks.org/genetic-algorithms

Genetic algorithms are based on the ideas of natural selection and genetics. These are intelligent exploitation of random search provided with historical data to direct the search into the region of better performance in solution space.

Each generation consist of a population of individuals and each individual represents a point in search space and possible solution. Each individual is represented as a string of character/integer/float/bits. This string is analogous to the Chromosome.

The population of individuals are maintained within search space. Each individual represents a solution in search space for given problem. Each individual is coded as a finite length vector (analogous to chromosome) of components. These variable components are analogous to Genes. Thus a chromosome (individual) is composed of several genes (variable components).

Fitness Score

The population size is static so the room has to be created for new arrivals. So, some individuals die and get replaced by new arrivals eventually creating new generation when all the mating opportunity of the old population is exhausted. It is hoped that over successive generations better solutions will arrive while least fit die.

Once the offspring produced having no significant difference from offspring produced by previous populations, the population is converged. The algorithm is said to be converged to a set of solutions for the problem.

Operators of Genetic Algorithms

The whole algorithm can be summarized as:

Example problem and solution using Genetic Algorithms

const string TARGET = "I love GeeksforGeeks";

Generation: 1 String: tO{"-?=jH[k8=B4]Oe@} Fitness: 18

Generation: 2 String: tO{"-?=jH[k8=B4]Oe@} Fitness: 18

Generation: 3 String: .#lRWf9k_Ifslw #O$k_ Fitness: 17

Generation: 4 String: .-1Rq?9mHqk3Wo]3rek_ Fitness: 16

Generation: 5 String: .-1Rq?9mHqk3Wo]3rek_ Fitness: 16

Generation: 6 String: A#ldW) #lIkslw cVek) Fitness: 14

Generation: 7 String: A#ldW) #lIkslw cVek) Fitness: 14

Generation: 8 String: (, o x _x%Rs=, 6Peek Fitness: 13

...

Generation: 29 String: I lope Geeks#o,Geeks Fitness: 3

Generation: 30 String: I loMe GeeksfoBGeeks Fitness: 2

Generation: 31 String: I love Geeksfo0Geeks Fitness: 1

Generation: 32 String: I love Geeksfo0Geeks Fitness: 1

Generation: 33 String: I love Geeksfo0Geeks Fitness: 1

Generation: 34 String: I love GeeksforGeeks Fitness: 0

As we can see from the output, our algorithm sometimes stuck at a local optimum solution, this can be further improved by updating fitness score calculation algorithm or by tweaking mutation and crossover operators.

Encoding Methods in Genetic Algorithm

https://www.geeksforgeeks.org/encoding-methods-in-genetic-algorithm

Chromosome: All living organisms consist of cells. In each cell, there is the same set of Chromosomes. Chromosomes are strings of DNA and consist of genes, blocks of DNA. Each gene encodes a trait, for example, the color of the eye.

Operation of Genetic Algorithms:

Two important elements required for any problem before a genetic algorithm can be used for a solution are:

Encoding using string:

Encoding of chromosomes is the first step in solving the problem and it depends entirely on the problem heavily. The process of representing the solution in the form of a string of bits that conveys the necessary information. Just as in a chromosome, each gene controls particular characteristics of the individual, similarly, each bit in the string represents characteristics of the solution.

Encoding Methods:

Introduction to Genetic Algorithms

https://courses.cs.washington.edu/courses/cse473/06sp/GeneticAlgDemo

Genetic algorithms are a part of evolutionary computing, which is a rapidly growing area of artificial intelligence.

History

Evolutionary computing was introduced in the 1960s by I. Rechenberg in his work "Evolution strategies" (Evolutionsstrategie in original). His idea was then developed by other researchers. Genetic Algorithms (GAs) were invented by John Holland and developed by him and his students and colleagues. This lead to Holland's book "Adaption in Natural and Artificial Systems" published in 1975.

In 1992 John Koza has used genetic algorithm to evolve programs to perform certain tasks. He called his method "genetic programming" (GP). LISP programs were used, because programs in this language can expressed in the form of a "parse tree", which is the object the GA works on.

Chromosome

Chromosomes are strings of DNA and serve as a model for the whole organism.

Each gene encodes a particular protein. Basically, it can be said that each gene encodes a trait, for example color of eyes. Possible settings for a trait (e.g. blue, brown) are called alleles (uh-lee-l). Each gene has its own position in the chromosome. This position is called locus (loh-kus, a particular position, point, or place).

Complete set of genetic material (all chromosomes) is called genome. Particular set of genes in genome is called genotype. The genotype is with later development after birth base for the organism's phenotype, its physical and mental characteristics, such as eye color, intelligence etc.

Reproduction

During reproduction, recombination (or crossover) first occurs. Genes from parents combine to form a whole new chromosome. The newly created offspring can then be mutated.

Search Space

If we are solving a problem, we are usually looking for some solution which will be the best among others. The space of all feasible solutions (the set of solutions among which the desired solution resides) is called search space (also state space). Each point in the search space represents one possible solution. Each possible solution can be "marked" by its value (or fitness) for the problem. With GA we look for the best solution among a number of possible solutions - represented by one point in the search space.

Looking for a solution is then equal to looking for some extreme value (minimum or maximum) in the search space. At times the search space may be well defined, but usually we know only a few points in the search space. In the process of using GA, the process of finding solutions generates other points (possible solutions) as evolution proceeds.

The problem is that the search can be very complicated. One may not know where to look for a solution or where to start. There are many methods one can use for finding a suitable solution, but these methods do not necessarily provide the best solution. Some of these methods are hill climbing, tabu search, simulated annealing and the genetic algorithm. The solutions found by these methods are often considered as good solutions, because it is not often possible to prove what the optimum is.

NP-hard Problems

One example of a class of problems which cannot be solved in the "traditional" way, are NP problems.

There are many tasks for which we may apply fast (polynomial) algorithms. There are also some problems that cannot be solved algorithmically.

There are many important problems in which it is very difficult to find a solution, but once we have it, it is easy to check the solution. This fact led to NP-complete problems. NP stands for nondeterministic polynomial and it means that it is possible to "guess" the solution (by some nondeterministic algorithm) and then check it.

If we had a guessing machine, we might be able to find a solution in some reasonable time.

Studying of NP-complete problems is, for simplicity, restricted to the problems where the answer can be yes or no. Because there are tasks with complicated outputs, a class of problems called NP-hard problems has been introduced. This class is not as limited as class of NP-complete problems.

A characteristic of NP-problems is that a simple algorithm, perhaps obvious at a first sight, can be used to find usable solutions. But this approach generally provides many possible solutions - just trying all possible solutions is very slow process (e.g. O(2^n)). For even slightly bigger instances of these type of problems this approach is not usable at all.

Today nobody knows if some faster algorithm exists to provide exact answers to NP-problems. The discovery of such algorithms remains a big task for researchers (maybe you! :-)). Today many people think that such algorithms do not exist and so they are looking for an alternative method. An example of an alternative method is the genetic algorithm.

Examples of the NP problems are satisfiability problem, travelling salesman problem or knapsack problem.

Genetic Algorithm

Outline of the Basic Genetic Algorithm

  1. [Start] Generate random population of n chromosomes (suitable solutions for the problem)
  2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population
  3. [New population] Create a new population by repeating following steps until the new population is complete
    1. [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected)
    2. [Crossover] With a crossover probability cross over the parents to form new offspring (children). If no crossover was performed, offspring is the exact copy of parents.
    3. [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).
    4. [Accepting] Place new offspring in the new population
  4. [Replace] Use new generated population for a further run of the algorithm
  5. [Test] If the end condition is satisfied, stop, and return the best solution in current population
  6. [Loop] Go to step 2

Some Comments

As you can see, the outline of the Basic GA is very general. There are many parameters and settings that can be implemented differently in various problems.

The first question to ask is how to create chromosomes and what type of encoding to choose. We then address Crossover and Mutation, the two basic operators of GA. Encoding, crossover and mutation are introduced in next chapter.

The next question is how to select parents for crossover. This can be done in many ways, but the main idea is to select the better parents (best survivors) in the hope that the better parents will produce better offspring.

You may think that generating populations from only from two parents may cause you to loose the best chromosome from the last population. This is true, and so elitism is often used. This means, that at least one of a generation's best solution is copied without changes to a new population, so the best solution can survive to the succeeding generation.

You might be wandering, why genetic algorithms work. It can be partially explained by the Schema Theorem (Holland), however, this theorem has been criticized in recent times. If you want to know more, check other resources.

Parameters of GA

There are two basic parameters of GA - crossover probability and mutation probability.

...

Mutation generally prevents the GA from falling into local extremes. Mutation should not occur very often, because then GA will in fact change to random search.

There are also some other parameters of GA. One another particularly important parameter is population size.

Population size: how many chromosomes are in population (in one generation). If there are too few chromosomes, GA have few possibilities to perform crossover and only a small part of search space is explored. On the other hand, if there are too many chromosomes, GA slows down. Research shows that after some limit (which depends mainly on encoding and the problem) it is not useful to use very large populations because it does not solve the problem faster than moderate sized populations.

Selection

There are many methods in selecting the best chromosomes. Examples are roulette wheel selection, Boltzman selection, tournament selection, rank selection, steady state selection and some others.

Roulette Wheel Selection

A pie proportional to fitness values.

Rank Selection

Problem: if the best chromosome fitness is 90% of the sum of all fitnesses then the other chromosomes will have very few chances to be selected.

Rank selection ranks the population first and then every chromosome receives fitness value determined by this ranking. The worst will have the fitness 1, the second worst 2 etc. and the best will have fitness N (number of chromosomes in population).

A pie proportional to ranks. E.g., in Roulette Wheel, the pie is proportional to fitness values (e.g. 5:10:35:60:...); in Rank, the fitness values are sorted and assigned numbers from 0 to N. So the pie is proportional to ranks (1:2:3:4:...).

Steady-State Selection

This is not a particular method of selecting parents. The main idea of this type of selecting to the new population is that a big part of chromosomes can survive to next generation.

The stady-state selection GA works in the following way. In every generation a few good (with higher fitness) chromosomes are selected for creating new offspring. Then some bad (with lower fitness) chromosomes are removed and the new offspring is placed in their place. The rest of population survives to new generation.

Elitism

Encoding

Binary Encoding

Example of Encoding: Each bit says, whether the corresponding thing is in knapsack.

Permutation Encoding

Example of Encoding: Chromosome describes the order of cities, in which the salesman will visit them.

Value Encoding

Example of Encoding: Real values in chromosomes represent weights in the neural network.

Tree Encoding

Example of Encoding: Chromosome are program functions represented in a tree.

Crossover and Mutation

Binary Encoding

Crossover

Mutation

Permutation Encoding

Crossover

Mutation

Value Encoding

Crossover

Mutation

Tree Encoding

Crossover

Mutation

Travelling Salesman Problem

There are cities and given distances between them. Travelling salesman has to visit all of them, but he does not want to travel very much. The task is to find a sequence of cities to minimize travelled distance. In other words, find a minimal Hamiltonian tour in a complete graph of N nodes.

You can select crossover and mutation type. The description of their meaning follows:

...

Mutation

Recommendations: Parameters of GA

Recommendations are often results of empiric studies of GAs that were often performed on binary encoding only.

Applications of GA

The advantage of GAs is in their parallelism. GA is travelling in a search space using more individuals (and with genotype rather than phenotype) so that they are less likely to get stuck in a local extreme like the other methods.

They are also easy to implement. Once you have the basic GA algorithm implemented, you have just to write a new chromosome (just one object) to solve another problem. With the same encoding you just change the fitness function - and you are done. However, for some problems, choosing and implementation of encoding and fitness function can be difficult.

The disadvantage of GAs is in the computational time. GAs can be slower than other methods. But sice we can terminate the computation in any time, the longer run is acceptable (especially with faster and faster computers).