# Recent Tutorials

26th March 2014 at 21:16

## Introduction to Artificial Neural Networks Part 2 - Learning

Welcome to part 2 of the introduction to my artificial neural networks series, if you haven't yet read part 1 you should probably go back and read that first!

## Introduction

In part 1 we were introduced to what artificial neural networks are and we learnt the basics on how they can be used to solve problems. In this tutorial we will begin to find out how artificial neural networks can learn, why learning is so useful and what the different types of learning are. We will specifically be looking at training single-layer perceptrons with the perceptron learning rule.

Before we begin, we should probably first define what we mean by the word learning in the context of this tutorial. It is still unclear whether machines will ever be able to learn in the sense that they will have some kind of metacognition about what they are learning like humans. However, they can learn how to perform tasks better with experience. So here, we define learning simply as being able to perform better at a given task, or a range of tasks with experience.

## Learning in Artificial Neural Networks

One of the most impressive features of artificial neural networks is their ability to learn. You may recall from the previous tutorial that artificial neural networks are inspired by the biological nervous system, in particular, the human brain. One of the most interesting characteristics of the human brain is it's ability to learn. We should note that our understanding of how exactly the brain does this is still very primitive, although we do still have a basic understanding of the process. It is believed that during the learning process the brain's neural structure is altered, increasing or decreasing the strength of it's synaptic connections depending on their activity. This is why more relevant information is easier to recall than information that hasn't been recalled for a long time. More relevant information will have stronger synaptic connections and less relevant information will gradually have it's synaptic connections weaken, making it harder to recall.

Although simplified, artificial neural networks can model this learning process by adjusting the weighted connections found between neurons in the network. This effectively emulates the strengthening and weakening of the synaptic connections found in our brains. This strengthening and weakening of the connections is what enables the network to learn.

Learning algorithms are extremely useful when it comes to certain problems that either can't be practically written by a programmer or can be done more efficiently by a learning algorithm. Facial recognition would be an example of a problem extremely hard for a human to accurately convert into code. A problem that could be solved better by a learning algorithm, would be a loan granting application which could use past loan data to classify future loan applications. Although a human could write rules to do this, a learning algorithm can better pick up on subtleties in the data which may be hard to code for.

## Learning Types

There are many different algorithms that can be used when training artificial neural networks, each with their own separate advantages and disadvantages. The learning process within artificial neural networks is a result of altering the network's weights, with some kind of learning algorithm. The objective is to find a set of weight matrices which when applied to the network should - hopefully - map any input to a correct output. In this tutorial, the learning type we will be focusing on is supervised learning. But before we begin, lets take a quick look at the three major learning paradigms.

• Supervised Learning
The learning algorithm would fall under this category if the desired output for the network is also provided with the input while training the network. By providing the neural network with both an input and output pair it is possible to calculate an error based on it's target output and actual output. It can then use that error to make corrections to the network by updating it's weights.
• Unsupervised Learning
In this paradigm the neural network is only given a set of inputs and it's the neural network's responsibility to find some kind of pattern within the inputs provided without any external aid. This type of learning paradigm is often used in data mining and is also used by many recommendation algorithms due to their ability to predict a user's preferences based on the preferences of other similar users it has grouped together.
• Reinforcement Learning
Reinforcement learning is similar to supervised learning in that some feedback is given, however instead of providing a target output a reward is given based on how well the system performed. The aim of reinforcement learning is to maximize the reward the system receives through trial-and-error. This paradigm relates strongly with how learning works in nature, for example an animal might remember the actions it's previously taken which helped it to find food (the reward).

## Implementing Supervised Learning

As mentioned earlier, supervised learning is a technique that uses a set of input-output pairs to train the network. The idea to provide the network with examples of inputs and outputs then to let it find a function that can correctly map the data we provided to a correct output. If the network has been trained with a good range of training data when the network has finished learning we should even be able to give it a new, unseen input and the network should be able to map it correctly to an output.

There are many different supervised learning algorithms we could use but the most popular, and the one we will be looking at in more detail is backpropagation. Before we look at why backpropagation is needed to train multi-layered networks, let's first look at how we can train single-layer networks, or as they're otherwise known, perceptrons.

## The Perceptron Learning rule

The perceptron learning ruleworks by finding out what went wrong in the network and making slight corrections to hopefully prevent the same errors happening again. Here's how it works... First we take the network's actual output and compare it to the target output in our training set. If the network's actual output and target output don't match we know something went wrong and we can update the weights based on the amount of error. Lets run through the algorithm step by step to understand how exactly it works.

First, we need to calculate the perceptron's output for each output node. As you should remember from the previous tutorial we can do this by:

output = f( input1 × weight1 + input2 × weight2 + ... )
- or -
$o = f(\sum\limits_{i=1}^n x_iw_i)$

Now we have the actual output we can compare it to the target output to find the error:

error = target output - output
- or -
E = t - o

Now we want to use the perceptron's error to adjust the weights.

weight change = learning rate × error × input
- or -
Δwi = r E x

We want to ensure only small changes are made to the weights on each iteration, so to do this we apply a small learning rate (r). If the learning rate is too high the perceptron can jump too far and miss the solution, if it's too low, it can take an unreasonably long time to train.

This gives us a final weight update equation of:

weight change = learning rate × (target output - actual output) × input
- or -
Δwi = r ( t - o ) xi

Here's an example of how this would work with the AND function...

Learning rate = 0.1
Expected output = 1
Actual output =  0
Error = 1

Weight Update:
wi = r E x + wi
w1 = 0.1 x 1 x 1 + w1
w2 = 0.1 x 1 x 1 + w2

New Weights:
w1 = 0.4
w2 = 0.4

Learning rate = 0.1
Expected output = 1
Actual output =  0
Error = 1

Weight Update:
wi = r E x + wi
w1 = 0.1 x 1 x 1 + w1
w2 = 0.1 x 1 x 1 + w2

New Weights:
w1 = 0.5
w2 = 0.5

Learning rate = 0.1
Expected output = 1
Actual output =  1
Error = 0

No error,
training complete.

## Implementing The Perceptron Learning Rule

To help fully understand what's happening let's implement a basic example in Java.

First, we initiate our network's threshold, learning rate and weights. We could initiate the weights with a small random starting weight, however for simplicity here we'll just set them to 0.

double threshold = 1;
double learningRate = 0.1;
double[] weights = {0.0, 0.0};

Next, we need to create our training data to train our perceptron. In this example our perceptron will be learning the AND function.

// AND function Training data
int[][][] trainingData = {
{{0, 0}, {0}},
{{0, 1}, {0}},
{{1, 0}, {0}},
{{1, 1}, {1}},
};

Now, we need to create a loop that we can break from later if our network completes a cycle of the training data without any errors. Then, we need a second loop that will iterate over each input in the training data.

// Start training loop
while(true){
int errorCount = 0;
// Loop over training data
for(int i=0; i < trainingData.length; i++){
System.out.println("Starting weights: " + Arrays.toString(weights));
}

From here we can calculate the weighted sum of the inputs and get the output.

// Calculate weighted sum of inputs
double weightedSum = 0;
for(int ii=0; ii < trainingData[i][0].length; ii++) {
weightedSum += trainingData[i][0][ii] * weights[ii];
}

// Calculate output
int output = 0;
if(threshold <= weightedSum){
output = 1;
}

System.out.println("Target output: " + trainingData[i][1][0] + ", "
+ "Actual Output: " + output);

The next step is to calculate the error and adjust the weights...

// Calculate error
int error = trainingData[i][1][0] - output;

// Increase error count for incorrect output
if(error != 0){
errorCount++;
}

// Update weights
for(int ii=0; ii < trainingData[i][0].length; ii++) {
weights[ii] += learningRate * error * trainingData[i][0][ii];
}

System.out.println("New weights: " + Arrays.toString(weights));
System.out.println();

Finally, break if a solution is found and close loop

// If there are no errors, stop
if(errorCount == 0){
System.out.println("Final weights: " + Arrays.toString(weights));
System.exit(0);
}
}

And if we put it all together...

package perceptron;
import java.util.Arrays;

public class PerceptronLearningRule {
public static void main(String args[]){
double threshold = 1;
double learningRate = 0.1;
// Init weights
double[] weights = {0.0, 0.0};

// AND function Training data
int[][][] trainingData = {
{{0, 0}, {0}},
{{0, 1}, {0}},
{{1, 0}, {0}},
{{1, 1}, {1}},
};

// Start training loop
while(true){
int errorCount = 0;
// Loop over training data
for(int i=0; i < trainingData.length; i++){
System.out.println("Starting weights: " + Arrays.toString(weights));
// Calculate weighted input
double weightedSum = 0;
for(int ii=0; ii < trainingData[i][0].length; ii++) {
weightedSum += trainingData[i][0][ii] * weights[ii];
}

// Calculate output
int output = 0;
if(threshold <= weightedSum){
output = 1;
}

System.out.println("Target output: " + trainingData[i][1][0] + ", "
+ "Actual Output: " + output);

// Calculate error
int error = trainingData[i][1][0] - output;

// Increase error count for incorrect output
if(error != 0){
errorCount++;
}

// Update weights
for(int ii=0; ii < trainingData[i][0].length; ii++) {
weights[ii] += learningRate * error * trainingData[i][0][ii];
}

System.out.println("New weights: " + Arrays.toString(weights));
System.out.println();
}

// If there are no errors, stop
if(errorCount == 0){
System.out.println("Final weights: " + Arrays.toString(weights));
System.exit(0);
}
}
}
}
Perceptron learning rule source

## Bias units

In our last example we set our threshold to 1, this means our weighted input needs to equal or exceed 1 to give us an output of 1. This is okay when learning the AND function because we know we only need an output when both inputs will be set, allowing (with the correct weights) for the threshold to be reached or exceeded. In the case of the NOR function however, the network should only output 1 if both inputs are off. This means if we have a threshold of 1 there isn't a combination of weights that will ever make the following true,
x1 = 0
x2 = 0
1 <= x1w1 + x2w2

There's a simple fix to this though, a bias unit. A bias unit is simply a neuron with a constant output, typically of 1. Bias units are weighted just like other units in the network, the only difference is that they will always output 1 regardless of the input from the previous layer, this is where they get their name! So why are they important? Bias inputs effectively allow the neuron to learn a threshold value. Consider our previous equation, with a bias input (x0) added we can change it to,
x0 = 1
x1 = 0
x2 = 0
1 <= x0w0 + x1w1 + x2w2

Now we can satisfy that equation. You can try this yourself by updating your perceptron training set to train for the NOR function. Just add a bias input to the training data and also an additional weight for the new bias input. Here is the updated code:

// Init weights
double[] weights = {0.0, 0.0, 0.0};

// NOR function training data
int[][][] trainingData = {
{{1, 0, 0}, {1}},
{{1, 0, 1}, {0}},
{{1, 1, 0}, {0}},
{{1, 1, 1}, {0}},
};

## To be continued...

Hopefully you should now have a clearer understanding about the types of learning we can apply to neural networks and the process in which a simple, single layer perceptrons can be trained. In the next tutorial we will be learning how to implement the back propagation algorithm and why it's needed when working with multi-layer networks.

5th December 2013 at 7:42

## Introduction to Artificial Neural Networks - Part 1

This is the first part of a three part introductory tutorial on artificial neural networks. In this first tutorial we will discover what neural networks are, why they're useful for solving certain types of tasks and finally how they work.

## Introduction

Computers are great at solving algorithmic and math problems, but often the world can't easily be defined with a mathematical algorithm. Facial recognition and language processing are a couple of examples of problems that can't easily be quantified into an algorithm, however these tasks are trivial to humans. The key to Artificial Neural Networks is that their design enables them to process information in a similar way to our own biological brains, by drawing inspiration from how our own nervous system functions. This makes them useful tools for solving problems like facial recognition, which our biological brains can do easily.

## How do they work?

First lets take a look at what a biological neuron looks like.

Our brains use extremely large interconnected networks of neurons to process information and model the world we live in. Electrical inputs are passed through this network of neurons which result in an output being produced. In the case of a biological brain this could result in contracting a muscle or signaling your sweat glands to produce sweat. A neuron collects inputs using a structure called dendrites, the neuron effectively sums all of these inputs from the dendrites and if the resulting value is greater than it's firing threshold, the neuron fires. When the neuron fires it sends an electrical impulse through the neuron's axon to it's boutons. These boutons can then be networked to thousands of other neurons via connections called synapses. There are about one hundred billion (100,000,000,000) neurons inside the human brain each with about one thousand synaptic connections. It's effectively the way in which these synapses are wired that give our brains the ability to process information the way they do.

## Modeling Artificial Neurons

Artificial neuron models are at their core simplified models based on biological neurons. This allows them to capture the essence of how a biological neuron functions. We usually refer to these artificial neurons as 'perceptrons'. So now lets take a look at what a perceptron looks like.

As shown in the diagram above a typical perceptron will have many inputs and these inputs are all individually weighted. The perceptron weights can either amplify or deamplify the original input signal. For example, if the input is 1 and the input's weight is 0.2 the input will be decreased to 0.2. These weighted signals are then added together and passed into the activation function. The activation function is used to convert the input into a more useful output. There are many different types of activation function but one of the simplest would be step function. A step function will typically output a 1 if the input is higher than a certain threshold, otherwise it's output will be 0.

Here's an example of how this might work:
Input 1 (x1)  = 0.6
Input 2 (x2)  = 1.0

Weight 1 (w1) = 0.5
Weight 2 (w2) = 0.8

Threshold = 1.0

First we multiple the inputs by their weights and sum them:
x1w1 + x2w2 = (0.6 x 0.5) + (1 x 0.8) = 1.1

Now we compare our input total to the perceptron's activation threshold. In this example the total input (1.1) is higher than the activation threshold (1.0) so the neuron would fire.

## Implementing Artificial Neural Networks

So now you're probably wondering what an artificial neural network looks like and how it uses these artificial neurons to process information. In this tutorial we're going to be looking at feedforward networks and how their design links our perceptron together creating a functioning artificial neural network. Before we begin lets take a look at what a basic feedforward network looks like:

Each input from the input layer is fed up to each node in the hidden layer, and from there to each node on the output layer. We should note that there can be any number of nodes per layer and there are usually multiple hidden layers to pass through before ultimately reaching the output layer. Choosing the right number of nodes and layers is important later on when optimising the neural network to work well a given problem. As you can probably tell from the diagram, it's called a feedforward network because of how the signals are passed through the layers of the neural network in a single direction. These aren't the only type of neural network though. There are also feedback networks where its architecture allows signals to travel in both directions.

## Linear separability

To explain why we usually require a hidden layer to solve our problem, take a look at the following examples:

Notice how the OR function can be separated on the graph with a single straight line, this means the function is “linearly separable” and can be modelled within our neural network without implementing a hidden layer, for example, the OR function can be modeled with a single perceptron like this:

However to model the XOR function we need to use an extra layer:

We call this type of neural network a 'multi layer perceptron'. In almost every case you should only ever need to use one or two hidden layers, however it make take more experimentation to find the optimal amount of nodes for the hidden layer(s).

## To be continued...

So now you should have a basic understanding of some of the typical applications for neural networks and why we use them for these purposes. You should also have a rough understanding of how a basic neural network operates and how it can process data. In the next tutorial we will be looking at ways to construct a neural network and then how we can 'train' it to do the things we want it to do.

Part 2 is now available here, Introduction to Artificial Neural Networks Part 2 - Learning

11th April 2013 at 15:17

## Simulated Annealing for beginners

Finding an optimal solution for certain optimisation problems can be an incredibly difficult task, often practically impossible. This is because when a problem gets sufficiently large we need to search through an enormous number of possible solutions to find the optimal one. Even with modern computing power there are still often too many possible solutions to consider. In this case because we can't realistically expect to find the optimal one within a sensible length of time, we have to settle for something that's close enough.

An example optimisation problem which usually has a large number of possible solutions would be the traveling salesman problem. In order to find a solution to a problem such as the traveling salesman problem we need to use an algorithm that's able to find a good enough solution in a reasonable amount of time. In a previous tutorial we looked at how we could do this with genetic algorithms, and although genetic algorithms are one way we can find a 'good-enough' solution to the traveling salesman problem, there are other simpler algorithms we can implement that will also find us a close to optimal solution. In this tutorial the algorithm we will be using is, 'simulated annealing'.

If you're not familiar with the traveling salesman problem it might be worth taking a look at my previous tutorial before continuing.

## What is Simulated Annealing?

First, let's look at how simulated annealing works, and why it's good at finding solutions to the traveling salesman problem in particular. The simulated annealing algorithm was originally inspired from the process of annealing in metal work. Annealing involves heating and cooling a material to alter its physical properties due to the changes in its internal structure. As the metal cools its new structure becomes fixed, consequently causing the metal to retain its newly obtained properties. In simulated annealing we keep a temperature variable to simulate this heating process. We initially set it high and then allow it to slowly 'cool' as the algorithm runs. While this temperature variable is high the algorithm will be allowed, with more frequency, to accept solutions that are worse than our current solution. This gives the algorithm the ability to jump out of any local optimums it finds itself in early on in execution. As the temperature is reduced so is the chance of accepting worse solutions, therefore allowing the algorithm to gradually focus in on a area of the search space in which hopefully, a close to optimum solution can be found. This gradual 'cooling' process is what makes the simulated annealing algorithm remarkably effective at finding a close to optimum solution when dealing with large problems which contain numerous local optimums. The nature of the traveling salesman problem makes it a perfect example.

You may be wondering if there is any real advantage to implementing simulated annealing over something like a simple hill climber. Although hill climbers can be surprisingly effective at finding a good solution, they also have a tendency to get stuck in local optimums. As we previously determined, the simulated annealing algorithm is excellent at avoiding this problem and is much better on average at finding an approximate global optimum.

To help better understand let's quickly take a look at why a basic hill climbing algorithm is so prone to getting caught in local optimums.

A hill climber algorithm will simply accept neighbour solutions that are better than the current solution. When the hill climber can't find any better neighbours, it stops.

In the example above we start our hill climber off at the red arrow and it works its way up the hill until it reaches a point where it can't climb any higher without first descending. In this example we can clearly see that it's stuck in a local optimum. If this were a real world problem we wouldn't know how the search space looks so unfortunately we wouldn't be able to tell whether this solution is anywhere close to a global optimum.

Simulated annealing works slightly differently than this and will occasionally accept worse solutions. This characteristic of simulated annealing helps it to jump out of any local optimums it might have otherwise got stuck in.

## Acceptance Function

Let's take a look at how the algorithm decides which solutions to accept so we can better understand how its able to avoid these local optimums.

First we check if the neighbour solution is better than our current solution. If it is, we accept it unconditionally. If however, the neighbour solution isn't better we need to consider a couple of factors. Firstly, how much worse the neighbour solution is; and secondly, how high the current 'temperature' of our system is. At high temperatures the system is more likely accept solutions that are worse.

The math for this is pretty simple:
exp( (solutionEnergy - neighbourEnergy) / temperature )

Basically, the smaller the change in energy (the quality of the solution), and the higher the temperature, the more likely it is for the algorithm to accept the solution.

## Algorithm Overview

So how does the algorithm look? Well, in its most basic implementation it's pretty simple.
• First we need set the initial temperature and create a random initial solution.
• Then we begin looping until our stop condition is met. Usually either the system has sufficiently cooled, or a good-enough solution has been found.
• From here we select a neighbour by making a small change to our current solution.
• We then decide whether to move to that neighbour solution.
• Finally, we decrease the temperature and continue looping

## Temperature Initialisation

For better optimisation, when initialising the temperature variable we should select a temperature that will initially allow for practically any move against the current solution. This gives the algorithm the ability to better explore the entire search space before cooling and settling in a more focused region.

## Example Code

Now let's use what we know to create a basic simulated annealing algorithm, and then apply it to the traveling salesman problem below. We're going to use Java in this tutorial, but the logic should hopefully be simple enough to copy to any language of your choice.

First we need to create a City class that can be used to model the different destinations of our traveling salesman.

City.java
/*
* City.java
* Models a city
*/

package sa;

public class City {
int x;
int y;

// Constructs a randomly placed city
public City(){
this.x = (int)(Math.random()*200);
this.y = (int)(Math.random()*200);
}

// Constructs a city at chosen x, y location
public City(int x, int y){
this.x = x;
this.y = y;
}

// Gets city's x coordinate
public int getX(){
return this.x;
}

// Gets city's y coordinate
public int getY(){
return this.y;
}

// Gets the distance to given city
public double distanceTo(City city){
int xDistance = Math.abs(getX() - city.getX());
int yDistance = Math.abs(getY() - city.getY());
double distance = Math.sqrt( (xDistance*xDistance) + (yDistance*yDistance) );

return distance;
}

@Override
public String toString(){
return getX()+", "+getY();
}
}

Next let's create a class that can keep track of the cities.

TourManager.java
/*
* TourManager.java
* Holds the cities of a tour
*/

package sa;

import java.util.ArrayList;

public class TourManager {

// Holds our cities
private static ArrayList destinationCities = new ArrayList<City>();

public static void addCity(City city) {
}

// Get a city
public static City getCity(int index){
return (City)destinationCities.get(index);
}

// Get the number of destination cities
public static int numberOfCities(){
return destinationCities.size();
}

}

Now to create the class that can model a traveling salesman tour.

Tour.java
/*
* Tour.java
* Stores a candidate tour through all cities
*/

package sa;

import java.util.ArrayList;
import java.util.Collections;

public class Tour{

// Holds our tour of cities
private ArrayList tour = new ArrayList<City>();
// Cache
private int distance = 0;

// Constructs a blank tour
public Tour(){
for (int i = 0; i < TourManager.numberOfCities(); i++) {
}
}

// Constructs a tour from another tour
public Tour(ArrayList tour){
this.tour = (ArrayList) tour.clone();
}

// Returns tour information
public ArrayList getTour(){
}

// Creates a random individual
public void generateIndividual() {
// Loop through all our destination cities and add them to our tour
for (int cityIndex = 0; cityIndex < TourManager.numberOfCities(); cityIndex++) {
setCity(cityIndex, TourManager.getCity(cityIndex));
}
// Randomly reorder the tour
Collections.shuffle(tour);
}

// Gets a city from the tour
public City getCity(int tourPosition) {
return (City)tour.get(tourPosition);
}

// Sets a city in a certain position within a tour
public void setCity(int tourPosition, City city) {
tour.set(tourPosition, city);
// If the tours been altered we need to reset the fitness and distance
distance = 0;
}

// Gets the total distance of the tour
public int getDistance(){
if (distance == 0) {
int tourDistance = 0;
// Loop through our tour's cities
for (int cityIndex=0; cityIndex < tourSize(); cityIndex++) {
// Get city we're traveling from
City fromCity = getCity(cityIndex);
// City we're traveling to
City destinationCity;
// Check we're not on our tour's last city, if we are set our
// tour's final destination city to our starting city
if(cityIndex+1 < tourSize()){
destinationCity = getCity(cityIndex+1);
}
else{
destinationCity = getCity(0);
}
// Get the distance between the two cities
tourDistance += fromCity.distanceTo(destinationCity);
}
distance = tourDistance;
}
return distance;
}

// Get number of cities on our tour
public int tourSize() {
}

@Override
public String toString() {
String geneString = "|";
for (int i = 0; i < tourSize(); i++) {
geneString += getCity(i)+"|";
}
return geneString;
}
}

Finally, let's create our simulated annealing algorithm.

SimulatedAnnealing.java
package sa;

public class SimulatedAnnealing {

// Calculate the acceptance probability
public static double acceptanceProbability(int energy, int newEnergy, double temperature) {
// If the new solution is better, accept it
if (newEnergy < energy) {
return 1.0;
}
// If the new solution is worse, calculate an acceptance probability
return Math.exp((energy - newEnergy) / temperature);
}

public static void main(String[] args) {
// Create and add our cities
City city = new City(60, 200);
City city2 = new City(180, 200);
City city3 = new City(80, 180);
City city4 = new City(140, 180);
City city5 = new City(20, 160);
City city6 = new City(100, 160);
City city7 = new City(200, 160);
City city8 = new City(140, 140);
City city9 = new City(40, 120);
City city10 = new City(100, 120);
City city11 = new City(180, 100);
City city12 = new City(60, 80);
City city13 = new City(120, 80);
City city14 = new City(180, 60);
City city15 = new City(20, 40);
City city16 = new City(100, 40);
City city17 = new City(200, 40);
City city18 = new City(20, 20);
City city19 = new City(60, 20);
City city20 = new City(160, 20);

// Set initial temp
double temp = 10000;

// Cooling rate
double coolingRate = 0.003;

// Initialize intial solution
Tour currentSolution = new Tour();
currentSolution.generateIndividual();

System.out.println("Initial solution distance: " + currentSolution.getDistance());

// Set as current best
Tour best = new Tour(currentSolution.getTour());

// Loop until system has cooled
while (temp > 1) {
// Create new neighbour tour
Tour newSolution = new Tour(currentSolution.getTour());

// Get a random positions in the tour
int tourPos1 = (int) (newSolution.tourSize() * Math.random());
int tourPos2 = (int) (newSolution.tourSize() * Math.random());

// Get the cities at selected positions in the tour
City citySwap1 = newSolution.getCity(tourPos1);
City citySwap2 = newSolution.getCity(tourPos2);

// Swap them
newSolution.setCity(tourPos2, citySwap1);
newSolution.setCity(tourPos1, citySwap2);

// Get energy of solutions
int currentEnergy = currentSolution.getDistance();
int neighbourEnergy = newSolution.getDistance();

// Decide if we should accept the neighbour
if (acceptanceProbability(currentEnergy, neighbourEnergy, temp) > Math.random()) {
currentSolution = new Tour(newSolution.getTour());
}

// Keep track of the best solution found
if (currentSolution.getDistance() < best.getDistance()) {
best = new Tour(currentSolution.getTour());
}

// Cool system
temp *= 1-coolingRate;
}

System.out.println("Final solution distance: " + best.getDistance());
System.out.println("Tour: " + best);
}
}

Output
Initial solution distance: 1966
Final solution distance: 911
Tour: |180, 200|200, 160|140, 140|180, 100|180, 60|200, 40|160, 20|120, 80|100, 40|60, 20|20, 20|20, 40|60, 80|100, 120|40, 120|20, 160|60, 200|80, 180|100, 160|140, 180|

## Conclusion

In this example we were able to more than half the distance of our initial randomly generated route. This hopefully goes to show how handy this relatively simple algorithm is when applied to certain types of optimisation problems.