How to Build a Genetic Algorithm from Scratch in Python

Author:Murphy  |  View: 26305  |  Time: 2025-03-23 11:37:00

Introduction

The beauty of Genetic Algorithms is that they are directly inspired by nature, more specifically, the process of natural selection:

Natural selection, process that results in the adaptation of an organism to its environment by means of selectively reproducing changes in its genotype, or genetic constitution [1].

More broadly speaking, natural selection is one of the primary mechanisms of evolution. I am fascinated by this mechanism, which has changed the natural world over millions of years. On an even more exciting note, we can leverage the power of natural selection via Genetic Algorithms.

A Genetic Algorithm simulates natural selection in a non-natural environment, typically resembling a business resource optimization. Still, it is certainly not limited to these types of use cases.

In this article, I will show the reader how to build their own Genetic Algorithm with Python and apply it to a real-world use case.

Why use a Genetic Algorithm?

Have you ever heard of the knapsack problem? It is a hypothetical scenario in which an individual has a knapsack with limited space and needs to fill it with the most valuable items possible, provided they stay within the limited space. Now, one could fill the knapsack with items at random and calculate each iteration to find the most valuable combination; however, this could take a substantial amount of time. A Genetic Algorithm would allow you to converge on the solution much faster.

Now imagine instead of filling up a knapsack, you have to properly allocate millions of dollars worth of resources for a particular project, optimize the inventory for different stages of an extensive supply chain, or, in the example we will discuss later, optimize the merchandise placement in a large retail store. The possible number of solutions to these business problems can approach numbers that none of us could imagine. Yet, a Genetic Algorithm can help develop the optimal solution.

What makes up a Genetic Algorithm?

Before coding, it is vital to review the various components of a Genetic Algorithm. Gaining a solid understanding of these makes coding them seamless.

As mentioned previously, a Genetic Algorithm is inspired by natural selection. Natural Selection involves several individuals (a population) interacting and living within a particular environment. Over time, individuals with traits better suited for their specific environment will tend to last from generation to generation, while individuals with less suitable characteristics will die out.

Traits are passed down from generation to generation by a process known as crossover, where two individuals pair up and produce offspring with characteristics similar to their parents. Occasionally, a mutation occurs, and the offspring develop a trait not inherited by their parent. Note that this type of occurrence is rare in the natural world.

In nature, these characteristics are encoded in something known as chromosomes. For a more formal definition, look at the below:

Chromosomes are threadlike structures made of protein and a single molecule of DNA that serve to carry the genomic information from cell to cell. In plants and animals (including humans), chromosomes reside in the nucleus of cells. Humans have 22 pairs of numbered chromosomes (autosomes) and one pair of sex chromosomes (XX or XY), for a total of 46. Each pair contains two chromosomes, one coming from each parent, which means that children inherit half of their chromosomes from their mother and half from their father. Chromosomes can be seen through a microscope when the nucleus dissolves during cell division [2].

A chromosome in a Genetic Algorithm will look like a list, with each position representing a trait. If the individual with that chromosome has that trait, the position in that list will have a binary indicator showing it has said trait (usually represented with 1s and 0s).

So, how do we ensure that more preferential traits are passed on from generation to generation? How do we ensure the most optimal combination of traits, let alone individual characteristics, is passed on? We can do this by applying what is known as the fitness function to the chromosome. The fitness function is the key to the success of a Genetic Algorithm. It is the survival metric applied to each individual in the population and subsequent generations. One can leverage the fitness function to ensure individuals with higher scores are more likely to have their traits passed down to the next generation.

Genetic Algorithms – Visualized

Visualizing the Genetic Algorithm flow makes it much easier to understand. Let's have fun with a hypothetical population inspired by my favorite gaming series, Fallout. The image below represents a population of individuals who are the main characters in various Fallout games. In the Fallout games, players can customize their attributes to start the game. I have used this inspiration to build chromosomes for each of the characters. Finally, I have also added the Fitness Function scores for each individual. Note that in this hypothetical example, I am not doing any practical calculations to get the fitness function; this is just for visualization purposes (we'll have a much more comprehensive view of a fitness function score in the real-world example).

Image provided by the author

The above was the first generation; next, let's visualize what happens at the crossover stage. Let's assume that the Lone Wanderer and The Chosen One pair up and have two children. Each of the children will inherit traits from their parents. We will designate how this occurs by randomly selecting a cutoff point among the parents' chromosomes. To ensure that each of the children has their own unique set of chromosomes, the first child will inherit Parent 1's traits to the left of the cutoff point and Parent 2's to the right. As for the second child, the opposite will occur. Child 2 also had a mutation occur where it happened to gain an additional attribute. Finally, we calculate the fitness function scores for both of the children. Because the world of Fallout is brutal, Child 1 was the only offspring to survive because they had a higher fitness function score and, thus, were more adaptable to living in the world of Fallout than their sibling.

Image provided by the author

Real World Use Case – Merchandise Optimization

Image generated by DALL-E

Let's put the Genetic Algorithm to the test and use it to solve a real-world business problem – Merchandise Optimization. Next time you walk into a big-box store like Target or Walmart, have you ever considered how much work, planning, and strategy go into placing products on a shelf? Some of these stores have hundreds of aisles, with racks spanning down the length of the aisle that have multiple shelves, with each shelf having hundreds of products in some cases. Think of how many possible combinations this can add up to.

The Scenario

You are a data scientist for a retail company. The company just opened a micro store and has 50 products they would like to stock. Space is limited, so given that each product occupies a certain amount of space in addition to an estimated profit, you are tasked with finding the most profitable set of products to stock, given the space constraints.

Solution – Create a Genetic Algorithm

Knowing the countless solutions to this problem, you elect to utilize a genetic algorithm. Below is a list of the available products that can be stocked. Take note that our micro store only has 100 units of shelf space.

retail_store_items = {
    "Toothpaste": {
        "shelf_space": 1,
        "expected_profit": 2.50
    },
    "Shampoo": {
        "shelf_space": 3,
        "expected_profit": 3.75
    },
    "Conditioner": {
        "shelf_space": 3,
        "expected_profit": 3.50
    },
    "Body Wash": {
        "shelf_space": 3,
        "expected_profit": 3.00
    },
    "Deodorant": {
        "shelf_space": 1,
        "expected_profit": 2.20
    },
    "Lotion": {
        "shelf_space": 3,
        "expected_profit": 3.00
    },
    "Shaving Cream": {
        "shelf_space": 2,
        "expected_profit": 2.50
    },
    "Razor Blades": {
        "shelf_space": 1,
        "expected_profit": 4.50
    },
    "Mouthwash": {
        "shelf_space": 2,
        "expected_profit": 2.00
    },
    "Hand Soap": {
        "shelf_space": 1,
        "expected_profit": 1.50
    },
    "Facial Tissue": {
        "shelf_space": 2,
        "expected_profit": 1.20
    },
    "Paper Towels": {
        "shelf_space": 5,
        "expected_profit": 3.00
    },
    "Toilet Paper": {
        "shelf_space": 5,
        "expected_profit": 2.80
    },
    "Dish Soap": {
        "shelf_space": 3,
        "expected_profit": 2.75
    },
    "Laundry Detergent": {
        "shelf_space": 6,
        "expected_profit": 4.50
    },
    "Fabric Softener": {
        "shelf_space": 3,
        "expected_profit": 3.50
    },
    "Bleach": {
        "shelf_space": 4,
        "expected_profit": 2.00
    },
    "All-Purpose Cleaner": {
        "shelf_space": 3,
        "expected_profit": 2.50
    },
    "Glass Cleaner": {
        "shelf_space": 2,
        "expected_profit": 2.00
    },
    "Furniture Polish": {
        "shelf_space": 2,
        "expected_profit": 2.50
    },
    "Trash Bags": {
        "shelf_space": 4,
        "expected_profit": 3.00
    },
    "Light Bulbs": {
        "shelf_space": 2,
        "expected_profit": 1.50
    },
    "Batteries": {
        "shelf_space": 1,
        "expected_profit": 2.00
    },
    "Extension Cords": {
        "shelf_space": 3,
        "expected_profit": 4.00
    },
    "Pet Food": {
        "shelf_space": 6,
        "expected_profit": 3.50
    },
    "Cat Litter": {
        "shelf_space": 5,
        "expected_profit": 4.00
    },
    "Dog Leash": {
        "shelf_space": 2,
        "expected_profit": 3.00
    },
    "Pet Shampoo": {
        "shelf_space": 2,
        "expected_profit": 2.50
    },
    "Pet Toys": {
        "shelf_space": 2,
        "expected_profit": 1.50
    },
    "Shower Curtain": {
        "shelf_space": 4,
        "expected_profit": 3.00
    },
    "Bath Towels": {
        "shelf_space": 4,
        "expected_profit": 4.00
    },
    "Bed Sheets": {
        "shelf_space": 6,
        "expected_profit": 8.00
    },
    "Pillows": {
        "shelf_space": 5,
        "expected_profit": 5.00
    },
    "Blankets": {
        "shelf_space": 6,
        "expected_profit": 6.00
    },
    "Curtains": {
        "shelf_space": 5,
        "expected_profit": 7.00
    },
    "Wall Clocks": {
        "shelf_space": 3,
        "expected_profit": 4.50
    },
    "Picture Frames": {
        "shelf_space": 2,
        "expected_profit": 2.00
    },
    "Candles": {
        "shelf_space": 2,
        "expected_profit": 2.50
    },
    "Vases": {
        "shelf_space": 3,
        "expected_profit": 3.00
    },
    "Decorative Pillows": {
        "shelf_space": 4,
        "expected_profit": 3.50
    },
    "Alarm Clocks": {
        "shelf_space": 2,
        "expected_profit": 2.00
    },
    "Phone Chargers": {
        "shelf_space": 1,
        "expected_profit": 3.00
    },
    "USB Cables": {
        "shelf_space": 1,
        "expected_profit": 1.50
    },
    "Headphones": {
        "shelf_space": 2,
        "expected_profit": 5.00
    },
    "Laptop Sleeves": {
        "shelf_space": 2,
        "expected_profit": 4.00
    },
    "Notebooks": {
        "shelf_space": 2,
        "expected_profit": 1.50
    },
    "Pens": {
        "shelf_space": 1,
        "expected_profit": 1.00
    },
    "Sticky Notes": {
        "shelf_space": 1,
        "expected_profit": 1.00
    },
    "Folders": {
        "shelf_space": 2,
        "expected_profit": 1.50
    },
    "Scissors": {
        "shelf_space": 1,
        "expected_profit": 2.00
    }
}

Object Oriented Programming

To execute this solution, we will build two classes in Python: one that represents a combination of the available products we can choose from and one for our genetic algorithm to help us converge on the optimal combination of products to stock in the store.

Libraries

To keep in the spirit of building this algorithm from scratch, I will utilize very few libraries, meaning I will have to construct my original classes and methods to bring this algorithm to life. Below are the only two libraries this code will utilize:

import random
from tqdm import tqdm

Individual Class

As I just mentioned, our individual class will represent a combination of products from the _retail_storeitems dictionary. This class will have the following attributes:

  • merch: a nested dictionary of the available items we can choose from.
  • _max_shelfspace: the maximum amount of shelf space our selection of items can occupy.
  • chromosome: a list of 1s and 0s that will represent whether or not a product from merch is included. Each chromosome will have the same length of merch. The 1s and 0s will be generated at random.
  • _occupiedspace: the amount of space occupied by the selected subset of merch. This will be calculated using a fitness function.
  • _total_expectedprofit: the total amount of expected profit by the selected subset of merch. This will be calculated using a fitness function.
  • generation: This will be a placeholder for an individual's generation once this class is incorporated into the genetic algorithm.
class Individual():
  def __init__(self,merch,max_shelf_space):
    self.merch = merch
    self.max_shelf_space = max_shelf_space
    self.chromosome = [random.choice([0, 1]) for _ in range(len(merch))]
    self.occupied_space = 0
    self.total_expected_profit = 0
    self.generation = 0

The Fitness method

Our fitness function will calculate two attributes for us: the occupied space and the total expected profit. To execute this, we will loop through the chromosome, use each 1 or 0 as a multiplier against each item's occupied space and expected profit, and add them to the attributes we mentioned. Note that _total_expectedprofit will automatically be 0 if our subset of products exceeds _max_shelfspace. Remember, this will be the heart of our genetic algorithm; it is the driving force that will determine which combinations of products will permeate through generations to converge on the best solution.

def fitness(self):
  for (item, values), multiplier in zip(self.merch.items(),self.chromosome):  
    self.occupied_space += values['shelf_space'] * multiplier
    self.total_expected_profit += values['expected_profit'] * multiplier
    if self.occupied_space > self.max_shelf_space:
       self.total_expected_profit = 0
       break

The Crossover method

Next, we will build our crossover function. This function will take two separate instances of an individual class, randomly generate two cutoff points for each instance's chromosome, and use it to create four new individual classes, essentially the children of these two individuals. Earlier, we discussed how the crossover would make two children. I'm building this function to ensure we have four because when we develop our genetic algorithm, I want a more comprehensive selection of child instances to permeate the next generation. Also, note how our generation attribute for each child would increase by 1 when this function is called.

def crossover(self,partner):
    cutoff_1 = random.randint(1,len(self.chromosome))
    cutoff_2 = random.randint(1,len(self.chromosome))

    child1 = Individual(self.merch,self.max_shelf_space)
    child2 = Individual(self.merch,self.max_shelf_space)
    child3 = Individual(self.merch,self.max_shelf_space)
    child4 = Individual(self.merch,self.max_shelf_space)

    child1.chromosome = partner.chromosome[:cutoff_1] + self.chromosome[cutoff_1:]
    child2.chromosome = self.chromosome[:cutoff_1] + partner.chromosome[cutoff_1:]
    child3.chromosome = partner.chromosome[:cutoff_2] + self.chromosome[cutoff_2:]
    child4.chromosome = self.chromosome[:cutoff_2] + partner.chromosome[cutoff_2:]

    child1.fitness()
    child2.fitness()
    child3.fitness()
    child4.fitness()

    child1.generation = self.generation + 1
    child2.generation = self.generation + 1
    child3.generation = self.generation + 1
    child4.generation = self.generation + 1

    return child1,child2,child3,child4

The Mutation method

As our algorithm progresses, there is a chance it won't converge appropriately because it didn't explore other combinations besides the ones that have been reinforced by our fitness and crossover function between generations. This is very similar to what happens when a neural network's learning rate is too high. This is where the mutation function comes in. Given a probability, our mutation function will randomly change one of an individual's chromosomes. It's as simple as that. When we code our genetic algorithm class, we will call the mutation function right after the crossover so that each generation's children can have a chance at forming a new attribute.

def mutation(self,rate):
    if random.random() < rate:
      random_position = random.randint(0,len(self.chromosome)-1)
      if self.chromosome[random_position] == 0:
        self.chromosome[random_position] = 1
      else:
        self.chromosome[random_position] = 0
      self.fitness()

View Stock Function

I decided to add this optional function out of curiosity. It lets us quickly see the subset of products for an individual instance.

  def view_shelf(self):
    for (item, values), multiplier in zip(self.merch.items(),self.chromosome):
      if multiplier == 1:
        print(item)

Genetic Algorithm Class

The stage is set to build our Genetic Algorithm class. Ultimately, this class will run the natural selection simulation and store the best-performing individuals as they are identified. This class will have the following attributes:

  • _populationsize: the size of each population within each iteration of the algorithm.
  • _num_ofgenerations: the total number of iterations to run.
  • _mutationrate: the mutation rate to be applied to each child when the crossover function is called in the individual class.
  • _max_shelfspace: identical to the same attribute in the individual class.
  • items: the nested dictionary of available items to choose from.
  • _best_expectedprofit: the placeholder for the best total_expected_profit recorded from all iterations.
  • _bestIndividuals: a list where each subsequent Individual class is added to when it is revealed it beats the latest best_expected_profit.
  • population: a list where each generation's population will be applied to.
class GeneticAlgorithm():
  def __init__(self,population_size,num_of_generations,mutation_rate,max_shelf_space,items):
    self.population_size = population_size
    self.num_of_generations = num_of_generations
    self.mutation_rate = mutation_rate
    self.max_shelf_space = max_shelf_space
    self.items = items
    self.best_expected_profit = 0
    self.best_Individuals = []
    self.population = []

The run_simulation method

Looking at the code below, this method may look complicated, but it becomes straightforward once you break it down. It starts by creating an initial population of Individual objects. From this population, we will check to see if each object's _total_expectedprofit is higher than the _best_expectedprofit attribute.

After the initial population is created, we will shuffle it randomly and apply the crossover function to every group of two. Shuffling at this stage and subsequent populations ensures we exploit as many combinations as possible. For each population, we will take every two individual objects in the population list and run the crossover method on them. Four children are created from this method, the mutation method is applied, and finally, we append to a list that we sort by the _total_expectedprofit. Only the top two children make it to the subsequent population, and check to see if they have a higher _total_expectedprofit than _best_expectedprofit. Ensuring the top two children make it to the next generation is key to ensuring we are continuing to converge on an even better solution.

def run_simulation(self):
    #### Initial Population ####
    for i in range(self.population_size):
      self.population.append(Individual(merch = self.items,max_shelf_space = self.max_shelf_space))
      self.population[i].fitness()
      if self.population[i].total_expected_profit > self.best_expected_profit:
        self.best_expected_profit = self.population[i].total_expected_profit
        self.best_Individuals.append(self.population[i])
      else:
        pass

    #### Subsequent Populations ####
    for g in tqdm(range(self.num_of_generations)):
      ## Shuffle the list randomly
      random.shuffle(self.population)
      new_population = []

      #### Crossover between each pair
      for pair in range(0,len(self.population),2):
        parent_1 = self.population[pair]
        parent_2 = self.population[pair+1]
        child_1,child_2,child_3,child_4 = parent_1.crossover(partner = parent_2)
        child_1.mutation(rate = self.mutation_rate)
        child_2.mutation(rate = self.mutation_rate)
        child_3.mutation(rate = self.mutation_rate)
        child_4.mutation(rate = self.mutation_rate)
        all_children = [child_1,child_2,child_3,child_4]
        all_children.sort(key = lambda x: x.total_expected_profit,reverse = True)
        top_children = all_children[:2]
        if top_children[0].total_expected_profit > self.best_expected_profit:
          self.best_expected_profit = top_children[0].total_expected_profit
          self.best_Individuals.append(top_children[0])
        elif top_children[1].total_expected_profit > self.best_expected_profit:
          self.best_expected_profit = top_children[1].total_expected_profit
          self.best_Individuals.append(top_children[1])
        else:
          pass
        new_population.append(top_children[0])
        new_population.append(top_children[1])
      #### Assign population as the new population
      self.population = new_population

Running the Genetic Algorithm

We can now run our Genetic Algorithm! We should find a solution that makes the most of the available space and maximizes our expected profit. However, how will we know if our solution is a good one? The best solution I came up with is comparing our genetic algorithm to a baseline. To do this, we will create another class called RandomAlgorithm that generates Individual classes at random and checks to see which one has the best _total_expectedprofit. The code for this is below for reference.

class RandomAlgorithm():
  def __init__(self,iterations,max_shelf_space,items):
    self.iterations = iterations
    self.max_shelf_space = max_shelf_space
    self.items = items
    self.best_Individuals = []
    self.best_expected_profit = 0

  def run_simulation(self):
    #### Initial Population ####
    for i in tqdm(range(self.iterations)):
      trial = Individual(merch = self.items,max_shelf_space = self.max_shelf_space)
      trial.fitness()
      if trial.total_expected_profit > self.best_expected_profit:
        self.best_expected_profit = trial.total_expected_profit
        self.best_Individuals.append(trial)
      else:
        pass

Trial Run 1 – 1,000 instances

Now, let's do our first trial run. We will run our Genetic Algorithm through 10 generations, with each population being of size 100, which yields 1,000 total Individual classes created. In addition, we will designate our Random Algorithm to generate 1,000 classes. Note that our _max_shelfspace will be 100. Check out the results below!

Image provided by the author

Our Genetic Algorithm won the first round with the best-expected profit of 118.9 vs. 105.95! Let's see if this gap increases as we increase the iterations. Next up, let's try out 10,000 instances.

Trial Run 2— 10,000 instances

Image provided by the author

Once again, our Genetic Algorithm won! I am curious to see how this result varies if we use a population size of 1,000 with 10 generations instead of 100 and 100 generations. Let's see what happens.

image provided by the author

Our expected profit went down. Would this always be the case? I have some ideas on how to test this, but we will explore that in a future article! For now, let's keep the focus on testing the Genetic Algorithm vs. the random approach. Finally, let's see what happens with 100,000 iterations. Our Genetic Algorithm will have populations of 1,000 and 100 generations.

Trial Run 3–100,000 instances

Image provided by the author

Although our Genetic Algorithm won, the expected profit seems to oscillate between 117 and 125. Luckily, our Genetic Algorithm object saves each instance of the Individual who becomes the new "best Individual." Using this information, we can visualize the generations where a new "best Individual" was revealed and their expected profits. This will help us see where our algorithm converges. I've listed out the code below to plot this using Seaborn.

import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
gens = [GA.best_Individuals[i].generation for i in range(len(GA.best_Individuals))]
profits = [GA.best_Individuals[i].total_expected_profit for i in range(len(GA.best_Individuals))]

# Plot using Seaborn
sns.lineplot(data=profits, label='Expected Profit')

# Adding labels and title
plt.xlabel('Generations')
plt.ylabel('Expected Profit')
plt.title('Expected Profit over Generations')
plt.legend()

# Display the plot
plt.show()
Image provided by the author

Looking at the above trend, the best-expected profit increases exponentially over the first 3 generations and then slowly after that. Note that this instance ran over 100 generations, meaning we had 70 generations without a new best instance. Let's put our algorithm to the test and run it over 1,000 generations, yielding 1,000,000 instances!

Image provided by the author

Our aspirational trial run still didn't show an apparent increase in our expected profit. It seems that the upper limit is around 125! If you are following along, have fun and try even more iterations!


If you enjoyed this story, please follow me here for more stories on Data Science, Python, Machine learning, Finance, A/B Testing & more!


References

  1. "Natural Selection." Encyclopædia Britannica, Encyclopædia Britannica, inc., 12 Apr. 2024, www.britannica.com/science/natural-selection.
  2. Bates, Sarah A. "Chromosome." Genome.Gov, 19 Apr. 2024, www.genome.gov/genetics-glossary/Chromosome.

Tags: Algorithms Data Science Genetic Algorithm Hands On Tutorials Simulation

Comment