C
A
T
E
G
O
R
I
E
S

&

R
E
C
E
N
T

Is it a weasel? Methinks it is.

Posted by tigerhawkvok on August 30, 2009 23:11 in computers , biology , programming , misc science , internet

Since I've wasted most of today on this (oh feature creep. And an extra "-1".), I thought I'd share. Reading around on Pharyngula, I found myself wanting to make a more customizable, even-less-preprogrammed version of Richard Dawkin's WEASEL program. So, I present you with a Python 3.0+ version of Dawkin's "Methinks it is a weasel" program. It will take any nonzero length starting string, has a maximum generational cap for local maxima, with customizable rates for single-bit errors (like SNPs), duplication errors, number of offspring, weighting for approaching the target sequence, and accepted variability, and whether bad mutations are penalized or not.

Sample output. Click for larger / more lines.

The point of the program, as published in The Blind Watchmaker, is to show that with random mutations over time you can expect to see something like order pop out of a random test string. The version I have below takes the genetic modelling a bit further, starting with any string, regardless of length, and duplication / omission errors will trim down to the right solution. In debug mode, there's an additional completely random selection every 500 generations to help pull the program out of local maxima, but it is not in the primary program.

Click the fold to read the source, or download it here

You should be able to copy-paste this into any text editor and save it as a *.py file — but if I have a spacing error and the program doesn't work, just download it from the link above. The file is tab-spaced instead of single-spaced, so it may be easier to read.


# Dawkin's "METHINK IT IS LIKE A WEASEL" program.  Latch free and highly configureable 
# Inspired by http://scienceblogs.com/pharyngula/2009/08/but_why_would_dawkins_want_to.php
# CC-BY-SA_3.0 Philip Kahn 2009

import random, math, time, calendar

#The starting string. In capitals. Random start is on by default.
wstring = "A" 

# Can change to goal alphabet, such as including lower case, or GTAC.
selector = "ABCDEFGHIJKLMNOPQRSTUVWXYZ ,.'-?;!:" 

# Target output.
soln = "METHINKS" #"TO BE, OR NOT TO BE: - THAT IS THE QUESTION: - WHETHER 'TIS NOBLER IN THE MIND TO SUFFER THE SLINGS AND ARROWS OF OUTRAGEOUS FORTUNE, OR TO TAKE ARMS AGAINST A SEA OF TROUBLES, AND BY OPPOSING END THEM? - TO DIE: - TO SLEEP; - NO MORE; AND BY A SLEEP TO SAY WE END THE HEART-ACHE AND THE THOUSAND NATURAL SHOCKS THAT FLESH IS HEIR TO, - 'TIS A CONSUMMATION DEVOUTLY TO BE WISH'D. TO DIE, - TO SLEEP; - TO SLEEP! PERCHANCE TO DREAM: - AYE, THERE'S THE RUB; FOR IN THAT SLEEP OF DEATH WHAT DREAMS MAY COME?"

# Mutation rates
pointrate = .05 #Accepts to .001 percent.
duprate = .01  #Accepts to .001 percent

# Number of offspring the sample population will have, 
# starting from one individual, in the generation time.
children = 100 
generationtime = 6 # Generation time in years. For final output.

# Evolutionary peak stopper. 
#Stops when reaches a local maxima for N generations.
localpeak = 20000 
## Would reccommend 500 000 / generationtime, or
## a half million years, for physical resembelance;
## 250000/children for speed.

# What degree of unoptomized individuals in a generation prosper (score variance). 
var_mult = 1 
## Strongly reccommend > 0.3

#How "valued" are correct characters?
cstringweight = .3
## 1 would be equally valued to everything; 
## 2 (default) values correct characters over expansion of genome. 
## Values of less than 1 selects for expansion more than correctness.

### Switches
randstart = 1 #Sets the start string to something random of random length.
penalty = 1 # If mismatches are penalized. 1 for yes, 0 for no.
debug = 0 #Display debug code. 1 for yes, 0 for no.

################## End Configuration Here ##################

starttime = time.gmtime()

# Random start parameters
if randstart == 1:
    stringlength = random.randrange(1,2*len(soln)) # No more than 2x starting length
    while len(wstring) < stringlength:
        wstring = wstring + selector[random.randrange(0,len(selector))]
    print("Random starting string selected.  Beginning with string: "+wstring)

charweight = (1/len(soln))

var = var_mult * charweight

i = 0 #counter

scoreold = 0
matchold = 0
nimp=0
lgchanges = 0

wstringn = wstring
starts = wstring

while wstring != soln:
    j = 0
    wstringa = []
    while j < children:
        wstringa.append(wstring) #creates a $children number of copies
        # See if there is a bit replication
        wstringn = wstringa[j]
        ratecheck = random.randrange(1,10000)/10000
        if ratecheck < duprate:
            num = math.floor(duprate/ratecheck) #Number of duplications that appear
            slength = len(wstringn)
            if slength !=1: bit = random.randrange(0,slength)
            else: bit = 0 #you can only duplicate bit 0 if there is one bit
            if random.random() > .5: #duplication or deletion?
                if bit !=0: wstringn = wstringn[:bit] + wstringn[bit-num:] #duplicate a random bit
                else: wstringn = wstringn + wstringn #fix for single character
            else: 
                wstringn = wstringn[:bit+1] + wstringn[bit-num:] #delete a random bit
        ratecheck = random.randrange(1,10000)/10000
        if ratecheck < pointrate:
            num = math.floor(pointrate / ratecheck) #number of SNPs (single-bit changes)
            nc = 0
            while nc < num: #loop over it
                slength = len(wstringn)
                bit=random.randrange(0,slength,1)
                letter = selector[random.randrange(0,len(selector),1)]
                while letter == wstringn[bit]: # Don't replace with same letter
                    letter = selector[random.randrange(0,len(selector),1)]
                if debug ==1: 
                    if nimp > 500: print("  Before: "+wstringn+" @ bit "+str(bit)+" with "+letter)
                if bit!=0: wstringn = wstringn[:bit] + letter + wstringn[bit+1:] #replace a random bit with a random letter
                else: wstringn = letter + wstringn[1:]
                nc +=1
            if debug == 1: 
                if nimp > 500: print(wstringn)
        # Mutations are done on this child.  Replace the stored version with this one.
        wstringa[j]=wstringn
        j+=1 # Increment child number and repeat for next child
    # Score the children
    wstringscore=0
    wstringscore = []
    n=0
    matchhigh = 0
    for x in wstringa:
        if len(soln) > len(x): lenweight = len(x)
        else: lenweight = 2*len(soln) - len(x) #for when it goes above the target length
        score = lenweight*charweight #Score for length
        # Check each bit, and score for correctness
        k = 0
        scoremax = 0
        match = 0
        while k < len(x):
            if k < len(soln):
                if x[k] == soln[k]: # Score correct bits up to length of target
                    score = score + charweight*cstringweight
                    match += 1
                elif penalty == 1: score = score - charweight #If penalties are on, penalizes wrong bits
            else: score = score - charweight*cstringweight #penalizes extra bits, longer than target
            k += 1
        if match > matchhigh: # Checks matched bits
            matchhigh = match
            if matchhigh > matchold:
                if debug == 1: print("High match, "+str(matchhigh)+"|"+str(matchold)+": "+x)
                matchold = matchhigh
                if debug == 1: print(score,n)
                if debug == 1: print(wstringscore)
        if debug == 1: 
            if nimp > 500: print(score)
        wstringscore.append(score) # add the score to the list
        n+=1
    # Create a list of all items with max score, within variability tolerances
    m = 0
    higharr=0
    higharr = []
    while m < len(wstringscore):
        if wstringscore[m] > max(wstringscore)-var: # To allow variability
            higharr.append(m)
        m += 1
    index = random.randrange(0,len(higharr)) # Selects random individual from most-fit population
    if wstringscore[higharr[index]] < max(wstringscore): 
        if debug == 1: print("ERROR 1: Not most fit individual selected, or rounding error") # Debugging
        if debug == 1: time.sleep(1) # Debugging
    elif wstringscore[higharr[index]] < scoreold: 
        if debug == 1: print("ERROR 2: Fittest individual is less fit than any in previous generation.")  # Debugging
        if debug == 1: time.sleep(1) # Debugging
    else: 
        improvement = wstringscore[higharr[index]] - scoreold ###Book-keeping for generational cap
        scoreold = wstringscore[higharr[index]]###
        if improvement == 0: 
            if lgchanges != scoreold:
                lgchanges = scoreold
                lastnovel = time.gmtime()
            nimp += 1  ###
        else: nimp = 0 ###
        if debug == 1: print("Improvement: "+str(improvement)) # Debugging
        if improvement < 0: 
            if debug == 1: print("ERROR 3: Fittest regression from peak.") # Debugging
            if debug == 1: time.sleep(1) # Debugging
    winner=higharr[index] # Selects the actual individual
    if debug == 1: print("Winner: "+str(higharr[index]))
    wstring = wstringa[winner] # Stores individual for next loop
    if debug == 1: 
        if i % 500 == 0: wstring = wstringa[random.randrange(0,len(wstringscore))] # Every 500 generations we just pick a random individual
    i+=1
    ## Output formatting, and remainders from debugging
    cnum=higharr[index]
    for good in higharr:
        if good == winner: wstring = wstringa[good]
    if cnum < 10: cnum = " "+str(cnum)
    else: cnum = str(cnum)
    lenmatch = len(wstring)/len(soln)
    charmatchp=matchhigh/len(soln)
    totalmatch = (1-lenmatch*charmatchp)*100
    if debug ==1: scoreoutput = "Score: " + str(wstringscore[higharr[index]]) + " with "
    else: scoreoutput = "Currently with "
    ## output
    if len(soln) > 30:
        if i % len(soln) == 0 or wstring == soln or i == 1:
            print("Generation " + str(i) + ", child "+cnum+": " + wstring + ". "+scoreoutput+str(matchhigh)+" character matches and "+str(len(wstring)/len(soln))+" length match for "+str(totalmatch)+"% error.")
    else:
        print("Generation " + str(i) + ", child "+cnum+": " + wstring + ". "+scoreoutput+str(matchhigh)+" character matches and "+str(len(wstring)/len(soln))+" length match for "+str(totalmatch)+"% error.")
    if nimp > localpeak: wstring = soln # Escapes out from local peak
#Final outputs
if nimp < localpeak: 
    duration = calendar.timegm(time.gmtime()) - calendar.timegm(starttime)
    print("Goal "+str(len(soln))+"-bit sequence achieved from source "+str(len(starts))+"-bit sequence in "+str(generationtime*i)+" years with a "+str(len(selector))+"-bit alphabet in "+str(duration)+" seconds.")
    try:
        truerand = math.pow(len(selector),len(soln))
        statden = 100/truerand
    except OverflowError:
        truerand=str(len(selector))+"^"+str(len(soln))
        statden = 0
    individ = children * i
    statden = individ * truerand
    print("Wholly random probability (non-selective): 1 in "+str(truerand)+". Individuals processed: "+str(individ)+", or "+str(statden)+"% random-time.")
else: 
    duration = calendar.timegm(lastnovel) - calendar.timegm(starttime)
    print("Evolutionary local maxima reached in "+str((i-localpeak)*generationtime)+" years.  Further improvement required too much regression (total evolution stopped after "+str(i*generationtime)+" years). Difference from goal was "+str(totalmatch)+"% in "+str(duration)+" seconds.")
    print("Run again, or consider increasing the local maxima variable, or decreasing the weight of correct letters.")

It's pretty startling how fast it converges (if it doesn't reach a local max). Here, it converges in as little as 6500 generations, which would be 13,000 years with the parameters I have. The performance hogging is all from printing it out to the screen, but that's half the fun.

Hey Dembski, is that good enough for you? If so, I'll take a copy of Dawkin's book, please :-D

Information and Links

Join the fray by commenting, tracking what others have to say, or linking to it from your blog.



Trackbacks

Vitalita Derma Reviews

Vitalita Derma Reviews | 03/02/2015 00:10

Is it a weasel? Methinks it is. | The Dichotomous Trekkie 2.0

Vitalita Derma Reviews

Vitalita Derma Reviews | 03/02/2015 00:07

Is it a weasel? Methinks it is. | The Dichotomous Trekkie 2.0