Evaluation of The Scatter Code Encodings in GAs.
CS591 Fall 97 [Dr.Forrest]
Special Topics in Genetic Algrotithms
S Madhu. CS Dept., UNM
<madhu@cs>
Wed Aug 19 13:11:37 1998
Motivation
The way in which candidate solutions are encoded is a central factor
in the success of a genetic algorithm [Mitch97]
. Using fixed length fixed order bitstrings in binary or gray
encodings is common, the drawback being the existance of hamming
cliffs, which affect the fitness landscape in the solution search
spaces. Scatter codes [Derek94]
provide us with an an encoding from linear space that has exponential
capacity and no hamming cliffs. In this report we study the issues in
using scatter code encodings in genetic search.
Scatter Codes
Scatter codes are fixed length binary bitsrings. To generate a
sequence of N n-bit scatter codes, the first code is
chosen randomly. The remaining codes are obtained by applying the
scatter operation on the previous code (code i+1
is obtained by scattering code i.)
The scatter operation selects at random a few bits of the previous
code and flips them. Each bit is flipped with probability p,
(0<p<1) the flip-bit probability for the code.
The scatter code was invented to generate a mapping from
multi-dimensional sense data into hamming space for use in Neural
Networks, where independent scatter codes encoding different variables
(on different sense axes) can be composed naturally with the XOR
operator. The motivation to use scatter codes in GAs comes from the
expectation that abscence of hamming cliffs in the encoding lead to
fitness landscapes that are "better".
Approach
To assess the usefulness of scatter code encodings in genetic search
three methods were considered. The first two methods involve actual
experiments on GAs while the third analyses fitness distance
correlations. [The Third approach is the focus of this interim report]
The first approach is to choose a set of well known functions, and
optimize them on both binary and scatter codes, on the same domain and
see which attains the optimum first. The second approach is to use the
same functions in a fixed number of fitness evaluations and see which
encoding method finds the optimum more often. These two approaches
were implemented using the [GECO] package under Allegro Common Lisp,
and raised some issues:
- The length of the codes: scatter codes introduce "redundancy" by
using a larger number of bits than is required to encode the
information. In function optimization, assume we are looking at the
domain [0, 1]. If we use a 8 bit binary (or gray) code, we will be
encoding the domain [0-1] using 2^8 = 256 codes. The search
space will be the space of all binary bitsrings of length 8. How to
design the scatter code for comparison is not readily clear. We do not
want to use scatter codes of length 8: the number of distinct
(non-redundant) codes produced by the scatter operation on an 8 bit
bitstring is not noo many. For comparison we probably want to use 256
codes in the encoding, and of length 32,100, and 200. (in studying
scatter codes histograms of the hamming distances between scattercodes
were plotted for these sizes and for flip-bit-probabilities of 0.01
0.05 0.1 and 0.2 but are not presented here)
- The genetic algorithm used to compare the encodings was the
simple GA, with operations of mutation and uniform crossover on the
codes. Application of these operations on scatter codes will produce
bitstrings that do not belong to the encoding set. To do a fitness
evaluation, the code has to be converted back to the function domain,
so we need to deal with bitstrings that do not strictly belong in the
search space. The way this was dealt with was by computing the hamming
distance of the new point from all the scatter codes and resetting the
new bitsring to the closest bitstring (randomly chosen if there are
degenerates). This value is passed on for fitness evaluation. Many
drawbacks are perceived with this method: and it wasnt evident if
there were any gains on doing this.
- Since the scatter codes are generated with the scatter operation, it
seems natural to replace the genetic operations with the scatter
operation. This still produces offspring that are not in the
encoding-space, and again we reset the offspring to the nearest valid
scatter code. (This operation is not too different from mutation)
If we end up by coming back to the code we began, we are not making
much progress in the search space, and its not clear what the gains are
Because of these issues, it wasnt clear if scatter codes were going to
be a win in function optimization. Implementations in Common Lisp were
extremely slow. The above tests need to be reimplemented in C taking
advantage of 32bit integers before satisfactory conclusions can be
made. There was still the lurking suspicion that there might be a
class of problems where scatter codes are a win.
FDC. The third approach to assessing scatter encodings was to
use fitness distance correlation (FDC). In asking what makes a GA
"difficult" [Terry95, p172] proposes that the
relationship between fitness and distance to the goal as a measure of
GA difficulty.
The methodology here is to examine problems with known optima, take a
sample of individuals and compute the correlation coefficient
r, given the set of (fitness, distance) pairs. Given a set
F = {f_1,f_2 ... f_n} of n individual
fitnesses and a corresponding set D = {d_1,d_2
... d_n} of the n distances to the nearest global maximum, r is
computed the usual way: r =
Cov(F,D)/(sd(F).sd(D)) where
Cov(F,D) = 1/nSUM(i=1 to n{
(f_i - f_avg)(d_i - d_avg) } is the
covariance of F, and D, and sd(F), sd(D),
f_avg and i_avg are the standard deviations and means of
F and D. [Terry95, p194] uses FDC
to make predictions about the relative worth of binary and gray
encodings. We repeat these experiments with scatter codes to estimate
the relative worth of scatter codes. These give us insights into the
nature of the fitness landscape relative to other encodings.
Results
FDC Plots were generated for the 1max and riolo problems on all three
types of encodings. The results in [Terry95]
were verified on gray and binary encodings on sizes 8, 16, and 24, on
the dejong functions 1,2. in order to compare these with scatter
codes, we need to ensure that the accuracy of encodings is
comparable. In all cases we restrict the dejong functions to the 2
dimensional case and convert it as a maximization problem. If we
choose a scatter encoding with 1024 codes, and we choose a size of 20
bits in the binary and gray encodings, we get 10 bits per argument =
1024 codes. If we choose a scatter code size of 200, there are 100
bits for each argument. Each scatter code translation has 1024 codes.
The scatter encoding had a flip-bit-probablity of 0.01.
The fdc-experiments were run on the first dejong function with these
parameters:
:func #'(lambda (x1 x2) (+ (square x1) (square x2)))
:nargs 2
:range '((-5.12 5.12) (-5.12 5.12))
:optima '((-5.12 -5.12) (-5.12 5.12) (5.12 -5.12) (5.12 5.12))
We repeat generate fitness-distance pairs for each encoding. The
process is repeated 10 times, to make sure results are within
acceptable limits of standard error. The average correlation, the
standard error, and best correlations are compared for the three
encodings:
| statistic |
avg-correlation |
square(stderr) |
best-correlation |
| binary |
-0.30917525 |
1.3242558e-4 |
-0.3282121 |
| gray |
-0.37372816 |
1.2376348e-4 |
-0.38945815 |
| scatter | g
-0.38841963 |
0.01179954 |
-0.5296851 |
The scatter-plots of fitness versus distance are plotted for the
best distributions.
The scatter encodings yielded better correlations than the binary
or gray encodings. Analysing the scatter-plots the nature of the
fitness landscapes becomes apparent. Both in binary and gray codes,
because of the hamming cliffs, if x and y are close to optima, and are
seperated by say 0.02, the encoding of x E(x) and E(y) are far apart
in hamming distance. This explains the bands. The scatter encoding
however is curved up - there is almost a tracable continous path which
the ga can take in reaching the optimum, by mutations.
The 2nd dejong function was tested with these parameters:
:func #'(lambda (x1 x2)
(+ (* 100 (square (- (square x1) x2))) (square (- 1 x1))))
:nargs 2
:optima '((2.048 -2.048))
:range '((-2.048 2.048) (-2.048 2.048))
| statistic |
avg-correlation |
square(stderr) |
best-correlation |
| binary |
-0.1860648 |
3.0494604e-4 |
-0.20727673 |
| gray |
-0.34222966 |
1.7104487e-4 |
-0.37592116 |
| scatter |
-0.40781552 |
0.004300935 |
-0.5232019 |
The dejong function 3 was tested as: This function had better
correlations under binary than gray encoding but was not significantly
found to do better in binary as in the experiments of Caruaana and
Schaffer quoted and referenced in [Terry95]. Functions like these
made fdc difficult to compare with experimental results in gray and
binary encodings. An actual simulation is required to verify that
scatter does not indeed perform any worse.
:func #'(lambda (x1 x2) (+ (floor x1) (floor x2)))
:nargs 2
:range '((-5.12 5.12) (-5.12 5.12))
:optima '((5.12 5.12))
| statistic |
avg-correlation |
square(stderr) |
best-correlation |
| binary |
-0.53967196 |
1.7185335e-4 |
-0.55281687 |
| gray |
-0.2716285 |
6.7105925e-5 |
-0.28346756 |
| scatter |
-0.36458302 |
0.013899967 |
-0.5504384 |
It seems that the best correlation was not a good example of the
scatter-plot. We plot the scatter plot for the result of the fdc
experiment where:
((MEANY 97543/1000) (SUMY 390172) (VARY 120.163925)
(PAIRS
((-4 102) (4 62) (-7 101) (0 94) (-6 112) (0 107) (0 112) (-3 93) (-2 111) (-5 104) ...))
(POP-SIZE 4000) (SUMX -4197) (VARX 17.39881) (COV -11.516021) (MEANX -4197/4000)
(CORR -0.25185794))
Conclusions and Future Work
Scatter encodings lead to better fitness landscapes and therefore
make the GA easier to search in the sense of [Terry95] This was established by analysing the
fitness-distance correlations and scatterplots on function
optimization experiments primarily dejong functions 1 and 2, 3, and other
functions.
Actual simulations of scatter-encodings
on the simple ga indicate that
- the ga tends to find a known optimum much faster (in fewer number
of evaluations) than binary or gray encodings, repeatedly.
- crossover seems to have a limited effect, its not clear if it is
positive. This will be tested with the headless chicken tests in the
future.
- when no optima is known priorly, and convergance is used to
determine if we are at an optima, the scatter code loses. The effect
of the simple ga operators as applied to scatter codes (in convergance
measures) needs to be evaluated.
- Genetic operations used could be modified to take into account the
nature of the scatter code: in particular the scatter-operation.
- the indivdual evaluations take longer. Some optimizations in the
code were recognized, and will be implemented in futre work.
There were interesting (software engineering) results in the actual
code that was written. Using CLOS and mixin-patterns was an experience
in the ease of method-based Object oriented programming. The code uses
GECO and is available at
http://www.cs.unm.edu/~madhu/spoilers/ga.tgz
This was only an interim report. There is a body of work that still
needs to be evaluated and presented in a complete report,
References
- [Mitch97]
An Introduction To genetic Algorithms, Melanie Mitchell, MIT Press,
1997.
- [Derek94]
The Multidimensional Scatter Code: A Data fusion technique with
exponential Capacity, Derek Smith, Paul Stanford, Intl.Conf.on
Artificial Neural Networks, Vol II 1432-5 May 1994.
- [Terry95]
Evolutionary Algortithms, Fitness Landscapes and Search, Terry Jones,
PhD Thesis, UNM, May 1995.
- [Ackley87]
A Connectionist Machine for Genetic Hillclimbing, David H Ackley,
Kluwaver Academic Publishers, 1987
- [Goldb89]
Genetic Algorithms in Search, Optimization & Machine Learning, David E
Goldberg, Addison Wesley, 1989
[Last Touched: Fri Jul 31 19:33:08 1998
<madhu@cs>]