-
- Immune systems are adaptive systems in which learning takes place by
evolutionary mechanisms similar to biological evolution. Their major
function is to provide a defense mechanism for the body that can identify
dangerous foreign material and eliminate it. Understanding the immune
system is important, both because of its role in complex diseases such
as AIDS and because of potential applications to computational problems.
The mechanisms of the immune system are remarkably complex and poorly
understood, even by immunologists.
Our immune system models are based on a universe in which antigens (foreign
material) and lymphocytes (the cells that do the recognition of foreign
material) are represented by strings of discrete symbols. The strings
represent receptors on B cells and T cells and epitopes on antigens---the
regions of the cells in which binding takes place. The complex chemistry
of molecular binding is modeled in our system by string matching.
We have used this model to study several different aspects of the
immune system, including its ability to detect common patterns
(schemas) in noisy environments
( Forrest et al., 1993a), its ability to
discover and maintain coverage of diverse pattern classes
(Smith et al., 1993), and its ability to learn effectively, even when
not all antibodies are expressed and not all antigens are presented
(Hightower et al., 1995,
Hightower et al., 1996). Current research directions include
modeling
cross-reactive memory (work by Derek Smith) and
location-specific behavior (immune
cells that behave differently depending on their physical location in
the body). This work is in collaboration with Alan Perelson.
-
- A long-term goal of my research is to build an artificial immune
system for a computer. This immune
system would have much more sophisticated notions of identity and
protection than those afforded by current operating systems, and it
would provide a general-purpose protection system to augment current
computer security systems.
As an initial step in this direction, we developed a computer
virus detection system based on immunological principles. Our
change-detection algorithm defines a set in terms of its complement.
This allows us to create a detection system with the following
properties: it is probabilistic and tunable (the probability of detection
can be traded
off against CPU time and space), it can be distributed (providing high system-wide
reliability at low individual cost), and it can detect novel viruses
that have not previously been identified. We have written several papers
on this work
(Forrest et al., 1994a, D'haeseleer et al., 1996a, D'haeseleer, 1996).
We are currently studying a method for anomaly detection in which ``normal'' is
defined by short-range correlations in a process' system calls. In
preliminary experiments, we found this definition to be stable during
normal behavior for several standard UNIX programs. Further, it is
able to detect several common intrusions involving sendmail and other
processes.
-
- In collaboration with John Holland, Terry Jones, Peter Hraber,
Andrew Kosoresow,
and several ecologists, we are studying how genetic algorithms can be used
in ecological modeling.
Echo extends standard genetic algorithms
in several interesting ways: (i) there is no explicit fitness function, (ii)
individuals have local state,
and (iii) the genetic representation is based on a
higher-cardinality alphabet than binary strings. In Echo, fitness
evaluation takes place implicitly. That is, individuals in the
population (called {\em agents}) are allowed to make copies of
themselves anytime they acquire enough ``resources'' to replicate
their genome. Different resources are modeled by different letters of
the alphabet (say, A, B, C, D), and genomes are constructed out of
those same letters. However these resources can exist independently
of the agent's genome, either free in the environment or stored
internally by the agent. Agents acquire resources by interacting with
other agents through trading relationships and combat. Echo thus
relaxes the constraint that an explicit fitness function must return a
numerical evaluation of each agent. This ``endogenous'' fitness
function is much closer to the way fitness is assessed in natural
settings. In addition to trade and combat, a third form of
interaction between agents is ``mating.'' Mating provides
opportunities for agents to exchange genetic material through
crossover, thus creating hybrids. Mating, together with mutation,
provides the mechanism for new types of agents to evolve.
In preliminary simulations, the Echo system has demonstrated
surprisingly complex behaviors, including something resembling a
biological arms race (in which two competing species develop
progressively more complex offensive and defensive strategies),
ecological dependencies among different species, and sensitivity (in
terms of the number of different phenotypes) to differing levels of
renewable resources. We are currently studying the extent to which
macro-level behaviors of Echo mimic those of natural ecological
systems. In particular, we are quantifying the species abundance
patterns observed in Echo and comparing them with those found in
natural ecologies.
More information about the Echo project can be found
here.
-
- This project is in collaboration with Melanie Mitchell. In this work,
we have investigated
a number of anomalous results (some reported in the literature and
some resulting from our own investigations) in which the genetic
algorithm fails to perform as theory predicts. In each case, we have
found interesting explanations for these failures, which have led us
to think more deeply about how the algorithm really works
(Forrest and Mitchell, 1991; Mitchell et al., 1992; Forrest and Mitchell
1993a; Forrest and Mitchell 1993b). We have
recently described and analyzed an idealized version of the genetic
algorithm (the IGA) which contains the features of genetic algorithms
that we believe are most important to its success---independent
sampling, combining good partial solutions through crossover, and
preserving these combinations through selection. Our mathematical
analysis provides a lower bound on the time to discover an optimal
solution for one class of problems. We performed a similar analysis
for hillclimbing, which is a step towards predicting the conditions
under which the genetic algorithm will outperform hillclimbing and
vice versa (Mitchell et al., 1993a). An important area of future
research is to study what parameter settings and versions of the
genetic algorithm allow it to conform most closely to the IGA.
-
- This project is in collaboration with Rebecca Parsons.
We are applying the
genetic algorithm to a computational biology problem---the assembly of
DNA sequence fragments from a parent clone whose sequence is unknown
into a consensus sequence corresponding to the parent sequence. This
is called the ``fragment assembly problem'' and was chosen as one of
two challenge problems for the 1994-95 DIMACS competition. It is a
permutation problem, similar to the Traveling Salesman Problem (TSP),
and similarly, NP-complete, but with some important differences (circular
tours, noise, and special relationships between entities). In this work
we have studied two different
representations and many different genetic operators. We have
compared the two representations and their associated operators on
problems ranging from 2 kilobases to 34 kilobases. After
experimenting with different combinations, we were able to solve a 10
kilobase sequence, consisting of 177 fragments, with no manual
intervention. We recently extended our work to a 752 fragment
set and achieved very close agreement with the published consensus
sequence.