Last
Update: October, 2000
Abstracts:
Immune System Modeling
- Immunity by Design: An Artificial Immune System.
We describe an artificial immune system (AIS) that is distributed, robust,
dynamic, diverse and adaptive. It captures many features of the ver tebrate
immune system and places them in the context of the problem of protecting
a network of computers from illegal intrusions. Postscript PDF
-
How the immune system generates diversity:
Pathogen space coverage with random and evolved antibody libraries.
The immune system uses many strategies to gen erate its enormous repertoire
of diverse antibod ies, but their relative importance is not under stood.
Here we address the contribution of an tibody gene libraries to the antibody
repertoire. We introduce a general framework, in which we can study many
antibodypathogen match ing rules, including the widelyused shapespace
model (Perelson and Oster, 1979). We use the ge netic algorithm as a model
of evolution to inves tigate the type of antibody repertoires that might
evolve in relation to a given pathogenic environ ment. For the antibody/pathogen
matching rules that we studied, the scaling relation between fit ness and
the size of the evolved antibody library is only a shifted variant of the
scaling relation that we obtain with random libraries of the same size.
We discuss how our results compare to the antibodies that are expressed
in newborns, and we discuss the implications of our results for re cent
experiments with phage antibody libraries Postscript PDF
- Variable efficacy of repeated annual
influenza vaccination.
Conclusions have differed in studies that have compared vaccine efficacy
in groups receiving in fluenza vaccine for the first time, to efficacy
in groups vaccinated more than once. For example, the Hoskins study concluded
that repeat vaccination was not protective in the long term, while the Keitel
study concluded that repeat vaccination provided continual protection. We
propose a novel explanation, the antigenic distance hypothesis, and test
it by analyzing seven influenza outbreaks that occurred during the Hoskins
and Keitel studies. The hypothesis is that variation in repeat vaccine efficacy
is due to differences in antigenic distances among vaccine strains and between
the vaccine strains and the epidemic strain in each outbreak. To test the
hypothesis, antigenic distances were calculated from historical hemagglutination
inhibition assay tables, and a computer model of the immune response was
used to predict the vaccine efficacy of individuals given different vacci
nations. The model accurately predicted the observed vaccine efficacies
in repeat vaccinees relative to the efficacy in firsttime vaccinees (correlation
0.87). Thus, the antigenic distance hypothesis offers a parsimonious explanation
of the differences between and within the Hoskins and Keitel studies. These
results have implications for the selection of influenza vaccine strains,
and also for vaccination strategies for other antigenically variable pathogens
that might require repeated vaccination. Postscript
PDF
- Using lazy evaluation to simulate realistic-size
repertoires in models of the immune system.
We describe a method of implementing efficient computer simulations of immune
systems that have a large number of unique B- and/or T-cell clones. The
method uses an implementation technique called lazy evaluation to create
the illusion that all clones are being simulated, while only actually simulating
a much smaller number of clones that can respond to the antigens in the
simulation. The method is effective because only 0.001-0.01% of clones can
typically be stimulated by an antigen, and because many simulations involve
only a small number of distinct antigens. A lazy simulation of a realistic
number of clones and 10 distinct antigens is 1000 times faster and 10 000
times smaller than a conventional simulation-making simulations of immune
systems with realistic-size repertoires computationally tractable. Bulletin of
Mathematical Biology
- Simulated evolution of antibody gene libraries under
pathogen selection.
The immune system of vertebrates is generally viewed as a prototype of a
highly adaptive, distributed, detection system, that identifies and neutralizes
pathogenic intrusions. One of its puzzling features is that the immune receptors
(antibodies) are able to bind to pathogens that they have not been trained
to recognize. This anticipatory capability is thought to be due to a broad
coverage of the pathogen space realized by the antibodies that the immune
system can produce. What we would like to understand is how this this coverage
is achieved, given that the immune system uses a relatively small number
of genes to construct its receptors. We use an evolution- ary algorithm
to explore the strategies that the antibody libraries may evolve in order
to encode pathogen sets of various sizes. We derive a lower and an upper
bound on the performance of the evolved antibody libraries as a function
of their size and the length of the pathogen string. We also provide some
insights in the strategy of the antibody libraries. We discuss the implications
of our results for biological evolution of antibody libraries. Postscript PDF
- Modeling the effect of prior infection on vaccine
efficacy. D.J. Smith, S. Forrest and D.H. Ackley, and A.S.Perelson.
In Artificial Immune systems and their Applications.
Conclusions have differed in studies that have compared vaccine efficacy
in groups receiving in fluenza vaccine for the first time, to efficacy
in groups vaccinated more than once. For example, the Hoskins study concluded
that repeat vaccination was not protective in the long term, while the Keitel
study concluded that repeat vaccination provided continual protection. We
propose a novel explanation, the antigenic distance hypothesis, and test
it by analyzing seven influenza outbreaks that occurred during the Hoskins
and Keitel studies. The hypothesis is that variation in repeat vaccine efficacy
is due to differences in antigenic distances among vaccine strains and between
the vaccine strains and the epidemic strain in each outbreak. To test the
hypothesis, antigenic distances were calculated from historical hemagglutination
inhibition assay tables, and a computer model of the immune response was
used to predict the vaccine efficacy of individuals given different vacci
nations. The model accurately predicted the observed vaccine efficacies
in repeat vaccinees relative to the efficacy in firsttime vaccinees (correlation
0.87). Thus, the antigenic distance hypothesis offers a parsimonious explanation
of the differences between and within the Hoskins and Keitel studies. These
results have implications for the selection of influenza vaccine strains,
and also for vaccination strategies for other antigenically variable pathogens
that might require repeated vaccination. Postscript
PDF
- Deriving shape-space parameters from immunological
data for a model of cross-reactive memory.
We present a method for deriving shape space parameters that are consistent
with immuno logical data, and illustrate the method by deriving shape space
parameters for a model of crossreactive memory. Crossreactive memory responses
occur when the immune system is primed by one strain of a pathogen and challenged
with a related, but different, strain. Much of the nature of a crossreactive
response is determined by the quantity and distribution of the memory cells,
raised to the primary antigen, that crossreact with the secondary antigen.
B cells with above threshold affinity for an antigen lie in a region of
shape space that we call a ball of stimulation. In a crossreactive response,
the intersection of the balls of stimulation of the primary and secondary
antigens contains the crossreactive B cells and thus determines the degree
of crossreactivity between the antigens. We derive formulas for the volume
of intersection of balls of stimulation in different shape spaces and show
that the parameters of shape space, such as its dimensionality, have a large
impact on the number of B cells in the intersection. The application of
our method for deriving shape space parameters indicates that, for Hamming
shape spaces, twenty to twentyfive dimensions, a three or four letter alphabet,
and balls of stimulation of radius five or six, are choices that match the
experimental data. For Euclidean shape spaces, five to eight dimensions
and balls of stimulation with radius about twenty percent of the radius
of the whole space, match the experimental data. Postscript
PDF
- Evolution (and learning) of v-region genes
A binary model of the immune system is used to study the effects of evolution
on the genetic encoding for antibody molecules. One feature of this encoding
is that, unlike typical genetic algorithm experiments, not all genes found
in the genotype are expressed in the phenotype. We report experiments which
show that the evolution of immune system genes, simulated by the genetic
algorithm, can induce a high degree of genetic organization even though
that organization is not explicitly required by the fitness function. We
hypothesize about the nature of this organization and introduce a measure
called Hamming Separation to observe its change during the evolution of
the immune system. Postscript PDF
Immune System Applications and Computer Security
- Architecture for an Artificial Immune
System.
An artificial immune system (ARTIS) is described which incorporates many
properties of natural immune systems, including diversity, distributed computation,
error tolerance, dynamic learning and adaptation and self-monitoring. ARTIS
is a general framework for a distributed adaptive system and could, in principle,
be applied to many domains. In this paper, ARTIS is applied to computer
security, in the form of a network intrusion detection system called LISYS.
LISYS is described and shown to be effective at detecting intrusions, while
maintaining low false positive rates. Finally, similarities and differences
between ARTIS and Holland's classifier systems are discussed. Postscript
- "Automated Response Using System-Call
Delays."
Automated intrusion response is an important unsolved problem in computer
security. A system called pH (for process homeostasis) is described which
can success fully detect and stop intrusions before the target system is
compromised. In its current form, pH monitors ev ery executing process
on a computer at the systemcall level, and responds to anomalies by either
delaying or aborting system calls. The paper presents the rationale for
pH, its design and implementation, and a set of initial experimental results.
Postscript
- Novelty Detection in Time Series Data
using Ideas from Immunology
Intrusion detection systems rely on a wide variety of observable data to
distinguish between legitimate and illegitimate activities. In this paper
we study one such observable sequences of system calls into the kernel of
an operating system. Using systemcall data sets generated by several different
programs, we compare the ability of different data modeling methods to represent
normal behavior accurately and to recognize intrusions. We compare the follow
ing methods: Simple enumeration of observed sequences, comparison of relative
frequencies of different sequences, a rule induction technique, and Hidden
Markov Models (HMMs). We discuss the factors affecting the performance of
each method, and conclude that for this particular problem, weaker methods
than HMMs are likely sufficient. Postscript
PDF
- Detecting intrusions using system
calls: Alternative data models.
Intrusion detection systems rely on a wide variety of observable data to
distinguish between legitimate and illegitimate activities. In this paper
we study one such observable -- sequences of system calls in the kernel
of an operating system. Using system-call data sets generated by several
different programs, we compare the ability of different data-modeling methods
to represent normal behavior accurately and to recognize intrusions. We
compare the following methods: Simple enumeration of observed sequences,
comparison of relative frequencies of different sequences, a rule induction
technique, and Hidden Markov Models (HMMs). We discuss the factors affecting
the performance of each method, and conclude that for this particular problem,
weaker methods than HMMs are likely sufficient. Postscript
PDF
- Artificial Immune Systems in Industrial Applications
D. Dasgupta and S. Forrest. Accepted for presentation at the International
conference on Intelligent Processing and Manufacturing Material (IPMM).
Honolulu, HI (July 10-14, 1999). Postscript PDF
- Intrusion detection using sequences
of system calls.
A method is introduced for detecting intrusions at the level of privileged
processes. Evidence is given that short sequences of system calls executed
by running programs are a good discriminator between normal and abnormal
operating characteristics of several common UNIX programs. Normal behavior
is collected in two ways: Synthetically, by exercising as many normal modes
of usage of a program as possible, and in a live user environment by tracing
the actual execution of the program. In the former case several types of
intrusive behavior were studied; in the latter case, we analyze results
were analyzed for false positives. Postscript PDF
- Principles of a Computer Immune System
Natural immune systems provide a rich source of inspiration for computer
security in the age of the Internet. Immune systems have many features that
are desirable for the imperfect, uncontrolled, and open environments in
which most computers currently exist. These include distributability, diversity,
disposability, adaptability, autonomy, dynamic coverage, anomaly detection,
multiple layers, identity via behavior, no trusted components, and imperfect
detection. These principles suggest a wide variety of architectures for
a computer immune system. Postscript PDF
-
A distributed approach to anomaly detection
The natural immune system has evolved many interesting mechanisms to solve
the problem of self-nonself discrimination. An anomaly detection system
based upon principles derived from the immune system was introduced in Self-nonself
discrimination in a computer. Its main advantages are that it is distributable,
local, and tunable. This paper provides an overview of the theoretical,
algorithmic and practical developments extending the original proposal.
In particular, we present information theoretic results on the detection
method, show the possibility of strings that cannot be detected for a given
combination of self set and matching rule, present efficient algorithms
to generate the detector set, and provide rules of thumb for setting the
parameters to apply this method to a real data set. Postscript
- Building diverse computer systems
Diversity is an important source of robustness in biological systems. Computers,
by contrast, are notable for their lack of diversity. Although homogeneous
systems have many advantages, the beneficial effects of diversity in computing
systems have been overlooked, specifically in the area of computer security.
Several methods of achieving software diversity are discussed based on randomizations
that respect the specified behavior of the program. Such randomization could
potentially increase the robustness of software systems with minimal impact
on convenience, usability, and efficiency. Randomization of the amount of
memory allocated on a stack frame is shown to disrupt a simple buffer overflow
attack. Postscript PDF
- An immunological approach to change detection: algorithms,
analysis, and implications
We present new results on a distributable change detection method inspired
by the natural immune system. A weakness in the original algorithm was the
exponential cost of generating detectors. Two detectorgenerating algorithms
are introduced which run in linear time. The algorithms are analyzed, heuristics
are given for setting parameters based on the analysis, and the presence
of holes in detector space is examined. The analysis pro vides a basis
for assessing the practicality of the algo rithms in specific settings,
and some of the implications are discussed . Postscript
PDF
- A sense of self for Unix processes
A method for anomaly detection is introduced in which "normal" is defined
by shortrange correlations in a pro cess' system calls. Initial experiments
suggest that the definition is stable during normal behavior for standard
UNIX programs. Further, it is able to detect several common intrusions involving
sendmail and lpr. This work is part of a research program aimed at building
computer security systems that incorporate the mechanisms and algorithms
used by natural immune systems. Postscript PDF
- Self-nonself discrimination in a computer
The problem of protecting computer systems can be viewed generally as the
problem of learning to distinguish self from other. We describe a method
for change detection which is based on the generation of T cells in the
immune system. Mathematical analysis reveals computational costs of the
system, and preliminary experiments illustrate how the method might be applied
to the problem of computer viruses. Postscript PDF
Ecology and Artificial Life
- John Holland's invisible hand: An artificial
immune system.
We describe an artificial immune system (AIS) that is distributed, robust,
dynamic, diverse and adaptive. It captures many features of the ver tebrate
immune system and places them in the context of the problem of protecting
a network of computers from illegal intrusions. The AIS resembles a classifier
system in many important ways. Similarities and differences are discussed.
Postscript PDF
- The ecology of Echo
Echo is a generic ecosystem model in which evolving agents are situated
in a resourcelimited environment. The Echo model is described, and the
behavior of Echo is evaluated on two well studied measures of ecological
diversity: relative species abundance and the speciesarea scaling relation.
In simulation experiments, these measures are used to compare the behavior
of Echo with that of a neutral model, in which selection on agent genotypes
is random. These simulations show that the evolutionary component of Echo
makes a significant contribution to its behavior and that Echo shows good
qualitative agreement with naturally occurring species abundance distributions
and speciesarea scaling relations. Postscript PDF
- Modeling complex adaptive systems with Echo
Complex adaptive systems (CAS) consist of many interacting and adapting
compo nents. Echo is a computational CAS model in which evolving agents
are situated in a resourcelimited environment. Different views of the notion
of species within Echo are compared to biological experiments on relative
species abundance, specifically to Pre ston's ``canonical'' lognormal distribution.
Postscript PDF
Genetic Algorithms and Classifier Systems
- Genetic operators for the DNA fragment-assembly problem.
We study different genetic algorithm operators for one permutationproblem
associated with the Human Genome Project---the assembly of DNA sequence
fragments from a parent clone whose sequence is unknown into a consensus
sequence corresponding to the parent sequence. The sortedorder representation,
which does not require specialized operators, is compared with a more traditional
permutation representation, which does require specialized operators. The
two representations and their associated operators are compared on problems
ranging from 2K to 34K base pairs (KB). Edgerecombination crossover used
in conjunction with several specialized operators is found to perform best
in these experiments; these operators solved a 10KB sequence, consisting
of 177 fragments, with no manual intervention. Natural building blocks in
the problem are exploited at progressively higher levels through ``macrooperators.''
This significantly improves performance. Postscript PDF
- Fitness Distance Correlation as a Measure of Problem
Difficulty for Genetic Algorithms.
A measure of search difficulty, fitness distance correlation (FDC), is introduced
and examined in relation to genetic algorithm (GA) performance. In many
cases, this correlation can be used to predict the performance of a GA on
problems with known global maxima. It correctly classifies easy deceptive
problems as easy and difficult nondeceptive problems as difficult, indicates
when Gray coding will prove better than binary coding, and is consistent
with the surprises encountered when GAs were used on the Tanese and royal
road functions. The FDC measure is a consequence of an investigation into
the connection between GAs and heuristic search. Postscript PDF
- Genetic algorithms and artificial life.
Genetic algorithms are computational models of evolution that play a central
role in many artificiallife models. We review the history and current scope
of research on genetic algorithms in artificial life, using illustrative
examples in which the genetic algorithm is used to study how learning and
evolution interact, and to model ecosystems, immune system, cognitive systems,
and social systems. We also outline a number of open questions and future
directions for genetic algorithms in artificiallife research. Postscript
PDF
-
Genetic Algorithms and Heuristic Search.
Genetic algorithms (GAs) and heuristic search are shown to be structurally
similar. The strength of the correspondence and its practical consequences
are demonstrated by considering the relationship between fitness functions
in GAs the heuristic functions of AI. By examining the extent to which fitness
functions approximate an AI ideal, a measure of GA search difficulty is
defined and applied to previously studied problems. The success of the measure
in predicting GA performance (1) illustrates the potential advantages of
viewing evolutionary search from a heuristic search perspective and (2)
appears to be an important step towards answering a question that has been
the subject of much research in the GAs community; what makes search hard
(or easy) for a GA? Postscript PDF
-
Using genetic algorithms to explore pattern recognition
in the immune system.
This paper describes an immune system model based on binary strings. The
purpose of the model is to study the pattern recognition processes and learning
that take place at both the individual and species levels in the immune
system. The genetic algorithm (GA) is a central component of the model.
The paper reports simulation experiments on two pattern recognition problems
that are relevant to natural immune systems. Finally, it reviews the relation
between the model and explicit fitness sharing techniques for genetic algorithms,
showing that the immune system model implements a form of implicit fitness
sharing. Postscript PDF
-
"Searching for diverse, cooperative
populations with genetic algorithms."
In typical applications, genetic algorithms (GAs) process populations of
potential problem solutions to evolve a single population member that specifies
an ``optimized'' solution. The majority of GA analysis has focused on these
optimization applications. In other applications (notably learning classifier
systems and certain connectionist learning systems), a GA searches for a
population of cooperative structures that jointly perform a computational
task. This paper presents an analysis of this type of GA problem. The analysis
considers a simplified geneticsbased machine learning system: a model of
an immune system. In this model, a GA must discover a set of patternmatching
antibodies that effectively match a set of antigen patterns. Analysis shows
how a GA can automatically evolve and sustain a diverse, cooperative population.
The cooperation emerges as a natural part of the antigenantibody matching
procedure. This emergent effect is shown to be similar to fitness sharing,
an explicit technique for multimodal GA optimization. Further analysis
shows how the GA population can adapt to express various degrees of generalization.
The results show how GAs can automatically and simultaneously discover effective
groups of cooperative computational structures. Postscript PDF