Definitions

The following definitions will be used in this document:

ACTION
A single, atomic movement or other primitive command taken by the AGENT. The ACTION is interpreted by the WORLD SIMULATOR, which assesses the result's of the AGENT's ACTION. Roughly, the ACTION can be thought of as a communication from the AGENT to the WORLD SIMULATOR. In this project, the available ACTIONs are FORWARD, BACK, TURNCLOCK, TURNCOUNTERCLOCK, and NOOP. Note: This ACTION is not to be confused with the javax.swing.Action interface that is used in Swing GUIs.
DIRECTION
Possible directions that an AGENT can move. Synonymous with ORIENTATION, although DIRECTION connotes movement, while ORIENTATION connotes static facing. Implemented in Java via the Direction enumeration. Note that a DIRECTION is distinct from an ACTION - the ACTION is what the AGENT attempts to do; the DIRECTION is the computed direction that it ends up moving. That is, DIRECTION is a function of the AGENT's current ORIENTATION and the ACTION it chooses.
GOAL STATE
A terminal state for a TRIAL. When an AGENT encounters a GOAL state, the current TRIAL is ended.
LEARNING CURVE
A plot of ``amount of experience'' versus ``performance''. For a learning AGENT, its performance should improve with increasing experience. In this project ``amount of experience'' can be interpreted as ``number of TRIALs'' (Section [*]), while ``performance'' can be interpreted as ``value of each trajectory'' (Section [*]).
LOCATION
An $ \LR{x,y}$ coordinate within a single MAP. All cells in a MAP are uniquely indexable by a single LOCATION.
MAP
A representation of the topology and geography of the world, including the LOCATIONs of obstacles, terrain types, and so on. In order to support multiple types of terrains and so on, the MAP is implemented with a Java generic class that implements the generic interface GridWorld2d<T>.
MAY
A requirement that the product can choose to implement if desired. Can also indicate a choice among acceptable alternatives (e.g., ``The program MAY do x, y, or z.'' indicates that the choice of behavior x, y, or z is up to the designer.)
MUST
A requirement that the product must implement for full credit.
MUST NOT
A behavior or assumption that must not be violated. Violating a MUST NOT restriction will result in a penalty on the assignment.
ORIENTATION
A direction with respect to the MAP. The agent's STATE consists partly of its ORIENTATION, and AGENT ACTIONs move it along some ORIENTATION. Synonymous with DIRECTION. Implemented in Java via the Direction enumeration.
POLICY
The function that tells the AGENT how to act at any STATE in the world. Given any STATE, the POLICY returns a single ACTION for that state (possibly chosen at random). Written $ a=\pi(s)$, where the POLICY is the function $ \pi$ and the ACTION is $ a$.
RECOVERABLE ERROR
An error condition that the software can ignore, correct, or otherwise recover from. The program MUST produce a warning message and then cleanly continue with no corruption or loss of valid data.
REINFORCEMENT LEARNING
A class of control algorithms for stochastic, dynamical systems that improve their performance over time through feedbacks of positive and negative REWARDs.
REWARD
A unit of feedback provided to the AGENT, represented as a scalar real number (i.e., a double in Java). Positive REWARDs are desirable - the AGENT seeks out positive REWARDs. Negative REWARDs are penalties and the AGENT attempts to avoid them. Written $ R(s)$ for the reward received at STATE $ s$.
RL
Abbreviation for REINFORCEMENT LEARNING.
SHOULD
A requirement that is recommended, but not required. The designer may violate a SHOULD requirement, but should be prepared to explain why.
SARS TUPLE
A single primitive tuple of experience: initial STATE, ACTION, REWARD, and next STATE. Written $ \LR{s,a,r,s'}$. Implemented in Java via the SARSTuple interface.
START STATE
The state at which an AGENT begins a TRIAL. Depending on the task, a START STATE may be at a fixed LOCATION in the MAP or it may be one of a set of possible starting LOCATIONs, or it may be chosen randomly.
STATE
The description of an atomic configuration of the system at some time. In this project, the entire STATE is the pair of LOCATION and ORIENTATION of the AGENT.
STEP
A single unit of ACTION by the AGENT. A single STEP generates a single SARS TUPLE.
TRAJECTORY
The ordered sequence of all LOCATIONs that an AGENT experiences during a single TRIAL. In some contexts, the sequence of all SARS TUPLES that the AGENT experiences during a single TRIAL.
TRIAL
A contiguous sequences of experiences for an AGENT, beginning with a START STATE and running until the AGENT encounters a GOAL state. Each TRIAL is associated with a TRAJECTORY of AGENT LOCATIONS (or, alternatively, a TRAJECTORY of SARS TUPLEs).
TUPLE
See SARS TUPLE.
UNRECOVERABLE ERROR
An error condition from which recovery is impossible. The program MUST produce an error message describing the condition and then cleanly halt.
WORLD SIMULATOR
The module responsible for simulating the effects of an AGENT's actions on an environment specified by a MAP. This module is responsible for maintaining the MAP, tracking the AGENT's current STATE, updating the AGENT's STATE in response to its ACTIONs, and returning REWARDs to the AGENT. Implemented in Java in via the WorldSimulatorinterface.

Terran Lane 2005-10-18