Environments, Rewards, and Parameters
The following environments will be used for these investigations:
- BoringRect(rX,rY)
- A RectangleGridWorld2d<T>
containing only IndoorFloor tiles. rX and
rY are the
and
ranges of the rectangular grid,
respectively.
- BoringTorus(cX,cY)
- The same as a
BoringRect(rX,rY), except built on a
TorusGridWorld2d<T> environment.
- RoomRect(rX,rY,k)
- A RectangleGridWorld2d<T>
containing both IndoorFloor and IndoorWall tiles.
rX and rY are the
and
ranges of the
rectangular grid, respectively. This environment contains
``rooms'', where a room is a region of IndoorFloor tiles that
are almost completely surrounded by IndoorWall tiles. Rooms
are connected by ``doorways'' which are just single-tile gaps in the
wall.
- RoomTorus(cX,cY,k)
- The same as a
RoomRect(cX,cY,k) except built on a toroidal topology.
- OutdoorRect(rX,rY,pG,pB,pM,pR)
- A
RectangleGridWorld2d<T> of size
containing
OutdoorTerrain tiles. The parameters
,
,
, and
represent the proportion of Grass, Bush,
Mud, and Rock tiles, respectively. (Note that
and
.) The designer MAY choose any
non-trivial7distribution of terrain types through the world and MAY generate this
distribution in any desired way - manually, a fixed terrain
generation algorithm, randomly, etc.
- OutdoorTorus(cX,cY,pG,pB,pM,pR)
- The same as a
OutdoorRect(cX,cY,pG,pB,pM,pR) except built on a toroidal
topology.
The following REWARD functions will be used for these investigations:
- SingleGoalReward(gx,gy)
- A reward function consisting
of a single LOCATION at coordinate
. When the AGENT
reaches that LOCATION, it receives a feedback of
; all other
LOCATIONs provide REWARDs of 0. (All ORIENTATIONs receive the same
REWARD at the LOCATION
.) The rewarding LOCATION is also
the GOAL STATE - when the AGENT reaches it, the TRIAL ends. The
rewarding LOCATION remains the same across all TRIALs.
- RandomGoalReward
- A reward function that changes on
every TRIAL. At the beginning of a TRIAL, a single
coordinate is chosen. That coordinate becomes the GOAL STATE for that
TRIAL. The AGENT receives a REWARD of
when it reaches that GOAL
STATE LOCATION and 0 everywhere else. (All ORIENTATIONs receive the same
REWARD at the GOAL STATE.) Upon reaching the GOAL STATE, the TRIAL
ends and a new GOAL STATE is chosen. The GOAL STATE does not move
during a single TRIAL.
For these experiments, it is sufficient to consider the parameter
settings
and
. The designer MAY choose to
experiment with other parameter settings.
Terran Lane
2005-10-18