Environments, Rewards, and Parameters

The following environments will be used for these investigations:

BoringRect(rX,rY)
A RectangleGridWorld2d<T> containing only IndoorFloor tiles. rX and rY are the $ x$ and $ y$ ranges of the rectangular grid, respectively.
BoringTorus(cX,cY)
The same as a BoringRect(rX,rY), except built on a
TorusGridWorld2d<T> environment.
RoomRect(rX,rY,k)
A RectangleGridWorld2d<T> containing both IndoorFloor and IndoorWall tiles. rX and rY are the $ x$ and $ y$ ranges of the rectangular grid, respectively. This environment contains $ k$ ``rooms'', where a room is a region of IndoorFloor tiles that are almost completely surrounded by IndoorWall tiles. Rooms are connected by ``doorways'' which are just single-tile gaps in the wall.
RoomTorus(cX,cY,k)
The same as a RoomRect(cX,cY,k) except built on a toroidal topology.
OutdoorRect(rX,rY,pG,pB,pM,pR)
A RectangleGridWorld2d<T> of size $ rX\Cross rY$ containing OutdoorTerrain tiles. The parameters $ pG$, $ pB$, $ pM$, and $ pR$ represent the proportion of Grass, Bush, Mud, and Rock tiles, respectively. (Note that $ 0\leq p*\leq 1$ and $ pG+pB+pM+pR=1$.) The designer MAY choose any non-trivial7distribution of terrain types through the world and MAY generate this distribution in any desired way - manually, a fixed terrain generation algorithm, randomly, etc.
OutdoorTorus(cX,cY,pG,pB,pM,pR)
The same as a OutdoorRect(cX,cY,pG,pB,pM,pR) except built on a toroidal topology.

The following REWARD functions will be used for these investigations:

SingleGoalReward(gx,gy)
A reward function consisting of a single LOCATION at coordinate $ \LR{gx,gy}$. When the AGENT reaches that LOCATION, it receives a feedback of $ +1$; all other LOCATIONs provide REWARDs of 0. (All ORIENTATIONs receive the same REWARD at the LOCATION $ \LR{gx,gy}$.) The rewarding LOCATION is also the GOAL STATE - when the AGENT reaches it, the TRIAL ends. The rewarding LOCATION remains the same across all TRIALs.
RandomGoalReward
A reward function that changes on every TRIAL. At the beginning of a TRIAL, a single $ \LR{x,y}$ coordinate is chosen. That coordinate becomes the GOAL STATE for that TRIAL. The AGENT receives a REWARD of $ +1$ when it reaches that GOAL STATE LOCATION and 0 everywhere else. (All ORIENTATIONs receive the same REWARD at the GOAL STATE.) Upon reaching the GOAL STATE, the TRIAL ends and a new GOAL STATE is chosen. The GOAL STATE does not move during a single TRIAL.

For these experiments, it is sufficient to consider the parameter settings $ \gamma=0.95$ and $ \alpha=0.8$. The designer MAY choose to experiment with other parameter settings.

Terran Lane 2005-10-18