Results Report and Analysis Questions
The designer MUST provide a written report that answers the following
questions. All answers MUST be substantiated with empirical evidence.
The designer MAY choose any experiments she or he chooses to answer
the questions, but MUST describe:
- The structure of the experiment(s): environment, learning
algorithms, parameters, number of TRIALs, etc. Illustrations of
the environment may be useful here. All such illustrations MUST
be clearly labeled and captioned.
- The results of the experiment: What happened? How did
different parameter choices, environments, algorithms, etc. affect the outcome of the experiment? Tables and/or plots of
result data may be useful to address this point, but be sure to
label all columns, rows, axes, etc. and provide descriptive
captions, useful legends, and use visually distinct lines on plots.
- And, most importantly, how the experiment addresses the question
being asked: What new information does the experiment provide?
Why does the outcome of this experiment help us answer the
question?
The report need not be any fixed length, but it MUST be long enough to
answer all of the questions. The report MUST be in an 11 or 12 point
Times Roman font, double-spaced, on an
-inch
paper with one-inch margins on all sides.
The designer MUST answer the following questions:
- For a fixed RL algorithm, fixed
and
, and using the
SingleGoalReward REWARD function, is there any
difference between learning in the BoringRect(rX,rY),
RoomRect(rX,rY,5), and
OutdoorRect(rX,rY,0.25,0.25,0.25,0.25) environments?
Why or why not?
- In an OutdoorRect(30,30,0.25,0.25,0.25,0.25)
environment with a
SingleGoalReward(7,23), is there a
difference in learning performance between the
-learning
algorithm and the SARSA(
) algorithm? Why or why not?
- For a fixed RL algorithm, the BoringRect(rX,rY)
environment, and the
SingleGoalReward(gx,gy) REWARD
function, is there a difference in learning performance between
,
,
, and
? Why or
why not?
- For a fixed RL algorithm, fixed
and
, and the
OutdoorRect(rX,rY,0.25,0.25,0.25,0.25), is there a
difference in learning performance between using the
RandomGoalReward function and a fixed START STATE versus using
the SingleGoalReward function and a random START STATE?
Why or why not?
- For a fixed learning algorithm, a fixed
and
, and using
the RandomGoalReward REWARD function, is there a
difference between learning in the BoringRect(rX,rY)
and the BoringTorus(rX,rY) environments? Why or why
not? What about in RoomRect(rX,rY,6) versus
RoomTorus(rX,rY,6)? Or in
OutdoorRect(rX,rY,0.25,0.25,0.25,0.25) versus
OutdoorTorus(rX,rY,0.25,0.25,0.25,0.25)? If there are
different answers for Boring* versus Room*
versus Outdoor*, describe why those differences occur.
In addition, the designer MAY choose to investigate other properties
of the RL algorithms, environments, reward functions, or parameters.
Please discuss what motivates each investigation that is reported.
Terran Lane
2005-10-18