public class PotentialShapedRMax extends MDPSolver implements LearningAgent
KWIKModel
class for more information on defining your
own model.
1. John Asmuth, Michael L. Littman, and Robert Zinkov. "Potential-based Shaping in Model-based Reinforcement Learning." AAAI. 2008.Modifier and Type | Class and Description |
---|---|
static class |
PotentialShapedRMax.RMaxPotential
A potential function for vanilla RMax; all states have a potential value of R_max/(1-gamma)
|
Modifier and Type | Field and Description |
---|---|
protected java.util.LinkedList<Episode> |
episodeHistory
the saved previous learning episodes
|
protected int |
maxNumSteps
The maximum number of learning steps per episode before the agent gives up
|
protected RMaxModel |
model
The model of the world that is being learned.
|
protected RewardFunction |
modeledRewardFunction
The modeled reward function that is being learned.
|
protected TerminalFunction |
modeledTerminalFunction
The modeled terminal state function.
|
protected ModelLearningPlanner |
modelPlanner
The model-adaptive planning algorithm to use
|
protected int |
numEpisodesToStore
The number of the most recent learning episodes to store.
|
actionTypes, debugCode, domain, gamma, hashingFactory, usingOptionModel
Constructor and Description |
---|
PotentialShapedRMax(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
double maxReward,
int nConfident,
double maxVIDelta,
int maxVIPasses)
Initializes for a tabular model, VI valueFunction, and standard RMax paradigm
|
PotentialShapedRMax(SADomain domain,
double gamma,
HashableStateFactory hashingFactory,
PotentialFunction potential,
int nConfident,
double maxVIDelta,
int maxVIPasses)
Initializes for a tabular model, VI valueFunction, and potential shaped function.
|
PotentialShapedRMax(SADomain domain,
HashableStateFactory hashingFactory,
PotentialFunction potential,
KWIKModel model,
ModelLearningPlanner plannerGenerator)
Initializes for a given model, model learning planner, and potential shaped function.
|
Modifier and Type | Method and Description |
---|---|
protected Policy |
createUnmodeledFavoredPolicy() |
java.util.List<Episode> |
getAllStoredLearningEpisodes() |
Episode |
getLastLearningEpisode() |
RMaxModel |
getModel()
Returns the model learning algorithm being used.
|
RewardFunction |
getModeledRewardFunction()
Returns the model reward function.
|
TerminalFunction |
getModeledTerminalFunction()
Returns the model terminal function.
|
ModelLearningPlanner |
getModelPlanner()
Returns the planning algorithm used on the model that can be iteratively updated as the model changes.
|
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
Episode |
runLearningEpisode(Environment env) |
Episode |
runLearningEpisode(Environment env,
int maxSteps) |
void |
setNumEpisodesToStore(int numEps) |
addActionType, applicableActions, getActionTypes, getDebugCode, getDomain, getGamma, getHashingFactory, setActionTypes, setDebugCode, setDomain, setGamma, setHashingFactory, setModel, solverInit, stateHash, toggleDebugPrinting
protected RMaxModel model
protected RewardFunction modeledRewardFunction
protected TerminalFunction modeledTerminalFunction
protected ModelLearningPlanner modelPlanner
protected int maxNumSteps
protected java.util.LinkedList<Episode> episodeHistory
protected int numEpisodesToStore
public PotentialShapedRMax(SADomain domain, double gamma, HashableStateFactory hashingFactory, double maxReward, int nConfident, double maxVIDelta, int maxVIPasses)
domain
- the real world domaingamma
- the discount factorhashingFactory
- the hashing factory to use for VI and the tabular modelmaxReward
- the maximum possible rewardnConfident
- the number of observations required for the model to be confident in a transitionmaxVIDelta
- the maximum change in value function for VI to terminatemaxVIPasses
- the maximum number of VI iterations per replan.public PotentialShapedRMax(SADomain domain, double gamma, HashableStateFactory hashingFactory, PotentialFunction potential, int nConfident, double maxVIDelta, int maxVIPasses)
domain
- the real world domaingamma
- the discount factorhashingFactory
- the hashing factory to use for VI and the tabular modelpotential
- the admissible potential functionnConfident
- the number of observations required for the model to be confident in a transitionmaxVIDelta
- the maximum change in value function for VI to terminatemaxVIPasses
- the maximum number of VI iterations per replan.public PotentialShapedRMax(SADomain domain, HashableStateFactory hashingFactory, PotentialFunction potential, KWIKModel model, ModelLearningPlanner plannerGenerator)
domain
- the real world domainhashingFactory
- a state hashing factory for indexing statespotential
- the admissible potential functionmodel
- the model/model-learning algorithm to useplannerGenerator
- a generator for a model valueFunctionpublic RMaxModel getModel()
getModel
in interface MDPSolverInterface
getModel
in class MDPSolver
public ModelLearningPlanner getModelPlanner()
public RewardFunction getModeledRewardFunction()
public TerminalFunction getModeledTerminalFunction()
public Episode runLearningEpisode(Environment env)
runLearningEpisode
in interface LearningAgent
public Episode runLearningEpisode(Environment env, int maxSteps)
runLearningEpisode
in interface LearningAgent
protected Policy createUnmodeledFavoredPolicy()
public Episode getLastLearningEpisode()
public void setNumEpisodesToStore(int numEps)
public java.util.List<Episode> getAllStoredLearningEpisodes()
public void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class MDPSolver