public class PotentialShapedRMax extends MDPSolver implements LearningAgent
Model
class for more information on defining your
own model.
1. John Asmuth, Michael L. Littman, and Robert Zinkov. "Potential-based Shaping in Model-based Reinforcement Learning." AAAI. 2008.Modifier and Type | Class and Description |
---|---|
protected class |
PotentialShapedRMax.PotentialShapedRMaxRF
This class is a special version of a potential shaped reward function that does not remove the potential value for transitions to states with uknown action transitions
that are followed.
|
class |
PotentialShapedRMax.PotentialShapedRMaxTerminal
A Terminal function that treats transitions to RMax fictious nodes as terminal states as well as what the model reports as terminal states.
|
static class |
PotentialShapedRMax.RMaxPotential
A potential function for vanilla RMax; all states have a potential value of R_max/(1-gamma)
|
Modifier and Type | Field and Description |
---|---|
protected java.util.LinkedList<EpisodeAnalysis> |
episodeHistory
the saved previous learning episodes
|
protected int |
maxNumSteps
The maximum number of learning steps per episode before the agent gives up
|
protected Model |
model
The model of the world that is being learned.
|
protected Domain |
modeledDomain
The modeled domain object containing the modeled actions that a valueFunction will use.
|
protected RewardFunction |
modeledRewardFunction
The modeled reward function that is being learned.
|
protected TerminalFunction |
modeledTerminalFunction
The modeled terminal state function.
|
protected ModelLearningPlanner |
modelPlanner
The model-adaptive planning algorithm to use
|
protected int |
numEpisodesToStore
The number of the most recent learning episodes to store.
|
actions, debugCode, domain, gamma, hashingFactory, mapToStateIndex, rf, tf
Constructor and Description |
---|
PotentialShapedRMax(Domain domain,
double gamma,
HashableStateFactory hashingFactory,
double maxReward,
int nConfident,
double maxVIDelta,
int maxVIPasses)
Initializes for a tabular model, VI valueFunction, and standard RMax paradigm
|
PotentialShapedRMax(Domain domain,
double gamma,
HashableStateFactory hashingFactory,
PotentialFunction potential,
int nConfident,
double maxVIDelta,
int maxVIPasses)
Initializes for a tabular model, VI valueFunction, and potential shaped function.
|
PotentialShapedRMax(Domain domain,
HashableStateFactory hashingFactory,
PotentialFunction potential,
Model model,
ModelLearningPlanner plannerGenerator)
Initializes for a given model, model learning planner, and potential shaped function.
|
Modifier and Type | Method and Description |
---|---|
protected Policy |
createUnmodeledFavoredPolicy() |
java.util.List<EpisodeAnalysis> |
getAllStoredLearningEpisodes() |
EpisodeAnalysis |
getLastLearningEpisode() |
Model |
getModel()
Returns the model learning algorithm being used.
|
Domain |
getModeledDomain()
Returns the model domain for planning.
|
RewardFunction |
getModeledRewardFunction()
Returns the model reward function.
|
TerminalFunction |
getModeledTerminalFunction()
Returns the model terminal function.
|
ModelLearningPlanner |
getModelPlanner()
Returns the planning algorithm used on the model that can be iteratively updated as the model changes.
|
void |
resetSolver()
This method resets all solver results so that a solver can be restarted fresh
as if had never solved the MDP.
|
EpisodeAnalysis |
runLearningEpisode(Environment env) |
EpisodeAnalysis |
runLearningEpisode(Environment env,
int maxSteps) |
void |
setNumEpisodesToStore(int numEps) |
addNonDomainReferencedAction, getActions, getAllGroundedActions, getDebugCode, getDomain, getGamma, getHashingFactory, getRf, getRF, getTf, getTF, setActions, setDebugCode, setDomain, setGamma, setHashingFactory, setRf, setTf, solverInit, stateHash, toggleDebugPrinting, translateAction
protected Model model
protected Domain modeledDomain
protected RewardFunction modeledRewardFunction
protected TerminalFunction modeledTerminalFunction
protected ModelLearningPlanner modelPlanner
protected int maxNumSteps
protected java.util.LinkedList<EpisodeAnalysis> episodeHistory
protected int numEpisodesToStore
public PotentialShapedRMax(Domain domain, double gamma, HashableStateFactory hashingFactory, double maxReward, int nConfident, double maxVIDelta, int maxVIPasses)
domain
- the real world domaingamma
- the discount factorhashingFactory
- the hashing factory to use for VI and the tabular modelmaxReward
- the maximum possible rewardnConfident
- the number of observations required for the model to be confident in a transitionmaxVIDelta
- the maximum change in value function for VI to terminatemaxVIPasses
- the maximum number of VI iterations per replan.public PotentialShapedRMax(Domain domain, double gamma, HashableStateFactory hashingFactory, PotentialFunction potential, int nConfident, double maxVIDelta, int maxVIPasses)
domain
- the real world domaingamma
- the discount factorhashingFactory
- the hashing factory to use for VI and the tabular modelpotential
- the admissible potential functionnConfident
- the number of observations required for the model to be confident in a transitionmaxVIDelta
- the maximum change in value function for VI to terminatemaxVIPasses
- the maximum number of VI iterations per replan.public PotentialShapedRMax(Domain domain, HashableStateFactory hashingFactory, PotentialFunction potential, Model model, ModelLearningPlanner plannerGenerator)
domain
- the real world domainhashingFactory
- a state hashing factory for indexing statespotential
- the admissible potential functionmodel
- the model/model-learning algorithm to useplannerGenerator
- a generator for a model valueFunctionpublic Model getModel()
public Domain getModeledDomain()
public ModelLearningPlanner getModelPlanner()
public RewardFunction getModeledRewardFunction()
public TerminalFunction getModeledTerminalFunction()
public EpisodeAnalysis runLearningEpisode(Environment env)
runLearningEpisode
in interface LearningAgent
public EpisodeAnalysis runLearningEpisode(Environment env, int maxSteps)
runLearningEpisode
in interface LearningAgent
protected Policy createUnmodeledFavoredPolicy()
public EpisodeAnalysis getLastLearningEpisode()
public void setNumEpisodesToStore(int numEps)
public java.util.List<EpisodeAnalysis> getAllStoredLearningEpisodes()
public void resetSolver()
MDPSolverInterface
resetSolver
in interface MDPSolverInterface
resetSolver
in class MDPSolver