public class MAValueIteration extends MADynamicProgramming
MADynamicProgramming
class to provide value iteration-like
value function estimation. When an input state is provided via the planFromState(State)
method, if the state has already been seen
and planned for, then nothing happens. If the state has never been seen before, then a state reachiability analysis is first performed in which
all states possibly reachable from the input state are found. Then Value iteration proceeds for all states that have been found in the past.
The runVI()
method can also be called directly to force value iteration to be performed on all states that have been previously found,
but the state reachability must have been performed at least once before to seed the state space. State reachability can be performed manually
by calling the performStateReachabilityFrom(State)
method.
Value iteration will continue until either the maximum change in Q-value is less than some user provided threshold or until a max number of iterations have passed.
MADynamicProgramming.BackupBasedQSource, MADynamicProgramming.JointActionTransitions
Modifier and Type | Field and Description |
---|---|
protected int |
debugCode
The debug code used for printing VI progress.
|
protected double |
maxDelta
The threshold that will cause VI to terminate when the max change in Q-value for is less than it
|
protected int |
maxIterations
The maximum allowable number of iterations until VI termination
|
protected java.util.Set<HashableState> |
states
The set of states that have been found
|
agentDefinitions, backupOperator, discount, domain, hashingFactory, jointModel, jointRewardFunction, planningStarted, qSources, terminalFunction, vInit
Constructor and Description |
---|
MAValueIteration(SGDomain domain,
JointRewardFunction jointRewardFunction,
TerminalFunction terminalFunction,
double discount,
HashableStateFactory hashingFactory,
double qInit,
SGBackupOperator backupOperator,
double maxDelta,
int maxIterations)
Initializes.
|
MAValueIteration(SGDomain domain,
JointRewardFunction jointRewardFunction,
TerminalFunction terminalFunction,
double discount,
HashableStateFactory hashingFactory,
ValueFunction qInit,
SGBackupOperator backupOperator,
double maxDelta,
int maxIterations)
Initializes.
|
MAValueIteration(SGDomain domain,
java.util.List<SGAgentType> agentDefinitions,
JointRewardFunction jointRewardFunction,
TerminalFunction terminalFunction,
double discount,
HashableStateFactory hashingFactory,
double vInit,
SGBackupOperator backupOperator,
double maxDelta,
int maxIterations)
Initializes.
|
MAValueIteration(SGDomain domain,
java.util.List<SGAgentType> agentDefinitions,
JointRewardFunction jointRewardFunction,
TerminalFunction terminalFunction,
double discount,
HashableStateFactory hashingFactory,
ValueFunction vInit,
SGBackupOperator backupOperator,
double maxDelta,
int maxIterations)
Initializes.
|
Modifier and Type | Method and Description |
---|---|
boolean |
performStateReachabilityFrom(State s)
Finds and stores all states that are reachable from input state s.
|
void |
planFromState(State s)
Calling this method causes planning to be performed from State s.
|
void |
runVI()
Runs Value Iteration over the set of states that have been discovered.
|
backupAllValueFunctions, getQSources, hasStartedPlanning, initMAVF, setAgentDefinitions
protected java.util.Set<HashableState> states
protected double maxDelta
protected int maxIterations
protected int debugCode
public MAValueIteration(SGDomain domain, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction, double discount, HashableStateFactory hashingFactory, double qInit, SGBackupOperator backupOperator, double maxDelta, int maxIterations)
domain
- the domain in which to perform planingjointRewardFunction
- the joint reward functionterminalFunction
- the terminal state functiondiscount
- the discounthashingFactory
- the hashing factory to use for storing statesqInit
- the default Q-value to initialize all values tobackupOperator
- the backup operator that defines the solution concept being solvedmaxDelta
- the threshold that causes VI to terminate when the max Q-value change is less than itmaxIterations
- the maximum number of iterations allowedpublic MAValueIteration(SGDomain domain, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction, double discount, HashableStateFactory hashingFactory, ValueFunction qInit, SGBackupOperator backupOperator, double maxDelta, int maxIterations)
domain
- the domain in which to perform planingjointRewardFunction
- the joint reward functionterminalFunction
- the terminal state functiondiscount
- the discounthashingFactory
- the hashing factory to use for storing statesqInit
- the q-value initialization function to use.backupOperator
- the backup operator that defines the solution concept being solvedmaxDelta
- the threshold that causes VI to terminate when the max Q-value change is less than itmaxIterations
- the maximum number of iterations allowedpublic MAValueIteration(SGDomain domain, java.util.List<SGAgentType> agentDefinitions, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction, double discount, HashableStateFactory hashingFactory, double vInit, SGBackupOperator backupOperator, double maxDelta, int maxIterations)
domain
- the domain in which to perform planingagentDefinitions
- the agents involved in the planning problemjointRewardFunction
- the joint reward functionterminalFunction
- the terminal state functiondiscount
- the discounthashingFactory
- the hashing factory to use for storing statesvInit
- the default value to initialize all state values tobackupOperator
- the backup operator that defines the solution concept being solvedmaxDelta
- the threshold that causes VI to terminate when the max Q-value change is less than itmaxIterations
- the maximum number of iterations allowedpublic MAValueIteration(SGDomain domain, java.util.List<SGAgentType> agentDefinitions, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction, double discount, HashableStateFactory hashingFactory, ValueFunction vInit, SGBackupOperator backupOperator, double maxDelta, int maxIterations)
domain
- the domain in which to perform planingagentDefinitions
- the agents involved in the planning problemjointRewardFunction
- the joint reward functionterminalFunction
- the terminal state functiondiscount
- the discounthashingFactory
- the hashing factory to use for storing statesvInit
- the state value initialization function to use.backupOperator
- the backup operator that defines the solution concept being solvedmaxDelta
- the threshold that causes VI to terminate when the max Q-value change is less than itmaxIterations
- the maximum number of iterations allowedpublic void planFromState(State s)
MADynamicProgramming
planFromState
in class MADynamicProgramming
s
- the state from which planning is to be performed.public void runVI()
If performStateReachabilityFrom(State)
has not yet been called, then the state set will be empty and a runtime exception will be thrown.
public boolean performStateReachabilityFrom(State s)
s
- the state from which all reachable states will be indexed