MAValueIteration

java.lang.Object
- burlap.behavior.stochasticgames.madynamicprogramming.MADynamicProgramming
- - burlap.behavior.stochasticgames.madynamicprogramming.dpplanners.MAValueIteration

All Implemented Interfaces:

MultiAgentQSourceProvider
```
public class MAValueIteration
extends MADynamicProgramming
```
A class for performing multi-agent value iteration. This class extends the MADynamicProgramming class to provide value iteration-like value function estimation. When an input state is provided via the planFromState(State) method, if the state has already been seen and planned for, then nothing happens. If the state has never been seen before, then a state reachiability analysis is first performed in which all states possibly reachable from the input state are found. Then Value iteration proceeds for all states that have been found in the past. The runVI() method can also be called directly to force value iteration to be performed on all states that have been previously found, but the state reachability must have been performed at least once before to seed the state space. State reachability can be performed manually by calling the performStateReachabilityFrom(State) method.
Value iteration will continue until either the maximum change in Q-value is less than some user provided threshold or until a max number of iterations have passed.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.stochasticgames.madynamicprogramming.MADynamicProgramming
  MADynamicProgramming.BackupBasedQSource, MADynamicProgramming.JointActionTransitions

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`debugCode` The debug code used for printing VI progress.
`protected double`	`maxDelta` The threshold that will cause VI to terminate when the max change in Q-value for is less than it
`protected int`	`maxIterations` The maximum allowable number of iterations until VI termination
`protected java.util.Set<HashableState>`	`states` The set of states that have been found

Fields inherited from class burlap.behavior.stochasticgames.madynamicprogramming.MADynamicProgramming
agentDefinitions, backupOperator, discount, domain, hashingFactory, jointModel, jointRewardFunction, planningStarted, qSources, terminalFunction, vInit

Constructor Summary

Constructors
Constructor and Description
`MAValueIteration(SGDomain domain, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction, double discount, HashableStateFactory hashingFactory, double qInit, SGBackupOperator backupOperator, double maxDelta, int maxIterations)` Initializes.
`MAValueIteration(SGDomain domain, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction, double discount, HashableStateFactory hashingFactory, ValueFunction qInit, SGBackupOperator backupOperator, double maxDelta, int maxIterations)` Initializes.
`MAValueIteration(SGDomain domain, java.util.List<SGAgentType> agentDefinitions, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction, double discount, HashableStateFactory hashingFactory, double vInit, SGBackupOperator backupOperator, double maxDelta, int maxIterations)` Initializes.
`MAValueIteration(SGDomain domain, java.util.List<SGAgentType> agentDefinitions, JointRewardFunction jointRewardFunction, TerminalFunction terminalFunction, double discount, HashableStateFactory hashingFactory, ValueFunction vInit, SGBackupOperator backupOperator, double maxDelta, int maxIterations)` Initializes.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`performStateReachabilityFrom(State s)` Finds and stores all states that are reachable from input state s.
`void`	`planFromState(State s)` Calling this method causes planning to be performed from State s.
`void`	`runVI()` Runs Value Iteration over the set of states that have been discovered.

Methods inherited from class burlap.behavior.stochasticgames.madynamicprogramming.MADynamicProgramming
backupAllValueFunctions, getQSources, hasStartedPlanning, initMAVF, setAgentDefinitions

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - states
```
protected java.util.Set<HashableState> states
```
    The set of states that have been found
  - maxDelta
```
protected double maxDelta
```
    The threshold that will cause VI to terminate when the max change in Q-value for is less than it
  - maxIterations
```
protected int maxIterations
```
    The maximum allowable number of iterations until VI termination
  - debugCode
```
protected int debugCode
```
    The debug code used for printing VI progress.
- Constructor Detail
  - MAValueIteration
```
public MAValueIteration(SGDomain domain,
                        JointRewardFunction jointRewardFunction,
                        TerminalFunction terminalFunction,
                        double discount,
                        HashableStateFactory hashingFactory,
                        double qInit,
                        SGBackupOperator backupOperator,
                        double maxDelta,
                        int maxIterations)
```
    Initializes.
    
    Parameters:
    
    domain - the domain in which to perform planing
    
    jointRewardFunction - the joint reward function
    
    terminalFunction - the terminal state function
    
    discount - the discount
    
    hashingFactory - the hashing factory to use for storing states
    
    qInit - the default Q-value to initialize all values to
    
    backupOperator - the backup operator that defines the solution concept being solved
    
    maxDelta - the threshold that causes VI to terminate when the max Q-value change is less than it
    
    maxIterations - the maximum number of iterations allowed
  - MAValueIteration
```
public MAValueIteration(SGDomain domain,
                        JointRewardFunction jointRewardFunction,
                        TerminalFunction terminalFunction,
                        double discount,
                        HashableStateFactory hashingFactory,
                        ValueFunction qInit,
                        SGBackupOperator backupOperator,
                        double maxDelta,
                        int maxIterations)
```
    Initializes.
    
    Parameters:
    
    domain - the domain in which to perform planing
    
    jointRewardFunction - the joint reward function
    
    terminalFunction - the terminal state function
    
    discount - the discount
    
    hashingFactory - the hashing factory to use for storing states
    
    qInit - the q-value initialization function to use.
    
    backupOperator - the backup operator that defines the solution concept being solved
    
    maxDelta - the threshold that causes VI to terminate when the max Q-value change is less than it
    
    maxIterations - the maximum number of iterations allowed
  - MAValueIteration
```
public MAValueIteration(SGDomain domain,
                        java.util.List<SGAgentType> agentDefinitions,
                        JointRewardFunction jointRewardFunction,
                        TerminalFunction terminalFunction,
                        double discount,
                        HashableStateFactory hashingFactory,
                        double vInit,
                        SGBackupOperator backupOperator,
                        double maxDelta,
                        int maxIterations)
```
    Initializes.
    
    Parameters:
    
    domain - the domain in which to perform planing
    
    agentDefinitions - the agents involved in the planning problem
    
    jointRewardFunction - the joint reward function
    
    terminalFunction - the terminal state function
    
    discount - the discount
    
    hashingFactory - the hashing factory to use for storing states
    
    vInit - the default value to initialize all state values to
    
    backupOperator - the backup operator that defines the solution concept being solved
    
    maxDelta - the threshold that causes VI to terminate when the max Q-value change is less than it
    
    maxIterations - the maximum number of iterations allowed
  - MAValueIteration
```
public MAValueIteration(SGDomain domain,
                        java.util.List<SGAgentType> agentDefinitions,
                        JointRewardFunction jointRewardFunction,
                        TerminalFunction terminalFunction,
                        double discount,
                        HashableStateFactory hashingFactory,
                        ValueFunction vInit,
                        SGBackupOperator backupOperator,
                        double maxDelta,
                        int maxIterations)
```
    Initializes.
    
    Parameters:
    
    domain - the domain in which to perform planing
    
    agentDefinitions - the agents involved in the planning problem
    
    jointRewardFunction - the joint reward function
    
    terminalFunction - the terminal state function
    
    discount - the discount
    
    hashingFactory - the hashing factory to use for storing states
    
    vInit - the state value initialization function to use.
    
    backupOperator - the backup operator that defines the solution concept being solved
    
    maxDelta - the threshold that causes VI to terminate when the max Q-value change is less than it
    
    maxIterations - the maximum number of iterations allowed
- Method Detail
  - planFromState
```
public void planFromState(State s)
```
    Description copied from class: MADynamicProgramming
    
    Calling this method causes planning to be performed from State s.
    
    Specified by:
    
    planFromState in class MADynamicProgramming
    
    Parameters:
    
    s - the state from which planning is to be performed.
  - runVI
```
public void runVI()
```
    Runs Value Iteration over the set of states that have been discovered. VI terminates either when the max change in Q-value is less than the threshold stored in this object's maxDelta parameter or when the number of iterations exceeds this object's maxIterations parameter.
    If performStateReachabilityFrom(State) has not yet been called, then the state set will be empty and a runtime exception will be thrown.
  - performStateReachabilityFrom
```
public boolean performStateReachabilityFrom(State s)
```
    Finds and stores all states that are reachable from input state s.
    
    Parameters:
    
    s - the state from which all reachable states will be indexed
    
    Returns:
    
    true if input s was not previously indexed resulting in new states being found; false if s was already previously indexed resulting in no change in the discovered state set.

Class MAValueIteration

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.stochasticgames.madynamicprogramming.MADynamicProgramming

Field Summary

Fields inherited from class burlap.behavior.stochasticgames.madynamicprogramming.MADynamicProgramming

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.stochasticgames.madynamicprogramming.MADynamicProgramming

Methods inherited from class java.lang.Object

Field Detail

states

maxDelta

maxIterations

debugCode

Constructor Detail

MAValueIteration

MAValueIteration

MAValueIteration

MAValueIteration

Method Detail

planFromState

runVI

performStateReachabilityFrom