BoltzmannActor

java.lang.Object
- burlap.behavior.policy.Policy
- - burlap.behavior.singleagent.learning.actorcritic.Actor
  - - burlap.behavior.singleagent.learning.actorcritic.actor.BoltzmannActor

```
public class BoltzmannActor
extends Actor
```
And Actor component whose policy is defined by a Boltzmann distribution over action preferences. This actor stores state-action preferences tabularly and therefore requires a HashableStateFactory to perform lookups.

Author:

James MacGlashan

Nested Class Summary
- Nested classes/interfaces inherited from class burlap.behavior.policy.Policy
  Policy.ActionProb, Policy.GroundedAnnotatedAction, Policy.PolicyUndefinedException

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.List<Action>`	`actions` The actions the agent can perform
`protected boolean`	`containsParameterizedActions` Indicates whether the actions that this agent can perform are parameterized
`protected Domain`	`domain` The domain in which this agent will act
`protected HashableStateFactory`	`hashingFactory` The hashing factory used to hash states and evaluate state equality
`protected LearningRate`	`learningRate` The learning rate used to update action preferences in response to critiques.
`protected java.util.Map<HashableState,burlap.behavior.singleagent.learning.actorcritic.actor.BoltzmannActor.PolicyNode>`	`preferences` A map from (hashed) states to Policy nodes; the latter of which contains the action preferences for each applicable action in the state.
`protected int`	`totalNumberOfSteps` The total number of learning steps performed by this agent.

Fields inherited from class burlap.behavior.policy.Policy
annotateOptionDecomposition, evaluateDecomposesOptions

Constructor Summary

Constructors
Constructor and Description

BoltzmannActor(Domain domain, HashableStateFactory hashingFactory, double learningRate)
Initializes the Actor

Constructors
Constructor and Description
`BoltzmannActor(Domain domain, HashableStateFactory hashingFactory, double learningRate)` Initializes the Actor

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`addNonDomainReferencedAction(Action a)` This method allows the actor to utilize actions that are not apart of the domain definition.
`AbstractGroundedAction`	`getAction(State s)` This method will return an action sampled by the policy for the given state.
`java.util.List<Policy.ActionProb>`	`getActionDistributionForState(State s)` This method will return action probability distribution defined by the policy.
`protected burlap.behavior.singleagent.learning.actorcritic.actor.BoltzmannActor.ActionPreference`	`getMatchingPreference(HashableState sh, GroundedAction ga, burlap.behavior.singleagent.learning.actorcritic.actor.BoltzmannActor.PolicyNode node)` Returns the stored `BoltzmannActor.ActionPreference` that is stored in a policy node.
`protected burlap.behavior.singleagent.learning.actorcritic.actor.BoltzmannActor.PolicyNode`	`getNode(HashableState sh)` Returns the policy node that stores the action preferences for state.
`boolean`	`isDefinedFor(State s)` Specifies whether this policy is defined for the input state.
`boolean`	`isStochastic()` Indicates whether the policy is stochastic or deterministic.
`void`	`resetData()` Used to reset any data that was created/modified during learning so that learning can be begin anew.
`void`	`setLearningRate(LearningRate lr)` Sets the learning rate function to use.
`void`	`updateFromCritqique(CritiqueResult critqiue)` Causes this object to update its behavior is response to a critique of its behavior.

Methods inherited from class burlap.behavior.policy.Policy
evaluateBehavior, evaluateBehavior, evaluateBehavior, evaluateBehavior, evaluateBehavior, evaluateMethodsShouldAnnotateOptionDecomposition, evaluateMethodsShouldDecomposeOption, followAndRecordPolicy, followAndRecordPolicy, getDeterministicPolicy, getProbOfAction, getProbOfActionGivenDistribution, getProbOfActionGivenDistribution, sampleFromActionDistribution

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - domain
```
protected Domain domain
```
    The domain in which this agent will act
  - actions
```
protected java.util.List<Action> actions
```
    The actions the agent can perform
  - hashingFactory
```
protected HashableStateFactory hashingFactory
```
    The hashing factory used to hash states and evaluate state equality
  - learningRate
```
protected LearningRate learningRate
```
    The learning rate used to update action preferences in response to critiques.
  - preferences
```
protected java.util.Map<HashableState,burlap.behavior.singleagent.learning.actorcritic.actor.BoltzmannActor.PolicyNode> preferences
```
    A map from (hashed) states to Policy nodes; the latter of which contains the action preferences for each applicable action in the state.
  - containsParameterizedActions
```
protected boolean containsParameterizedActions
```
    Indicates whether the actions that this agent can perform are parameterized
  - totalNumberOfSteps
```
protected int totalNumberOfSteps
```
    The total number of learning steps performed by this agent.
- Constructor Detail
  - BoltzmannActor
```
public BoltzmannActor(Domain domain,
              HashableStateFactory hashingFactory,
              double learningRate)
```
    Initializes the Actor
    
    Parameters:
    domain - the domain in which the agent will act
    hashingFactory - the state hashing factory to use for state hashing and equality checks
    learningRate - the learning rate that affects how quickly the agent adjusts its action preferences.
- Method Detail
  - setLearningRate
```
public void setLearningRate(LearningRate lr)
```
    Sets the learning rate function to use.
    
    Parameters:
    lr - the learning rate function to use.
  - updateFromCritqique
```
public void updateFromCritqique(CritiqueResult critqiue)
```
    Description copied from class: Actor
    
    Causes this object to update its behavior is response to a critique of its behavior.
    
    Specified by:
    
    updateFromCritqique in class Actor
    
    Parameters:
    critqiue - the critique of the agents behavior represented by a CritiqueResult object
  - addNonDomainReferencedAction
```
public void addNonDomainReferencedAction(Action a)
```
    Description copied from class: Actor
    
    This method allows the actor to utilize actions that are not apart of the domain definition.
    
    Specified by:
    
    addNonDomainReferencedAction in class Actor
    
    Parameters:
    a - an action not apart of the of the domain definition that this actor should be able to use.
  - getAction
```
public AbstractGroundedAction getAction(State s)
```
    Description copied from class: Policy
    
    This method will return an action sampled by the policy for the given state. If the defined policy is stochastic, then multiple calls to this method for the same state may return different actions. The sampling should be with respect to defined action distribution that is returned by getActionDistributionForState
    
    Specified by:
    
    getAction in class Policy
    
    Parameters:
    s - the state for which an action should be returned
    
    Returns:
    a sample action from the action distribution; null if the policy is undefined for s
  - getActionDistributionForState
```
public java.util.List<Policy.ActionProb> getActionDistributionForState(State s)
```
    Description copied from class: Policy
    
    This method will return action probability distribution defined by the policy. The action distribution is represented by a list of ActionProb objects, each which specifies a grounded action and a probability of that grounded action being taken. The returned list does not have to include actions with probability 0.
    
    Specified by:
    
    getActionDistributionForState in class Policy
    
    Parameters:
    s - the state for which an action distribution should be returned
    
    Returns:
    a list of possible actions taken by the policy and their probability.
  - getNode
```
protected burlap.behavior.singleagent.learning.actorcritic.actor.BoltzmannActor.PolicyNode getNode(HashableState sh)
```
    Returns the policy node that stores the action preferences for state.
    
    Parameters:
    sh - The (hashed) state of the BoltzmannActor.PolicyNode to return
    
    Returns:
    the BoltzmannActor.PolicyNode object for the given input state.
  - isStochastic
```
public boolean isStochastic()
```
    Description copied from class: Policy
    
    Indicates whether the policy is stochastic or deterministic.
    
    Specified by:
    
    isStochastic in class Policy
    
    Returns:
    true when the policy is stochastic; false when it is deterministic.
  - isDefinedFor
```
public boolean isDefinedFor(State s)
```
    Description copied from class: Policy
    
    Specifies whether this policy is defined for the input state.
    
    Specified by:
    
    isDefinedFor in class Policy
    
    Parameters:
    s - the input state to test for whether this policy is defined
    
    Returns:
    true if this policy is defined for State s, false otherwise.
  - resetData
```
public void resetData()
```
    Description copied from class: Actor
    
    Used to reset any data that was created/modified during learning so that learning can be begin anew.
    
    Specified by:
    
    resetData in class Actor
  - getMatchingPreference
```
protected burlap.behavior.singleagent.learning.actorcritic.actor.BoltzmannActor.ActionPreference getMatchingPreference(HashableState sh,
                                                                                                           GroundedAction ga,
                                                                                                           burlap.behavior.singleagent.learning.actorcritic.actor.BoltzmannActor.PolicyNode node)
```
    Returns the stored BoltzmannActor.ActionPreference that is stored in a policy node. If actions are parameterized and the domain is not name dependent, then a matching between the input state and stored state is first found and used to match the input action parameters to the stored action parameters.
    
    Parameters:
    sh - the input state on which the input action was applied
    ga - the input action for which the BoltzmannActor.ActionPreference object should be returned.
    node - the BoltzmannActor.PolicyNode object that contains the Action preference.
    
    Returns:
    the BoltzmannActor.ActionPreference object for the given action stored in the given BoltzmannActor.PolicyNode; null if it does not exist.

Class BoltzmannActor

Nested Class Summary

Nested classes/interfaces inherited from class burlap.behavior.policy.Policy

Field Summary

Fields inherited from class burlap.behavior.policy.Policy

Constructor Summary

Method Summary

Methods inherited from class burlap.behavior.policy.Policy

Methods inherited from class java.lang.Object

Field Detail

domain

actions

hashingFactory

learningRate

preferences

containsParameterizedActions

totalNumberOfSteps

Constructor Detail

BoltzmannActor

Method Detail

setLearningRate

updateFromCritqique

addNonDomainReferencedAction

getAction

getActionDistributionForState

getNode

isStochastic

isDefinedFor

resetData

getMatchingPreference