public class RLGlueEnvironment
extends java.lang.Object
implements org.rlcommunity.rlglue.codec.EnvironmentInterface
DenseStateFeatures
to flatten the BURLAP states; it should always return arrays of the same length for all visitable states.
Additionally, RLGlue does not support action preconditions, so each action must be available everywhere.
Note that RLGlue does not support observations of terminal states; it only gives the final reward upon entering a terminal state.
Therefore, this class will not terminate in a terminal state indicated by the provided TerminalFunction
.
Instead, it will allow one more transition from the terminal state, which will transition back to itself with reward zero, which
is mathematically equivalent to transitioning to terminal state and observing it.
Modifier and Type | Field and Description |
---|---|
protected java.util.Map<java.lang.Integer,Action> |
actionMap
A mapping from action index identifiers (that RLGlue will use) to BURLAP actions and their parametrization specified as the index of objects in a state.
|
protected State |
curState
The current state of the environment
|
protected double |
discount
The discount factor of the task
|
protected SADomain |
domain
The BURLAP domain
|
protected boolean |
isEpisodic
Whether this task is episodic (false will indicate that it is continuing)
|
protected org.rlcommunity.rlglue.codec.taskspec.ranges.DoubleRange |
rewardRange
The reward function value range
|
protected DenseStateFeatures |
stateFlattener
Used to flatten states into a vector representation
|
protected StateGenerator |
stateGenerator
The state generator for generating states for each episode
|
protected int |
terminalVisits
Indicates the number of times a terminal state has been visited by the agent within the same episode.
|
protected boolean |
usedConstructorState
Whether the state generated from the state generator to gather auxiliary information (like the number of objects of each class) has yet be used as a starting state for
an RLGlue episode.
|
protected org.rlcommunity.rlglue.codec.taskspec.ranges.DoubleRange[] |
valueRanges
The value ranges for the vector representation of the state
|
Constructor and Description |
---|
RLGlueEnvironment(SADomain domain,
StateGenerator stateGenerator,
DenseStateFeatures stateFlattener,
org.rlcommunity.rlglue.codec.taskspec.ranges.DoubleRange[] valueRanges,
org.rlcommunity.rlglue.codec.taskspec.ranges.DoubleRange rewardRange,
boolean isEpisodic,
double discount)
Constructs with all the BURLAP information necessary for generating an RLGlue Environment.
|
Modifier and Type | Method and Description |
---|---|
protected org.rlcommunity.rlglue.codec.types.Observation |
convertIntoObservation(State s)
Takes a OO-MDP state and converts it into an RLGlue observation
|
void |
env_cleanup() |
java.lang.String |
env_init() |
java.lang.String |
env_message(java.lang.String arg0) |
org.rlcommunity.rlglue.codec.types.Observation |
env_start() |
org.rlcommunity.rlglue.codec.types.Reward_observation_terminal |
env_step(org.rlcommunity.rlglue.codec.types.Action arg0) |
void |
load()
Loads this environment into RLGlue
|
void |
load(java.lang.String hostAddress,
java.lang.String port)
Loads this environment into RLGLue with the specified host address and port
|
protected SADomain domain
protected StateGenerator stateGenerator
protected DenseStateFeatures stateFlattener
protected org.rlcommunity.rlglue.codec.taskspec.ranges.DoubleRange[] valueRanges
protected int terminalVisits
protected org.rlcommunity.rlglue.codec.taskspec.ranges.DoubleRange rewardRange
protected boolean isEpisodic
protected double discount
protected State curState
protected java.util.Map<java.lang.Integer,Action> actionMap
protected boolean usedConstructorState
public RLGlueEnvironment(SADomain domain, StateGenerator stateGenerator, DenseStateFeatures stateFlattener, org.rlcommunity.rlglue.codec.taskspec.ranges.DoubleRange[] valueRanges, org.rlcommunity.rlglue.codec.taskspec.ranges.DoubleRange rewardRange, boolean isEpisodic, double discount)
domain
- the BURLAP domainstateGenerator
- a generated for generating states at the start of each episode.stateFlattener
- used to flatten states into a numeric representationvalueRanges
- the value ranges of the flattened vector staterewardRange
- the reward function value rangeisEpisodic
- whether the task is episodic or continuingdiscount
- the discount factor to use for the taskpublic void load()
public void load(java.lang.String hostAddress, java.lang.String port)
hostAddress
- the RLGlue host addressport
- the RLGlue portpublic void env_cleanup()
env_cleanup
in interface org.rlcommunity.rlglue.codec.EnvironmentInterface
public java.lang.String env_init()
env_init
in interface org.rlcommunity.rlglue.codec.EnvironmentInterface
public java.lang.String env_message(java.lang.String arg0)
env_message
in interface org.rlcommunity.rlglue.codec.EnvironmentInterface
public org.rlcommunity.rlglue.codec.types.Observation env_start()
env_start
in interface org.rlcommunity.rlglue.codec.EnvironmentInterface
public org.rlcommunity.rlglue.codec.types.Reward_observation_terminal env_step(org.rlcommunity.rlglue.codec.types.Action arg0)
env_step
in interface org.rlcommunity.rlglue.codec.EnvironmentInterface
protected org.rlcommunity.rlglue.codec.types.Observation convertIntoObservation(State s)
s
- the OO-MDP state