If you've been following git or the google groups page, you may be aware of or were even following the development of BURALP 3, which is now finally ready for a more full release, replacing the master branch of the github repo! BURLAP 3 changes can be summarized as follows.
A good way to acquaint yourself with the changes in BURLAP 3 is to review the tutorials which have all been updated for BURALP 3 or to scan the code in the examples repository. In particular, you may want to review the updated Building a Domain tutorial which will cover the most significant changes to how BURLAP works.
Previously in BURALP 2 and 1, all states in BURLAP were OO-MDP states, and although State was an interface in BURLAP 2 that allowed you to ignore OO-MDP related methods, there were a very large number of methods for which you'd have to throw unsupported operation exceptions if you didn't want your state to be an OO-MDP state, making what was expected rather unclear. Furthermore, all variable values were pushed through nested class wrappers and you had to specify these types and various information about them very explicitly in the domain construction.
In BURLAP 3, OO-MDPs are no longer a requirement. State is now a simple interface that you implement for your problem, making the state definition equivalent to defining a Java class. The State interface only requires you to implement three methods: variableKeys(), get(Object key), and copy(). variableKeys requires you to return a list of Java type Object that are keys that can be used to refer to variables in your state. Because the list is of type Object, your keys can be any kind of structure that is most convenient for you; be it an int, a String, or something else entirely. The get method takes in a variable key and returns the variable value for that key. The value is also of type Object, meaning your state variables can be any data type you want! Because the key is an Object, you can also handle support for multiple key data representations (e.g., maybe strings and ints). Finally, the copy method simply creates a copy of your State and returns it.
If you want to provide more standard functionality with your state, you can also implement MutableState, which adds a set(Object key, Object value) method to State. Doing so will let things like standard BURALP shell commands change the value of states, or create an set of states centered on grid intersection points. And if you do want to use OO-MDP representations, there are special interfaces for that too, but it's optional. If you want to use OO-MDPs, you should look at the new tutorial specifically about that: Building and OO-MDP Domain.
One of the auxiliary advantages of this new state interface compared to the old methods is that it makes serialization very trivial. As long as your State implementations are JavaBeans (has a default constructor and getter and setters for all relevant non-public data fields), you can use the Yaml package to save and load data that includes states. For example:
String strRep = new Yaml().dump(stateObject);
Where stateObject is the State instance you want to serialize. (Naturally, this also works for serializing more complex data structures that contain State objects).
And to parse a yaml string of a State, you'd do:
State s = (State)new Yaml().load(stateStr);
In BURLAP 2 and 1, there were a number of action interfaces and abstract classes and it was often unclear to new users what piece was responsible for what. In BURLAP 3, the Action interface defines a decision (and is most similar to a GroundedAction in BURALP 2 and 1); it does not request code for transition dynamics or preconditions. Implementing an Action requires implementing only two methods: actionName() and copy(). Anything else you need to distinguish an action is entirely up to you.
Although Action is quite simple now, that doesn't mean BURLAP no longer has a means to handle preconditions or parameterized actions. Along with Action is the interface ActionType. ActionType serves as a generator for Actions and requires three methods: typeName(), associatedAction(String), and allApplicableActions(State). The associatedAction method takes a string representation of the action and generates the corresponding Action object. allApplicableActions is the main method of interest, which takes as input a State and returns all applicable actions of that ActionType. This method would handle determining which possible parameterizations there were and checking any preconditions you want to define.
Although you can implement your own Action and ActionType classes, when you have an unparameterized action set that you can execute anywhere (as is most common in MDPs), you can define the action set using the UniversalActionType, where you create a new UniversalActionType with a different name for each of your actions. (Alternatively, you can manually give a UniversalActionType your own Action instance that it will always generate). For example, for a grid world, defining the action set would look something like this in code:
domain.addActionTypes( new UniversalActionType("north"), new UniversalActionType("south"), new UniversalActionType("east"), new UniversalActionType("west"));
In the previous version of BURLAP an MDP's state transition dynamics model was stored in one of the Actions. Since the Action interfaces are much simpler now, the model is now its own interface, with SampleModel, and FullModel (the former if you can only generate samples from the transition dynamics, and the latter if you can enumerate the probability distribution). The nice thing about this factoring is that for real RL problems you don't have to provide a model at all, and can just specify the action set. Additionally, it makes working with model-based RL algorithms easier, and enables you to easily change the model you're using with an algorithm, independent of even those provided with an existing simulated domain.
Finally, this change to a more simple interface also provides trivial serialization in the same way states do. Since Actions are simple objects, you can serialize data that includes them with standard Yaml serialization methods.
Stochastic games agent indexing by int
Previously, agents in a stochastic game were identified by a name. This formulation led to more book keeping because you may have needed to ensure that your state representation was compatible with the names of the agents selecting actions. It also required duplication of existing data structures. For example, there needed to be special agent-wise action interfaces that kept track of their name and state generators that were dependent on the names of the agents being used. In BURLAP 3, agents are primarily identified by their int index in the set of agents in the game (with an agent name only providing descriptive information). That means that joint actions now consist of a list of regular Action objects, and you select an agent's action by the index of the agent in that list. Similarly, a joint reward return is a double array with each entry being the reward for each agent in the game.
License change to Apache 2.0
Previously, BURALP was licensed under LGPL 3, but we have changed licenses to the more permissive Apache 2.0 license. In short, this change allows you to create modified versions of BURLAP and license your changes under a different license.
BURLAP master and the pre-compiled jar files have been updated with some changes, one of which may require some very minor changes to your code. The primary changes of note consist of (1) a new set of interfaces for general function approximation; and (2) a new experiment "shell" for interactively controlling experiments at runtime. If you need to get the prior version of BURLAP, you can get it from the v2 branch on github.
BURLAP has always supported function approximation for various algorithms. However, for standard value function approximation, there was often a notational an implementation slant toward linear function approximation, even though it could in principle support non-linear function approximation. This slant also made it less clear how to implement your own non-linear function approximators.
At a high-level there is an interface named ParametricFunction that is used to provide general interfaces for getting and setting parameter values of the function. Common interface extensions include interfaces for parametric state and state-action value functions (ParametricStateFunction and ParametricStateActionFunction). Furthermore, inverse reinforcement learning parameterized reward functions also extend ParametricFunction to unify how you work these two kinds of objects.
There, there are also interface extensions for parametric functions that are differentiable (DifferentiableStateValue, DifferentiableStateActionValue, DifferentiableRF, etc.). These interfaces include a method that allows the gradient, with respect to the function parameters, to be returned. Gradients are provided via the interface FunctionGradient. This interface allows the retrieval and storing of partial derivatives for each parameter and also allows a list of the non-zero partial derivatives to be returned. Currently there is one concrete implementation of FunctionGradient: SparseGradient which only stores the gradient for parameters with non-zero partial derivatives with a Java hash map. This sparse data structure is especially for convenient for function approximation like tile coding. However, in principle you can implement your own FunctionGradient data structures for your own differentiable ParametricFunctions if there is a data structure that is more efficient for your needs.
With these more simple interfaces, hopefully it is more clear how to implement your own custom forms of function approximation. All the previous function approximation methods in BURLAP have been converted to this new interface. Because of that conversion, in previous client code you may have developed, there shouldn't be much changes other the names of the data types. That is, Whereas before you may have had a ValueFunctionApproximtion object, it is now probably DifferentiableStateActionValue. Correspondingly, the tutorial code has been updated to reflect these changes.
The next change is more of an additional tool to BURLAP: a framework for setting up an interactive shell that you can use to control your experiments at runtime. Note that this shell does not provide arbitrary java code execution. Instead it's more similar to a light weight "bash" for BURLAP.
At a high-level there is a class called BurlapShell that takes as input an input stream and output stream on which the shell will operate. The shell is then started with the method start (which runs the shell in a separate thread), which begins an input output sequence. The shell has some basic universal commands that allow you to do things like set aliases for commands. There are two primary subclasses for this shell, EnvironmentShell and SGWorldShell. The former is a shell that contains a reference to an Environment instance so that you can perform various controls with an environment and comes with a number of standard commands for interacting with an environment (such as executing actions, recording episodes, visualizing episodes, etc.). Analogously, the SGWorldShell contains a reference to a stochastic games World and has various similar commands for interacting with it.
Although the EnvironmentShell and SGWorld Shell come with many convenient commands you can use for these cases, any BurlapShell can be extended by giving it your own implementations of the interface ShellCommand. Being able to extend the commands allows you to create your own experiment specific controllers. Most of the existing commands make use of the JOptSimple library so that they handle standard formats for command line arguments, and you may want to make use of it as well for your own custom commands.
Its is also worth noting that the TerminalExplorer class is now just a wrapper for an EnvironmentShell that is initialized on the the standard input and output streams. Similarly for SGTerminalExplorer. The console that is accessible from a VisualExplorer or SGVisualExplorer now also makes use of the corresponding EnvironmentShell or SGWorldShell using a output and input stream from GUI elements. This inclusion allows you to have a lot of control over an experiment while it's being visualized.
Currently the shell framework does not support conditionals or looping mechanisms, but we may add these capabilities in a future version.
In the last update, we mentioned that BURLAP would be getting some more significant changes that turned the State class into an interface so that it opened the door for custom implementations for domains that needed more specific memory management. Since then we also began to implement a number of other changes that taken together are sufficient enough to be a new version since they may require some code changes by users. These changes make BURLAP in many cases easier to use, more flexible, and in some cases faster.
BURLAP version 1 is still available for download, both the pre-compiled JAR and the source code in the github (which is on branch v1), but from now on, master will point to version 2. All tutorials have also been updated to reflect the changes in BURLAP 2 (and also received some other tuning), but the version 1 tutorials are also still available.
In the remainder of this update we will review some of the main highlights of the changes in BURLAP 2.
As discussed in the previous update, the State class has now become an interface. Most of the domains in BURLAP will make use of the MutableState implementation, which is the same kind of implementation BURLAP version 1 used. However, by making State an interface, it opens the door for domain-specific memory optimization in which you can implement your own State class specific for your domain and even provide methods that are useful for quickly retrieving information.
Although an Environment class was included in the previous version of BURLAP, it was auxiliary for very specific purposes and did not strongly integrate with the rest of the BURLAP tools. Environment is now a central interface in BURLAP and used by many classes. The Environment interface provides methods for getting observations and rewards from some environment and receiving actions from an agent. There is also a standard SimulatedEnvironment class for using BURLAP domains to simulate the environment.
The Environment interface is now integrated into other BURLAP tools in a variety of ways. First, learning algorithms that implement the LearningAgent interface now all learn by interacting with the Environment. This change removes the burden of current state book keep from the learning algorithm; moreover, it provides state safety in that the learning algorithms cannot accidentally change what the state or outcome of the Environment is and are forced to take it as it comes. This is particularly useful for model-based RL algorithms in which a separate modeled domain is learned in which planning is performed. When the agent executes an action selected from the learned model policy in an environment, there is no confusion over whether it's executed the model of the action or actually executing the action in the world, because it will be executing the action through the Environment interface when it need to apply it in the world.
Second, Policy objects can now be trivially rolled out in an Environment. This paradigm is useful for using BURLAP with robotics or other external systems, since you can build a model in BURLAP, produce a policy with that model, and then roll that policy out in the external world/system by rolling it out in an Environment that interfaces with the external world/system. For example, the burlap_rosbrige extension provides an Environment implementation that interfaces with ROS so that you can have a BURLAP policy control a ROS robot by rolling out the policy in the Environment.
Third, the VisualExplorer and TerminalExplorer now operate on Environment implementations, which means you can use them to manually control external systems that are interfaced with an Environment.
The Action, GroundedAction, and the stochastic games equivalents have been a refactored. In the previous version of BURLAP, many of the critical methods of the Action class took as a method argument a String array that was used for specifying parameters. By default, these parameters were considered to be STRIPs-like OO-MDP object references, and although other kinds of parameters could be used, it was a hacky implementation.
In the new version of BURLAP, Action methods now take a GroundedAction as an argument, instead of a String array. The motivation is that if you have a special of kind of action parameterization you would like to use, you can subclass GroundedAction to contain whatever kind of data structures you want to specify any kind of parameters that you want. (For example, it would now be trivial to implement continuous valued parameterized actions.) Along with that, two of the methods that you must implement for the Action class are used to generate your instances of GroundedAction. This way, planning and learning algorithms never need to know about your parameterization types; they simply ask your Action definition to generate the permissible parameterizations and hand them back to your appropriate methods when they want to execute them or query the transition dynamics.
Along with these changes, all Action methods that are used to define an Action are all now abstract so it's entirely clear what would be expected of you to define an Action. However, if you simply want to define parameter-less actions without any preconditions, you can always just subclass the SimpleAction class which will implement those details for you.
Similar changes were made for the stochastic games classes.
POMDP support added
BURLAP now has support for POMDP problem definitions. Currently the space of implemented solvers is limited; there is a Belief MDP conversion tool so that you use standard MDP algorithms to solve the POMDP; an exact finite horizon solver, by using the Belief MDP conversion with Sparse Sampling; and QMDP. However, other algorithms are currently being implemented, including POMCP and PBVI, which will hopefully be available soon.
StateHashFactory/StateHashTuple -> HashableStateFactory/HashableState and revised
For the most part there has merely been a renaming of the StateHashFactory/StateHashTuple to HashableStateFactory/HashableState; however, the code for these classes have been significantly streamlined with better support. When in doubt, you can simply use the SimpleHashableStateFactory implementation, which has much better support and which other implementations for state abstraction or value discretization extend.
StateParser deprecated for Java bean serialization
The StateParser in version 1 was used for serializing State objects by allowing them to be turned to and from String representations. Although the code is still there, it is now in a legacy package and should not be used going forward. Instead, it is recommended that you use the SerializableState and SerializableStateFactory classes (if at all). A SerializableState is a Java Bean class representation of a state that can be trivially serialized using any number of standard Java serialization methods (as well as providing a method for turning it back into a standard State), such as JSON or YAML.
There is a standard serialization class called SimpleSerializableState that in general you can always use. In fact, now when you write an EpisodeAnalysis file to disk, it saves it in a YAML format and by default uses the SimpleSerializableState so that you never have to think about it. However, you can also always pass it a more compact SerializableState representations through the SerializableStateFactory if you're looking to keep your files compressed. Domains that previously had their own StateParser implementations now have corresponding SerializableStateFactory classes that you can use instead; though in general, you don't really need to think about state serialization anymore.
Package and class name refactoring, as well as class hierarchy refactoring
The class names and organization for a number of classes has changed. For example, OOMDPPlanner is now simply MDPSolver, which inherits from an MDPSolverInterface interface. The new Planner interface extends MDPSolverInterface, giving it the planFromState method, which now also returns a policy so that you don't have to think about what Policy to wrap around your planning algorithm. Similarly, the QComputablePlanner interface is now simply QFunction and it extends a ValueFunction interface.
Other minor reorganization changes also exist.
If you are migrating your code, many of the changes may simply require using different imports and changing the name of some elements. However, it may be worthwhile to rescan the updated tutorials to see how things have changed. Additionally, all tutorial code on the website has been included in the main distribution under the package burlap.tutorials.
The latest version of BURLAP has had various changes. Most changes will be transparent or are feature additions. For example, domains for BlockDude and Frostbite have been added, and terminal explorers now accept console commands for directly modifying states, similar to what you will find in the VisualExplorer console. However, there is one change that may require some small changes to your code. Specifically, all domain's DomainGenerators can have their physics/model parameters modified to generate a new domain, without affecting the behavior of previously generated domains from the same DomainGenerator instance. For many domains, like MountainCar, InvertedPendulum, LunarLander, etc., this change was implemented by storing all physics parameters in a data member called physParams. physParams has public data members for each physics parameter that was normally part of its corresponding DomainGenerator, so if you have code that changes a physics parameter, you will now need to reference it from the physParams data member. Note that physParams gets fully copied whenever generateDomain is called so that future changes will not affect previously generated domains.
Along with embedding physics/model parameters inside a single object that is copied, the MountainCar ClassicMCTF class is now a static class. To see how these changes affect code, see the Solving a Continuous Domain Tutorial which has had its code updated to reflect the changes. Also be sure to examine the Java doc for each DomainGenerator you are using.
Finally, we are planning a fairly significant change to BURLAP's state definitions. Currently, State is a mutable class for defining OO-MDP states with a list of ObjectInstance objects (that are also hash-indexed by their name). Although this works fine for many domains, we have found that more complex domains we are investigating would benefit from different state memory management and indexing methods. Therefore, we are planning on changing State to become an interface with the current State being a standard implementation that is available. This will enable users freedom to optimize their state definitions for the needs of their domain, providing increased CPU performance and reduced memory usage. If you would like to track the progress of this work, see the state_interface branch on git. Once that branch is fully developed, we will branch the current version of master to something else so it's always there and then pull state_interface into master. If you have comments about this new direction, please share them on the BURLAP google group.