June 17, 2016

BURLAP 3 is Here!

If you've been following git or the google groups page, you may be aware of or were even following the development of BURALP 3, which is now finally ready for a more full release, replacing the master branch of the github repo! BURLAP 3 changes can be summarized as follows.

  1. a more simple and flexible State interface;
  2. a more simple and flexible action interface and model definition;
  3. stochastic games agent indexing by ints;
  4. shorter method names and/or classes in some cases
  5. abstract classes converted to interfaces;
  6. abstract methods/interfaces restructured to be more functional;
  7. slight package reorganizations;
  8. and a license change to Apache 2.0.

A good way to acquaint yourself with the changes in BURLAP 3 is to review the tutorials which have all been updated for BURALP 3 or to scan the code in the examples repository. In particular, you may want to review the updated Building a Domain tutorial which will cover the most significant changes to how BURLAP works.

February 26, 2016

Changes to BURLAP Master

BURLAP master and the pre-compiled jar files have been updated with some changes, one of which may require some very minor changes to your code. The primary changes of note consist of (1) a new set of interfaces for general function approximation; and (2) a new experiment "shell" for interactively controlling experiments at runtime. If you need to get the prior version of BURLAP, you can get it from the v2 branch on github.

Function approximation
BURLAP has always supported function approximation for various algorithms. However, for standard value function approximation, there was often a notational an implementation slant toward linear function approximation, even though it could in principle support non-linear function approximation. This slant also made it less clear how to implement your own non-linear function approximators.

At a high-level there is an interface named ParametricFunction that is used to provide general interfaces for getting and setting parameter values of the function. Common interface extensions include interfaces for parametric state and state-action value functions (ParametricStateFunction and ParametricStateActionFunction). Furthermore, inverse reinforcement learning parameterized reward functions also extend ParametricFunction to unify how you work these two kinds of objects.

There, there are also interface extensions for parametric functions that are differentiable (DifferentiableStateValue, DifferentiableStateActionValue, DifferentiableRF, etc.). These interfaces include a method that allows the gradient, with respect to the function parameters, to be returned. Gradients are provided via the interface FunctionGradient. This interface allows the retrieval and storing of partial derivatives for each parameter and also allows a list of the non-zero partial derivatives to be returned. Currently there is one concrete implementation of FunctionGradient: SparseGradient which only stores the gradient for parameters with non-zero partial derivatives with a Java hash map. This sparse data structure is especially for convenient for function approximation like tile coding. However, in principle you can implement your own FunctionGradient data structures for your own differentiable ParametricFunctions if there is a data structure that is more efficient for your needs.

With these more simple interfaces, hopefully it is more clear how to implement your own custom forms of function approximation. All the previous function approximation methods in BURLAP have been converted to this new interface. Because of that conversion, in previous client code you may have developed, there shouldn't be much changes other the names of the data types. That is, Whereas before you may have had a ValueFunctionApproximtion object, it is now probably DifferentiableStateActionValue. Correspondingly, the tutorial code has been updated to reflect these changes.

Shell

The next change is more of an additional tool to BURLAP: a framework for setting up an interactive shell that you can use to control your experiments at runtime. Note that this shell does not provide arbitrary java code execution. Instead it's more similar to a light weight "bash" for BURLAP.

At a high-level there is a class called BurlapShell that takes as input an input stream and output stream on which the shell will operate. The shell is then started with the method start (which runs the shell in a separate thread), which begins an input output sequence. The shell has some basic universal commands that allow you to do things like set aliases for commands. There are two primary subclasses for this shell, EnvironmentShell and SGWorldShell. The former is a shell that contains a reference to an Environment instance so that you can perform various controls with an environment and comes with a number of standard commands for interacting with an environment (such as executing actions, recording episodes, visualizing episodes, etc.). Analogously, the SGWorldShell contains a reference to a stochastic games World and has various similar commands for interacting with it.

Although the EnvironmentShell and SGWorld Shell come with many convenient commands you can use for these cases, any BurlapShell can be extended by giving it your own implementations of the interface ShellCommand. Being able to extend the commands allows you to create your own experiment specific controllers. Most of the existing commands make use of the JOptSimple library so that they handle standard formats for command line arguments, and you may want to make use of it as well for your own custom commands.

Its is also worth noting that the TerminalExplorer class is now just a wrapper for an EnvironmentShell that is initialized on the the standard input and output streams. Similarly for SGTerminalExplorer. The console that is accessible from a VisualExplorer or SGVisualExplorer now also makes use of the corresponding EnvironmentShell or SGWorldShell using a output and input stream from GUI elements. This inclusion allows you to have a lot of control over an experiment while it's being visualized.

Currently the shell framework does not support conditionals or looping mechanisms, but we may add these capabilities in a future version.

September 19, 2015

BURLAP version 2 is live!

In the last update, we mentioned that BURLAP would be getting some more significant changes that turned the State class into an interface so that it opened the door for custom implementations for domains that needed more specific memory management. Since then we also began to implement a number of other changes that taken together are sufficient enough to be a new version since they may require some code changes by users. These changes make BURLAP in many cases easier to use, more flexible, and in some cases faster.

BURLAP version 1 is still available for download, both the pre-compiled JAR and the source code in the github (which is on branch v1), but from now on, master will point to version 2. All tutorials have also been updated to reflect the changes in BURLAP 2 (and also received some other tuning), but the version 1 tutorials are also still available.

In the remainder of this update we will review some of the main highlights of the changes in BURLAP 2.

Most Significant Changes

Less Significant Changes

If you are migrating your code, many of the changes may simply require using different imports and changing the name of some elements. However, it may be worthwhile to rescan the updated tutorials to see how things have changed. Additionally, all tutorial code on the website has been included in the main distribution under the package burlap.tutorials.

May 29, 2015

The latest version of BURLAP has had various changes. Most changes will be transparent or are feature additions. For example, domains for BlockDude and Frostbite have been added, and terminal explorers now accept console commands for directly modifying states, similar to what you will find in the VisualExplorer console. However, there is one change that may require some small changes to your code. Specifically, all domain's DomainGenerators can have their physics/model parameters modified to generate a new domain, without affecting the behavior of previously generated domains from the same DomainGenerator instance. For many domains, like MountainCar, InvertedPendulum, LunarLander, etc., this change was implemented by storing all physics parameters in a data member called physParams. physParams has public data members for each physics parameter that was normally part of its corresponding DomainGenerator, so if you have code that changes a physics parameter, you will now need to reference it from the physParams data member. Note that physParams gets fully copied whenever generateDomain is called so that future changes will not affect previously generated domains.

Along with embedding physics/model parameters inside a single object that is copied, the MountainCar ClassicMCTF class is now a static class. To see how these changes affect code, see the Solving a Continuous Domain Tutorial which has had its code updated to reflect the changes. Also be sure to examine the Java doc for each DomainGenerator you are using.

Finally, we are planning a fairly significant change to BURLAP's state definitions. Currently, State is a mutable class for defining OO-MDP states with a list of ObjectInstance objects (that are also hash-indexed by their name). Although this works fine for many domains, we have found that more complex domains we are investigating would benefit from different state memory management and indexing methods. Therefore, we are planning on changing State to become an interface with the current State being a standard implementation that is available. This will enable users freedom to optimize their state definitions for the needs of their domain, providing increased CPU performance and reduced memory usage. If you would like to track the progress of this work, see the state_interface branch on git. Once that branch is fully developed, we will branch the current version of master to something else so it's always there and then pull state_interface into master. If you have comments about this new direction, please share them on the BURLAP google group.