BURLAP stands for Brown-UMBC Reinforcement Learning And Planning and is a Java library for the use and development of single or multi-agent planning and learning algorithms and domains to accompany them.
You can either field the question to the community in our Google group or you can send an email to the creator, James MacGlashan, at jmacglashan at cs dot brown dot edu.
Java provides a nice balance between high portability and efficiency, while also being fairly well known with a wide range of available libraries. Additionally, many modern languages, such as Scala and Groovy, compile to the JVM, which makes BURLAP instantly compatible with them.
This topic is more fully described in the Building an OO-MDP Domain tutorial. In short, an OO-MDP defines states as a set of objects, each with their own attribute values, and adds a set of propositional functions that operate on the objects in a state, thereby providing additional high-level information. This representation is very powerful and allows for a range of different kinds of problems to be easily defined.
Prior to BURLAP 3, all domains were represented as OO-MDPs. However, with BURLAP 3, domains use arbitrary representations that you define, with OO-MDPs having optional interfaces and tools that you can use if you wish to use that representation.
First, be sure read the Java documentation for these classes which provides a good deal of information. In brief, a PropositionalFunction defines the function and function signature for a propositional function: a function that operates on state objects in an OO-MDP. However, when you evaluate a propositional function, you must specify on which object argument to evaluate it. A GroundedProp is a reference to a PropositionalFunction and a set of parameters/arguments on which to evaluate it.
Tabular learning and planning algorithms need a way to quickly look up values or stored actions (or otherwise) for different states, which makes Java HashMaps and HashSets especially appealing. Of course, that requires that we be able to compute hash codes for states and perform equality evaluations between them. We might imagine letting the State object simply implement the standard equals and hashCode methods to handle that. However, it is often the case that different experiments call for different ways of hashing or checking equality (e.g., when providing state abstraction or variable discretization). Therefore, hashing and equality is provided through a HashableStateFactory that can be customized by the client. When you aren't doing anything fancy, you can probably just use the SimpleHashableStateFactory, or if you do want to use a State's equals and hashCode method, you can use ReflectiveHashableStateFactory
Terminal states are defined either by an Environment implementation, or for your own simulated domains, the SampleModel. However, often times models in BURLAP are implemented with a FactoredModel that separates the state transition model, reward function, and terminal states, with the latter being defined with an object that implements the TerminalFunction interface.
Following the Sutton and Barto paradigm, planning and learning algorithms in BURLAP mathematically interpret terminal states as states from which the agent deterministically transitions back to the same state with a reward of zero. This is equivalent to all action ceasing once a terminal state is reached. Because terminal states are indicated with a flag from the model or Environment, this property of the transition dynamics does not need to be specifically coded into the state transition dynamics, which makes it easy to define various ad hoc tasks for the same domain.
Given this interpretation, the value of terminal states should be fixed at zero (and is in the existing BURLAP planning and learning algorithms). If you are are student reading from the Russell Norvig AI textbook and want to implement their small grid world, keep in mind that that their small grid world treats the terminal state has having a non-zero value.
In many MDPs or classes of domains, the state space is infinite, in which case it cannot be defined. Other times, the state space is only finite when considering the states that are reachable from some seed initial state. And even if the state space is always finite and well defined, it's often an undue burden to require the the designer to specify the entire thing manually.
However, if you'd like to gather all the possible states from a domain instance that you've created, a good way to go about this is to let BURLAP do the heavy lifting for you by using the StateReachability class, which takes as input a source state and domain, and returns all states that are reachable from the state. If your domain has states that are disconnected from each other, then you may need to run it from multiple seed states in each disconnected component. You may also find the StateEnumerator class helpful, which will let you iteratively add additional states to its set and assign a unique identifier to each.
If you are working in a continuous state space, then the number of states is probably uncountably infinite, which obviously makes state enumeration impossible; however, if you want a representation of the space, you may want to consider the FlatStateGridder class, which will sample states in the space along a regular grid that you can define.
And if you're using a planning algorithm that needs the full state space, such as value iteration, the implementations in BURLAP will automatically handle enumeration of the states from the source state that you give it.
Yes! This is precisely the role of an ActionType, which lets you define which set of actions are applicable in a given state.
Possibly. The first thing to do is to make sure that Java's JVM is being given a large enough heap. If it's not, it's possible that it's artificially using less memory than you have. To set the amount of memory Java's JVM can use, you want to add a -Xmx argument when you call java; e.g., java -Xmx2048M [class path here]. The -Xmx argument lets you specify the heap size. In the example, it would provide it with 2048MB (2GB). If you're running in Eclipse or IntelliJ, then in the run configuration you should find a text field that lets you set your VM arguments; that's where you would put that flag.
If you're still running out of memory then there are a couple of things to consider. First, try and figure out if there is a more compact way to define your states. For example, do you really need as many data members you've defined, or can they be compressed into a smaller set? Also, you may also want to consider making your states perform shallow copies so that the pool of states reuses the same data. More information about shallow copying can be found in the Building a Domain tutorial.
Sure! There are two ways you can do this. You can (1) implement your own Environment class to manage the connection, control, and state perception of your robot; or (2) use our BURLAP library extension (burlap_rosbridge) that has a standard Environment implementation for communicating with robots controlled with ROS over Rosbridge.
When using burlap_rosbridge it is expected that you will implement the method to turn the ROS message into a State object.Burlap_rosbridge also allows you to handle the execution of action in a variety of ways, including sending direct messages to the ROS controller (e.g., Twist messages) or by sending string messages to a ROS action controller. See burlap_rosbridge's github page for more information and examples of how to use it.
Burlap_rosbridge is also on Maven Central, so to link to it, simply add the following dependency alongside the BURLAP dependency in your project's pom.xml file:
Yes, we have been building a mod for Minecraft that allows you to use BURLAP's planning and learning algorithms to control the player. For more information on that project, see its github page and its accompanying wiki page.
For the moment, you'll have to cite this web page and can use James MacGlashan as the author. An academic publication is forthcoming, which you can use once it's available.