BURLAP stands for Brown-UMBC Reinforcement Learning And Planning and is a Java library for the use and development of single or multi-agent planning and learning algorithms and domains to accompany them.
BURLAP is licensed under LGPL. In brief, that means you can use it for free and even link to it in sold software.
You can either field the question to the community in our Google group or you can send an email to the creator, James MacGlashan, at jmacglashan at cs dot brown dot edu.
Java provides a nice balance between high portability and efficiency, while also being fairly well known with a wide range of available libraries. Additionally, many modern languages, such as Scala and Groovy, compile to the JVM, which makes BURLAP instantly compatible with them.
This topic is more fully described on page 2 of the Building a Domain tutorial. In short, an OO-MDP defines states as a set of objects, each with their own attribute values, and adds a set of propositional functions that operate on the objects in a state, thereby providing additional high-level information. This representation is very powerful and allows for a range of different kinds of problems to be easily defined.
First, be sure read the Java documentation for these classes which provides a good deal of information. In brief, an Action is subclassed and instantiated to define your MDP actions; that means the action name, its preconditions, parameterizations, and transition dynamics. Because actions can be defined to be parameterized, decisions that an agent makes are not only over the space of actions definitions, but the space of possible parameterizations for each action definition. As a consequence, there needs to be a way for agents to refer to which action-parameterization they're considering. The GroundedAction serves this purpose by containing a reference to an Action, as well as parameter assignments with which the action will be applied.
When you subclass Action, the kinds of parameterizations your action supports are defined by the getAssociatedGroundedAction() and getAllApplicableGroundedActions(burlap.oomdp.core.states.State) methods, which should return a subclass of GroundedAction that contains the possible parameterizations. Because you can create your own subclasses of GroundedAction, you can define any kind of action parameterization that you'd like! From continuous valued parameters, to STRIPs-like object references.
If your action is not parameterized and has no preconditions, you should consider subclassing SimpleAction, which implements many of the Action methods and returns the parameter-free SimpleGroundedAction.
The difference between PropositionalFunction and GroundedProp is similar; PropositionalFunction defines the propositional function and GroundedProp is a reference to the PropositionalFunction along with the parameters with which to evaluate it. In this case, however, the parameters to propositional functions are always OO-MDP objects in the state.
AbstractGroundedAction is the common interface for the GroundedAction class and other and acton groundings used in other problem types (for example, GroundedSGAgentAction in stochastic games). This common interface allows tools in BURLAP be re-used across different problem types. For example, the Policy class can be used in both single agent problems and stochastic games problems, because it returns AbstractGroundedAction implementations, rather than a single problem type's grounded action.
Tabular learning and planning algorithms need a way to quickly look up values or stored actions (or otherwise) for different states, which makes Java HashMaps and HashSets especially appealing. Of course, that requires that we be able to compute hash codes for OO-MDP State objects and perform equality evaluations between them. Depending the kind of problem you're solving there may be different ways that you need to compute the hash code and perform equality evaluations. For example, perhaps you want to plan in an abstract space that ignores certain objects or attribute values. Or maybe you need to discretize real-valued attributes before comparing the states. Or maybe you don't want to use object identifier invariance.
Since different problems may require different ways of hashing and comparing states, different HashableStateFactory implementations will produce different HashableState objects that compute hash codes and perform state equality evaluations differently. However, unless you want to do something special, (like state abstraction) use SimpleHashableStateFactory. You can also always implement your own if you need special functionality not already supported in BURLAP!
For a larger discussion of this topic, see page 2 of the Building a Domain tutorial. In short, if a domain is object identifier independent, then it means that the equality of two states is independent of the order of the objects (or their name) in the states. Instead, two states will be considered equal as long as there is a bijection between their objects such that each matched object is equal to one another. Using object identifier independence results in a compression of the state space size since you don't have to treat every ordering of the same objects as different states.
When you don't! :-) For example, if you want to define a goal in which the condition for a specific object is satisfied, you need to differentiate between objects with different identifiers.
Additionally, if you are creating a relational domain, then you must have object identifier dependence, because being invariant to object identifiers in a relational domain would require solving graph isomorphism which is thought to be NP-hard. So while detecting graph isomorphism is doable, BURLAP does not implement object identifier independence in relational domains since it's not tractable.
Terminal states are defined with an object that implements the TerminalFunction interface. Following the Sutton and Barto paradigm, planning and learning algorithms in BURLAP mathematically interpret terminal states as states from which the agent deterministically transitions back to the same state with a reward of zero. This is equivalent to all action ceasing once a terminal state is reached. Because terminal states are indicated with a TerminalFunction class, this property of the transition dynamics does not need to be specifically coded into the action transition dynamics, which makes it easy to define various ad hoc tasks for the same domain.
Given this interpretation, the value of terminal states should be fixed at zero (and is in the existing BURLAP planning and learning algorithms). If you are are student reading from the Russell Norvig AI textbook and want to implement their small grid world, keep in mind that that their small grid world treats the terminal state has having a non-zero value.
In an OO-MDP, the state space is infinite because you can always just imagine another world with an additional object. Although in practice your state space may always be well defined, the domain generators in BURLAP can support much more than any single instance, which may make enumerating the state space for any domain instance non-trivial. For example, even in grid worlds you can imagine any number of different grid worlds. In other cases, it can just be hard to manually enumerate all possible state, which would make requiring it for domains a burden on the designer.
However, if you'd like to gather all the possible states from a domain instance that you've created, a good way to go about this is to let BURLAP do the heavy lifting for you by using the StateReachability class, which takes as input a source state and domain, and returns all states that are reachable from the state. If your domain has states that are disconnected from each other, then you may need to run it from multiple seed states in each disconnected component. You may also find the StateEnumerator class helpful, which will let you iteratively add additional states to its set and assign a unique identifier to each.
If you are working in a continuous state space, then the number of states is probably uncountably infinite, which obviously makes state enumeration impossible; however, if you want a representation of the space, you may want to consider the StateGridder class, which will sample states in the space along a regular grid that you can define.
And if you're using a planning algorithm that needs the full state space, such as value iteration, the implementations in BURLAP will automatically handle enumeration of the states from the source state that you give it.
Transition dynamics for a single-agent domain are defined in the Action classes that you (or an existing domain) implement. Specifically, there are two methods used for defining transition dynamics. The performActionHelper(State, GroundedAction) and, if your Action class implements FullActionModel, getTransitions(State GroundedAction). The performActionHelper method, as the name implies performs the action on the given state (and the given parameters, if any, in the GroundedAction argument) and returns the outcome state. If the domain is stochastic, then the method should randomly sample an outcome state according to the distribution. Note that this method does not require the probability of the sampled outcome state to be returned, which in some domains may be non-trivial to compute. This method must be implemented to implement an Action.
The getTransitions method returns a list of all the possible outcome states with non-zero probability as well as their probability of occurring. This method only needs to be implemented if your action implements the optional interface FullActionModel, because for some domains it is impossible to enumerate all possible outcomes. However, some algorithms, such as DynamicProgramming algorithms, require being able to access the fully enumerated transition dynamics. So if you plan on using these algorithms with your domain, your Action will need to implement the interface and method.
For stochastic games problems, similar methods that need to be overridden exist in the JointActionModel abstract class: actionHelper(State, JointAction) and transitionProbsFor(State, JointAction), respectively.
Yes! When you subclass Action, one of the methods you must implement is applicableInState(State, GroundedAction), which should return true in states where you preconditions are satisfied and false in states where they are not. For stochastic games, the SGAgentAction has a method with the same name for the same purpose.
A SGAgentAction class is used to define stochastic games problems. Stochastic games are formalism for a multi-agent problem. In the definition, each agent in the world has a set of individual actions that they apply and at each time step, they each choose from their set of actions and execute them at the same time. Similar, to the single-agent problem Action class, the SGAgentAction class is used to define the name, preconditions, and parameterizations of an action that an agent in a stochastic games problem can take. Since states in a stochastic game change as a function of joint actions taken by all agents in the world, SGAgentAction does not have transition dynamics defined in it. Instead, transition dynamics for stochastic games are defined in the `JointActionModel abstract class.
You will see this term used in the definition of STRIPs-like ObjectParameterizedAction implementations and the PropositionalFunction class. A parameter order group is used with OO-MDP object parameters to specify whether there is symmetry between parameters. That is, parameters that belong to the same parameter order group (POG) can have their values swapped without changing the effect of the action or evaluation of the propositional function. For example, consider the propositional function prototype touching(X, Y), which returns true when the object assigned to X is touching the object assigned to Y. This evaluation should be be transitive. That is, if touching(a, b) is true, then touching(b, a) is true (and inversely when one evaluates to false, flipping the parameters should still result in a false evaluation). To encode that the parameters are transitive, we assign them to the same POG (which can be named anything as long as it's the same name for both). If they are not transitive, then we would assign different POGs to them. If you use an ObjectParameterizedAction constructor or PropositionalFunction constructor without the POG values argument, it will automatically assign each parameter to a different POG (that is, non-transitivity is the default assumption).
Specifying the parameter transitivity with POGs is useful because when you request a list of all grounded versions of an ObjectParameterizedAction (with the getAllApplicableGroundedActions(State) method) or propositional function (with the getAllGroundedPropsForState(State) method), it will not produce a grounding that is transitively identical to another already in the list. In our touching(X, Y) example, for instance, it will return a list with only the grounding for touching(a, b), or touching(b, a), but not both.
Possibly. The first thing to do is to make sure that Java's JVM is being given a large enough heap. If it's not, it's possible that it's artificially using less memory than you have. To set the amount of memory Java's JVM can use, you want to add a -Xmx argument when you call java; e.g., java -Xmx2048M [class path here]. The -Xmx argument lets you specify the heap size. In the example, it would provide it with 2048MB (2GB). If you're running in Eclipse or IntelliJ, then in the run configuration you should find a text field that lets you set your VM arguments; that's where you would put that flag.
If you're still running out of memory then there are a couple of things to consider. First, try and figure out if there is a more compact way to define your states. For example, do you really need as many OO-MDP objects you've defined, or can they be compressed into a smaller set?
Finally, if you're still running out memory, you should considering writing your own implementation of the State interface that allows for the most efficient management of your state data. It's possible the standard MutableState implementation that is mostly used in BURLAP domains is too general and is being wasteful for your specific domain.
Sure! There are two ways you can do this. You can (1) implement your own Environment class to manage the connection, control, and state perception of your robot; or (2) use our BURLAP library extension (burlap_rosbridge) that has a standard Environment implementation for communicating with robots controlled with ROS over Rosbridge.
When using burlap_rosbridge it is expected that you create a ROS topic that is publishing BURLAP state messages. By default, these messaged are assumed to adhere our ROS burlap_msgs type. However, you can also subclass the RosEnvironment to implement custom parsing of differently formatted topics. Burlap_rosbridge allows you to handle the execution of action in a variety of ways, including sending direct messages to the ROS controller (e.g., Twist messages) or by sending string messages to a ROS action controller. See burlap_rosbridge's github page for more information and examples of how to use it.
Burlap_rosbridge is also on Maven Central, so to link to it, simply add the following dependency alongside the BURLAP dependency in your project's pom.xml file:
Yes, we have been building a mod for Minecraft that allows you to use BURLAP's planning and learning algorithms to control the player. For more information on that project, see its github page and its accompanying wiki page.
For the moment, you'll have to cite this web page and can use James MacGlashan as the author. An academic publication is forthcoming, which you can use once it's available.