Tutorial: Building a Domain

Tutorials > Building a Domain > Part 2

Tutorial Contents

OO-MDPs

In the classic MDP formalism, each state is simply described by its identity. The cell in the bottom left corner of the grid world would simply be state "0" and the one above it might simply be state "11." This is known as a flat state representation because there is no other information about the states other than their identity. Although many planning/learning algorithms work just fine with flat representations, using a flat state representation makes defining transition dynamics and reward functions inconvenient. In fact, when we described the grid world in the previous section, we used words regarding spatial adjacency and direction to explain it. It would similarly be nice to define the states, transitions, etc. using such concepts. For these reasons (and others), it is often much easier to use a factored state representation, which can be exploited when defining the MDP transition dynamics and other properties.

A classic way to define a factored state representation is with a set of state variables or attributes. In our grid world, for example, we would define the state by an x-position attribute and a y-position attribute. The bottom left cell of the world would be state (0, 0); the cell directly above it would be (0, 1); and so on.

The factored representation that BURAP uses is the object-oriented MDP (OO-MDP), which rather than representing states by a set of attributes, states are represented by a set of objects. Each object belongs to an object class, and each object class has an associated set of attributes. Each attribute can be of a different type with its own value domain. An object in a state is simply a value assignment to its class' attributes. In our grid world, we can define an "agent" class that has two integer attributes associated with it with a value domain spanning the width and height of the grid world. In this definition, a state would contain an object instance belonging to the agent class with a value assignment specifying the agent's x and y position.

Although grid worlds are simple enough to describe without using an OO-MDP representation, there are a number of reasons why the OO-MDP representation is useful. For example, it's trivial to define transition dynamics that create new objects in the world or remove them, merely by having the objects added or removed from the list of objects present in a state. If there are multiple objects belonging to the same class, states can also be defined invariantly to the identifier or order of the objects in the state. We call this kind of invariance object identifier independence. For example, consider a state (s0) made up of two block objects (block0 and block1) that are each defined by spatial position information (an x and y attribute). Now imagine a new state (s1) that is the result of swapping the positions of block0 and block1. Even though the object identifiers associated with the block positions (block0 and block1) are different between s0 and s1, these really are the same state and when equality is object identifier independent they will be considered equal. The below illustration helps clarify this property.

BURLAP supports both object idenitifer independence and dependence, depending on your needs. See the Basic Planning and Learning Tutorial for more information on using object identifier independence.

Another advantage to the OO-MDP paradigm is that it leverages the object-oriented nature to provide additional high-level state features in the form of propositional functions that operate on objects in the world. In our grid world, we can introduce an additional object class for location objects (similarly defined by x,y position attributes) and then define a propositional function called "at" that operates on the agent object and a location object and evaluates to true when they are in the same location. Including propositional functions is useful for bridging the gap between MDPs and more classic AI approaches that are based on logical representations. In this tutorial we will implement the "at" propositional function in our grid world to demonstrate how to create them.

BURLAP OO-MDP Java Class Overview

BURLAP implements the OO-MDP paradigm in Java with the following class structure, which can be found in the packages burlap.oomdp.core and burlap.oomdp.singleagent.

Attribute - this class defines an attribute name, data type, and value range. Attribute data types can be discrete (categorical or integers), real-valued, relational, strings, or int and double arrays for very large data.
ObjectClass - this class defines an object class name and is defined with an associated set of Attribute objects.
Value - this class provides a value assignment for a specific attribute. There is a different Value subclass for each attribute data type, but management of it is handled behind the scenes.
ObjectInstance - this interface is used to represent an OO-MDP object, which is defined by an ObjectClass and a set of Value assignments to each of the class' Attribute objects.
State - this interface provides methods to retrieve infromation about an OO-MDP state, such as getting the vairous ObjectInstance elements that define it.
PropositionalFunction - this abstract class is subclassed to define propositional functions of an OO-MDP that operate on objects in State objects.
Action - this abstract class is subclassed to provide the definitions of actions that an agent can take in the world. The action subclass defines the transition dynamics for the action, preconditions (if any), and parameters the action takes (if any).
TransitionProbability - this class is a pair that defines the probability of transitioning to another state and is returned by the Action class when querying for its transition dynamics.
GroundedAction - this class provides a reference to an action and the specific parameters (if any) with which it should be applied.
GroundedProp - this class provides a reference to a PropositionalFunction and the parameters with which it will be evaluated.
SADomain - this class provides the definition of an OO-MDP domain, and includes the references to all of the ObjectClass objects, Attribute objects, PropositionalFunction objects, and Action objects that define the domain.
TerminalFunction - this interface is implemented to specify the terminal states of a task in a specific OO-MDP domain.
RewardFunction - this interface is implemented to specify the rewards received for any state, action, next state transition.

State implementations

Because State and ObjectInstance are interfaces, you must choose a specific implementaiton that handles how they manage memory and datastructures for storing the information. All the included Domain's in BURLAP make use of the MutableState and MuableObjectInstance implemenations, which is a good place to start. However, if your domain is computationally demanding, you may want to consider writing your own implementation that handles memory and allows access to information in the most efficient way possible.

Next Part