In the classic MDP formalism, each state is simply described by its identity. The cell in the bottom left corner of the grid world would simply be state "0" and the one above it might simply be state "11." This is known as a flat state representation because there is no other information about the states other than their identity. Although many planning/learning algorithms work just fine with flat representations, using a flat state representation makes defining transition dynamics and reward functions inconvenient. In fact, when we described the grid world in the previous section, we used words regarding spatial adjacency and direction to explain it. It would similarly be nice to define the states, transitions, etc. using such concepts. For these reasons (and others), it is often much easier to use a factored state representation, which can be exploited when defining the MDP transition dynamics and other properties.
A classic way to define a factored state representation is with a set of state variables or attributes. In our grid world, for example, we would define the state by an x-position attribute and a y-position attribute. The bottom left cell of the world would be state (0, 0); the cell directly above it would be (0, 1); and so on.
The factored representation that BURAP uses is the object-oriented MDP (OO-MDP), which rather than representing states by a set of attributes, states are represented by a set of objects. Each object belongs to an object class, and each object class has an associated set of attributes. Each attribute can be of a different type with its own value domain. An object in a state is simply a value assignment to its class' attributes. In our grid world, we can define an "agent" class that has two integer attributes associated with it with a value domain spanning the width and height of the grid world. In this definition, a state would contain an object instance belonging to the agent class with a value assignment specifying the agent's x and y position.
Although grid worlds are simple enough to describe without using an OO-MDP representation, there are a number of reasons why the OO-MDP representation is useful. For example, it's trivial to define transition dynamics that create new objects in the world or remove them, merely by having the objects added or removed from the list of objects present in a state. If there are multiple objects belonging to the same class, states can also be defined invariantly to the identifier or order of the objects in the state. We call this kind of invariance object identifier independence. For example, consider a state (s0) made up of two block objects (block0 and block1) that are each defined by spatial position information (an x and y attribute). Now imagine a new state (s1) that is the result of swapping the positions of block0 and block1. Even though the object identifiers associated with the block positions (block0 and block1) are different between s0 and s1, these really are the same state and when equality is object identifier independent they will be considered equal. The below illustration helps clarify this property.
BURLAP supports both object idenitifer independence and dependence, depending on your needs. See the Basic Planning and Learning Tutorial for more information on using object identifier independence.
Another advantage to the OO-MDP paradigm is that it leverages the object-oriented nature to provide additional high-level state features in the form of propositional functions that operate on objects in the world. In our grid world, we can introduce an additional object class for location objects (similarly defined by x,y position attributes) and then define a propositional function called "at" that operates on the agent object and a location object and evaluates to true when they are in the same location. Including propositional functions is useful for bridging the gap between MDPs and more classic AI approaches that are based on logical representations. In this tutorial we will implement the "at" propositional function in our grid world to demonstrate how to create them.
Because State and ObjectInstance are interfaces, you must choose a specific implementaiton that handles how they manage memory and datastructures for storing the information. All the included Domain's in BURLAP make use of the MutableState and MuableObjectInstance implemenations, which is a good place to start. However, if your domain is computationally demanding, you may want to consider writing your own implementation that handles memory and allows access to information in the most efficient way possible.