Tutorial: Building a Domain

Tutorials (v1) > Building a Domain > Part 3

Defining GridWorld Object Classes

To begin implementing our grid world domain in BURLAP, we will create a class that in this tutorial we will call ExampleGridWorld. Furthermore, we will make it implement the DomainGenerator interface, which is a common convention in BURLAP when developing domains and requires implementing the generateDomain() method. In that method, we will also create an SADomain object that will keep track of all of our attributes, object classes, etc. that we define. (Note that the Domain class that is returned by the method is the abstract superclass of SADomain. It is abstract because for stochastic games we will have a different domain subclass that contains different and shared information.) You should have code that looks like the below; to make things easier for the future, the below code has all of the library imports that you'll need for the rest of the tutorial.

import java.awt.Color;
import java.awt.Graphics2D;
import java.awt.geom.Ellipse2D;
import java.awt.geom.Rectangle2D;
import java.util.ArrayList;
import java.util.List;

import burlap.oomdp.auxiliary.DomainGenerator;
import burlap.oomdp.core.Attribute;
import burlap.oomdp.core.Attribute.AttributeType;
import burlap.oomdp.core.Domain;
import burlap.oomdp.core.ObjectClass;
import burlap.oomdp.core.ObjectInstance;
import burlap.oomdp.core.PropositionalFunction;
import burlap.oomdp.core.State;
import burlap.oomdp.core.TerminalFunction;
import burlap.oomdp.core.TransitionProbability;
import burlap.oomdp.singleagent.Action;
import burlap.oomdp.singleagent.GroundedAction;
import burlap.oomdp.singleagent.RewardFunction;
import burlap.oomdp.singleagent.SADomain;
import burlap.oomdp.singleagent.explorer.TerminalExplorer;
import burlap.oomdp.singleagent.explorer.VisualExplorer;
import burlap.oomdp.visualizer.ObjectPainter;
import burlap.oomdp.visualizer.StateRenderLayer;
import burlap.oomdp.visualizer.StaticPainter;
import burlap.oomdp.visualizer.Visualizer;

public class ExampleGridWorld implements DomainGenerator {

	@Override
	public Domain generateDomain() {
		SADomain domain = new SADomain();
		
		
		return domain;
	}

}
				

The first thing that we will want to decide in the construction of our domain is what the object classes and attributes that define the domain are. For our grid world, we will define two object classes: an agent class, to represent the agent in the world; and a location class, which we can use to refer to special places in the world. The location class isn't necessary to define grid world problems, but we're going to include it to help illustrate how to define propositional functions in BURLAP. Both the agent and location class will be defined by their x and y position in the 2D world. For convenience, it is often useful to define the names of all attributes, object classes, etc. that the domain defines as string constants so that they can be precisely referenced by code that uses the domain, so lets add those to our code now.

public static final String ATTX = "x";
public static final String ATTY = "y";

public static final String CLASSAGENT = "agent";
public static final String CLASSLOCATION = "location";
				

Next we will create the actual Attribute objects and the classes associated with them. Since grid worlds have discrete states, but our attributes represent numeric positions, we will set our attributes to be of type int. We will also construct a world that is 11x11 in size, so we will set the attribute limits to be from 0 to 10 inclusively.

@Override
public Domain generateDomain() {
	
	SADomain domain = new SADomain();
	
	Attribute xatt = new Attribute(domain, ATTX, AttributeType.INT);
	xatt.setLims(0, 10);
	
	Attribute yatt = new Attribute(domain, ATTY, AttributeType.INT);
	yatt.setLims(0, 10);

	ObjectClass agentClass = new ObjectClass(domain, CLASSAGENT);
	agentClass.addAttribute(xatt);
	agentClass.addAttribute(yatt);
	
	ObjectClass locationClass = new ObjectClass(domain, CLASSLOCATION);
	locationClass.addAttribute(xatt);
	locationClass.addAttribute(yatt);
	
	return domain;
}
				

Notice how we never directly told the domain about the attributes or object classes that we created? That's because the constructor of the attributes and object classes will tell the domain about themselves automatically to streamline construction.

With that code written, we're already finished defining the state representation of the domain! The next step will be to define actions.

Defining GridWorld Actions

Defining actions in BURLAP means specifying the actions the agent can take, their preconditions (if any), their parameters (if any), and their transition dynamics. We will construct four actions: north, south, east, and west. None of these actions will have preconditions and they will not take any parameters. The transition dynamics we define for them will depend on the location of walls in the world. We could have potentially made the walls objects of the OO-MDP themselves, but for simplifying the state representation we will keep the walls embedded in our transition dynamics without explicit state representation. To facilitate the definition of the transition dynamics, we will create a 2D int matrix specifying the location of walls in each cell. We will also create a world with the same wall layout shown in the image at the beginning of this tutorial. To do so, add the following code.

//ordered so first dimension is x
protected int [][] map = new int[][]{
			{0,0,0,0,0,1,0,0,0,0,0},
			{0,0,0,0,0,0,0,0,0,0,0},
			{0,0,0,0,0,1,0,0,0,0,0},
			{0,0,0,0,0,1,0,0,0,0,0},
			{0,0,0,0,0,1,0,0,0,0,0},
			{1,0,1,1,1,1,1,1,0,1,1},
			{0,0,0,0,1,0,0,0,0,0,0},
			{0,0,0,0,1,0,0,0,0,0,0},
			{0,0,0,0,0,0,0,0,0,0,0},
			{0,0,0,0,1,0,0,0,0,0,0},
			{0,0,0,0,1,0,0,0,0,0,0},
	};


				

Next we will create our actions. We will define north, south, east, and west actions that have a probability of 0.8 of going in the intended direction and a probability of 0.2 of going in any other direction. To create these actions, we will define a single subclass of the Action class that we will call Movement and instantiate it multiple times for each action. Subclassing Action requires us to implement the method performActionHelper, which is called whenever an action is applied and must return the resulting state from applying the action.

We will also want to override the method getTransitions, which specifies the probability of transitioning to each possible next state when applying the action in the provided state (and with the provided parameters). Overriding this method is not required if the domain is not going to be used with planning or learning algorithms that require exact and fully enumerated transition dynamics, and for some domains, it may not even be possible to enumerate all possible outcomes. However, if you do not override the method and a planning or learning algorithm tries to use it, an UnsupportedOperation exception will be thrown. Since the number of possible outcomes is trivial to enumerate for our grid world and since we'd like to be able to use algorithms like Value Iteration that require the fully enumerated transition dynamics, we will implement the method.

If our actions had preconditions and were only applicable in certain states, we would also have to override the applicableInState method. Since our actions can be taken anywhere, however, we will not override it, which has the default behavior of making the actions applicable everywhere.

protected class Movement extends Action{

	@Override
	protected State performActionHelper(State s, String[] params) {
		
		return null;
	}
	
	
	@Override
	public List getTransitions(State s, String [] params){
		return null;
	}
	
	
}
				

So that we can reuse this Action class for each movement direction, lets define a data member that specifies the probability of going in each direction and let the intended direction (which will succeed with probability 0.8) be specified in the constructor, along with a name for the action and the domain to which it belongs.

protected class Movement extends Action{

	//0: north; 1: south; 2:east; 3: west
	protected double [] directionProbs = new double[4];
	
	public Movement(String actionName, Domain domain, int direction){
		super(actionName, domain, "");
		for(int i = 0; i < 4; i++){
			if(i == direction){
				directionProbs[i] = 0.8;
			}
			else{
				directionProbs[i] = 0.2/3.;
			}
		}
	}
	
	@Override
	protected State performActionHelper(State s, String[] params) {
		
		return null;
	}
	
	
	@Override
	public List<TransitionProbability> getTransitions(State s, String [] params){
		return null;
	}
	
	
}
				

Notice how we called a super constructor with the action name, domain, and empty quotes? Calling the super constructor will automatically connect the action with the domain object and set its name. The empty quotes are used to specify that this action takes no parameters.

Now we'll want to start defining our performActionHelper and getTransitions methods. To do so, lets add a helper method for getting the result of moving in a given direction. As long as the cell is free (that is, there is no wall), the agent will move to that position; if there is a wall there, the agent will stay in the same place. This method will then return the resulting x and y position of the agent.

protected int [] moveResult(int curX, int curY, int direction){
			
		//first get change in x and y from direction using 0: north; 1: south; 2:east; 3: west
		int xdelta = 0;
		int ydelta = 0;
		if(direction == 0){
			ydelta = 1;
		}
		else if(direction == 1){
			ydelta = -1;
		}
		else if(direction == 2){
			xdelta = 1;
		}
		else{
			xdelta = -1;
		}
		
		int nx = curX + xdelta;
		int ny = curY + ydelta;
		
		int width = ExampleGridWorld.this.map.length;
		int height = ExampleGridWorld.this.map[0].length;
		
		//make sure new position is valid (not a wall or out of bounds)
		if(nx < 0 || nx >= width || ny < 0 || ny >= height ||  
			ExampleGridWorld.this.map[nx][ny] == 1){
			
			nx = curX;
			ny = curY;
		}
			
		
		return new int[]{nx,ny};
		
	}
				

With this helper method defined, all the performActionHelper method needs to do is determine the current position of the agent, sample from the direction distribution, call the moveResult method to get the new position, and set the agent attribute values to the new position. Getting the direction and setting the values can be performed by grabbing agent object instance in the state and using the pertinent value getter and setter methods. Since there is only ever one agent object instance in the state, we can retrieve the agent ObjectInstance using the getFirstObjectOfClass method that returns the first object instance of an object belonging to a specified class in the state.

@Override
protected State performActionHelper(State s, String[] params) {
	
	//get agent and current position
	ObjectInstance agent = s.getFirstObjectOfClass(CLASSAGENT);
	int curX = agent.getDiscValForAttribute(ATTX);
	int curY = agent.getDiscValForAttribute(ATTY);
	
	//sample directon with random roll
	double r = Math.random();
	double sumProb = 0.;
	int dir = 0;
	for(int i = 0; i < this.directionProbs.length; i++){
		sumProb += this.directionProbs[i];
		if(r < sumProb){
			dir = i;
			break; //found direction
		}
	}
	
	//get resulting position
	int [] newPos = this.moveResult(curX, curY, dir);
	
	//set the new position
	agent.setValue(ATTX, newPos[0]);
	agent.setValue(ATTY, newPos[1]);
	
	//return the state we just modified
	return s;
}
				

Note

You might be wondering why we can modify the state directly and then return it, since it seems like that might cause problems if the passed in State object was used elsewhere in a planning or learning algorithm. Notice how the method we are overriding has a "Helper" qualifier at the end and is marked as protected? That's because there is a public method called "performAction" that will first copy any input state and then pass the copied state to the performActionHelper method that we are overriding. This copying ensures that you're never modifying a State object that is used for something else.

For the getTransition method, we'll do something similar, except in this case, we create a copy of the input state for each possible outcome, modify the copied state to the possible results, and return a List that couples the probability with each of those possible outcomes. In general, each possible outcome from a movement action is the result of the agent moving in each direction with the probability of the action making the agent move in that direction. However, moving in multiple directions may result in the agent not changing position at all if there are walls in those directions. Therefore, we should merge the result and probability of directions that result in the same outcome before returning our set of outcome states.


@Override
public List<TransitionProbability> getTransitions(State s, String [] params){
	
	//get agent and current position
	ObjectInstance agent = s.getFirstObjectOfClass(CLASSAGENT);
	int curX = agent.getDiscValForAttribute(ATTX);
	int curY = agent.getDiscValForAttribute(ATTY);
	
	List<TransitionProbability> tps = new ArrayList<TransitionProbability>(4);
	TransitionProbability noChangeTransition = null;
	for(int i = 0; i < this.directionProbs.length; i++){
		int [] newPos = this.moveResult(curX, curY, i);
		if(newPos[0] != curX || newPos[1] != curY){
			//new possible outcome
			State ns = s.copy();
			ObjectInstance nagent = ns.getFirstObjectOfClass(CLASSAGENT);
			nagent.setValue(ATTX, newPos[0]);
			nagent.setValue(ATTY, newPos[1]);
			
			//create transition probability object and add to our list of outcomes
			tps.add(new TransitionProbability(ns, this.directionProbs[i]));
		}
		else{
			//this direction didn't lead anywhere new
			//if there are existing possible directions that wouldn't lead anywhere, 
			//aggregate with them
			if(noChangeTransition != null){
				noChangeTransition.p += this.directionProbs[i];
			}
			else{
				//otherwise create this new state outcome
				noChangeTransition = new TransitionProbability(s.copy(), this.directionProbs[i]);
				tps.add(noChangeTransition);
			}
		}
	}
	
	
	return tps;
}
				

With the Action class defined, we'll want to hook it up to our domain. First, lets add some string constants for the names of the actions.

public static final String ACTIONNORTH = "north";
public static final String ACTIONSOUTH = "south";
public static final String ACTIONEAST = "east";
public static final String ACTIONWEST = "west";
				

Then we just need to call the constructor in our generateDomain method. As with the object classes and attributes, calling the constructor will automatically tell our domain about it, so we don't need to do anything further.

new Movement(ACTIONNORTH, domain, 0);
new Movement(ACTIONSOUTH, domain, 1);
new Movement(ACTIONEAST, domain, 2);
new Movement(ACTIONWEST, domain, 3);;
				

Defining Propositional Functions

Although we have a functioning domain at this point, we're going to add an optional propositional function to it to demonstrate how to use them. The propositional function we will create is at(agent, location). Notice that it will take two parameters, one which is an OO-MDP object belonging to OO-MDP class "agent" and the other an OO-MDP object belonging to OO-MDP class "location." The function should evaluate to true when the provided agent object's position is equal to the provided location object's position. To implement this function, we will need to subclass the PropositionalFunction class. Lets first create another string constant for the name of the function.

public static final String PFAT = "at";
				

And then lets make the shell of the class implementation with a constructor.

protected class AtLocation extends PropositionalFunction{

	public AtLocation(Domain domain){
		super(PFAT, domain, new String []{CLASSAGENT,CLASSLOCATION});
	}
	
	@Override
	public boolean isTrue(State s, String[] params) {
		// TODO Auto-generated method stub
		return false;
	}
	
	
	
}
				

Inside the constructor we called a super constructor that takes as arguments the name of the propositional function, the domain with which it will be associated, and an array specifying the OO-MDP class types to which its parameters must adhere.

Now lets implement the isTrue method, which is as simple as getting the object instances of the provided parameters and checking if the attributes are equal. The parameters that are provided to the method are the names (or identifiers) of the objects in the world that it's referencing, so we can retrieve the objects from the provided state object by simply querying for the object with the given identifier.

@Override
public boolean isTrue(State s, String[] params) {
	ObjectInstance agent = s.getObject(params[0]);
	ObjectInstance location = s.getObject(params[1]);
	
	int ax = agent.getDiscValForAttribute(ATTX);
	int ay = agent.getDiscValForAttribute(ATTY);
	
	int lx = location.getDiscValForAttribute(ATTX);
	int ly = location.getDiscValForAttribute(ATTY);
	
	return ax == lx && ay == ly;
}
				

Finally, lets call the constructors from our generateDomain method. As before, the constructor will automatically tell the domain about the propositional function. Our final generateDomain method will look like the below.

@Override
public Domain generateDomain() {
	
	SADomain domain = new SADomain();
	
	Attribute xatt = new Attribute(domain, ATTX, AttributeType.INT);
	xatt.setLims(0, 10);
	
	Attribute yatt = new Attribute(domain, ATTY, AttributeType.INT);
	yatt.setLims(0, 10);
	
	ObjectClass agentClass = new ObjectClass(domain, CLASSAGENT);
	agentClass.addAttribute(xatt);
	agentClass.addAttribute(yatt);
	
	ObjectClass locationClass = new ObjectClass(domain, CLASSLOCATION);
	locationClass.addAttribute(xatt);
	locationClass.addAttribute(yatt);
	
	new Movement(ACTIONNORTH, domain, 0);
	new Movement(ACTIONSOUTH, domain, 1);
	new Movement(ACTIONEAST, domain, 2);
	new Movement(ACTIONWEST, domain, 3);
	
	new AtLocation(domain);
	
	return domain;
}
				

Note

Although we have defined a propositional function, you might be wondering how you can find all possible parameters in a state with which it can be evaluated. The PropositionalFunction super class provides a method for doing just this called getAllGroundedPropsForState(State), which will return a list of GroundedProp objects that can be evaluated. A GroundedProp is simply a PropositionalFunction object reference and parameters with which to evaluate it. Similarly, if you want all possible GroundedProp objects for a list of different PropositionalFunction objects, you can use the PropositionalFunction static method getAllGroundedPropsForState(State).

Testing the Domain

We now have a functional domain! However, it's probably important to test that the domain actually works as expected. By the end of this tutorial we will have created a visualizer for our domain that we can use to interactively visualize and test it, but we can also test it in the command line. To test our domain, lets first add a method to generate a state with the agent in the bottom left corner and a location object in the top right.

public static State getExampleState(Domain domain){
	State s = new State();
	ObjectInstance agent = new ObjectInstance(domain.getObjectClass(CLASSAGENT), "agent0");
	agent.setValue(ATTX, 0);
	agent.setValue(ATTY, 0);
	
	ObjectInstance location = new ObjectInstance(domain.getObjectClass(CLASSLOCATION), "location0");
	location.setValue(ATTX, 10);
	location.setValue(ATTY, 10);
	
	s.addObject(agent);
	s.addObject(location);
	
	return s;
}
				

Note that we made this method static and had it take as a parameter the domain object. We made the method static because the domain object is only ever created once the generateDomain method is called and that method can produce different Domain objects each time it is called. (Although not especially important for our example, this is useful if you have parameterizable domains.) Since ObjectInstance objects have to belong to a specific ObjectClass, we have to have the have the corresponding domain object from which to retrieve the ObjectClass.

In addition to taking an ObjectClass, the ObjectInstance constructor also takes as a parameter the name or identifier of the object instance. This name will be used to identify the object from other objects in the state and to further disambiguate it from multiple objects of the same class. This name is also what will be passed to the PropositionalFunction isTrue method parameters and for actions that are parameterized to objects (though in our Grid World, our Actions are not parameterized).

Lets now add a main method that we will launch into to test our domain. To do the testing, we will make use the TerminalExplorer which lets us act as the agent through the command line. Create the main method as shown below.

public static void main(String [] args){
		
	ExampleGridWorld gen = new ExampleGridWorld();
	Domain domain = gen.generateDomain();
	
	State initialState = ExampleGridWorld.getExampleState(domain);
	
	TerminalExplorer exp = new TerminalExplorer(domain);
	exp.exploreFromState(initialState);
	
	
}
				

When you run main in the command line/terminal you should see a print out describing the example state. If you now type the name of an action (such as "north") and hit enter, it will change state and print out the new state. Remember that our actions are stochastic, so 20% of the time you'll find yourself going in a different direction!

While using the TerminalExplorer works for all domains that you might create, as your domains get more complex it be can be difficult to make sense of them from text alone. Having a visualizer makes understanding what's happening in your domain much easier. Furthermore, you can reuse a visualizer not just for interacting in the world yourself, but for visualizing results of learning and planning algorithms. In the next sections, we will walk through how to create a state visualizer for this domain.