Tutorial: Basic Planning and Learning

Tutorials > Basic Planning and Learning > Part 2

Tutorial Contents

Initializing the data members

Now that we have the structure of our class, we'll need to initialize our data members to instances that will create our domain and define our task. First, create a default constructor. The first thing we'll do in the constructor is create our domain.

public BasicBehavior(){
	gwdg = new GridWorldDomain(11, 11);
	gwdg.setMapToFourRooms();
	tf = new GridWorldTerminalFunction(10, 10);
	gwdg.setTf(tf);
	goalCondition = new TFGoalCondition(tf);
	domain = gwdg.generateDomain();
	
	//more to come...
}

The first line will create an 11x11 deterministic GridWorld. The second line sets up the map to a pre-canned layout: the four rooms layout. This domain layout was used in Option learning work from Sutton, Precup, and Singh (1999) and it presents a simple environment for us to do some tests. Alternatively, you could also define your own map layout by either passing the constructor a 2D integer array (with 1s specifying the cells with walls and 0s specifying open cells), or you could simply specify the size of the domain like we did and then use the GridWorldDomain object's horiztonalWall and verticalWall methods to place walls on it. The GridWorldDomain also supports 1 dimensional walls between cells that you can set, if you'd prefer that kind of domain. For simplicity, we'll stick with the four rooms layout.

By default, our grid world will set the reward function a reward function that always returns -1; but we will need to set where the terminal state is, that when paired with the -1 rewards will motivate the agent to get to it as soon as possible and complete the task. For that, we've told the grid world to use a GridWorldTerminalFunction that marks state 10, 10 as a terminal state.

Rewards and terminal states are fine for MDP-based algorithms, but some of the algorithms in BURLAP are search-based planning algorithms: algorithms that search for a path in a detemrinistic domain that reaches some goal condition. For that, we create a StateConditionTest that will be passed to these algorithms and set it to be one that marks all terminal states as goal states. StateConditionTest objects are not actually part of a domain definition, but since we will use it conjunction with the terminal states, we'll create it here anyway.

At this point we've fully specified the domain and generate it.

The next step will be to define the initial state of this task. For GridWorlds, we use the GridWorldState instance. Add the following code

initialState = new GridWorldState(new GridAgent(0, 0), new GridLocation(10, 10, "loc0"));

The GridWorldState uses an OO-MDP representation, which means the state itself consists of multiple objects (see the Building an OO-MDP Domain tutorial for more information). Here we've made it consist of an agent, located at position 0,0, and a location object located at 10,10. We're not actually going to do anything meaningful with the location object and could have ommitted it, but it will give us a nice visual representation of the goal we placed at 10,10 when we visualizer our results.

Next we will instantiate the HashableStateFactory that we wish to use. Since we are not doing anything fancy like state abstraction, we will use SimpleHashableStateFactory.

hashingFactory = new SimpleHashableStateFactory();

Finally, we will instantiate an Environment with which the agent will interact in our learning algorithm demonstrations. Since we will be using BURLAP's simulation of the environment, we will use a SimulatedEnvironment, which along with the domain, needs to be told which initial state to use for the environment.

env = new SimulatedEnvironment(domain, initialState);

At this point you should have initialized all of the data members for the class and the final constructor will look something like the below.

public BasicBehavior(){
	
	gwdg = new GridWorldDomain(11, 11);
	gwdg.setMapToFourRooms();
	tf = new GridWorldTerminalFunction(10, 10);
	gwdg.setTf(tf);
	goalCondition = new TFGoalCondition(tf);
	domain = gwdg.generateDomain();

	initialState = new GridWorldState(new GridAgent(0, 0), new GridLocation(10, 10, "loc0"));
	hashingFactory = new SimpleHashableStateFactory();

	env = new SimulatedEnvironment(domain, initialState);
		
}

Setting up a result visualizer

Before we get to actually running planning and learning algorithms, we're going to want a way to visualize the results that they generate. While there are a few different ways to do this, for now we will define an offline visualizer that will allow us to run planning or learning completely, and then visualize the results after it's finished. Offline visualization has the advantage of not bogging down the runtime of planning/learning algorithms with time allocated for visualization. To create our offline visualizer, we will need to define a state Visualizer and pass it to an EpisodeSequenceVisualizer. A Visualizer is a Java JPanel that can render State objects. An EpisodeSequenceVisualizer lets you view and explore episodes (state-action-reward sequences) and can either load the episodes from files or be provided them programmatically. In this example, we will save results to file and load them back up. To handle this kind of result visualization, create the below method.

public void visualize(String outputPath){
	Visualizer v = GridWorldVisualizer.getVisualizer(gwdg.getMap());
	new EpisodeSequenceVisualizer(v, domain, outputPath);
}

Note that the outputPath parameter specifies the directory where our planning/learning results were stored (well get to this when we actually apply a planning/learning algorithm).

The state Visualizer we will use is the one designed for rendering grid world states. It takes as input the map of the world (a 2D int array), which we retrieve from our GridWorldDomain instance. Note that other domains included in BURLAP have their own Visualizers that you can use for them.

Before moving on to the next part of the tutorial, lets also hook up our class constructor and visualizer method to the main method.

public static void main(String[] args) {


	BasicBehavior example = new BasicBehavior();
	String outputPath = "output/"; //directory to record results
	
	//we will call planning and learning algorithms here
	
	
	//run the visualizer
	example.visualize(outputPath);

}

Note that you can set the output path to whatever you want. If it doesn't already exist, the code that saves the results will automatically created it (more on that next).

Next Part