The Brown-UMBC Reinforcement Learning and Planning (BURLAP) java code library is for the use and development of single or multi-agent planning and learning algorithms and domains to accompany them. BURLAP uses a highly flexible system for defining states and and actions of nearly any kind of form, supporting discrete continuous, and relational domains. Planning and learning algorithms range from classic forward search planning to value function-based stochastic planning and learning algorithms. Also included is a set of analysis tools such as a common framework for the visualization of domains and agent performance in various domains.

BURLAP is licensed under the permissive Apache 2.0 license.

For more background information on the project and the people involved, see the Information page.

Where to git it

BURALP uses Maven and is available on Maven Central! That means that if you'd like to create a project that uses BURLAP, all you need to do is add the following dependency to the <dependencies> section of your projects pom.xml

<dependency> <groupId>edu.brown.cs.burlap</groupId> <artifactId>burlap</artifactId> <version>3.0.0</version> </dependency> and the library will automatically be downloaded and linked to your project! If you do not have Maven installed, you can get it from here.

You can also get the full BURLAP source to manually compile/modify from Github at:

Alternatively, you can directly download precompiled jars from Maven Central from here. Use the jar-with-dependencies if you want all dependencies included.

Prior versions of BURLAP are also available on Maven Central, and branches on github.

Tutorials and Example Code

Short video tutorials, longer text tutorials, and example code are available for BURLAP. All code can be found in our examples repository, which also provides the kind of POM file and file sturcture you should consider using for a BURLAP project. The example repository can be found at:


Video Tutorials

Written Tutorials


Java documentation is provided for all of the source code in BURLAP. You can find an online copy of the javadoc at the below location.




Features in development


  1. Diuk, C., Cohen, A., and Littman, M.L.. "An object-oriented representation for efficient reinforcement learning." Proceedings of the 25th international conference on Machine learning (2008). 240-270.
  2. Pohl, Ira. "First results on the effect of error in heuristic search". Machine Intelligence 5 (1970): 219-236.
  3. Pohl, Ira. "The avoidance of (relative) catastrophe, heuristic competence, genuine dynamic weighting and computational issues in heuristic problem solving (August, 1973)
  4. Puterman, Martin L., and Moon Chirl Shin. "Modified policy iteration algorithms for discounted Markov decision problems." Management Science 24.11 (1978): 1127-1137.
  5. Barto, Andrew G., Steven J. Bradtke, and Satinder P. Singh. "Learning to act using real-time dynamic programming." Artificial Intelligence 72.1 (1995): 81-138.
  6. Kocsis, Levente, and Csaba Szepesvari. "Bandit based monte-carlo planning." ECML (2006). 282-293.
  7. Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.
  8. Rummery, Gavin A., and Mahesan Niranjan. On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering, 1994.
  9. Barto, Andrew G., Richard S. Sutton, and Charles W. Anderson. "Neuronlike adaptive elements that can solve difficult learning control problems." Systems, Man and Cybernetics, IEEE Transactions on 5 (1983): 834-846.
  10. Albus, James S. "A theory of cerebellar function." Mathematical Biosciences 10.1 (1971): 25-61.
  11. Sutton, Richard S., Doina Precup, and Satinder Singh. "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning." Artificial intelligence 112.1 (1999): 181-211.
  12. Asmuth, John, Michael L. Littman, and Robert Zinkov. "Potential-based Shaping in Model-based Reinforcement Learning." AAAI. 2008.
  13. Littman, Michael L. "Markov games as a framework for multi-agent reinforcement learning." ICML. Vol. 94. 1994.
  14. Greenwald, Amy, Keith Hall, and Roberto Serrano. "Correlated Q-learning." ICML. Vol. 3. 2003.
  15. Sodomka, Eric, Hilliard, E., Littman, M., & Greenwald, A. "Coco-Q: Learning in Stochastic Games with Side Payments." Proceedings of the 30th International Conference on Machine Learning (ICML-13). 2013.
  16. Abbeel, Pieter, and Andrew Y. Ng. "Apprenticeship learning via inverse reinforcement learning." Proceedings of the twenty-first international conference on Machine learning. ACM, 2004.
  17. Kearns, Michael, Yishay Mansour, and Andrew Y. Ng. "A sparse sampling algorithm for near-optimal planning in large Markov decision processes." Machine Learning 49.2-3 (2002): 193-208.
  18. Lagoudakis, Michail G., and Ronald Parr. "Least-squares policy iteration." The Journal of Machine Learning Research 4 (2003): 1107-1149
  19. G.D. Konidaris, S. Osentoski and P.S. Thomas. Value Function Approximation in Reinforcement Learning using the Fourier Basis. In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence, pages 380-385, August 2011.
  20. Li, Lihong, Michael L. Littman, and L. Littman. Prioritized sweeping converges to the optimal value function. Tech. Rep. DCS-TR-631, 2008.
  21. McMahan, H. Brendan, Maxim Likhachev, and Geoffrey J. Gordon. "Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees." Proceedings of the 22nd international conference on Machine learning. ACM, 2005.
  22. Babes, Monica, et al. "Apprenticeship learning about multiple intentions." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.
  23. MacGlashan, James and Littman, Micahel, "Between imitation and intention learning," in Proceedings of the International Joint Conference on Artificial Intelligence, 2015.
  24. Gordon, Geoffrey J. "Stable function approximation in dynamic programming." Proceedings of the twelfth international conference on machine learning. 1995.
  25. Littman, M.L., Cassandra, A.R., Kaelbling, L.P., "Learning Policies for Partially Observable Environments: Scaling Up," in Proceedings of the 12th Internaltion Conference on Machine Learning. 1995.