public class CartPoleDomain extends java.lang.Object implements DomainGenerator
physParams
isFiniteTrack
parameter to false. The infinite track is handled by never changing the position value of the cart. All model/physics parameters are stored in the
CartPoleDomain.CPPhysicsParams
object physParams
. Modifying this generator's model parameters
will not affected previously generated domains, so the same generator can be used to generate different domains without affecting others.
By default, this implementation will use the simulation described by Florian, which corrects two problems in the classic Barto, Sutton, and Anderson paper.
The two problems were (1) gravity was specified as negative in the equations when it should have been positive and (2) friction was not calculated
correctly. However, this domain may also be set to use the classic incorrect mechanics or the classic mechanics with correct gravity for comparison
purposes. To do so, use the methods setToIncorrectClassicModel()
and setToIncorrectClassicModelWithCorrectGravity()
. Note that
when incorrect gravity is used, the pole will "bounce" once it reaches about 90 degrees (though in most tasks the pole is never allowed to fall this far).
This domain consists of a single object with 4 real valued attributes: the x position of the cart, the x velocity of the cart, the angle between the pole and the vertical axis, and the speed of the change in angle. Additionally, a 5th hidden attribute is included when the corrected physics are used that maintains the sign of the normal force in the last step. If the classic mechanics are used instead, then this hidden attribute is not included. The physics are simulated using a non-linear differential equation that is estimated using Euler's Method. All system parameters are defaulted to those used in the original paper, but they may modified as desired.
Also included with this class are default classes for reward function and terminal function for this domain.
Running the main method of this class will launch and interactive visualizer with the 'a' and 'd' keys controlling left and right movement force respectively. 1. Florian, Razvan V. "Correct equations for the dynamics of the cart-pole system." Center for Cognitive and Neural Studies (Coneural), Romania (2007). 2. Barto, Andrew G., Richard S. Sutton, and Charles W. Anderson. "Neuronlike adaptive elements that can solve difficult learning control problems." Systems, Man and Cybernetics, IEEE Transactions on 5 (1983): 834-846.
Modifier and Type | Class and Description |
---|---|
static class |
CartPoleDomain.CartPoleRewardFunction
A default reward function for this task.
|
static class |
CartPoleDomain.CartPoleTerminalFunction
A default terminal function for this domain.
|
static class |
CartPoleDomain.CPPhysicsParams |
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
ACTION_LEFT
A constant for the name of the left action
|
static java.lang.String |
ACTION_RIGHT
A constant for the name of the right action
|
CartPoleDomain.CPPhysicsParams |
physParams
An object specifying the physics parameters for the cart pole domain.
|
protected RewardFunction |
rf |
protected TerminalFunction |
tf |
static java.lang.String |
VAR_ANGLE
A constant for the name of the angle attribute
|
static java.lang.String |
VAR_ANGLEV
A constant for the name of the angle velocity
|
static java.lang.String |
VAR_NORM_SGN
Attribute name for maintaining the direction sign of the force normal.
|
static java.lang.String |
VAR_V
A constant for the name of the position velocity
|
static java.lang.String |
VAR_X
A constant for the name of the position attribute
|
Constructor and Description |
---|
CartPoleDomain() |
Modifier and Type | Method and Description |
---|---|
SADomain |
generateDomain()
Returns a newly instanced Domain object
|
RewardFunction |
getRf() |
TerminalFunction |
getTf() |
static void |
main(java.lang.String[] args)
Launches an interactive visualize in which key 'a' applies a force in the left direction and key 'd' applies force in the right direction.
|
void |
setRf(RewardFunction rf) |
void |
setTf(TerminalFunction tf) |
void |
setToCorrectModel()
Sets to use the correct physics model by Florian.
|
void |
setToIncorrectClassicModel()
Sets to the use the classic model by Barto, Sutton, and Anderson, which has incorrect friction forces and gravity
in the wrong direction
|
void |
setToIncorrectClassicModelWithCorrectGravity()
Sets to use the classic model by Barto, Sutton, and Anderson which has incorrect friction forces, but will use
correct gravity.
|
public static final java.lang.String VAR_X
public static final java.lang.String VAR_V
public static final java.lang.String VAR_ANGLE
public static final java.lang.String VAR_ANGLEV
public static final java.lang.String VAR_NORM_SGN
public static final java.lang.String ACTION_LEFT
public static final java.lang.String ACTION_RIGHT
public CartPoleDomain.CPPhysicsParams physParams
protected RewardFunction rf
protected TerminalFunction tf
public SADomain generateDomain()
DomainGenerator
generateDomain
in interface DomainGenerator
public TerminalFunction getTf()
public void setTf(TerminalFunction tf)
public RewardFunction getRf()
public void setRf(RewardFunction rf)
public void setToIncorrectClassicModelWithCorrectGravity()
public void setToIncorrectClassicModel()
public void setToCorrectModel()
public static void main(java.lang.String[] args)
args
- ignored.