SeqCritic

Sequential Critic class with LSTM layers for RDPG agent

SeqCritic

 SeqCritic (state_dim:int=0, action_dim:int=0, hidden_dim:int=0,
            n_layers:int=0, batch_size:int=0, padding_value:float=0.0,
            tau:float=0.0, lr:float=0.0, ckpt_dir:pathlib.Path=Path('.'),
            ckpt_interval:int=0, logger:Optional[logging.Logger]=None,
            dict_logger:Optional[dict]=None)

*Sequential Critic network for the RDPG algorithm.

Attributes:

state_dim (int): Dimension of the state space.
action_dim (int): Dimension of the action space.
hidden_dim (int): Dimension of the hidden layer.
n_layers (int): Number of layers in the network.
batch_size (int): Batch size for the network.
padding_value (float): Value to pad the input with.
tau (float): Soft update parameter.
lr (float): Learning rate for the network.
ckpt_dir (str): Directory to restore the checkpoint from.
ckpt_interval (int): Interval to save the checkpoint.
logger (logging.Logger): Logger for the class.
dict_logger (dict): Dictionary to log the class.*

	Type	Default	Details
state_dim	int	0	dimension of the state space, 600 for cloud, 90 for kvaser
action_dim	int	0	dimension of the action space, 68 for both cloud and kvaser
hidden_dim	int	0	dimension of the hidden layer
n_layers	int	0	layer number of the lstm
batch_size	int	0	batch size for the network
padding_value	float	0.0	value to pad the input with
tau	float	0.0	soft update parameter \(\tau\)
lr	float	0.0	learning rate for the network
ckpt_dir	Path	.	directory to restore the checkpoint from
ckpt_interval	int	0	interval to save the checkpoint
logger	Optional	None	logger for the class
dict_logger	Optional	None	format specs to log the class

source

SeqCritic.init

 SeqCritic.__init__ (state_dim:int=0, action_dim:int=0, hidden_dim:int=0,
                     n_layers:int=0, batch_size:int=0,
                     padding_value:float=0.0, tau:float=0.0, lr:float=0.0,
                     ckpt_dir:pathlib.Path=Path('.'), ckpt_interval:int=0,
                     logger:Optional[logging.Logger]=None,
                     dict_logger:Optional[dict]=None)

*Initialize the critic network.

restore checkpoint from the provided directory if it exists, initialize otherwise. Args: state_dim (int): Dimension of the state space. action_dim (int): Dimension of the action space. hidden_dim (int): Dimension of the hidden layer. lr (float): Learning rate for the network. ckpt_dir (str): Directory to restore the checkpoint from.*

	Type	Default	Details
state_dim	int	0	dimension of the state space, 600 for cloud, 90 for kvaser
action_dim	int	0	dimension of the action space, 68 for both cloud and kvaser
hidden_dim	int	0	dimension of the hidden layer
n_layers	int	0	layer number of the lstm
batch_size	int	0	batch size for the network
padding_value	float	0.0	value to pad the input with
tau	float	0.0	soft update parameter \(\tau\)
lr	float	0.0	learning rate for the network
ckpt_dir	Path	.	directory to restore the checkpoint from
ckpt_interval	int	0	interval to save the checkpoint
logger	Optional	None	logger for the class
dict_logger	Optional	None	format specs to log the class

source

SeqCritic.clone_weights

 SeqCritic.clone_weights (moving_net)

Clone weights from a model to another model.

source

SeqCritic.soft_update

 SeqCritic.soft_update (moving_net)

Update the target critic weights.

source

SeqCritic.save_ckpt

 SeqCritic.save_ckpt ()

Save the checkpoint.

source

SeqCritic.evaluate_q

 SeqCritic.evaluate_q (states, last_actions, actions)

*Evaluate the action value given the state and action

Args:

states (np.array): State in a minibatch
last_actions (np.array): Action in a minibatch
actions (np.array): Action in a minibatch

Returns:

np.array: Q-value*