SeqCritic
Sequential Critic class with LSTM layers for RDPG agent
SeqCritic
SeqCritic (state_dim:int=0, action_dim:int=0, hidden_dim:int=0, n_layers:int=0, batch_size:int=0, padding_value:float=0.0, tau:float=0.0, lr:float=0.0, ckpt_dir:pathlib.Path=Path('.'), ckpt_interval:int=0, logger:Optional[logging.Logger]=None, dict_logger:Optional[dict]=None)
*Sequential Critic network for the RDPG algorithm.
Attributes:
state_dim (int): Dimension of the state space.
action_dim (int): Dimension of the action space.
hidden_dim (int): Dimension of the hidden layer.
n_layers (int): Number of layers in the network.
batch_size (int): Batch size for the network.
padding_value (float): Value to pad the input with.
tau (float): Soft update parameter.
lr (float): Learning rate for the network.
ckpt_dir (str): Directory to restore the checkpoint from.
ckpt_interval (int): Interval to save the checkpoint.
logger (logging.Logger): Logger for the class.
dict_logger (dict): Dictionary to log the class.*
Type | Default | Details | |
---|---|---|---|
state_dim | int | 0 | dimension of the state space, 600 for cloud, 90 for kvaser |
action_dim | int | 0 | dimension of the action space, 68 for both cloud and kvaser |
hidden_dim | int | 0 | dimension of the hidden layer |
n_layers | int | 0 | layer number of the lstm |
batch_size | int | 0 | batch size for the network |
padding_value | float | 0.0 | value to pad the input with |
tau | float | 0.0 | soft update parameter \(\tau\) |
lr | float | 0.0 | learning rate for the network |
ckpt_dir | Path | . | directory to restore the checkpoint from |
ckpt_interval | int | 0 | interval to save the checkpoint |
logger | Optional | None | logger for the class |
dict_logger | Optional | None | format specs to log the class |
SeqCritic.__init__
SeqCritic.__init__ (state_dim:int=0, action_dim:int=0, hidden_dim:int=0, n_layers:int=0, batch_size:int=0, padding_value:float=0.0, tau:float=0.0, lr:float=0.0, ckpt_dir:pathlib.Path=Path('.'), ckpt_interval:int=0, logger:Optional[logging.Logger]=None, dict_logger:Optional[dict]=None)
*Initialize the critic network.
restore checkpoint from the provided directory if it exists, initialize otherwise. Args: state_dim (int): Dimension of the state space. action_dim (int): Dimension of the action space. hidden_dim (int): Dimension of the hidden layer. lr (float): Learning rate for the network. ckpt_dir (str): Directory to restore the checkpoint from.*
Type | Default | Details | |
---|---|---|---|
state_dim | int | 0 | dimension of the state space, 600 for cloud, 90 for kvaser |
action_dim | int | 0 | dimension of the action space, 68 for both cloud and kvaser |
hidden_dim | int | 0 | dimension of the hidden layer |
n_layers | int | 0 | layer number of the lstm |
batch_size | int | 0 | batch size for the network |
padding_value | float | 0.0 | value to pad the input with |
tau | float | 0.0 | soft update parameter \(\tau\) |
lr | float | 0.0 | learning rate for the network |
ckpt_dir | Path | . | directory to restore the checkpoint from |
ckpt_interval | int | 0 | interval to save the checkpoint |
logger | Optional | None | logger for the class |
dict_logger | Optional | None | format specs to log the class |
SeqCritic.clone_weights
SeqCritic.clone_weights (moving_net)
Clone weights from a model to another model.
SeqCritic.soft_update
SeqCritic.soft_update (moving_net)
Update the target critic weights.
SeqCritic.save_ckpt
SeqCritic.save_ckpt ()
Save the checkpoint.
SeqCritic.evaluate_q
SeqCritic.evaluate_q (states, last_actions, actions)
*Evaluate the action value given the state and action
Args:
states (np.array): State in a minibatch
last_actions (np.array): Action in a minibatch
actions (np.array): Action in a minibatch
Returns:
np.array: Q-value*