
Sequential Critic class with LSTM layers for RDPG agent



 SeqCritic (state_dim:int=0, action_dim:int=0, hidden_dim:int=0,
            n_layers:int=0, batch_size:int=0, padding_value:float=0.0,
            tau:float=0.0, lr:float=0.0, ckpt_dir:pathlib.Path=Path('.'),
            ckpt_interval:int=0, logger:Optional[logging.Logger]=None,

*Sequential Critic network for the RDPG algorithm.


state_dim (int): Dimension of the state space.
action_dim (int): Dimension of the action space.
hidden_dim (int): Dimension of the hidden layer.
n_layers (int): Number of layers in the network.
batch_size (int): Batch size for the network.
padding_value (float): Value to pad the input with.
tau (float): Soft update parameter.
lr (float): Learning rate for the network.
ckpt_dir (str): Directory to restore the checkpoint from.
ckpt_interval (int): Interval to save the checkpoint.
logger (logging.Logger): Logger for the class.
dict_logger (dict): Dictionary to log the class.*
Type Default Details
state_dim int 0 dimension of the state space, 600 for cloud, 90 for kvaser
action_dim int 0 dimension of the action space, 68 for both cloud and kvaser
hidden_dim int 0 dimension of the hidden layer
n_layers int 0 layer number of the lstm
batch_size int 0 batch size for the network
padding_value float 0.0 value to pad the input with
tau float 0.0 soft update parameter \(\tau\)
lr float 0.0 learning rate for the network
ckpt_dir Path . directory to restore the checkpoint from
ckpt_interval int 0 interval to save the checkpoint
logger Optional None logger for the class
dict_logger Optional None format specs to log the class



 SeqCritic.__init__ (state_dim:int=0, action_dim:int=0, hidden_dim:int=0,
                     n_layers:int=0, batch_size:int=0,
                     padding_value:float=0.0, tau:float=0.0, lr:float=0.0,
                     ckpt_dir:pathlib.Path=Path('.'), ckpt_interval:int=0,

*Initialize the critic network.

restore checkpoint from the provided directory if it exists, initialize otherwise. Args: state_dim (int): Dimension of the state space. action_dim (int): Dimension of the action space. hidden_dim (int): Dimension of the hidden layer. lr (float): Learning rate for the network. ckpt_dir (str): Directory to restore the checkpoint from.*

Type Default Details
state_dim int 0 dimension of the state space, 600 for cloud, 90 for kvaser
action_dim int 0 dimension of the action space, 68 for both cloud and kvaser
hidden_dim int 0 dimension of the hidden layer
n_layers int 0 layer number of the lstm
batch_size int 0 batch size for the network
padding_value float 0.0 value to pad the input with
tau float 0.0 soft update parameter \(\tau\)
lr float 0.0 learning rate for the network
ckpt_dir Path . directory to restore the checkpoint from
ckpt_interval int 0 interval to save the checkpoint
logger Optional None logger for the class
dict_logger Optional None format specs to log the class



 SeqCritic.clone_weights (moving_net)

Clone weights from a model to another model.



 SeqCritic.soft_update (moving_net)

Update the target critic weights.



 SeqCritic.save_ckpt ()

Save the checkpoint.



 SeqCritic.evaluate_q (states, last_actions, actions)

*Evaluate the action value given the state and action


states (np.array): State in a minibatch
last_actions (np.array): Action in a minibatch
actions (np.array): Action in a minibatch


np.array: Q-value*