RDPG
RDPG
RDPG (actor_net:Optional[tspace.agent.rdpg.seq_actor.SeqActor]=None, critic_net:Optional[tspace.agent.rdpg.seq_critic.SeqCritic]=None, t arget_actor_net:Optional[tspace.agent.rdpg.seq_actor.SeqActor]=None , target_critic_net:Optional[tspace.agent.rdpg.seq_critic.SeqCritic ]=None, _ckpt_actor_dir:Optional[pathlib.Path]=None, _ckpt_critic_dir:Optional[pathlib.Path]=None, _truck:tspace.config.vehicles.Truck, _driver:tspace.config.drivers.Driver, _resume:bool, _coll_type:str, _hyper_param:Union[tspace.agent.utils.hyperparams.HyperParamDDPG,ts pace.agent.utils.hyperparams.HyperParamRDPG,tspace.agent.utils.hype rparams.HyperParamIDQL], _pool_key:str, _data_folder:str, _infer_mode:bool, _buffer:Union[tspace.storage.buffer.mongo.MongoBu ffer,tspace.storage.buffer.dask.DaskBuffer,NoneType]=None, _episode _start_dt:Optional[pandas._libs.tslibs.timestamps.Timestamp]=None, _observation_meta:Union[tspace.data.core.ObservationMetaCloud,tspac e.data.core.ObservationMetaECU,NoneType]=None, _torque_table_row_names:Optional[list[str]]=None, _observations:Optional[list[pandas.core.series.Series]]=None, _epi_no:Optional[int]=None, logger:Optional[logging.Logger]=None, dict_logger:Optional[dict]=None)
*RDPG agent for VEOS.
Abstracts:
data interface:
- pool in mongodb
- buffer in memory (numpy array)
model interface:
- actor network
- critic network
Attributes:
actor_net: actor network
critic_net: critic network
target_actor_net: target actor network
target_critic_net: target critic network
_ckpt_actor_dir: checkpoint directory for actor
_ckpt_critic_dir: checkpoint directory for critic*
RDPG.__post_init__
RDPG.__post_init__ ()
*initialize the rdpg agent.
args: truck.ObservationNumber (int): dimension of the state space. padding_value (float): value to pad the state with, impossible value for observation, action or re*
RDPG.__repr__
RDPG.__repr__ ()
Return repr(self).
RDPG.__str__
RDPG.__str__ ()
Return str(self).
RDPG.__hash__
RDPG.__hash__ ()
Return hash(self).
RDPG.touch_gpu
RDPG.touch_gpu ()
touch the gpu to avoid the first time delay
RDPG.init_checkpoint
RDPG.init_checkpoint ()
create or restore from checkpoint
RDPG.actor_predict
RDPG.actor_predict (state:pandas.core.series.Series)
*get the action given a single observations by inference
batch size cannot be 1. For LSTM to be stateful, batch size must match the training scheme.*
Type | Details | |
---|---|---|
state | Series | state sequence of the current episode |
Returns | ndarray | action sequence of the current episode |
RDPG.actor_predict_step
RDPG.actor_predict_step (states:tensorflow.python.framework.tensor.Tenso r, last_actions:tensorflow.python.framework.tens or.Tensor)
*evaluate the actors given a single observations.
batch size is 1.*
Type | Details | |
---|---|---|
states | Tensor | state, dimension: [B,T,D] |
last_actions | Tensor | last action, dimension [B,T,D] |
Returns | Tensor |
RDPG.train
RDPG.train ()
*train the actor and critic moving network with truncated Backpropagation through time (TBPTT)
with k1 = k2 = self.hyperparam.tbptt_k1 (keras)
Return: tuple: (actor_loss, critic_loss)*
RDPG.train_step
RDPG.train_step (s_n_t, a_n_t, r_n_t, ns_n_t)
train in one step the critic using bptt
RDPG.end_episode
RDPG.end_episode ()
end the episode by depositing the episdoe and resetting the states of the actor and critic networks
RDPG.get_losses
RDPG.get_losses ()
get the losses of the actor and critic networks
RDPG.notrain
RDPG.notrain ()
*purely evaluate the actor and critic networks to return the losses without training.
Return: tuple: (actor_loss, critic_loss)*
RDPG.soft_update_target
RDPG.soft_update_target ()
*update target networks with tiny tau value,
typical value 0.001. done after each batch, slowly update target by polyak averaging.*
RDPG.save_ckpt
RDPG.save_ckpt ()
save the checkpoints of the actor and critic networks