RDPG

RDPG class

source

RDPG

 RDPG (actor_net:Optional[tspace.agent.rdpg.seq_actor.SeqActor]=None,
       critic_net:Optional[tspace.agent.rdpg.seq_critic.SeqCritic]=None, t
       arget_actor_net:Optional[tspace.agent.rdpg.seq_actor.SeqActor]=None
       , target_critic_net:Optional[tspace.agent.rdpg.seq_critic.SeqCritic
       ]=None, _ckpt_actor_dir:Optional[pathlib.Path]=None,
       _ckpt_critic_dir:Optional[pathlib.Path]=None,
       _truck:tspace.config.vehicles.Truck,
       _driver:tspace.config.drivers.Driver, _resume:bool, _coll_type:str,
       _hyper_param:Union[tspace.agent.utils.hyperparams.HyperParamDDPG,ts
       pace.agent.utils.hyperparams.HyperParamRDPG,tspace.agent.utils.hype
       rparams.HyperParamIDQL], _pool_key:str, _data_folder:str,
       _infer_mode:bool, _buffer:Union[tspace.storage.buffer.mongo.MongoBu
       ffer,tspace.storage.buffer.dask.DaskBuffer,NoneType]=None, _episode
       _start_dt:Optional[pandas._libs.tslibs.timestamps.Timestamp]=None, 
       _observation_meta:Union[tspace.data.core.ObservationMetaCloud,tspac
       e.data.core.ObservationMetaECU,NoneType]=None,
       _torque_table_row_names:Optional[list[str]]=None,
       _observations:Optional[list[pandas.core.series.Series]]=None,
       _epi_no:Optional[int]=None, logger:Optional[logging.Logger]=None,
       dict_logger:Optional[dict]=None)

*RDPG agent for VEOS.

Abstracts:

data interface:
    - pool in mongodb
    - buffer in memory (numpy array)
model interface:
    - actor network
    - critic network

Attributes:

actor_net: actor network
critic_net: critic network
target_actor_net: target actor network
target_critic_net: target critic network
_ckpt_actor_dir: checkpoint directory for actor
_ckpt_critic_dir: checkpoint directory for critic*

source

RDPG.__post_init__

 RDPG.__post_init__ ()

*initialize the rdpg agent.

args: truck.ObservationNumber (int): dimension of the state space. padding_value (float): value to pad the state with, impossible value for observation, action or re*


source

RDPG.__repr__

 RDPG.__repr__ ()

Return repr(self).


source

RDPG.__str__

 RDPG.__str__ ()

Return str(self).


source

RDPG.__hash__

 RDPG.__hash__ ()

Return hash(self).


source

RDPG.touch_gpu

 RDPG.touch_gpu ()

touch the gpu to avoid the first time delay


source

RDPG.init_checkpoint

 RDPG.init_checkpoint ()

create or restore from checkpoint


source

RDPG.actor_predict

 RDPG.actor_predict (state:pandas.core.series.Series)

*get the action given a single observations by inference

batch size cannot be 1. For LSTM to be stateful, batch size must match the training scheme.*

Type Details
state Series state sequence of the current episode
Returns ndarray action sequence of the current episode

source

RDPG.actor_predict_step

 RDPG.actor_predict_step
                          (states:tensorflow.python.framework.tensor.Tenso
                          r, last_actions:tensorflow.python.framework.tens
                          or.Tensor)

*evaluate the actors given a single observations.

batch size is 1.*

Type Details
states Tensor state, dimension: [B,T,D]
last_actions Tensor last action, dimension [B,T,D]
Returns Tensor

source

RDPG.train

 RDPG.train ()

*train the actor and critic moving network with truncated Backpropagation through time (TBPTT)

with k1 = k2 = self.hyperparam.tbptt_k1 (keras)

Return: tuple: (actor_loss, critic_loss)*


source

RDPG.train_step

 RDPG.train_step (s_n_t, a_n_t, r_n_t, ns_n_t)

train in one step the critic using bptt


source

RDPG.end_episode

 RDPG.end_episode ()

end the episode by depositing the episdoe and resetting the states of the actor and critic networks


source

RDPG.get_losses

 RDPG.get_losses ()

get the losses of the actor and critic networks


source

RDPG.notrain

 RDPG.notrain ()

*purely evaluate the actor and critic networks to return the losses without training.

Return: tuple: (actor_loss, critic_loss)*


source

RDPG.soft_update_target

 RDPG.soft_update_target ()

*update target networks with tiny tau value,

typical value 0.001. done after each batch, slowly update target by polyak averaging.*


source

RDPG.save_ckpt

 RDPG.save_ckpt ()

save the checkpoints of the actor and critic networks