DDPG

DDPG class

DDPG

 DDPG (_buffer:Union[tspace.storage.buffer.mongo.MongoBuffer,tspace.storag
       e.buffer.dask.DaskBuffer,NoneType]=None,
       _actor_model:Optional[keras.src.models.model.Model]=None,
       _critic_model:Optional[keras.src.models.model.Model]=None,
       _target_actor_model:Optional[keras.src.models.model.Model]=None,
       _target_critic_model:Optional[keras.src.models.model.Model]=None, m
       anager_critic:Optional[tensorflow.python.checkpoint.checkpoint_mana
       gement.CheckpointManager]=None, ckpt_critic:Optional[tensorflow.pyt
       hon.checkpoint.checkpoint.Checkpoint]=None, manager_actor:Optional[
       tensorflow.python.checkpoint.checkpoint_management.CheckpointManage
       r]=None, ckpt_actor:Optional[tensorflow.python.checkpoint.checkpoin
       t.Checkpoint]=None,
       actor_saved_model_path:Optional[pathlib.Path]=None,
       critic_saved_model_path:Optional[pathlib.Path]=None,
       _truck:tspace.config.vehicles.Truck,
       _driver:tspace.config.drivers.Driver, _resume:bool, _coll_type:str,
       _hyper_param:Union[tspace.agent.utils.hyperparams.HyperParamDDPG,ts
       pace.agent.utils.hyperparams.HyperParamRDPG,tspace.agent.utils.hype
       rparams.HyperParamIDQL], _pool_key:str, _data_folder:str,
       _infer_mode:bool, _episode_start_dt:Optional[pandas._libs.tslibs.ti
       mestamps.Timestamp]=None, _observation_meta:Union[tspace.data.core.
       ObservationMetaCloud,tspace.data.core.ObservationMetaECU,NoneType]=
       None, _torque_table_row_names:Optional[list[str]]=None,
       _observations:Optional[list[pandas.core.series.Series]]=None,
       _epi_no:Optional[int]=None, logger:Optional[logging.Logger]=None,
       dict_logger:Optional[dict]=None)

*DDPG agent.

data interface: - pool in mongodb - buffer in memory (numpy array) model interface: - actor network - critic network

Attributes:

_buffer: Optional[Union[MongoBuffer, DaskBuffer]] = None
    buffer for storing data, default is None
_actor_model: Optional[tf.keras.Model] = None
    actor network, default is None
_critic_model: Optional[tf.keras.Model] = None
    critic network, default is None
_target_actor_model: Optional[tf.keras.Model] = None
    target actor network, default is None
_target_critic_model: Optional[tf.keras.Model] = None
    target critic network, default is None
manager_critic: Optional[tf.train.CheckpointManager] = None
    manager for saving critic network, default is None
ckpt_critic: Optional[tf.train.Checkpoint] = None
    checkpoint for saving critic network, default is None
manager_actor: Optional[tf.train.CheckpointManager] = None
    manager for saving actor network, default is None
ckpt_actor: Optional[tf.train.Checkpoint] = None
    checkpoint for saving actor network, default is None
actor_saved_model_path: Optional[Path] = None
    path for saving actor network as saved model, default is None
critic_saved_model_path: Optional[Path] = None
    path for saving critic network as saved model, default is None*

source

DDPG.init_checkpoint

 DDPG.init_checkpoint ()

add checkpoints manager

source

DDPG.save_as_saved_model

 DDPG.save_as_saved_model ()

save the actor and critic networks as saved model

source

DDPG.load_saved_model

 DDPG.load_saved_model ()

load the actor and critic networks from saved model

source

DDPG.convert_to_tflite

 DDPG.convert_to_tflite ()

convert the actor and critic networks to tflite format

source

DDPG.model_summary_print

 DDPG.model_summary_print (mdl:keras.src.models.model.Model,
                           file_path:pathlib.Path)

print the model summary to a file

source

DDPG.tflite_analytics_print

 DDPG.tflite_analytics_print (tflite_file_path:pathlib.Path)

print the tflite model analytics to a file

source

DDPG.save_ckpt

 DDPG.save_ckpt ()

Save checkpoints for the actor and critic networks

source

DDPG.update_target

 DDPG.update_target (target_weights, weights, tau)

update the target networks

source

DDPG.soft_update_target

 DDPG.soft_update_target ()

update the target networks with Polyak averaging

source

DDPG.get_actor

 DDPG.get_actor (num_states:int, num_actions:int,
                 num_actor_inputs1:int=256, num_actor_inputs2:int=256,
                 num_layers:int=2, action_bias:float=0)

Create actor network

	Type	Default	Details
num_states	int		number of states, 600
num_actions	int		number of actions, 68
num_actor_inputs1	int	256	number of inputs to the first dense layer, 256
num_actor_inputs2	int	256	number of inputs to the second and subsequent dense layers, 256
num_layers	int	2	number of layers, 2
action_bias	float	0	action bias, 0

source

DDPG.get_critic

 DDPG.get_critic (num_states:int, num_actions:int,
                  num_state_input_dense1:int=16,
                  num_state_input_dense2:int=32,
                  num_action_input_dense:int=32,
                  num_output_dense1:int=256, num_output_dense2:int=256,
                  num_layers:int=2)

source

DDPG.policy

 DDPG.policy (state:pandas.core.series.Series)

*sample actions with additive ou noise

input: state is a pd.Series of length 3103/4503 (r*c), output numpy array

Action outputs and noise object are all row vectors of length 2117 (rc), output numpy array*

source

DDPG.actor_predict

 DDPG.actor_predict (state:pandas.core.series.Series)

actor_predict returns an action sampled from our Actor network without noise. add optional t just to have uniform interface with rdpg

source

DDPG.infer_single_sample

 DDPG.infer_single_sample
                           (state_flat:tensorflow.python.framework.tensor.
                           Tensor)

Get a single sample from inferring

source

DDPG.touch_gpu

 DDPG.touch_gpu ()

touch gpu to initialize the graph

source

DDPG.sample_minibatch

 DDPG.sample_minibatch ()

Convert batch type from DataFrames to flattened tensors.

source

DDPG.train

 DDPG.train ()

Train the networks on the batch sampled from the pool.

source

DDPG.update_with_batch

 DDPG.update_with_batch (state_batch, action_batch, reward_batch,
                         next_state_batch, training=True)

*Update the networks with the batch sampled from the pool.

Eager execution is turned on by default in TensorFlow 2. Decorating with tf.function allows TensorFlow to build a static graph out of the logic and computations in our function. This provides a large speed-up for blocks of code that contain many small TensorFlow operations such as this one.*

source

DDPG.get_losses

 DDPG.get_losses ()

Get the losses of the networks on the batch sampled from the pool.