DDPG
DDPG
DDPG (_buffer:Union[tspace.storage.buffer.mongo.MongoBuffer,tspace.storag e.buffer.dask.DaskBuffer,NoneType]=None, _actor_model:Optional[keras.src.models.model.Model]=None, _critic_model:Optional[keras.src.models.model.Model]=None, _target_actor_model:Optional[keras.src.models.model.Model]=None, _target_critic_model:Optional[keras.src.models.model.Model]=None, m anager_critic:Optional[tensorflow.python.checkpoint.checkpoint_mana gement.CheckpointManager]=None, ckpt_critic:Optional[tensorflow.pyt hon.checkpoint.checkpoint.Checkpoint]=None, manager_actor:Optional[ tensorflow.python.checkpoint.checkpoint_management.CheckpointManage r]=None, ckpt_actor:Optional[tensorflow.python.checkpoint.checkpoin t.Checkpoint]=None, actor_saved_model_path:Optional[pathlib.Path]=None, critic_saved_model_path:Optional[pathlib.Path]=None, _truck:tspace.config.vehicles.Truck, _driver:tspace.config.drivers.Driver, _resume:bool, _coll_type:str, _hyper_param:Union[tspace.agent.utils.hyperparams.HyperParamDDPG,ts pace.agent.utils.hyperparams.HyperParamRDPG,tspace.agent.utils.hype rparams.HyperParamIDQL], _pool_key:str, _data_folder:str, _infer_mode:bool, _episode_start_dt:Optional[pandas._libs.tslibs.ti mestamps.Timestamp]=None, _observation_meta:Union[tspace.data.core. ObservationMetaCloud,tspace.data.core.ObservationMetaECU,NoneType]= None, _torque_table_row_names:Optional[list[str]]=None, _observations:Optional[list[pandas.core.series.Series]]=None, _epi_no:Optional[int]=None, logger:Optional[logging.Logger]=None, dict_logger:Optional[dict]=None)
*DDPG agent.
data interface: - pool in mongodb - buffer in memory (numpy array) model interface: - actor network - critic network
Attributes:
_buffer: Optional[Union[MongoBuffer, DaskBuffer]] = None
buffer for storing data, default is None
_actor_model: Optional[tf.keras.Model] = None
actor network, default is None
_critic_model: Optional[tf.keras.Model] = None
critic network, default is None
_target_actor_model: Optional[tf.keras.Model] = None
target actor network, default is None
_target_critic_model: Optional[tf.keras.Model] = None
target critic network, default is None
manager_critic: Optional[tf.train.CheckpointManager] = None
manager for saving critic network, default is None
ckpt_critic: Optional[tf.train.Checkpoint] = None
checkpoint for saving critic network, default is None
manager_actor: Optional[tf.train.CheckpointManager] = None
manager for saving actor network, default is None
ckpt_actor: Optional[tf.train.Checkpoint] = None
checkpoint for saving actor network, default is None
actor_saved_model_path: Optional[Path] = None
path for saving actor network as saved model, default is None
critic_saved_model_path: Optional[Path] = None
path for saving critic network as saved model, default is None*
DDPG.init_checkpoint
DDPG.init_checkpoint ()
add checkpoints manager
DDPG.save_as_saved_model
DDPG.save_as_saved_model ()
save the actor and critic networks as saved model
DDPG.load_saved_model
DDPG.load_saved_model ()
load the actor and critic networks from saved model
DDPG.convert_to_tflite
DDPG.convert_to_tflite ()
convert the actor and critic networks to tflite format
DDPG.model_summary_print
DDPG.model_summary_print (mdl:keras.src.models.model.Model, file_path:pathlib.Path)
print the model summary to a file
DDPG.tflite_analytics_print
DDPG.tflite_analytics_print (tflite_file_path:pathlib.Path)
print the tflite model analytics to a file
DDPG.save_ckpt
DDPG.save_ckpt ()
Save checkpoints for the actor and critic networks
DDPG.update_target
DDPG.update_target (target_weights, weights, tau)
update the target networks
DDPG.soft_update_target
DDPG.soft_update_target ()
update the target networks with Polyak averaging
DDPG.get_actor
DDPG.get_actor (num_states:int, num_actions:int, num_actor_inputs1:int=256, num_actor_inputs2:int=256, num_layers:int=2, action_bias:float=0)
Create actor network
Type | Default | Details | |
---|---|---|---|
num_states | int | number of states, 600 | |
num_actions | int | number of actions, 68 | |
num_actor_inputs1 | int | 256 | number of inputs to the first dense layer, 256 |
num_actor_inputs2 | int | 256 | number of inputs to the second and subsequent dense layers, 256 |
num_layers | int | 2 | number of layers, 2 |
action_bias | float | 0 | action bias, 0 |
DDPG.get_critic
DDPG.get_critic (num_states:int, num_actions:int, num_state_input_dense1:int=16, num_state_input_dense2:int=32, num_action_input_dense:int=32, num_output_dense1:int=256, num_output_dense2:int=256, num_layers:int=2)
DDPG.policy
DDPG.policy (state:pandas.core.series.Series)
*sample actions with additive ou noise
input: state is a pd.Series of length 3103/4503 (r*c), output numpy array
Action outputs and noise object are all row vectors of length 2117 (rc), output numpy array*
DDPG.actor_predict
DDPG.actor_predict (state:pandas.core.series.Series)
actor_predict
returns an action sampled from our Actor network without noise. add optional t just to have uniform interface with rdpg
DDPG.infer_single_sample
DDPG.infer_single_sample (state_flat:tensorflow.python.framework.tensor. Tensor)
Get a single sample from inferring
DDPG.touch_gpu
DDPG.touch_gpu ()
touch gpu to initialize the graph
DDPG.sample_minibatch
DDPG.sample_minibatch ()
Convert batch type from DataFrames to flattened tensors.
DDPG.train
DDPG.train ()
Train the networks on the batch sampled from the pool.
DDPG.update_with_batch
DDPG.update_with_batch (state_batch, action_batch, reward_batch, next_state_batch, training=True)
*Update the networks with the batch sampled from the pool.
Eager execution is turned on by default in TensorFlow 2. Decorating with tf.function allows TensorFlow to build a static graph out of the logic and computations in our function. This provides a large speed-up for blocks of code that contain many small TensorFlow operations such as this one.*
DDPG.get_losses
DDPG.get_losses ()
Get the losses of the networks on the batch sampled from the pool.