DPG

ABC DPG class
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

<<<<<<< HEAD

<<<<<<< HEAD


source

DPG

 DPG (_truck:tspace.config.vehicles.Truck,
      _driver:tspace.config.drivers.Driver, _resume:bool, _coll_type:str, 
      _hyper_param:Union[tspace.agent.utils.hyperparams.HyperParamDDPG,tsp
      ace.agent.utils.hyperparams.HyperParamRDPG,tspace.agent.utils.hyperp
      arams.HyperParamIDQL], _pool_key:str, _data_folder:str,
      _infer_mode:bool, _buffer:Union[tspace.storage.buffer.mongo.MongoBuf
      fer,tspace.storage.buffer.dask.DaskBuffer,NoneType]=None, _episode_s
      tart_dt:Optional[pandas._libs.tslibs.timestamps.Timestamp]=None, _ob
      servation_meta:Union[tspace.data.core.ObservationMetaCloud,tspace.da
      ta.core.ObservationMetaECU,NoneType]=None,
      _torque_table_row_names:Optional[list[str]]=None,
      _observations:Optional[list[pandas.core.series.Series]]=None,
      _epi_no:Optional[int]=None, logger:Optional[logging.Logger]=None,
      dict_logger:Optional[dict]=None)

*Base class for differentiable policy gradient methods

Attributes:

truck_type: class variable [`Truck`](https://Binjian.github.io/tspace/03.config.vehicles.html#truck) Type,
rdpg_hyper_type: class variable [`HyperParamRDPG`](https://Binjian.github.io/tspace/07.agent.utils.hyperparams.html#hyperparamrdpg)
_truck: Truck object
_driver: Driver object
_resume: bool type, whether to resume training or start from scratch
_coll_type: str, either "RECORD" or "EPISODE"
_hyper_param: either HyperParamDDPG or HyperParamRDPG object
_pool_key: str, database account, password, host and port specs
_data_folder: str, root for data folder
_infer_mode: bool, either pure inferring and no training or both inferring and training
_buffer: Buffer object, either [`MongoBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.mongo.html#mongobuffer) or [`DaskBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.dask.html#daskbuffer)
_episdoe_start_dt: Timestamp, starting time of the current episode
-observation_meta: metadata of the observation, either from Cloud or from Kvaser
_torque_table_row_name: list of str, ['r0', 'r1', 'r2', ...]
_observations: list of pd.Series, the observation quadruple (s, a, r, s')
_epi_no: int, sequence number of the episode
logger: logging.Logger, logging object
dict_logger: dict, logging format specs*

=======

=======

>>>>>>> 8d57352 ([fix] remove itikz in sandbox)


source

DPG

 DPG (_truck:tspace.config.vehicles.Truck,
      _driver:tspace.config.drivers.Driver, _resume:bool, _coll_type:str, 
      _hyper_param:Union[tspace.agent.utils.hyperparams.HyperParamDDPG,tsp
      ace.agent.utils.hyperparams.HyperParamRDPG,tspace.agent.utils.hyperp
      arams.HyperParamIDQL], _pool_key:str, _data_folder:str,
      _infer_mode:bool, _buffer:Union[tspace.storage.buffer.mongo.MongoBuf
      fer,tspace.storage.buffer.dask.DaskBuffer,NoneType]=None, _episode_s
      tart_dt:Optional[pandas._libs.tslibs.timestamps.Timestamp]=None, _ob
      servation_meta:Union[tspace.data.core.ObservationMetaCloud,tspace.da
      ta.core.ObservationMetaECU,NoneType]=None,
      _torque_table_row_names:Optional[list[str]]=None,
      _observations:Optional[list[pandas.core.series.Series]]=None,
      _epi_no:Optional[int]=None, logger:Optional[logging.Logger]=None,
      dict_logger:Optional[dict]=None)

*Base class for differentiable policy gradient methods

Attributes:

truck_type: class variable [`Truck`](https://Binjian.github.io/tspace/03.config.vehicles.html#truck) Type,
rdpg_hyper_type: class variable [`HyperParamRDPG`](https://Binjian.github.io/tspace/07.agent.utils.hyperparams.html#hyperparamrdpg)
_truck: Truck object
_driver: Driver object
_resume: bool type, whether to resume training or start from scratch
_coll_type: str, either "RECORD" or "EPISODE"
_hyper_param: either HyperParamDDPG or HyperParamRDPG object
_pool_key: str, database account, password, host and port specs
_data_folder: str, root for data folder
_infer_mode: bool, either pure inferring and no training or both inferring and training
_buffer: Buffer object, either [`MongoBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.mongo.html#mongobuffer) or [`DaskBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.dask.html#daskbuffer)
_episdoe_start_dt: Timestamp, starting time of the current episode
-observation_meta: metadata of the observation, either from Cloud or from Kvaser
_torque_table_row_name: list of str, ['r0', 'r1', 'r2', ...]
_observations: list of pd.Series, the observation quadruple (s, a, r, s')
_epi_no: int, sequence number of the episode
logger: logging.Logger, logging object
dict_logger: dict, logging format specs*

>>>>>>> 8d57352 ([fix] remove itikz in sandbox)


source

DPG.__post_init__

 DPG.__post_init__ ()

*Initialize the DPG object.

Heavy lifting of data interface (buffer, pool) for both DDPG and RDPG*


source

DPG.touch_gpu

 DPG.touch_gpu ()

warm up gpu for computing


source

DPG.init_checkpoint

 DPG.init_checkpoint ()

*Actor create or restore from checkpoint

add checkpoints manager*


source

DPG.actor_predict

 DPG.actor_predict (state:pandas.core.series.Series)

*Evaluate the actors given a single observations.

batch_size is 1.*


source

DPG.start_episode

 DPG.start_episode (ts:pandas._libs.tslibs.timestamps.Timestamp)

initialize observation list


source

DPG.deposit

 DPG.deposit (timestamp:pandas._libs.tslibs.timestamps.Timestamp,
              state:pandas.core.series.Series,
              action:pandas.core.series.Series,
              reward:pandas.core.series.Series,
              nstate:pandas.core.series.Series)

Deposit the experience quadruple into the replay buffer.

Type Details
timestamp pd.Timestamp timestamp of the quadruple
state pd.Series state, pd.Series [brake row -> thrust row -> timestep row -> velocity row ]
action pd.Series action, pd.Series [r0, r1, r2, … rows -> speed row -> throttle row-> (flash) timestep row ]
reward pd.Series reward, pd.Series [timestep row -> work row]
nstate pd.Series next state, like state

source

DPG.end_episode

 DPG.end_episode ()

Deposit the whole episode of experience into the replay buffer for DPG.


source

DPG.deposit_episode

 DPG.deposit_episode ()

Deposit the whole episode of experience into the replay buffer for DPG.


source

DPG.train

 DPG.train ()

*Train the actor and critic moving network.

Return:

tuple: (actor_loss, critic_loss)*

source

DPG.get_losses

 DPG.get_losses ()

Get the actor and critic losses without calculating the gradients.


source

DPG.soft_update_target

 DPG.soft_update_target ()

*update target networks with tiny tau value, typical value 0.001.

It’ll be done once after each batch, slowly update target by Polyak averaging.*


source

DPG.save_ckpt

 DPG.save_ckpt ()

save checkpoints of actor and critic