DPG

ABC DPG class

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

<<<<<<< HEAD

source

DPG

 DPG (_truck:tspace.config.vehicles.Truck,
      _driver:tspace.config.drivers.Driver, _resume:bool, _coll_type:str, 
      _hyper_param:Union[tspace.agent.utils.hyperparams.HyperParamDDPG,tsp
      ace.agent.utils.hyperparams.HyperParamRDPG,tspace.agent.utils.hyperp
      arams.HyperParamIDQL], _pool_key:str, _data_folder:str,
      _infer_mode:bool, _buffer:Union[tspace.storage.buffer.mongo.MongoBuf
      fer,tspace.storage.buffer.dask.DaskBuffer,NoneType]=None, _episode_s
      tart_dt:Optional[pandas._libs.tslibs.timestamps.Timestamp]=None, _ob
      servation_meta:Union[tspace.data.core.ObservationMetaCloud,tspace.da
      ta.core.ObservationMetaECU,NoneType]=None,
      _torque_table_row_names:Optional[list[str]]=None,
      _observations:Optional[list[pandas.core.series.Series]]=None,
      _epi_no:Optional[int]=None, logger:Optional[logging.Logger]=None,
      dict_logger:Optional[dict]=None)

*Base class for differentiable policy gradient methods

Attributes:

truck_type: class variable [`Truck`](https://Binjian.github.io/tspace/03.config.vehicles.html#truck) Type,
rdpg_hyper_type: class variable [`HyperParamRDPG`](https://Binjian.github.io/tspace/07.agent.utils.hyperparams.html#hyperparamrdpg)
_truck: Truck object
_driver: Driver object
_resume: bool type, whether to resume training or start from scratch
_coll_type: str, either "RECORD" or "EPISODE"
_hyper_param: either HyperParamDDPG or HyperParamRDPG object
_pool_key: str, database account, password, host and port specs
_data_folder: str, root for data folder
_infer_mode: bool, either pure inferring and no training or both inferring and training
_buffer: Buffer object, either [`MongoBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.mongo.html#mongobuffer) or [`DaskBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.dask.html#daskbuffer)
_episdoe_start_dt: Timestamp, starting time of the current episode
-observation_meta: metadata of the observation, either from Cloud or from Kvaser
_torque_table_row_name: list of str, ['r0', 'r1', 'r2', ...]
_observations: list of pd.Series, the observation quadruple (s, a, r, s')
_epi_no: int, sequence number of the episode
logger: logging.Logger, logging object
dict_logger: dict, logging format specs*

=======

>>>>>>> 8d57352 ([fix] remove itikz in sandbox)

source

DPG

 DPG (_truck:tspace.config.vehicles.Truck,
      _driver:tspace.config.drivers.Driver, _resume:bool, _coll_type:str, 
      _hyper_param:Union[tspace.agent.utils.hyperparams.HyperParamDDPG,tsp
      ace.agent.utils.hyperparams.HyperParamRDPG,tspace.agent.utils.hyperp
      arams.HyperParamIDQL], _pool_key:str, _data_folder:str,
      _infer_mode:bool, _buffer:Union[tspace.storage.buffer.mongo.MongoBuf
      fer,tspace.storage.buffer.dask.DaskBuffer,NoneType]=None, _episode_s
      tart_dt:Optional[pandas._libs.tslibs.timestamps.Timestamp]=None, _ob
      servation_meta:Union[tspace.data.core.ObservationMetaCloud,tspace.da
      ta.core.ObservationMetaECU,NoneType]=None,
      _torque_table_row_names:Optional[list[str]]=None,
      _observations:Optional[list[pandas.core.series.Series]]=None,
      _epi_no:Optional[int]=None, logger:Optional[logging.Logger]=None,
      dict_logger:Optional[dict]=None)

*Base class for differentiable policy gradient methods

Attributes:

truck_type: class variable [`Truck`](https://Binjian.github.io/tspace/03.config.vehicles.html#truck) Type,
rdpg_hyper_type: class variable [`HyperParamRDPG`](https://Binjian.github.io/tspace/07.agent.utils.hyperparams.html#hyperparamrdpg)
_truck: Truck object
_driver: Driver object
_resume: bool type, whether to resume training or start from scratch
_coll_type: str, either "RECORD" or "EPISODE"
_hyper_param: either HyperParamDDPG or HyperParamRDPG object
_pool_key: str, database account, password, host and port specs
_data_folder: str, root for data folder
_infer_mode: bool, either pure inferring and no training or both inferring and training
_buffer: Buffer object, either [`MongoBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.mongo.html#mongobuffer) or [`DaskBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.dask.html#daskbuffer)
_episdoe_start_dt: Timestamp, starting time of the current episode
-observation_meta: metadata of the observation, either from Cloud or from Kvaser
_torque_table_row_name: list of str, ['r0', 'r1', 'r2', ...]
_observations: list of pd.Series, the observation quadruple (s, a, r, s')
_epi_no: int, sequence number of the episode
logger: logging.Logger, logging object
dict_logger: dict, logging format specs*

>>>>>>> 8d57352 ([fix] remove itikz in sandbox)

source

DPG.__post_init__

 DPG.__post_init__ ()

*Initialize the DPG object.

Heavy lifting of data interface (buffer, pool) for both DDPG and RDPG*

source

DPG.touch_gpu

 DPG.touch_gpu ()

warm up gpu for computing

source

DPG.init_checkpoint

 DPG.init_checkpoint ()

*Actor create or restore from checkpoint

add checkpoints manager*

source

DPG.actor_predict

 DPG.actor_predict (state:pandas.core.series.Series)

*Evaluate the actors given a single observations.

batch_size is 1.*

source

DPG.start_episode

 DPG.start_episode (ts:pandas._libs.tslibs.timestamps.Timestamp)

initialize observation list

source

DPG.deposit

 DPG.deposit (timestamp:pandas._libs.tslibs.timestamps.Timestamp,
              state:pandas.core.series.Series,
              action:pandas.core.series.Series,
              reward:pandas.core.series.Series,
              nstate:pandas.core.series.Series)

Deposit the experience quadruple into the replay buffer.

	Type	Details
timestamp	pd.Timestamp	timestamp of the quadruple
state	pd.Series	state, pd.Series [brake row -> thrust row -> timestep row -> velocity row ]
action	pd.Series	action, pd.Series [r0, r1, r2, … rows -> speed row -> throttle row-> (flash) timestep row ]
reward	pd.Series	reward, pd.Series [timestep row -> work row]
nstate	pd.Series	next state, like state