The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
<<<<<<< HEAD
<<<<<<< HEAD
DPG (_truck:tspace.config.vehicles.Truck, _driver:tspace.config.drivers.Driver, _resume:bool, _coll_type:str, _hyper_param:Union[tspace.agent.utils.hyperparams.HyperParamDDPG,tsp ace.agent.utils.hyperparams.HyperParamRDPG,tspace.agent.utils.hyperp arams.HyperParamIDQL], _pool_key:str, _data_folder:str, _infer_mode:bool, _buffer:Union[tspace.storage.buffer.mongo.MongoBuf fer,tspace.storage.buffer.dask.DaskBuffer,NoneType]=None, _episode_s tart_dt:Optional[pandas._libs.tslibs.timestamps.Timestamp]=None, _ob servation_meta:Union[tspace.data.core.ObservationMetaCloud,tspace.da ta.core.ObservationMetaECU,NoneType]=None, _torque_table_row_names:Optional[list[str]]=None, _observations:Optional[list[pandas.core.series.Series]]=None, _epi_no:Optional[int]=None, logger:Optional[logging.Logger]=None, dict_logger:Optional[dict]=None)
*Base class for differentiable policy gradient methods
Attributes:
truck_type: class variable [`Truck`](https://Binjian.github.io/tspace/03.config.vehicles.html#truck) Type,
rdpg_hyper_type: class variable [`HyperParamRDPG`](https://Binjian.github.io/tspace/07.agent.utils.hyperparams.html#hyperparamrdpg)
_truck: Truck object
_driver: Driver object
_resume: bool type, whether to resume training or start from scratch
_coll_type: str, either "RECORD" or "EPISODE"
_hyper_param: either HyperParamDDPG or HyperParamRDPG object
_pool_key: str, database account, password, host and port specs
_data_folder: str, root for data folder
_infer_mode: bool, either pure inferring and no training or both inferring and training
_buffer: Buffer object, either [`MongoBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.mongo.html#mongobuffer) or [`DaskBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.dask.html#daskbuffer)
_episdoe_start_dt: Timestamp, starting time of the current episode
-observation_meta: metadata of the observation, either from Cloud or from Kvaser
_torque_table_row_name: list of str, ['r0', 'r1', 'r2', ...]
_observations: list of pd.Series, the observation quadruple (s, a, r, s')
_epi_no: int, sequence number of the episode
logger: logging.Logger, logging object
dict_logger: dict, logging format specs*
=======
=======
>>>>>>> 8d57352 ([fix] remove itikz in sandbox)
DPG (_truck:tspace.config.vehicles.Truck, _driver:tspace.config.drivers.Driver, _resume:bool, _coll_type:str, _hyper_param:Union[tspace.agent.utils.hyperparams.HyperParamDDPG,tsp ace.agent.utils.hyperparams.HyperParamRDPG,tspace.agent.utils.hyperp arams.HyperParamIDQL], _pool_key:str, _data_folder:str, _infer_mode:bool, _buffer:Union[tspace.storage.buffer.mongo.MongoBuf fer,tspace.storage.buffer.dask.DaskBuffer,NoneType]=None, _episode_s tart_dt:Optional[pandas._libs.tslibs.timestamps.Timestamp]=None, _ob servation_meta:Union[tspace.data.core.ObservationMetaCloud,tspace.da ta.core.ObservationMetaECU,NoneType]=None, _torque_table_row_names:Optional[list[str]]=None, _observations:Optional[list[pandas.core.series.Series]]=None, _epi_no:Optional[int]=None, logger:Optional[logging.Logger]=None, dict_logger:Optional[dict]=None)
*Base class for differentiable policy gradient methods
Attributes:
truck_type: class variable [`Truck`](https://Binjian.github.io/tspace/03.config.vehicles.html#truck) Type,
rdpg_hyper_type: class variable [`HyperParamRDPG`](https://Binjian.github.io/tspace/07.agent.utils.hyperparams.html#hyperparamrdpg)
_truck: Truck object
_driver: Driver object
_resume: bool type, whether to resume training or start from scratch
_coll_type: str, either "RECORD" or "EPISODE"
_hyper_param: either HyperParamDDPG or HyperParamRDPG object
_pool_key: str, database account, password, host and port specs
_data_folder: str, root for data folder
_infer_mode: bool, either pure inferring and no training or both inferring and training
_buffer: Buffer object, either [`MongoBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.mongo.html#mongobuffer) or [`DaskBuffer`](https://Binjian.github.io/tspace/05.storage.buffer.dask.html#daskbuffer)
_episdoe_start_dt: Timestamp, starting time of the current episode
-observation_meta: metadata of the observation, either from Cloud or from Kvaser
_torque_table_row_name: list of str, ['r0', 'r1', 'r2', ...]
_observations: list of pd.Series, the observation quadruple (s, a, r, s')
_epi_no: int, sequence number of the episode
logger: logging.Logger, logging object
dict_logger: dict, logging format specs*
>>>>>>> 8d57352 ([fix] remove itikz in sandbox)
DPG.__post_init__ ()
*Initialize the DPG object.
Heavy lifting of data interface (buffer, pool) for both DDPG and RDPG*
DPG.touch_gpu ()
warm up gpu for computing
DPG.init_checkpoint ()
*Actor create or restore from checkpoint
add checkpoints manager*
DPG.actor_predict (state:pandas.core.series.Series)
*Evaluate the actors given a single observations.
batch_size is 1.*
DPG.start_episode (ts:pandas._libs.tslibs.timestamps.Timestamp)
initialize observation list
DPG.deposit (timestamp:pandas._libs.tslibs.timestamps.Timestamp, state:pandas.core.series.Series, action:pandas.core.series.Series, reward:pandas.core.series.Series, nstate:pandas.core.series.Series)
Deposit the experience quadruple into the replay buffer.
Type | Details | |
---|---|---|
timestamp | pd.Timestamp | timestamp of the quadruple |
state | pd.Series | state, pd.Series [brake row -> thrust row -> timestep row -> velocity row ] |
action | pd.Series | action, pd.Series [r0, r1, r2, … rows -> speed row -> throttle row-> (flash) timestep row ] |
reward | pd.Series | reward, pd.Series [timestep row -> work row] |
nstate | pd.Series | next state, like state |
DPG.end_episode ()
Deposit the whole episode of experience into the replay buffer for DPG.
DPG.deposit_episode ()
Deposit the whole episode of experience into the replay buffer for DPG.
DPG.train ()
*Train the actor and critic moving network.
Return:
tuple: (actor_loss, critic_loss)*
DPG.get_losses ()
Get the actor and critic losses without calculating the gradients.
DPG.soft_update_target ()
*update target networks with tiny tau value, typical value 0.001.
It’ll be done once after each batch, slowly update target by Polyak averaging.*
DPG.save_ckpt ()
save checkpoints of actor and critic