-
由 Davide Liu 提交于
* added experience replay and n-step * implementing distributional q value * added distributional q-value * added overview in qac_dist and d4pg * derived D4PG from DDPG * fixed a bug when action shape >1 * benchmark D4PG mujoco + minor fixs -entry for DDPG mujoco -entry for D4PG mujoco -config for D4PG mujoco -fixed style D4PG code -unittests for QAC distributional * formatted code * minor updates (read description) -added d4pg seria_entry test -updated comments in QACDIST -added d4pg in commander register -added q_value in d4pg return dict -added priority update in d4pg entry -added assertion in QACDIST
16a89c35
To learn more about this project, read
the wiki.