Red¶
red_irl_model¶
RedRewardModel¶
- class ding.reward_model.red_irl_model.RedRewardModel(config: Dict, device: str, tb_logger: SummaryWriter)[source]¶
- Overview:
The implement of reward model in RED (https://arxiv.org/abs/1905.06750)
- Interface:
estimate
,train
,load_expert_data
,collect_data
,clear_date
,__init__
,_train
- Properties:
online_net (:obj: SENet): The reward model, in default initialized once as the training begins.
- __init__(config: Dict, device: str, tb_logger: SummaryWriter) None [source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature.- Arguments:
cfg (
Dict
): Training configdevice (
str
): Device usage, i.e. “cpu” or “cuda”tb_logger (
str
): Logger, defaultly set as ‘SummaryWriter’ for model summary
- clear_data()[source]¶
- Overview:
Collecting clearing data, not implemented if reward model (i.e. online_net) is only trained ones, if online_net is trained continuously, there should be some implementations in clear_data method
- collect_data(data) None [source]¶
- Overview:
Collecting training data, not implemented if reward model (i.e. online_net) is only trained ones, if online_net is trained continuously, there should be some implementations in collect_data method
- estimate(data: list) None [source]¶
- Overview:
Estimate reward by rewriting the reward key
- Arguments:
data (
list
): the list of data used for estimation, with at leastobs
andaction
keys.
- Effects:
This is a side effect function which updates the reward values in place.