Base Model

base_reward_estimate

BaseRewardModel

class ding.reward_model.base_reward_model.BaseRewardModel[source]
Overview:

the base class of reward model

Interface:

default_config, estimate, train, clear_data, collect_data, load_expert_date

abstract clear_data() None[source]
Overview:

Clearing training data. This can be a side effect function which clears the data attribute in self

abstract collect_data(data) None[source]
Overview:

Collecting training data in designated formate or with designated transition.

Arguments:
  • data (Any): Raw training data (e.g. some form of states, actions, obs, etc)

Returns / Effects:
  • This can be a side effect function which updates the data attribute in self

abstract estimate(data: list) None[source]
Overview:

estimate reward

Arguments:
  • data (List): the list of data used for estimation

Returns / Effects:
  • This can be a side effect function which updates the reward value

  • If this function returns, an example returned object can be reward (Any): the estimated reward

load_expert_data(data) None[source]
Overview:

Getting the expert data, usually used in inverse RL reward model

Arguments:
  • data (Any): Expert data

Effects:

This is mostly a side effect function which updates the expert data attribute (e.g. self.expert_data)

abstract train(data) None[source]
Overview:

Training the reward model

Arguments:
  • data (Any): Data used for training

Effects:
  • This is mostly a side effect function which updates the reward model

create_reward_model

Overview:

Reward Estimation Model.

Arguments:
  • cfg (Dict): Training config

  • device (str): Device usage, i.e. “cpu” or “cuda”

  • tb_logger (str): Logger, defaultly set as ‘SummaryWriter’ for model summary

Returns:
  • reward (Any): The reward model