Base Model¶

base_reward_estimate¶

class ding.reward_model.base_reward_model.BaseRewardModel[source]¶

Overview:: the base class of reward model
Interface:: default_config, estimate, train, clear_data, collect_data, load_expert_date

abstract clear_data() → None[source]¶

Overview:: Clearing training data. This can be a side effect function which clears the data attribute in self

abstract collect_data(data) → None[source]¶

Overview:

Collecting training data in designated formate or with designated transition.

Arguments:

Returns / Effects:

abstract estimate(data: list) → None[source]¶

Overview:

estimate reward

Arguments:

Returns / Effects:

This can be a side effect function which updates the reward value
If this function returns, an example returned object can be reward (Any): the estimated reward

load_expert_data(data) → None[source]¶

Overview:

Getting the expert data, usually used in inverse RL reward model

Arguments:

Effects:

This is mostly a side effect function which updates the expert data attribute (e.g. self.expert_data)

abstract train(data) → None[source]¶

Overview:

Training the reward model

Arguments:

Effects:

Overview:

Reward Estimation Model.

Arguments:

Returns: