model.rst 1.1 KB
Newer Older
L
LI Yunxiang 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Model (*Forward Part*)
=======================

Methods
----------
1. policy(self, *args)

    Define the structure of networks here. Algorithm will call this method to predict probabilities of actions. 
    It is optional. 

2. value(self, *args)

    Return: values: a dict of estimated values for the current observations and states. 
    For example, "q_value" and "v_value".

3. sync_params_to(self, target_net, gpu_id, decay=0.0, share_vars_parallel_executor=None)

    This method deepcopied the parameters from the current network to the target network, which two have the same structure.  

An example
------------
.. code-block:: python
    :linenos:

    class MLPModel(Model):
        def __init__(self):
            self.fc = layers.fc(size=64)

        def policy(self, obs):
            out = self.fc(obs)
            return out
            
    model = MLPModel() 
    target_model = deepcopy(model) # automatically create new unique parameters names for target_model.fc

    # build program
    x = layers.data(name='x', shape=[100], dtype="float32")
    y1 = model.policy(x) 
    y2 = target_model.policy(x)