从优化算法角度看,像layers.elementwise_max, layers.clip这些不可导的op是如何被计算导数,并参与梯度优化过程的?
Created by: ydchen9
比如下面这些代码,来自如下链接代码中的182行:https://github.com/PaddlePaddle/PARL/blob/develop/parl/algorithms/fluid/ppo.py
pg_ratio = layers.exp(logprob - old_logprob)
clipped_pg_ratio = layers.clip(pg_ratio, 1 - self.epsilon,
1 + self.epsilon)
surrogate_loss = layers.elementwise_min(
advantages * pg_ratio, advantages * clipped_pg_ratio)
loss = 0 - layers.reduce_mean(surrogate_loss)