the output place of cast operator should be the same with input place
Created by: QiJune
We create global step variable in CPU, and than add a cast operator.
def _decay_step_counter():
# the first global step is zero in learning rate decay
global_step = nn.autoincreased_step_counter(
counter_name='@LR_DECAY_COUNTER@', begin=0, step=1)
global_step = tensor.cast(global_step, 'float32')
return global_step
If we run the program in GPU, cast operator will have a CUDADeviceContext, and set the output of cast operator in GPU. It will cause segment fault.