using loss scaling to improve model accuracy
Created by: honshj
Due the 64bit fixed-point type used in MPC, the smallest number is 2^-16. As a result, gradients less than 2^-16 during backward pass will become zeros, which degrades the accuracy of the result model. A technique like loss scaling (https://arxiv.org/abs/1710.03740) can be used to alleviate this issue.