Created by: sidgoyal78
This closes #4622 (closed), by adding 2 basic activation functions: logsigmoid and softshrink
- Logsigmoid is defined as:
y = log ( 1 / ( 1 + exp(-x)))
However, to make the computation numerically stable for large negative values of x, the well-known "log-sum-exp" trick is employed. (https://hips.seas.harvard.edu/blog/2013/01/09/computing-log-sum-exp/).
- softshrink is defined as follows (for a non-negative lambda):
y = x - lambda, if x > lambda
x + lambda, if x < -lambda
0, otherwise