Created by: yu239-zz
I closed two old pull requests and add them together in this one.
- rotate_layer and flip_layer. These two operations are performed on each input vector (row of mini-batch matrix)
- added getMin and getMax for GpuMatrix
- bug fix for matrix transpose when memalloc is true (the matrix pointer needs to be created)