1. 27 11月, 2018 4 次提交
    • J
      polish code, test=develop · c3c3c0b3
      JiabinYang 提交于
      c3c3c0b3
    • T
      Make NCE_OP more efficient and support SelectedRows (#14469) · 56a4912b
      tangwei12 提交于
      * Fix truncated normal.
      
      * Fix.
      
      * Make nce support more distribution.
      
      * Fix API.spec.
      
      * Fix python API.
      
      * Fix.
      test=develop
      
      * Fix API.spec
      test=develop
      
      * Fix sampler.
      
      * Fix order of arguments in python API.
      test=develop
      
      * NCE add selectedrows support
      
      * NCE update weighted sampling
      
      * fix bugs in nce_op, and assign_value_op optimized
      
      * fix bugs in nce_op, revert assign_value_op
      
      * nce_op optimize
      
      * nce_op optimize
      
      * nce_op optimize
      
      * add selectedRows test later
      
      test=develop
      
      * add selectedRows supported
      
      * add selectedRows supported
      
      test=develop
      
      * add selectedRows supported
      
      * add nce selectedRows supported, test=develop
      
      * add nce selectedRows supported
      
      * add nce selectedRows supported, test=develop
      
      * fix height in nce, test=develop
      
      * add ut
      
      * add ut, test=develop
      
      * make AutoGrownIndex inline
      test=develop
      
      * fix tinny error, test=develop
      56a4912b
    • P
      minor fix · 38715e6f
      peizhilin 提交于
      38715e6f
    • J
      refine code and add none bias ut, test=develop · b10df8bc
      JiabinYang 提交于
      b10df8bc
  2. 26 11月, 2018 3 次提交
  3. 23 11月, 2018 4 次提交
  4. 22 11月, 2018 5 次提交
    • C
      Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929) · 00b9e9a1
      chengduo 提交于
      * refine cublase
      test=develop
      
      * code refine
      
      * refine cublas
      
      * add GEMME_EX
      
      * add enable_cublas_tensor_op_math doc and add cublasCall
      test=develop
      
      * fix CublasCall for cuda version
      test=develop
      
      * fix error
      test=develop
      
      * fix GEMM_EX to be compatible with gcc 4.8
      test=develop
      
      * add GEMM_EX
      test=develop
      
      * to compatiable with gcc4.8
      test=develop
      00b9e9a1
    • P
      fix unit test cases · 7c8c9dc9
      peizhilin 提交于
      7c8c9dc9
    • T
      enable peephole jitcode · 0c5ed5f6
      tensor-tang 提交于
      test=develop
      0c5ed5f6
    • T
      init gru jitcode and fix lstm jitcode · e3b61cf5
      tensor-tang 提交于
      test=develop
      e3b61cf5
    • W
      Windows/online (#14474) · d9a1f3e5
      wopeizl 提交于
      * add recordio support
      
      * disable the openblas multi-thread on windows since no support
      adjust the python script
      
      * code style
      
      * code style
      test=develop
      
      * add create_recordio_file_reader back
      
      * fix code style
      test=develop
      
      * fix the gtest.cmake on windows
      
      * fix cc_test on windows
      
      * fix the win build
      test=develop
      
      * remove fused compile support on windows
      test=develop
      
      * add the jit support
      test=develop
      
      * add the jit support, test=develop
      
      * add the jit support, test=develop
      
      * add the jit back
      fix compile error on windows
      
      * rollback test=develop
      
      * test case fix
      
      * disable DSO by default on windows
      
      * exclude warpctc_op on windows
      
      * exclude the dynload_warpctc out on windows
      test=develop
      
      * fix the scripts error
      test=develop
      
      * disable avx on windows by default
      test=develop
      
      * re-organize the cmake file
      
      * disable mkl on windows by default
      
      * add warp_ctc back
      
      * fix the dependency
      
      * fix the dependency
      
      * fix the build issue on windows
      
      * remove unsupported flag on windows
      
      * code style
      
      * code style
      test=develop
      
      * fix issue
      
      * add profiler, parallel_executor back
      
      * clean up the pre-definitions on windows
      
      * fix build issue
      
      * test=develop
      d9a1f3e5
  5. 21 11月, 2018 3 次提交
  6. 20 11月, 2018 2 次提交
  7. 19 11月, 2018 1 次提交
    • Y
      Optimize the layer_norm operator with AVX intrinsic function (#14417) · f4c869d8
      Yihua Xu 提交于
      * Optimize layer_norm operator with AVX intrinsic functions
      
      * Revert the wrong modifications
      
      * Implement the jit kernel for layer_norm operator
      
      * Add math headfile to fix the compile issue (test=develop)
      
      * Add math headfile to fix the compile issue (test=develop)
      
      * Fixed the intrinsic headfile issue (test=develop)
      
      * Fix the conflicts (test=develop)
      
      * Revert for CUDA compiler (test=develop)
      
      * Fixed the cuda depency (test=develop)
      
      * Fix the marco issues (test=develop)
      f4c869d8
  8. 18 11月, 2018 3 次提交
  9. 17 11月, 2018 6 次提交
  10. 16 11月, 2018 9 次提交