1. 26 10月, 2020 3 次提交
  2. 23 10月, 2020 1 次提交
    • H
      Fix test_parallel_executor_test_while_train Random Failure by Decreasing GPU Usage (#28213) · a1e7fd4a
      Huihuang Zheng 提交于
      Recently, test_parallel_executor_test_while_train randomly failed on CI. On all CI logs, it showed NCCL initialization failed or cusolver initialization failed. I found online that those failure is usually caused by GPU shortage. Those API calls CUDA APIs directly so it shouldn't be the problem of allocator. It may be somewhere in PaddlePaddle increases GPU usage.
      
      However, I run this test for 1000 times on my machine and the CI machine, either of them can reproduce the random failure. Maybe there is something related to the environment only happened in test env.
      
      To verify my assumption that somewhere in PaddlePaddle increases GPU usage and also fix this CI, I decreased the batch_size to see whether the random failure disappears in test env.
      a1e7fd4a
  3. 22 10月, 2020 5 次提交
  4. 21 10月, 2020 8 次提交
  5. 20 10月, 2020 10 次提交
  6. 19 10月, 2020 13 次提交
    • Y
      xpu adam op (#28031) · 6f0c3d1f
      yinhaofeng 提交于
      * lookup_table_xpu op report errors;test=kunlun
      
      * add adam xpu op;test=kunlun
      
      * reset lookup
      
      * change adam wrong;test=kunlun
      6f0c3d1f
    • T
      Add xpu transpose2 op.test=kunlun (#28086) · a5c95cd5
      TeslaZhao 提交于
      a5c95cd5
    • K
      hapi/model step learning rate on batch end. (#27991) · a5f65d51
      Kaipeng Deng 提交于
      * hapi/model step learning rate on batch end. test=develop
      a5f65d51
    • L
      Fix diag OP bug on Windows Python3.8 · c8d32c8c
      LutaoChu 提交于
      Fix diag OP bug on Windows Python3.8 ,remove the std::min
      c8d32c8c
    • M
      fleet support paddle.optimzier (#28026) · 55098b97
      MRXLT 提交于
      fleet support paddle.optimzier
      
      * bug fix
      
      * fix fleet_base
      
      * bug fix
      
      * fix coverage
      55098b97
    • L
      add doc for ReduceOp (#28051) · 5bb348a1
      lilong12 提交于
      * add doc, test=document_fix
      5bb348a1
    • Z
      fix optimizer init (#27995) · 086b92df
      Zhou Wei 提交于
      086b92df
    • L
      [API 2.0: doc] transfer from paddle.fluid.layers.assign() into creation.py (#27999) · e21b13fb
      liuyuhui 提交于
      * transfer from paddle.fluid.layers.assign() into creation.py,test=develop
      
      * fix ut fail,add support for paddle.assign,test=develop
      
      * fix,test=develop
      
      * fix UT coverage,test=coverage
      
      * fix UT fail,test=coverage
      
      * fix doc,test=develop
      e21b13fb
    • H
      Allclose op (#27891) · d4668938
      huangxu96 提交于
      * Still has bugs.
      
      * Fixed allclose_op bug, which cannot deal with some cases of fp64 inputs.
      
      * improved CUDA kernel performance.
      
      * Changed CUDA code.
      
      * Fixed a bug in cuda kernel which cannot deal with large dimension input, and added an unittest for it.
      
      * Add a test case for float32 input.
      d4668938
    • P
      Fix error message of multinomial op (#27946) · 975bd887
      pangyoki 提交于
      * fix multinomial doc
      
      * fix multinomial error message
      
      * little doc change
      
      * fix Categorical class doc
      
      * optimize format of error message
      
      * fix CPU Kernel error message format
      
      * fix isinf and isnan error in WindowsOPENBLAS CI
      
      * delete inf and nan
      
      * add manual_seed in sample code
      
      * little error message change
      
      * change error message to InvalidArgument
      
      * add full point for error message and add manual_seed in CPU environment
      975bd887
    • P
      Add truncated_gaussian_random XPU kernel (#27861) · 4c5b779a
      pangyoki 提交于
      * Add truncated_gaussian_random_op XPU kernel
      
      * Add truncated_gaussian_random_op XPU kernel, test=kunlun
      
      * little change, test=kunlun
      
      * change boost_get to BOOST_GET_CONST
      
      * change boost_get to BOOST_GET_CONST, test=kunlun
      
      * little change, test=kunlun
      
      * use Generator to generate random number and optimize format, test=kunlun
      
      * little change, test=kunlun
      
      * add TODO, test=kunlun
      4c5b779a
    • P
      Add gaussian_random XPU kernels (#27853) · 5b8e5001
      pangyoki 提交于
      * Add gaussian_random XPU kernels
      
      * commit kunlun, test=kunlun
      
      * new version, test=kunlun
      
      * change boost_get to BOOST_GET_CONST, test=kunlun
      
      * use Generator to generate random number and optimize format, test=kunlun
      
      * add TODO, test=kunlun
      5b8e5001
    • P
      Add uniform_random XPU kernel (#27846) · 74ce0397
      pangyoki 提交于
      * support uniform_random op on Baidu Kunlun
      
      * change dtype of attr shape from int to int64_t
      
      * kunlun ci, test=kunlun
      
      * new version, test=kunlun
      
      * change boost_get to BOOST_GET_CONST
      
      * change boost_get to BOOST_GET_CONST, test=kunlun
      
      * use Generator to generate random number and optimize format
      
      * run Kunlun CI, test=kunlun
      
      * add TODO, test=kunlun
      74ce0397