- 09 4月, 2023 1 次提交
-
-
由 Yiqun Liu 提交于
* Register exp/expm1/logit bf16 activation op kernels (#48702) * register more bf16 ops * update to register coresponding backward ops * Addition of bf16 type support for Compare OP (#46413) * first commit * clarify the quotes * change code style format * support bfloat16 * add bfloat16 support for more ops (#48272) * [Bfloat16]register bfloat16 datatype for squared l2 norm (#50908) * Sync the pull request #51903. * Add some header files back. * modify cmake file for cuda11.8 compile (#49020) * modify cmake file for cuda11.8 compile * add op_library(fused_embedding_eltwise_layernorm_op DEPS bert_encoder_functor) * Fix compling error. * Cherry-pick pull request #51396. --------- Co-authored-by: Nsneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: Nlimingshu <61349199+JamesLim-sy@users.noreply.github.com> Co-authored-by: Shaojie WANG <wsjmessi@163.com> Co-authored-by: Nzqw_1997 <118182234+zhengqiwen1997@users.noreply.github.com>
-
- 29 7月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* remove cudaDeviceContext * remove more template * fix rocm compile
-
- 26 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 01 4月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* add cross_entropy_with_softmax phi kernel * remove softmax_with_cross_entropy kernel * add softmax_with_cross_entropy grad kernel * remove original op kernel * refine cross entropy impl * fix pointer error * revert kernel cu change * fix xpu failed * fix cinn failed * fix npu failed * add forward sig * add check_nan_inf for pt kernel * remove repeat cmake item * fix unittest error
-
- 15 2月, 2022 1 次提交
-
-
由 Aurelius84 提交于
* #1 migrate dist-related type()-> dtype() * move datatype function from pten -> fluid/framework * change type() in imperative into convert(dtype()) * modify xx_tensor->type into xx_tensor->dtype * change the set_type interface and the caller * modify xx_tensor.type into xx_tensor.dtype * fix mutable_data(place, dtype()) * change caller of mutable_data in pten and distributed * change the caller of mutable_data in fluid/framework * change the caller of mutable_data in imperative directory * mutable_data: inference * update the call of mutable_data * transfer MakePenScalarArray MakePtenScalar ResetHolderWithType * pass the compile. the next step is remove VarType in Pten * fix all and remove VarType from pten. success in linux. Next task is other platform * fix conflict with develop * fix compiled error * Fix reset conversion * fix conflict * fix compiled problem * fix typo * Fix << in tensor_utils.cc * fix type->dtype * fix unittest * fix tensor init constructor * fix DataTypeSize for BFloat16 * fix code style * fix npu compiled error * fix npu * compile npu sucessfully * fix conflict * fix conflict Co-authored-by: Nxiongkun <xiongkun03@baidu.com>
-
- 09 2月, 2022 1 次提交
-
-
由 sneaxiy 提交于
-
- 03 12月, 2021 1 次提交
-
-
由 ronnywang 提交于
* refine structure for cuda and rocm * update * update * update * update
-
- 02 4月, 2021 1 次提交
-
-
由 ronnywang 提交于
-
- 14 9月, 2020 1 次提交
-
-
由 Zhong Hui 提交于
Enhance the error messages for files in operators/math
-
- 11 7月, 2020 1 次提交
-
-
由 Chen Weihang 提交于
* fix softmax_with_cross_entropy cuda kernel overflow bug, test=develop * replace old macro & for condition, test=develop * polish details, test=develop
-
- 05 9月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* remove assert.h * change PADDLE_ASSERT_MSG to PADDLE_ENFORCE test=develop * fix tensorrt paddle_enforce test=develop
-
- 03 9月, 2019 1 次提交
-
-
由 Tao Luo 提交于
* remove unused PADDLE_ASSERT(_IS_NOT_ERROR) * replace PADDLE_ASSERT with PADDLE_ASSERT_MSG test=develop
-
- 07 5月, 2019 1 次提交
-
-
由 Kaipeng Deng 提交于
* add attr axis infershape. test=develop * add CUDA kernel. test=develop * fix unittest. test=develop * fix unittest for soft_label. test=develop * fix fp16 unittest. test=develop * remove comment code. test=develop * refine test for axis. test=develop * add python api. test=develop * fix doc. test=develop * fix fp16 unittest. test=develop * fix ngraph test. test=develop * fix ENFORCE for test_imperative_transformer. test=develop * fit for ngraph test. test=develop * fix after rebase develop. test=develop * fix doc. test=develop * fix API.spec. test=develop * fix test_layers. test=develop * fix format. test=develop
-
- 14 3月, 2019 2 次提交
-
-
由 sneaxiy 提交于
test=develop
-
由 Zeng Jinle 提交于
test=develop
-
- 12 3月, 2019 1 次提交
-
-
由 sneaxiy 提交于
test=develop
-
- 07 11月, 2018 2 次提交
-
-
由 chengduo 提交于
test=develop
-
由 chengduo 提交于
* add fp16 backward support test=develop * add sum_op fp16 test * disable test_dist_save_load test=develop * add check_grad for sum * add unit test for softmax_grad fp16 test=develop * add scale_op unit test * add mul_grad_op unit test for fp16 * add cross_entropy_grad and eman_grad unit test for fp16 test=develop * fix cross_entropy unit test * add pool2d fp16 unit test * refine conv2d fp16 unit test test=develop * refine activation unit test test=develop * fix ci test=develop * follow zhihong's comment, copy from https://github.com/PaddlePaddle/Paddle/pull/12796 test=develop
-
- 11 9月, 2018 1 次提交
-
-
由 Bai Yifan 提交于
* add ignore index * update api.spec * enhance softmax_with_cross_entropy
-
- 17 8月, 2018 1 次提交
-
- 16 8月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "cherry picked operators changes" * "remove duplicated code" * "add constant setter" * "add get expected kernel" * "fix ci" * "add fill constant"
-
- 03 5月, 2018 1 次提交
-
-
由 chengduo 提交于
* fix __shfl_down_sync_ of cross_entropy * use reduceSum * "fix ci"
-
- 30 4月, 2018 1 次提交
-
-
由 dzhwinter 提交于
* "re-commit " * "picked up" * "fix ci" * "fix pdb hang up issue in cuda 9"
-
- 28 4月, 2018 1 次提交
-
-
由 Abhinav Arora 提交于
* Fix CPPLint errors * Fix CPPLint errors in sequence2batch * Fix compilation * Fix LSTM op and GRU op * Fix LSTMP op * Fix more cpplint errors in operators/math * Address Code review feedback
-
- 12 2月, 2018 1 次提交
-
-
由 qingqing01 提交于
-
- 10 2月, 2018 2 次提交
- 26 12月, 2017 1 次提交
-
-
由 Luo Tao 提交于
-
- 12 12月, 2017 1 次提交
-
-
由 QI JUN 提交于
There are mainly following fixes: - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place` - remove `eigen_device` interface in base class `DeviceContext` - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext` - remove unused `platform::EigenDeviceConverter` - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL` - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
-
- 29 10月, 2017 1 次提交
-
-
由 QI JUN 提交于
* add sparse support for sum op * typo fix * fix gpu build error * fix unittest error * typo fix * infer var type and shape in op_test * follow comments * fix build error * bypass some unittests depend on NetOp * support sparse output for lookup table grad op * refine codes * fix gpu build error * fix lookup table grad gpu kernel * fix ci * fix ci * fix ci * fix bug in lookup_table_grad op * fix bug in test_word2vec * register double kernel for some operators * set is_sparse=True in test_word2vec * fix lookup table grad op CUDA kernel bug * disable test_modified_huber_loss_op temporarily * disable test_lstm_unit_op temporarily
-
- 26 10月, 2017 1 次提交
-
-
由 Yu Yang 提交于
* Cross Entropy Wrong * Fix XE * Polish gradient check for xe * Fix compile
-
- 12 10月, 2017 1 次提交
-
-
由 dangqingqing 提交于
-
- 29 9月, 2017 2 次提交
- 27 9月, 2017 1 次提交
-
-
由 caoying03 提交于
-