- 31 5月, 2022 1 次提交
-
-
由 Li Min 提交于
* replace dropout_is_test with is_test. * improve atol on a100.
-
- 30 5月, 2022 2 次提交
- 27 5月, 2022 1 次提交
-
-
由 zyfncg 提交于
* refactor the optional tensor * remove optiona<MetaTensor> in InferMeta * fix bug * fix optional<vector<Tensor>> * fix bug * fix rmsprop * fix amp of eager_gen * polish code * fix deleted code * fix merge conflict * polish code * remove is_nullopt_ * fix merge conflict * fix merge conflict
-
- 25 5月, 2022 1 次提交
-
-
由 Leo Chen 提交于
* fix maybe-uninitialized warning * fix compile * fix xpu compile * fix npu compile * fix infer compile * fix compile * fix compile
-
- 24 5月, 2022 1 次提交
-
-
由 YuanRisheng 提交于
* move grad_add * fix unittest bugs * fix compile bugs
-
- 20 5月, 2022 1 次提交
-
-
由 WangXi 提交于
-
- 17 5月, 2022 1 次提交
-
-
由 zhupengyang 提交于
-
- 16 5月, 2022 2 次提交
-
-
由 niuliling123 提交于
-
由 WangXi 提交于
-
- 12 5月, 2022 1 次提交
-
-
由 Shuangchi He 提交于
-
- 06 5月, 2022 1 次提交
-
-
由 Zhang Zheng 提交于
-
- 02 5月, 2022 1 次提交
-
-
由 Zhang Zheng 提交于
* Fix test_cudnn_norm_conv and test_cudnn_bn_add_relu in CUDA11.2 * no throw in V100 for some cases
-
- 28 4月, 2022 3 次提交
-
-
由 Zhang Zheng 提交于
* Suppport more scenes for fused_fast_ln * fix
-
由 WangXi 提交于
-
由 WangXi 提交于
-
- 26 4月, 2022 1 次提交
-
-
由 WangXi 提交于
-
- 22 4月, 2022 1 次提交
-
-
由 Ming-Xu Huang 提交于
* Fix leading dimension setting error in fused_gemm_epilogue_grad_op. * Add dyload to cuBlasLt functions. * Added cublasLtMatmulAlgoGetHeuristic to improve performance. * Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue * Added UTs to FLAGS_cublaslt_exhaustive_search_times * Added warmup runs in algo searching of Gemm epilogue. * Update copyright and documents. * Fixed error handling.
-
- 19 4月, 2022 1 次提交
-
-
由 WangXi 提交于
-
- 16 4月, 2022 1 次提交
-
-
由 王明冬 提交于
-
- 09 4月, 2022 1 次提交
-
-
由 limingshu 提交于
* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 28 3月, 2022 1 次提交
-
-
由 danleifeng 提交于
* add fused_seqpool_cvm op;test=develop
-
- 19 3月, 2022 1 次提交
-
-
由 hong 提交于
* add infer meta; test=develop * add histogram infer meta; test=develop * fix unitest bug; test=develop * format; test=develop * format; test=develop * bn not use new infer meta; test=develop * add infer meta; test=develop * fixbug; test=develop * fix bug; * recover unitest; test=develop
-
- 17 3月, 2022 1 次提交
-
-
由 hong 提交于
* update * fix bugs; test=develop * update; test=develop * fix test compile error; test=develop * fix cpu compile error; test=develop * fix test error; test=develo * fix layer_norm_op plugin error; test=develop * fix error; test=develop * fix test bug; test=develop * update; test=develop * polish code; test=develop * fix bugs; test=develop * remove unused depency; test=develop * polish code; test=develop
-
- 11 3月, 2022 2 次提交
-
-
由 Chen Weihang 提交于
* remove needless deps in unittests * add gpu marco * fix other unittests * fix kernel name error * fix test_prepare_op * fix failed dygraph unittests * fix gpu failed tests * fix cinn test failed * fix cinn test failed * fix dropout tests
-
由 Yuang Liu 提交于
-
- 10 3月, 2022 1 次提交
-
-
由 hong 提交于
* move dropout to phi; test=develop * fix xpu, npu compile error; test=develop
-
- 09 3月, 2022 1 次提交
-
-
由 WangXi 提交于
-
- 08 3月, 2022 1 次提交
-
-
由 Zhang Zheng 提交于
* Add throw for norm_conv when platform is not supported * fix format
-
- 07 3月, 2022 1 次提交
-
-
由 Ming-Xu Huang 提交于
* Added cuBlasLtHandle_t to device context. * Added fused_gemm_epilogue op. 1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue. 2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2. 2. Act currently only be supported ReLU. (Will add GeLU in the future). * Added UT to fused_gemm_epilogue op. * Added LinearAct Pattern 1. Added LinearAct into graph_pattern_detector.* to define (2.)'s pattern. 2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)). 3. act currently only support ReLU (Will support GeLU in the future). * Added FuseGemmEpiloguePass 1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU} fusion (GeLU will be supported in the future). 2. Only support matmul_v2 from nn.Linear. * Added pybind to BuildStrageter.fuse_gemm_epilogue_. * Added UT for fuse_gemm_epilogue_pass. * GeLU support and EpilogueSingleton 1. Added GeLU support to fused_gemm_epilogue op. 2. Added EpilogueSingleton to cache auxiliary pointer. 3. Added related UTs. * Rename cublaslt_epilogue_opto gemm_epilogue_op.*. * Added both train and infer pattern to LinearAct. 1. Added support of fwd graph with grap_ops linking to LinearAct. 2. Added related changes to fuse_gemm_epilogue_pass for above modification. * Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass. * Added identity activation support to gemm_epilogue_op. * Added Linear Fusion (matmul_v2 + ele_add) 1. Added matmul_v2 + ele_add pattern to LinearActPattern. 2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass. * Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.* * Add fused_gemm_epilogue_grad op. 1. Added fused_gemm_epilogue_grad to support backward epilogue fusion. * Add UTs to fused_gemm_epilogue_grad_op. * Change attribute name in fused_gemm_epilogue_grad_op for clearing. * Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op. * Added ElementwiseAdd+Matmul+Act graph pattern detection. * Fuse backward of Linear( Act(x)) 1. Added backward fusion pass to Linear( Act(x)). 2. Added backward fusion pass to Linear(x). * Added UTs to backward fusion of Linear(Act(x)). * Complete document of arguments to fused_gemm_epilogue_op. * Made arguments of some functions pass by reference. * Modify code with review comments. 1. Made arguments of some function pass by reference. 2. Removed redundant code. 3. Followed Google code style to change code. * Made 'const' code style be consistent * Fixed random seed of python UTs. * Set Compiling constrains to cuBlasLt 1. Require CUDA 11.6+ 2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6. * Code Reivew from Paddle 1. Changed arguments name is_first_gemm to without_x_gradient for clearing. 2. Applied PADDLE_THROW in fused_gemm_epilogue_op. * Remove EpilogueSingleton 1. Applied ReserveSpace to replace Epilogue for passing auxiliary pointers between FWD and BWD. * Fix a logical error and enhance UTs. 1. Added act op count checking in UTs. 2. Fix issue to fuse backward or ReLU(Linear(X)). 3. TODO: solve GELU fusion issues. * Fix Linear and GeLU fusion issues. 1. Modified graph_detech_pattern to fit with both linear wiht gelu or relu. 2. Modified data range in Uts to allow negative values. * Removed fused_gemm_epilogue_op.h. * Rename namespace pten to phi. * Rename name of arguments in fused_gemm_epilogue_op 1. bias -> Bias. 2. out -> Out. 3. reserve_space -> ReserveSpace. * Change EpiloguePassActivationCache as local variable. 1. Removed singleton in EpiloguePassActivationCache. 2. Made EpiloguePassActivationCache as an argument to each pass functions.
-
- 04 3月, 2022 4 次提交
-
-
由 Feiyu Chan 提交于
move cpu_vec.h to phi/kernels/funcs.
-
由 Leo Chen 提交于
* clean distribution_helper, index_impl, aligned_vector code in fluid * fix conflicts
-
由 chentianyu03 提交于
* move reduce gpu impl funcs into pten/kernels/funcs * change reduce header name and namespace * fix spell word error * change mutable_data to dev_ctx.Alloc * modify place to devcontex * format code style * fix build error * fix build error * fix conflict
-
由 hong 提交于
* move conv to pten * move conv to pten; test=develop * fix bug; * add conv cudnn impl; test=develop * update * update operator; test=develop * fix bug; test=develop * move operator and prepared_operator to develop; test=develop * resolve conflict; test=develop * remove useless code;test=develop * add depency ; test=develop * fix bug; * add sig.cc ; test=develop * fix use_op error; test=develop * fix bug; test=develop * fix bug; test=develop * add conv3d register; test=develop * fix star gan and conv_nn_grad test failed; test=develop * add header; test=develop * manul to recover to develop; * resolve confilct; test=develop * remove useless code * fix bug; * remove conv2d_cudnn; test=develop * fix bugs; test=develop * fix cpu rocm compile bugs; test=develop * fix blas error; test=develop * fix compile bug; test=develop * fix windows compile error; test=develop * fix windows error; test=develop * resolve confilct; test=develop
-
- 03 3月, 2022 2 次提交
-
-
由 xiongkun 提交于
* add pad forward * fix error * transfer pad and pass the test_pad_op
-
由 hong 提交于
* add bn cpu version; test=develop * move batch norm to pten * move batch norm to pten; test=develop * fix bug; test=develop * fix func::tranpose depend bug; test=develop * fix compile bugs; test=develop * fix use_op batch_norm bug; test=develop * fix cudnn bn add relu test; test=develop * fix pten context build and double grad bug; test= develop * remve useless code; test=develop * add batch norm gpu fp16 support; test=develop * fix test bn op bug; test=develop * remove output dtype set; test=develop * fix bug; test=develop * fix bug; test=develop * fix applay pass to program bug; test=develop * revert to develop; test=develop * fix rocm bug; test=develop * revert operator to develop; test=develop * fix pre_commit; test=develop * fix statci check error; test=develop * resolve conflict; test=develop * ana batch norm bug; * revert batch norm op * resolve conlict * fix nan inf and speed bug; test=develop * fix bug; test=develop * fix error; test=develop * test expand op; test=develop * fix bug; test=develop * resolve confilct * resolve confilct; test=develop * polish code; test=develop * polish code; test=develop * change mutable data to ctx alloc; test=develop * make format same with ci; test=develop * fix format error with ci; test=develop
-
- 02 3月, 2022 1 次提交
-
-
由 Feiyu Chan 提交于
* move sequence2batch * move lstm and gru * Add phi/kernels directory into exclusion to stop using hipcc to compile non .cu files in it.
-
- 28 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* rename pten_utils to phi_utils * rename pten_utils target * rename Pten to Phi * replace pten with phi * resolve conflict
-
- 25 2月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* support cudnn kernel moving * polish cmake rules * add unittest for coverage * remove orig kernel * remove softmax cudnn kernel * fix softmax test failed * fix npu func error * resolve conflict * rename gpu dnn kernels * fix name rule error * fix compile error * update fp16 namespace
-
- 22 2月, 2022 1 次提交
-
-
由 xiongkun 提交于
* change Vector to std::vector and provide MixVector class as a helper wrapper class * solve the multi-gpu hang problem * remove the duplicate template instantialize * Copy vector to cpu * add CopyToCPU * xxx * final version: fix the problem of all reduce * remove mixvector dependence * fix * merge * fix code * fix by CI
-