- 26 12月, 2021 1 次提交
-
-
由 Zhanlue Yang 提交于
* Replaced pten::LoD with paddle::framework::LoD * Overrided CPUVector with CUDAVector * Refactored paddle::framework::Vector
-
- 24 12月, 2021 8 次提交
-
-
由 seemingwang 提交于
* graph engine demo * upload unsaved changes * fix dependency error * fix shard_num problem * py client * remove lock and graph-type * add load direct graph * add load direct graph * add load direct graph * batch random_sample * batch_sample_k * fix num_nodes size * batch brpc * batch brpc * add test * add test * add load_nodes; change add_node function * change sample return type to pair * resolve conflict * resolved conflict * resolved conflict * separate server and client * merge pair type * fix * resolved conflict * fixed segment fault; high-level VLOG for load edges and load nodes * random_sample return 0 * rm useless loop * test:load edge * fix ret -1 * test: rm sample * rm sample * random_sample return future * random_sample return int * test fake node * fixed here * memory leak * remove test code * fix return problem * add common_graph_table * random sample node &test & change data-structure from linkedList to vector * add common_graph_table * sample with srand * add node_types * optimize nodes sample * recover test * random sample * destruct weighted sampler * GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * pybind sample nodes api * pull nodes with step * fixed pull_graph_list bug; add test for pull_graph_list by step * add graph table;name * add graph table;name * add pybind * add pybind * add FeatureNode * add FeatureNode * add FeatureNode Serialize * add FeatureNode Serialize * get_feat_node * avoid local rpc * fix get_node_feat * fix get_node_feat * remove log * get_node_feat return py:bytes * merge develop with graph_engine * fix threadpool.h head * fix * fix typo * resolve conflict * fix conflict * recover lost content * fix pybind of FeatureNode * recover cmake * recover tools * resolve conflict * resolve linking problem * code style * change test_server port * fix code problems * remove shard_num config * remove redundent threads * optimize start server * remove logs * fix code problems by reviewers' suggestions * move graph files into a folder * code style change * remove graph operations from base table * optimize get_feat function of graph engine * fix long long count problem * remove redandunt graph files * remove unused shell * recover dropout_op_pass.h * fix potential stack overflow when request number is too large & node add & node clear & node remove * when sample k is larger than neigbor num, return directly * using random seed generator of paddle to speed up * fix bug of random sample k * fix code style * fix code style * add remove graph to fleet_py.cc * fix blocking_queue problem * fix style * fix * recover capacity check * add remove graph node; add set_feature * add remove graph node; add set_feature * add remove graph node; add set_feature * add remove graph node; add set_feature * fix distributed op combining problems * optimize * remove logs * fix MultiSlotDataGenerator error * cache for graph engine * fix type compare error * more test&fix thread terminating problem * remove header * change time interval of shrink * use cache when sample nodes * remove unused function * change unique_ptr to shared_ptr * simplify cache template * cache api on client * fix * reduce sample threads when cache is not used * reduce cache memory * cache optimization * remove test function * remove extra fetch function * graph-engine data transfer optimization * support graph_split load&query * remove logs * change shards to pointer vector * use inference * remove test code * renorm op * simplify renorm op * recover local changes * recover renorm op kernel * fix init * add blanklines in renorm doc * fix import * fix import Co-authored-by: NHuang Zhengjie <270018958@qq.com> Co-authored-by: NWeiyue Su <weiyue.su@gmail.com> Co-authored-by: Nsuweiyue <suweiyue@baidu.com> Co-authored-by: Nluobin06 <luobin06@baidu.com> Co-authored-by: Nliweibin02 <liweibin02@baidu.com> Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
-
由 zhangbo9674 提交于
-
由 chentianyu03 提交于
* combine reduce_cuda codes * support float16 in pten redcue_mean * replace ReduceCudaKernel impl with pten reduce impl * mv reduce funcs into reduce_cuda_impl * rm unsed codes and headers * mv GetReduceDim into reduce_cuda_impl * recover GetReduceDim in reduce_op.h * add new dispatch macro * fix pool op output not inited and cause transform to pten::denseTensor error * fix output tensor not initialized error * rename new dispatch macro and format code style * rm reduce_functor_op.h file
-
由 zhouweiwei2014 提交于
* add new API/OP:paddle.Tensor.exponential_ * fix CI
-
由 努力努力在努力丶 提交于
* [MLU]add mlu op interface * [MLU]fix alpha of activation op
-
由 yaoxuefeng 提交于
add pull gpups sparse op
-
由 zhiboniu 提交于
-
由 zhouweiwei2014 提交于
* add new API/OP:paddle.poisson * fix comment
-
- 23 12月, 2021 5 次提交
-
-
由 Chen Weihang 提交于
-
由 Jacek Czaja 提交于
* First set of fixes * - Make more likely to GetBlob find a blobs * - Lint
-
由 wuhuanzhou 提交于
* add erfinv API, test=develop * fix gradient accuracy error, test=develop * fix cuda compilation error on Windows, test=develop * fix M_2_SQRTPI undeclared identifier on Windows, test=develop
-
由 zyfncg 提交于
* add empty and empty_like kernel in pten * add empty dev_api
-
由 Chen Weihang 提交于
-
- 22 12月, 2021 3 次提交
-
-
由 crystal 提交于
* optimize gelu backward * optimize gelu backward * optimize code * Number to expression * Replacement number
-
由 YuanRisheng 提交于
* move flatten * fix bugs of test * modify header file * add copy declare * fix compile bugs
-
由 joanna.wozna.intel 提交于
-
- 21 12月, 2021 4 次提交
-
-
由 Chen Weihang 提交于
* rename cuda to gpu * revert CMake change * resolve conflit * rename other cuda to gpu * poish details
-
由 crystal 提交于
* relu forward opt * add gelu functor * optimize code
-
由 arlesniak 提交于
-
由 sneaxiy 提交于
* mean first version * fix scalar mean * add fp16 dtype for api
-
- 20 12月, 2021 9 次提交
-
-
由 chentianyu03 提交于
* add pten conj kernel * modify conj_kernel file path * add defined cuda macro to cuda/conj_kernel.h
-
由 baoachun 提交于
-
由 fwenguang 提交于
-
由 sneaxiy 提交于
* support FP16 for more ops * add amp list tests * refine reduce_mean_grad * fix OP benchmark ci * fix fp16 reduce_mean * updat ut, but still have some problems * remove mean/reduce_mean fp16 kernel
-
由 Feng Xing 提交于
softmax_with_cross_entropy optimization with soft label. This PR includes optimization of "SoftmaxWithCrossEntropySoftLabel" : compute log_softmax and then compute loss. "CrossEntropySoftLabel" : compute loss with softmax as input. These optimization includes following technics: read data to buffer with vectorization compute max and sum in warp fixed loop size with macro Performance (computation time): softmax_with_cross_entropy_0 (forward) : -40.1% softmax_with_cross_entropy_0 (backward): -41%
-
由 石晓伟 提交于
-
由 Feiyu Chan 提交于
-
由 Sylwester Fraczek 提交于
-
由 YuanRisheng 提交于
* fix bugs when run reshape * fix ci bug
-
- 18 12月, 2021 3 次提交
-
-
由 Noel 提交于
-
由 Guoxia Wang 提交于
-
由 Feiyu Chan 提交于
* add complex op and `paddle.complex`.
-
- 17 12月, 2021 6 次提交
-
-
由 sneaxiy 提交于
* support multi precision update for LAMB * hide some api * fix ci uts * fix lamb output of dygraph * remove some changes to some PR * try to fix Py3 CI compile error * fix test_imperative_optimizer, add lars ut, add layer_norm ut * fix ut, fix format * fix ut * fix windows ci
-
由 chentianyu03 提交于
* modify sum mean args * add GetExpectedPtenKernelArgs for redcue_op * modify kernel args number * modify kernel args number
-
由 kuizhiqing 提交于
-
由 zlsh80826 提交于
From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block. There are two ways to resolve this problem: Reduce the threads per block launch configuration add __launch_bound__ to give information to nvcc compiler for reducing registers usage this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
-
由 niuliling123 提交于
-
由 limingshu 提交于
* fix_bugs_for_elementwise_branch_selection * fix merge_dims bugs * fix all influenced file
-
- 16 12月, 2021 1 次提交
-
-
由 Tomasz Socha 提交于
* Faster implementation of CPU kernel for ROI_ALIGN Operator * Add missing variable to CUDA roi_align_op * Style * Fix boundaries * Rename variables for indexes calculation * Remove unnecessary emplace * Revert "Remove unnecessary emplace" This reverts commit c10e87f7fb812f1a672fde32f2690a97d47e2f5a. * Style
-