- 01 11月, 2021 5 次提交
-
-
由 jiangcheng 提交于
-
由 zhaocaibei123 提交于
-
由 seemingwang 提交于
* graph engine demo * upload unsaved changes * fix dependency error * fix shard_num problem * py client * remove lock and graph-type * add load direct graph * add load direct graph * add load direct graph * batch random_sample * batch_sample_k * fix num_nodes size * batch brpc * batch brpc * add test * add test * add load_nodes; change add_node function * change sample return type to pair * resolve conflict * resolved conflict * resolved conflict * separate server and client * merge pair type * fix * resolved conflict * fixed segment fault; high-level VLOG for load edges and load nodes * random_sample return 0 * rm useless loop * test:load edge * fix ret -1 * test: rm sample * rm sample * random_sample return future * random_sample return int * test fake node * fixed here * memory leak * remove test code * fix return problem * add common_graph_table * random sample node &test & change data-structure from linkedList to vector * add common_graph_table * sample with srand * add node_types * optimize nodes sample * recover test * random sample * destruct weighted sampler * GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * WeightedGraphEdgeBlob to GraphEdgeBlob * pybind sample nodes api * pull nodes with step * fixed pull_graph_list bug; add test for pull_graph_list by step * add graph table;name * add graph table;name * add pybind * add pybind * add FeatureNode * add FeatureNode * add FeatureNode Serialize * add FeatureNode Serialize * get_feat_node * avoid local rpc * fix get_node_feat * fix get_node_feat * remove log * get_node_feat return py:bytes * merge develop with graph_engine * fix threadpool.h head * fix * fix typo * resolve conflict * fix conflict * recover lost content * fix pybind of FeatureNode * recover cmake * recover tools * resolve conflict * resolve linking problem * code style * change test_server port * fix code problems * remove shard_num config * remove redundent threads * optimize start server * remove logs * fix code problems by reviewers' suggestions * move graph files into a folder * code style change * remove graph operations from base table * optimize get_feat function of graph engine * fix long long count problem * remove redandunt graph files * remove unused shell * recover dropout_op_pass.h * fix potential stack overflow when request number is too large & node add & node clear & node remove * when sample k is larger than neigbor num, return directly * using random seed generator of paddle to speed up * fix bug of random sample k * fix code style * fix code style * add remove graph to fleet_py.cc * fix blocking_queue problem * fix style * fix * recover capacity check * add remove graph node; add set_feature * add remove graph node; add set_feature * add remove graph node; add set_feature * add remove graph node; add set_feature * fix distributed op combining problems * optimize * remove logs * fix MultiSlotDataGenerator error * cache for graph engine * fix type compare error * more test&fix thread terminating problem * remove header * change time interval of shrink Co-authored-by: NHuang Zhengjie <270018958@qq.com> Co-authored-by: NWeiyue Su <weiyue.su@gmail.com> Co-authored-by: Nsuweiyue <suweiyue@baidu.com> Co-authored-by: Nluobin06 <luobin06@baidu.com> Co-authored-by: Nliweibin02 <liweibin02@baidu.com> Co-authored-by: Ntangwei12 <tangwei12@baidu.com>
-
由 Aganlengzi 提交于
* [NPU] fix lookup_table_v2_grad ACL error for model BoW * add more unit tests
-
由 CtfGo 提交于
增加CinnLaunchOp,负责执行Cinn子图编译的结果,要点如下: 1. 在子图划分的BuildCinnPass中,每个子图在原图中会被替换为该CinnLaunchOp,由它来调用Cinn进行子图编译、执行的功能。 2. CinnLaunchOp的输入/输出即为子图的输入和输出,另外增加`compilation_key`属性,它可由该属性key从全局Cache中获取子图对象、编译结果,该属性由BuildCinnPass在创建Op时进行设置 3. CinnLaunchOp功能实现的流程为: - 从全局Cache中获取子图对象 - 从全局Cache中获取子图编译结果,未命中cache时进行即时编译 - 根据编译结果的变量信息(数据类型、shape)初始化运行时数据,分配内存/显存 - 将运行时数据打包为参数,调用cinn的可执行对象runtime program进行计算 - 子图运行结果通过参数指针同步到paddle侧的tensor
-
- 29 10月, 2021 7 次提交
-
-
由 taixiurong 提交于
* aaaa * add some ops support fp16 in kunlun2
-
由 baoachun 提交于
-
由 niuliling123 提交于
-
由 zhouweiwei2014 提交于
* add new API: paddle.linalg.triangular_solve * add new API/OP: paddle.linalg.triangular_solve * add new API/OP: paddle.linalg.triangular_solve * fix comment
-
由 wanghuancoder 提交于
* fix some bug in new executor, test=develop * fix error message, test=develop
-
由 Leo Chen 提交于
* enable check_nan_inf and fix variable scope * add ut * fix bug * update ut * revert doc change * fix npu compile
-
由 wangxinxin08 提交于
-
- 28 10月, 2021 10 次提交
-
-
由 Zhen Wang 提交于
* Update the content of `test_parallel_executor_run_cinn.py`. * Fix some bugs in the topological sort and `CreateNewSubGraph`. * Update the CINN commit id used by Paddle. * Update the unit test to `add+relu`. * Update according to reviewers' suggestion.
-
由 ronnywang 提交于
* add TypeAdapter method for npu_op_runner * add int64 supporting for elementwise_mul and reduce_sum * add int64 supporting and UT for expand_v2, scale and reduce_max * fix bug
-
由 wangguanqun 提交于
* add trainer desc config to distributed strategy * code style modified * data_feed set lod * fix bug * code style * fix bug * save load * save load * save unittest * add unittest of the_one_ps * unittest * add todo in communicator sendsparse
-
由 Liu-xiandong 提交于
-
由 XGZhang 提交于
* support inference for quantized matmul_v2 * undate code style * code style
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * update * update * update Error MSG * update EventsWaiter * Add Cancel For ThreadPool * Add UT for Cancel * fix Cancel
-
由 Li Min 提交于
* Fix bug when pre_layer_norm is false.
-
由 Aurelius84 提交于
* Refactor InterpreterCore code * make tuple
-
由 feng_shuai 提交于
* change api for support trt8 * fix:change api
-
- 27 10月, 2021 14 次提交
-
-
由 pangyoki 提交于
-
由 Qi Li 提交于
* [ROCM] add custom op support, test=develop * remove debug codes, test=develop
-
由 wuhuanzhou 提交于
* GeneratePass support attr condition and mapping, test=develop * fix coverage, test=develop
-
由 wangxinxin08 提交于
* add dcnv2 plugin
-
由 zlsh80826 提交于
-
由 piotrekobiIntel 提交于
* Add WIP version of elementwise_div_mkldnn without working dy grad * Add dy gradient calculation implementation, disable broadcast tests * Readd removed tests from static_mode_white_list * Add bfloat16 gradient tests, remove int8 and uint8 support * - Change the way dy grad is calculated to improve performance - Refactor BinaryMKLDNNHandler to use a default parameter * Change copyright year * Refactor as suggested * Attempt to bypass CI Approval not accepting max_relative_error * Fix formatting issue
-
由 Feiyu Chan 提交于
* WIP: add cache * delete move constructor and operator= for CuFFTHandle and FFTConfig * remove log from CuFFTHandle and FFTConfig * add lrucache for fft rocm backend * disable LRUCache when CUFFT_VERSION >= 10200 * disbale copy and move for hipFFTHandle; format code * clean debug code Co-authored-by: NXiaoxu Chen <chenxx_id@163.com>
-
由 zhangkaihuo 提交于
本PR是fused_transformer的layer层代码,包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。
-
由 baoachun 提交于
* fix matmul dim error * fix wrong dim check in matmul
-
由 Feiyu Chan 提交于
* fix fftshift/ifftshift on static mode * update roll_op version * add more test cases for fftshift/ifftshift
-
由 taixiurong 提交于
-
由 Wilber 提交于
-
由 huangjun12 提交于
* add eigvalsh with is_test * add eigvalsh op * fix backward bug * forward and backward, float and complex, unittest * remove eigvalsh_helper.h * remove changes of cusolver.h * fix unittest * fix unittest bug * update code following eigh * fix test * update lapack * pull develop * update funcor * fix unittest bug * fix details * add tensor_method_func * fix notes
-
由 whs 提交于
-
- 26 10月, 2021 4 次提交
-
-
由 Li Min 提交于
功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
-
由 Feiyu Chan 提交于
-
由 zhulei 提交于
-
由 Leo Chen 提交于
* cache exception in child thread * add ut * fix ut
-