- 01 11月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
* initial tensor design & sign kernel demo * add move constructor for meta & add lodtensor * add dirs & sign xpu kernel * add mean cpu&cuda kernel impl * move sign & mean xpu & npu kernel * add selected_rows basic impl * refactor design, BaseTensor to DenseTensor, etc. * add scale mkldnn kernel * polish xpu & npu impl details * fix mkldnn reuse compile failed * change tensor operation lib name * rename util filename * add more comments * change TensorImplInterface to TensorInterface * add kernel key and factory * remove MKLDNNTensorMeta, add MKLDNNDenseTensor * change XXDeviceContext to XXContext * add base kernel registrar utils & test on sign * replace boost::any by paddle::any * fix several ci failed * fix npu compile error * add ordered map util * fix multiple ordered_map compile errors * move dev into include dir * support sign op in static op run * fix static op run error * fix new executor compile failed * add dygraph branch & remove sign_op.h * fix test_infer_no_need_buffer_slots * fix rocm compile link error * fix unitybuild error & clear glog * fix npu compile failed * skip quant trans test * fix part windows compile problem * fix xpu enforce error * fix inference test failed * remove ordered_map to solve quant failed * fix part of rcom compile faild * add more register kernels * revert scale kernel temporarily * fix code format error * add new kernel registrar marco * rename top to tcmpt * revert xpu, npu, mkldnn impl & remove op def * add kernel args parse functor to auto parse args * revert some change & add scale kernels * add op proto in dygraph kernelcontext building * polish kernel dispatch logic & nameing rule * fix scale kernel match error * fix scale test failed * add mean API and unittest * test mean api success * add branch to solve compiled error * skip clang format error * add mean skip rule in op_library * add dot kernel, api and unittest (#6) * remove old kernel and add symbol link * fix dot compiled failed * add merco for module declare * fix npu and xpu compile error * revert sign, mean, scale, dot kernel removing * add comment for keeping old kernel impl * fix mutable_data error * fix bfloat16 conflit * fix inference undef error * adapt to msvc compile rules * polish comment for template inst * add cmake template instantiation for win * fix backend to place device id bug * fix ifdef error * Op2functor (#7) * add kernel args maker class * make args maker non-const * remove debug log * modify codes by review options * split constructPrKernelContext function * fix output name bug * fix test_mean_op test_sign_op failed * fill_any_like kernel refactor (#10) * fill_any_like kernel refactor * remove useless code of full_like c++ api * skip dtype for fill_any_like * add attrs for kernel key constrcut * add use_pt_kernel Flags to control whether to use pt kernel (#13) * add use_pt_kernel Flags to control whether to use pt kernel * change the default value to true for cheking pt kernels * fix mutable_data cuda place error * move high level apis into hapi * remove selectedrows adapting temporarily * Support Scalar in Tensor Compute Library (#14) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * remove mkldnn tensor & polish details * use flat_hash_map and small_vector in kernel factory * Refactor flatten kernel (#12) * refactor flatten kernel * update infershape function * fix compile bugs * fix bugs when merge * fix compiler bugs * fix bugs when run test_flatten_api * fix bugs when run test * Revert "use flat_hash_map and small_vector in kernel factory" This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b. * Move cpu, cuda and other device code into kernels (#15) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Perfect unitests (#16) * perfect unittest * update license * replace with flat_hash_map, small_vector (#19) * fix small_vector build error on windows platform * replace with flat_hash_map, small_vector * remove todo * Perfect unitests (#20) * perfect unittest * update license * fix bug when run tcmpt_utils_test * refactor execution adapting impl * fix insert conflit * Fix CI bug of test_yolov3 (#21) * fill_any_like kernel refactor * remove useless code of full_like c++ api * Support Scalar in Tensor Compute Library * add scalar in dygraph and static graph mode * keep the basic type for attr, instead of using scalar for all * merge the code * start refactor matmul * move cpu, cuda and other device modules into kernels * merge code * polish code in operator.cc * Fix CI bug of test_yolov3 * add the tensor base class, test=develop (#17) * update the tensor base class, test=develop * remove two funcs, test=develop * update the error msg, test=develop Co-authored-by: NChen Weihang <chenweihang@baidu.com> * [no-verify] commit backend and tensor signature changes * Rename tcmpt to pten (#23) * rename tcmpt to pten * update omitted files for rename to pten * update omitted file for rename to pten * remove k of all enum var * remove kernel_instantiate (#26) * remove symbols and spatial_tensor * change common to functions * readd share tensor impl methods * add a candidate dense tensor class, test=develop (#28) * change all Pt to Pten * resolve conflit with xiaowei * Op2functor opt1 (#27) * replace to small vector and change to const & * add std::move Co-authored-by: NChen Weihang <chenweihang@baidu.com> * polish kernel factory and kernel registry * fix operator test error msg mismatch * remove tensor signature and backend set member * move scalar and polish enforce * revert dtype layout change to fix error * fix enum operator override error * add several base unittests * add pten utils tests * polish some details * Dev/op2func refactor 3 (#30) * add a candidate dense tensor class, test=develop * remove TensorBase::backend(), test=develop * remove some ops, test=develop * cherry-pick the pr of tensor meta, test=develop * moves the dense tensor and some ops, test=develop * update the linalg operator, test=develop * update other operators, test=develop * fix errors, test=develop * fix bugs, test=develop * try to resolve the problem of windows ci, test=develop * updates codes, test=develop * fix the tensor_utils.cc, test=develop * modify the dense tensor, test=develop * fix the data type, test=develop Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> * polish some details * polish kernel signature details * fix a bug about offsets of the tensor, test=develop (#31) Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> * polish some details Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com> Co-authored-by: Nzyfncg <1370305206@qq.com> Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com> Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
-
- 29 10月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* enable check_nan_inf and fix variable scope * add ut * fix bug * update ut * revert doc change * fix npu compile
-
- 24 10月, 2021 1 次提交
-
-
由 Zhen Wang 提交于
-
- 20 10月, 2021 1 次提交
-
-
由 Steffy-zxf 提交于
Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent. * support the text string as an input Tensor * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization. * It first applies basic tokenization, followed by wordpiece tokenization.
-
- 11 10月, 2021 1 次提交
-
-
由 Huihuang Zheng 提交于
Add use_cinn flag and use it to control whether we run PaddlePaddle using CINN. Also add: Replace PaddlePaddle graph with a CINN graph in a pass PE Method to feed data and run the graph by CINN
-
- 28 9月, 2021 1 次提交
-
-
由 Huihuang Zheng 提交于
* Add Basic CINN Runner Class * Add CinnCacheKey * Add Cache logic and improve CinnCacheKey * Modify as reviewer commented * Implement hash_combine to fix MAC build.
-
- 17 9月, 2021 1 次提交
-
-
由 wuhuanzhou 提交于
#### 背景 #35602 提供Python侧开发子图替换类Pass的方式: - 利用Paddle Python API或者辅助类型定义子图program用来匹配/替换图; - Python侧注册Pass时,将注册函数最终转换为protobuf定义的PassDesc数据形式,供C++侧进行解析完成Pass实例注册。 本PR即为根据PassDesc规则描述解析生成Pass实例。 #### 方案设计 ##### Pass规则验证 在以往的Pass开发中,会存在随着算子迭代引发的匹配失效或者错误匹配的问题,该问题可以通过扫描算子支持的参数设置及参数类型等来判断是否应该使用该Pass或者给出提示需要修改Pass代码。 当前Pass开发中提供了算子兼容性OpCompatSensiblePass用于解决上述问题。但同时还存在不足:由于以往Pass开发在运行时才能获取到pattern信息,所以需要在执行Pass时才可以判断。 使用PassDesc表示的Pass可以在执行Pass前验证上述问题,这个过程在VerifyDesc中完成。 ##### 根据匹配子图构造pattern GeneratePass对于图匹配和替换使用GraphPatternDecetor完成,构造匹配pattern实际上就是将对应对象成员PDPattern中添加PDNode和边关系。该过程在函数`InitGeneratePattern`中完成,该函数没有作为GeneratePass的成员方法,主要出于后续可能开发新的Decetor考虑,GeneratePass与Decetor的操作是没有关联的。 初始化pattern主要通过遍历匹配子图program的全部算子实现: 1. 添加当前算子对应PDNode及限制条件(算子类型、属性限制等); 2. 遍历当前算子对应输入并从pattern中尝试获取PDNode: - 在pattern中获取到PDNode且为输出节点:表示属于匹配子图的中间节点,将该PDNode设置为中间节点; - 在pattern中没有获取到PDNode:添加该输入PDNode并设置作为输入节点; - 设置输入到算子的边关系; 3. 遍历当前算子对应输出: - 在pattern中获取到PDNode且为输入节点:表示属于匹配子图的中间节点,将该PDNode设置为中间节点; - 在pattern中没有获取到PDNode:添加该输入PDNode并设置作为输出节点; - 设置算子到输出的边关系; ##### 根据替换子图操作graph 替换子图操作的过程在`GetGenerateRewrite`函数中完成,与`InitGeneratePattern`类似没有作为GeneratePass的成员方法。 生成替换子图操作过程如下: 1. 判断冗余替换子图; 2. 遍历替换子图program的全部算子添加替换子图Node: 1. 添加当前算子的Node及属性设置; 2. 遍历当前算子对应输入,添加中间variable节点; 3. 遍历当前算子对应输出,添加中间variable节点; 4. 添加输入/输出节点与算子节点的边关系; 3. 删除匹配图中属于中间节点的Node; ##### 优化子图验证 对于替换子图或者替换后的计算图是否可以正确运行等,可以在执行Pass时验证,从而防止在后续执行计算图时出现异常。 当前Pass执行直接修改计算图,验证失败时无法很好的完成还原操作,目前子图验证暂时默认成功,留到后续改进。
-
- 16 9月, 2021 1 次提交
-
-
由 wuhuanzhou 提交于
PR主要功能:针对fusion等子图替换场景,支持Python侧开发并注册Pass。 背景 Pass是指输入一个深度学习计算图Graph,依照一定条件进行修改,输出修改后的Graph的过程; 当前PaddlePadle框架编写Pass代码存在以下问题: 用户需要手写Graph的条件匹配、在Graph上的修改代码; 对Graph操作需要深入底层框架代码,了解Graph的结构,并且知道相关Pass写法; 我们提出了针对fusion等子图替换类Pass的优化方案以支持用户在Python侧开发注册Pass,提升二次开发体验: 用户只需要输入匹配和替换的子图描述,由深度学习框架编写的代码来生成匹配和替换的逻辑,不需要用户对Graph进行匹配和替换操作; API级别的替换,用户可以通过Paddle的Python API构造子图,从而不需要知道Graph的结构,也能写Paddle的Graph Pass代码
-
- 07 9月, 2021 1 次提交
-
-
由 yaoxuefeng 提交于
-
- 24 8月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 18 8月, 2021 2 次提交
-
-
由 wanghuancoder 提交于
* code refactoring, test=develop * refine, test=develop * refine, test=develop * refine, test=develop
-
由 Chen Weihang 提交于
* fix ext_tensor.cast failed bug * remove useless deps * fix windows cmake failed * try to fix windows make failed * fix make error on windwos
-
- 11 8月, 2021 1 次提交
-
-
由 lilong12 提交于
* add auto_parallel apis
-
- 10 8月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* add any.hpp to utils and replace boost::any with self defined paddle::any * add copy any.hpp to custom op depends * modify any.hpp include path * remove boost from setup.py.in * add copy any.hpp to custom op depends * move any.hpp to paddle/utils/ dirs * move any.h to extension/include direction * copy utils to right directions
-
- 05 8月, 2021 1 次提交
-
-
由 hong 提交于
* first test version * add test exec; * add data transfer; test=develop * add new exec head; * add memcpy; test=develop * add python fetch * add new test * add graph node; test=develop * remove useless new executor test; test=develop * remove gperf dependency; test=develop * fix compile bugs; test=develop * remove useless code; test=develop * remove useless code; test=develop * add uni test; test=develop * polish code; test=develop * polish code; test=develop * add interpreter cmakefile; test=develop * remove useless code; test=develop
-
- 03 8月, 2021 2 次提交
-
-
由 QingshuChen 提交于
* support Kunlun2 * support KL2 * support KL2
-
由 zhouweiwei2014 提交于
-
- 20 7月, 2021 1 次提交
-
-
由 Huihuang Zheng 提交于
Add boost as dependency to fix random compilation failure. This is due to program_processing.cc used boost but didn't write boost into DEPS in the CMakeLists.txt
-
- 15 7月, 2021 3 次提交
-
-
由 huangxu96 提交于
This PR creates a class to process the program at the C++ level. Currently, this class has one class method: GetInputsOutputsInBlock()
-
由 Zhanlue Yang 提交于
* Add DCU backend support for custom ops * Added checks for DeviceCopy and renamed some macros
-
由 Aurelius84 提交于
* Refine Constructor logic of ParallelExecutor * Replace executor into ParallelExecutor in run_program_op
-
- 14 7月, 2021 1 次提交
-
-
由 zhouweiwei2014 提交于
* Support sccache to speed up compilation on Windows * Support sccache to speed up compilation on Windows
-
- 13 7月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
-
- 02 7月, 2021 2 次提交
-
-
由 houj04 提交于
-
由 zhouweiwei2014 提交于
-
- 29 6月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* remove heterbox * remove heterbox
-
- 07 6月, 2021 1 次提交
-
-
由 王明冬 提交于
-
- 03 6月, 2021 1 次提交
-
-
由 王明冬 提交于
-
- 25 5月, 2021 1 次提交
-
-
由 石晓伟 提交于
* add the op def proto, test=develop * add while.pbtxt
-
- 18 5月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* unit double * unit double
-
- 10 5月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* pslib with cmake * heter util * vlog * heter server test * add dtor * cmake
-
- 07 5月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
* Remove paddle_custom_op dynamic libraries, change link to FLUID_CORE on windows, and check copy_to * fix CI
-
- 28 4月, 2021 1 次提交
-
-
由 Thunderbrook 提交于
* Revert "Revert "[PsCore] optimize performance of large kv (#32535)" (#32599)" This reverts commit 809ac036. * brpc dep
-
- 15 4月, 2021 1 次提交
-
-
由 123malin 提交于
* add index_dataset and index_sampler for tree-based model
-
- 09 4月, 2021 1 次提交
-
-
由 Aurelius84 提交于
* Remove old custom OP to reduce whl package volume * [Custom OP]Remove old custom OP to reduce whl package volume * support macos
-
- 30 3月, 2021 1 次提交
-
-
由 Zhou Wei 提交于
* Remove old custom OP to reduce whl package volume * [Custom OP]Remove old custom OP to reduce whl package volume
-
- 18 3月, 2021 1 次提交
-
-
由 Chen Weihang 提交于
* support custom complex op * fix detail error * add inference support * fix setup windows failed
-
- 04 3月, 2021 1 次提交
-
-
由 wuhuanzhou 提交于
-
- 27 2月, 2021 1 次提交
-
-
由 石晓伟 提交于
-
- 25 2月, 2021 1 次提交
-
-
由 Qi Li 提交于
-