- 03 11月, 2021 1 次提交
-
-
由 LiYuRio 提交于
-
- 02 11月, 2021 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Refactor conv2d int8 unit test * Correct according to review and add int8 check
-
- 29 10月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* fix some bug in new executor, test=develop * fix error message, test=develop
-
- 27 10月, 2021 1 次提交
-
-
由 taixiurong 提交于
-
- 22 10月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* [hapi] support dygrapg amp O2 * fix problem of static pure fp16 in hapi * fix bug * fix format * fix ut * follow comments * update ut * update amp save/load * fix ut * refine code format
-
- 20 10月, 2021 1 次提交
-
-
由 Steffy-zxf 提交于
Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent. * support the text string as an input Tensor * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization. * It first applies basic tokenization, followed by wordpiece tokenization.
-
- 13 10月, 2021 1 次提交
-
-
由 Huihuang Zheng 提交于
Remove RunFromCinn method in PE because We Will Call CinnRunner in Compute method of SubgraphOp
-
- 11 10月, 2021 1 次提交
-
-
由 Huihuang Zheng 提交于
Add use_cinn flag and use it to control whether we run PaddlePaddle using CINN. Also add: Replace PaddlePaddle graph with a CINN graph in a pass PE Method to feed data and run the graph by CINN
-
- 08 10月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* support CUDA Graph on PE * add ut, fix CI compile * reduce memory consumption * fix CUDA 10 CI * improve coverage * improve python coverage
-
- 29 9月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* add basic support for CUDA Graph * fix ci compile error * fix LOG print, fix windows CI * follow comments and update * small fix for default ctor * fix rocm compile error * fix CPU compile error
-
- 28 9月, 2021 2 次提交
-
-
由 Yanxing Shi 提交于
* Initial Commit * add unittest and add error information * modify doc * fix some error * fix some word * fix bug cudaDeviceProp* and modify error explanation * fix cudaDeviceProp* error and unnitest samples * fix hip error and PADDLE_WITH_HIP * update style * fix error is_compiled_with_cuda * fix paddle.device.cuda.get_device_properties * fix error for multi thread safe * update style * merge conflict * modify after mentor review * update style * delete word * fix unittest error for windows * support string input and modify some code * modify doc to support string input * fix error for express information * fix error for express information * fix unnitest for windows * fix device.startswith('gpu:') * format error and doc * fix after review * format code * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix py2 error * fix wrong words and doc * fix _gpuDeviceProperties
-
由 Siming Dai 提交于
-
- 18 9月, 2021 2 次提交
-
-
由 Huihuang Zheng 提交于
Add basic Cost Model, it uses executor to run program and profile it to get op time. This is an early basic version, we will add more functions in the future.
-
由 Aurelius84 提交于
* Clean ParaseMemInfo and fix unittest with multi-thread * fix declare
-
- 17 9月, 2021 1 次提交
-
-
由 wuhuanzhou 提交于
#### 背景 #35602 提供Python侧开发子图替换类Pass的方式: - 利用Paddle Python API或者辅助类型定义子图program用来匹配/替换图; - Python侧注册Pass时,将注册函数最终转换为protobuf定义的PassDesc数据形式,供C++侧进行解析完成Pass实例注册。 本PR即为根据PassDesc规则描述解析生成Pass实例。 #### 方案设计 ##### Pass规则验证 在以往的Pass开发中,会存在随着算子迭代引发的匹配失效或者错误匹配的问题,该问题可以通过扫描算子支持的参数设置及参数类型等来判断是否应该使用该Pass或者给出提示需要修改Pass代码。 当前Pass开发中提供了算子兼容性OpCompatSensiblePass用于解决上述问题。但同时还存在不足:由于以往Pass开发在运行时才能获取到pattern信息,所以需要在执行Pass时才可以判断。 使用PassDesc表示的Pass可以在执行Pass前验证上述问题,这个过程在VerifyDesc中完成。 ##### 根据匹配子图构造pattern GeneratePass对于图匹配和替换使用GraphPatternDecetor完成,构造匹配pattern实际上就是将对应对象成员PDPattern中添加PDNode和边关系。该过程在函数`InitGeneratePattern`中完成,该函数没有作为GeneratePass的成员方法,主要出于后续可能开发新的Decetor考虑,GeneratePass与Decetor的操作是没有关联的。 初始化pattern主要通过遍历匹配子图program的全部算子实现: 1. 添加当前算子对应PDNode及限制条件(算子类型、属性限制等); 2. 遍历当前算子对应输入并从pattern中尝试获取PDNode: - 在pattern中获取到PDNode且为输出节点:表示属于匹配子图的中间节点,将该PDNode设置为中间节点; - 在pattern中没有获取到PDNode:添加该输入PDNode并设置作为输入节点; - 设置输入到算子的边关系; 3. 遍历当前算子对应输出: - 在pattern中获取到PDNode且为输入节点:表示属于匹配子图的中间节点,将该PDNode设置为中间节点; - 在pattern中没有获取到PDNode:添加该输入PDNode并设置作为输出节点; - 设置算子到输出的边关系; ##### 根据替换子图操作graph 替换子图操作的过程在`GetGenerateRewrite`函数中完成,与`InitGeneratePattern`类似没有作为GeneratePass的成员方法。 生成替换子图操作过程如下: 1. 判断冗余替换子图; 2. 遍历替换子图program的全部算子添加替换子图Node: 1. 添加当前算子的Node及属性设置; 2. 遍历当前算子对应输入,添加中间variable节点; 3. 遍历当前算子对应输出,添加中间variable节点; 4. 添加输入/输出节点与算子节点的边关系; 3. 删除匹配图中属于中间节点的Node; ##### 优化子图验证 对于替换子图或者替换后的计算图是否可以正确运行等,可以在执行Pass时验证,从而防止在后续执行计算图时出现异常。 当前Pass执行直接修改计算图,验证失败时无法很好的完成还原操作,目前子图验证暂时默认成功,留到后续改进。
-
- 16 9月, 2021 2 次提交
-
-
由 0x45f 提交于
* fix no_grad context error in dy2stat * remove useless comments * fix error by drop_kids in python * add test and fix review
-
由 wanghuancoder 提交于
-
- 14 9月, 2021 1 次提交
-
-
由 chenenquan 提交于
* Add empty_cache api to release idle gpu memory hold by allocator,test=develop * Add empty_cache api to release idle gpu memory hold by allocator,test=develop * Add empty_cache api to release idle gpu memory hold by allocator,test=develop * Fix test coverage problem for empty_cache * delete redundant check for empty_cache * fix the problem of empty_cache's doc * delete the nvidia-smi comment in doc of empty_cache, test=document_fix
-
- 08 9月, 2021 1 次提交
-
-
由 Leo Chen 提交于
* release gil before op run * support npu grad test * fix op_test
-
- 31 8月, 2021 1 次提交
-
-
由 Aurelius84 提交于
* polish code * fix unittest on windows * refine pybind interface * support statistic MemSize of AllocatorPool * Replace mutex into atomic
-
- 26 8月, 2021 1 次提交
-
-
由 Siming Dai 提交于
* add dlpack api and fix a from_dlpack
-
- 24 8月, 2021 1 次提交
-
-
由 wanghuancoder 提交于
* add fetch, test=develop * fix fetch2op, test=develop * fix fetch2op, test=develop * refine, test=develop * fix fetch ctx, test=develop * add wait, test=develop * rename fetch2 to fetch_v2, test=develop * merge, test=develop
-
- 18 8月, 2021 2 次提交
-
-
由 wanghuancoder 提交于
* code refactoring, test=develop * refine, test=develop * refine, test=develop * refine, test=develop
-
由 Zhanlue Yang 提交于
* Add function to disable paddle signal handler Paddle used google::InstallFaultSignalHandler to handle selected system signals, mainly for debugging and bug report purposes. However, this can be conflicted with other python packages whoever captures similar signals. Such python package involves tvm and more To resolve this issue, we support a function to disable signal handler * Remove signal test from WIN32 platform * Remove redundant return from disable_signal_handler() function * Add detailed messages to en_doc
-
- 17 8月, 2021 2 次提交
-
-
由 chentianyu03 提交于
* copy boost optional.hpp to paddle * copy boost optional.hpp to paddle * move directions * del fluid/utils * modify .hpp to .h * move directions * modify to paddle::optional * add modification description * format code stype for the files in paddle/utils * format code stype
-
由 Zeng Jinle 提交于
* add inplace passes and tests * update * fix use_cuda undefined fix compile error of op compat * add more ut * fix CPU CI error * check adam unique * fix mac/windows ci, improve coverage * fix ci error * follow weihang's comment * fix BlockDesc::MoveFrom * follow qiuliang's comment * update * follow huihuang's comments
-
- 13 8月, 2021 1 次提交
-
-
由 ronnywang 提交于
-
- 11 8月, 2021 2 次提交
- 06 8月, 2021 1 次提交
-
-
由 TTerror 提交于
-
- 05 8月, 2021 1 次提交
-
-
由 hong 提交于
* first test version * add test exec; * add data transfer; test=develop * add new exec head; * add memcpy; test=develop * add python fetch * add new test * add graph node; test=develop * remove useless new executor test; test=develop * remove gperf dependency; test=develop * fix compile bugs; test=develop * remove useless code; test=develop * remove useless code; test=develop * add uni test; test=develop * polish code; test=develop * polish code; test=develop * add interpreter cmakefile; test=develop * remove useless code; test=develop
-
- 03 8月, 2021 1 次提交
-
-
由 QingshuChen 提交于
* support Kunlun2 * support KL2 * support KL2
-
- 02 8月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* add basic APIs * add attr_types * follow comments * change pass attr types * add set pass attribute codes * refine PADDLE_THROW
-
- 29 7月, 2021 1 次提交
-
-
由 Zeng Jinle 提交于
* add fix op run order pass * add ut for fix_op_run_order * fix ci error * improve coverage * improve coverge again and fix cpu test case * follow some comments
-
- 27 7月, 2021 1 次提交
-
-
由 Aurelius84 提交于
Revert "Revert "[Dy2Stat] Refactor ExecutorCache logic and pre-support BuildStrategy for pass (#34181)" (#34348)" (#34384) This reverts commit 577fdde5.
-
- 23 7月, 2021 1 次提交
-
-
由 Aurelius84 提交于
Revert "[Dy2Stat] Refactor ExecutorCache logic and pre-support BuildStrategy for pass (#34181)" (#34348) This reverts commit 609f8225.
-
- 22 7月, 2021 2 次提交
-
-
由 Aurelius84 提交于
* modify into program_id * fix cache_info declare problem * fix python int to C long problem * modify point to reference * add ENVS
-
由 Leo Chen 提交于
-
- 19 7月, 2021 1 次提交
-
-
由 chentianyu03 提交于
* add cuda event and stream api * add cuda event and stream api * add get_current_stream api * add get_current_stream api * init streams * modify get_current_stream * modify get_cuttent_stream * add synchronize func * add current_stream doc and test file * move get_current_stream into CUDA macro * move CudaEvent into CUDA macro * move _get_current_stream and _device_synchronize into cuda macro * modify the macro of cuda stream and event * add test case for synchronize * add paddle.devices.cuda module * event and stream support hip * add doc for stream and event class * move cuda stream and event into single pybind * add cuda_streams_py.cc to cmakelist * add _device_synchronize and _get_current_stream to core module * add test case for cudastream and cudaevent * move __all__ in streams.py * fix test fail * add cuda to devices __all__ * fix current_stream doc writing error * move devices to device direction, and merge device.py into __init__.py * add required:gpu to sample codes * remove cuda direction from device/__init__.py
-
- 15 7月, 2021 1 次提交
-
-
由 Aurelius84 提交于
* Refine Constructor logic of ParallelExecutor * Replace executor into ParallelExecutor in run_program_op
-