- 17 12月, 2021 18 次提交
-
-
由 chentianyu03 提交于
* modify sum mean args * add GetExpectedPtenKernelArgs for redcue_op * modify kernel args number * modify kernel args number
-
由 LiYuRio 提交于
-
由 Zhanlue Yang 提交于
* Rearranged Eager AutoCodeGen directory structure * Removed USE_OP in Eager AutoCodeGen * Enabled generation for Operators without Grad/Inputs/Outputs * Resolved operators without input * Fixed merge conflicts * Enabled Eager AutoCodeGen for 10+ more operators * Refactored Eager AutoCodeGen with more organized helper objects * Enabled Eager AutoCodeGen for operators with multiple OpBases * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen * Adjusted function generation/call between Python-C API & Dygraph API * Synchronized auto-generated Python-C API with Dygraph Forward Functions * Generated CoreOpsInfos for potential use in append_op API * Fixed CI problem
-
由 kuizhiqing 提交于
-
由 heliqi 提交于
* add timeout * add timeout
-
由 Leo Chen 提交于
* Inspect the information inside a TRT engine. * Follow up the google code style. * Fix code error.
-
由 Aurelius84 提交于
* Add RWLock to protect loading module under multi-thread * refine code * remove import statement
-
由 zlsh80826 提交于
From --ptxas-options=-v, SegmentOpsKernel uses 66 registers in a block. There are two ways to resolve this problem: Reduce the threads per block launch configuration add __launch_bound__ to give information to nvcc compiler for reducing registers usage this PR chooses __launch_bound__ solution because changing gpu_launch_config may affect other ops.
-
由 zhaoyingli 提交于
* add gpt modeling * update file name
-
由 niuliling123 提交于
-
由 From00 提交于
* Get GPU BasePtr from CUDA allocation * Fix compile error for ROCm * Add BasePtr function for IPUPlace in naive_best_fit_allocator.cc * Add alignment for BuddyAllocator * Set address alignment of BuddyAllocator to 32 bytes * Fix CI error * Remove code for naive_best_fit strategy
-
由 From00 提交于
-
由 Chen Weihang 提交于
-
由 Yuang Liu 提交于
-
由 limingshu 提交于
* fix_bugs_for_elementwise_branch_selection * fix merge_dims bugs * fix all influenced file
-
由 houj04 提交于
-
由 jianghaicheng 提交于
* ipu add dockerfile * resolve comments
-
由 WangXi 提交于
-
- 16 12月, 2021 22 次提交
-
-
由 Sing_chan 提交于
-
由 Leo Chen 提交于
* fix cmake * not check execution time
-
由 Sing_chan 提交于
-
由 chentianyu03 提交于
-
由 Tomasz Socha 提交于
* Faster implementation of CPU kernel for ROI_ALIGN Operator * Add missing variable to CUDA roi_align_op * Style * Fix boundaries * Rename variables for indexes calculation * Remove unnecessary emplace * Revert "Remove unnecessary emplace" This reverts commit c10e87f7fb812f1a672fde32f2690a97d47e2f5a. * Style
-
由 chentianyu03 提交于
-
由 Zhanlue Yang 提交于
* Rearranged Eager AutoCodeGen directory structure * Removed USE_OP in Eager AutoCodeGen * Enabled generation for Operators without Grad/Inputs/Outputs * Resolved operators without input * Fixed merge conflicts * Enabled Eager AutoCodeGen for 10+ more operators * Refactored Eager AutoCodeGen with more organized helper objects * Enabled Eager AutoCodeGen for operators with multiple OpBases * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen * Enabled Eager AutoCodeGen for All Existing Operators & Possible Future Operators * Fixed CI issues * Fixed LD_LIBRARY_PATH for eager_code_generator
-
由 YUNSHEN XIE 提交于
-
由 xiaoting 提交于
* add activation * update activation_op * add unitest for activation * fix acosh for init, test=develop
-
由 feng_shuai 提交于
* conv_transpose_eltwiseadd_bn_fuse_pass * change timeout * add TIMEOUT * add random num for group and dilation * change PassCompat
-
由 yeliang2258 提交于
* add test for conv_elementwise_add2_act_fuse_pass and conv_elementwise_add_act_fuse_pass * Add conv_eltwiseadd_bn_fuse_pass test and fix test_conv_elementwise_addX_act_fuse_pass * add tests for conv_act_mkldnn_fuse_pass * add test for conv_bias_mkldnn_fuse_pass * update code * add conv_act_mkldnn_fuse_pass for relu, relu6, swish, leaky_relu * update test * update * update bug * update * update pattern_detector * fix test_conv_eltwiseadd_bn_fuse_pass * add diff display notest;test=windows_ci_inference * fix * remove test_conv_act_mkldnn_fuse_pass.py * ifix
-
由 Chen Weihang 提交于
* unify device context entrance * move all_context include to header * polish cmake relay for device_context * fix npu compile failed * fix npu compile failed
-
由 YUNSHEN XIE 提交于
-
由 zhangchunle 提交于
-
由 Chen Weihang 提交于
* add register_ctx_kernel and move scale kernel * polish details by reviewer comment * fix xpu compile failed * fix cmake error
-
由 wuhuanzhou 提交于
-
由 Jiabin Yang 提交于
* support eager switch system * polish code
-
由 danleifeng 提交于
* trainer_device fix and checknan tool for psgpu;test=develop * disable show_one_table;test=develop
-
由 tianshuo78520a 提交于
-
由 liutiexing 提交于
* add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2. * add os_info * update * update * update * update * update * update for bugfix * update * update * update Co-authored-by: Nliutiexing <liutiexing@google.com>
-
由 LJQ❤️ 提交于
Add elementwise_fmax and elementwise_fmin operators
-