- 01 8月, 2019 3 次提交
-
-
由 wawltor 提交于
* test=develop Add the op of unique_with_counts, the op is calc the unqiue input of data, and output the corresponding indices and count of data. * test=develop Check the input and dtype in the op of unique_with_counts * test=develop test=document_preview update the API.spec for `unique_with_counts`, at the same time, optimize the python api in the op of `unique_with_count` * test=develop test=document_preview Fix some python api problem in the op of `unique_with_counts`, and change the error messsage in this op. * Fix some API problem in the op of `unique_with_counts` test=develop test=document_preview * test=develop test=document_preview Fix the api sample of op `unique_with_counts`, and update api.spec
-
由 LielinJiang 提交于
* fix depthwise conv gpu kernel bug, test=develop * add more depthwise conv test, test=develop
-
由 whs 提交于
test=develop
-
- 31 7月, 2019 5 次提交
-
-
由 jiaqi 提交于
(1) set fleet_send_batch_num a default value according to trainer num, the previous 80000 is fixed,if trainer num is much less or larger than 100,global shuffle may have timeout error. (2) fix load one table bug, add barrier
-
由 chengduo 提交于
* update parallel.py test=develop
-
由 HaoRen 提交于
* support center loss * change tensor copy api to high level api tensorcopy * test=develop rewrite the center_loss cuda_kernel to make it faster and add document of the center loss api,also update test function * test=document_preview test=develop update document of center loss * test=document_preview test=develop modify API.spec modify test code remove nouse const_cast
-
由 lvmengsi 提交于
Update conv2d transpose link
-
由 Dong Daxiang 提交于
make dist unit test exclusive run
-
- 30 7月, 2019 3 次提交
-
-
由 whs 提交于
test=develop
-
由 danleifeng 提交于
-
由 chengduo 提交于
test=develop
-
- 29 7月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
* remove legacy memory optimization codes, test=develop * follow huihuang's comments,test=develop * follow luotao's comments, test=develop
-
由 Thunderbrook 提交于
* dump slot * test * proto * dump slot * test * proto * code style * code style * code style * style * add delete after unseen days * add unseen days * code style * conflict solve test=develop * add clear model * code style test=develop * code style test=develop
-
- 28 7月, 2019 2 次提交
-
-
由 Zeng Jinle 提交于
-
由 lvmengsi 提交于
* replace link * update api.spec * fix mistake
-
- 27 7月, 2019 2 次提交
- 26 7月, 2019 2 次提交
-
-
由 Adam 提交于
-
由 Zeng Jinle 提交于
* first version memory optimize pass, test=develop * remove move_tensor_sharing_pass, test=develop * refine code comments, add unittests, test=develop * turn off memory_optimize by default, test=develop * follow huihuang's comments, test=develop * follow chengduoZH's comments, test=develop * fix grammar error, add const qualifier, fix pass_test exception message, test=develop * follow chengduoZH's comments 2nd, test=develop
-
- 25 7月, 2019 4 次提交
-
-
由 石晓伟 提交于
* fix logical APIs test=develop test=document_preview * fix isfinite * update matmul comments * update API.spec test=document_preview test=develop * update API.spec test=document_preview test=develop * update API.spec test=document_preview test=develop
-
由 guru4elephant 提交于
refine launch_ps and role_maker
-
由 fuyinno4 提交于
Fix FleetWrapper: 1. fix shrink dense: just scale show 2. add datanorm scale: divide datanorm's gradient by batch_size
-
由 guru4elephant 提交于
* split test_dist_se_resnext.py into 4 testcases
-
- 24 7月, 2019 5 次提交
-
-
由 Bob Zhu 提交于
* extend matmul op to support multiple head multiplication With the support of multiple head, the multiplication of two big matrixes is split into multiplication of several (head_number) small matrixes. e.g. if Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].
-
由 whs 提交于
* Make lod reset op support for append lod level. * Fix API.spec test=develop * Fix unitest. test=develop * Add python api for lod append. test=develop * Fix API.spec test=develop * Fix format of doc. test=develop * Fix unitest. test=develop * Fix doc. test=develop
-
由 chengduo 提交于
* prun backward ops test=develop
-
由 JesseyXujin 提交于
Modify auc doc. Add output variable description, previously was the scalar type, now changed to the tuple type.test=develop (#18771)
-
由 Thunderbrook 提交于
The change includes 2 things: 1. save delta model and shrink table are control by the same parameter before, now add delete_after_unseen_days to control shrink table. 2. value in sparse table has no slot before, now add slot in sparse table, and add DownpureCtrAccessor to support the new meta. test=develop
-
- 23 7月, 2019 3 次提交
-
-
由 jiaqi 提交于
(1)support patch data (merge slots of instances of same line id, modify dense layer which changes its size) (2)add fleet load_one_table interface, support load from paddle model and load from pslib model (3)fix push sparse bug which cause push sparse cost more time(about 10% in my testcase) (4)when some slots are not in one of your network (join/update, etc.),data feed、collect label info、push/pull sparse will skip these slots, instead of throw error. (5)add more debug info in TrainFilesWithProfiler
-
由 chengduo 提交于
* support sparse gradients test=develop
-
由 Yi Liu 提交于
* supports distributed classification training * update API.spec * fix evenly division in python3 * change "index_range" to "index_num" in shard_index operator test=document_preview test=develop
-
- 22 7月, 2019 5 次提交
-
-
由 Zeng Jinle 提交于
-
由 Huihuang Zheng 提交于
The change includes 3 things: 1. Set CPU_NUM to 1 in the tests because the ParallelExecutor will print warning that CPU_NUM is not set and use default 1. 2. Old tests compare two RNNs, hand written simple RNN and same RNN built by Paddle, but initialized RNN weights in numpy random and Paddle random separately. Fixed it by setting weights and bias values. 3. Also set numpy random seed in the tests. Now the two RNNs diff can be smaller (rtol from 0.1, 0.2 to. 0.01) in the tests. test=develop
-
由 Tao Luo 提交于
test=develop
-
由 tangwei12 提交于
do some odd jobs, test=develop
-
由 guru4elephant 提交于
* split different comm method for mnist distributed training
-
- 19 7月, 2019 3 次提交
-
-
由 Huihuang Zheng 提交于
Test PaddingRNN on V100 GPU device. Test configuration: large model, padding mode (which is the mode using recurrentOp), one GPU. GPU memory (MiB): 6414 (this PR) vs 6837 (without this PR) Speed (steps/s): 10.28 (this PR) vs 9.89 (without this PR)
-
由 Adam 提交于
test=develop
-
由 tangwei12 提交于
* add check of executor, test=develop
-
- 18 7月, 2019 1 次提交
-
-
由 Zeng Jinle 提交于
* feature/auto_growth_allocator, test=develop * add unittest of AlignedAllocator, test=develop * try to turn on auto_growth to test on CI, test=develop * fix segmentation fault in mixed_vector.h, test=develop * add unittests, test=develop
-