- 15 11月, 2019 2 次提交
-
-
由 xujiaqi01 提交于
* fix cache table bug * add save_paddle_inference_model * fix hdfs util bug * test=develop
-
由 xujiaqi01 提交于
* copy some feasigns and corresponding embeddings from one sparse table to another * copy all feasigns and corresponding embeddings from one sparse table to another * copy all dense params from one table to another * copy some local vars to other local vars
-
- 04 11月, 2019 1 次提交
-
-
由 Thunderbrook 提交于
test=develop
-
- 31 10月, 2019 2 次提交
-
-
由 Chengmo 提交于
* fix PaddleCloud Role maker & add warning in distribute transpiler & change rpc_retry_times
-
由 Thunderbrook 提交于
* support dump param to afs test=develop * code style test=develop * code style test=develop * dump param test=develop * dump param test=develop * dump param test=develop * dump param test=develop
-
- 25 10月, 2019 1 次提交
-
-
由 xujiaqi01 提交于
* no longer need to define all embedding layers (no one less) of all slots in each program. make trainer_param repeated in ps.proto. * add find_distributed_lookup_table_grads instead of hard code GRAD * support embedding stop gradient. push sparse has error before fix this.* * fix fill sparse, skip slots which do not have embedding. each slot's embedding in a sparse table should be used in all training programs before fix this. * fix pull sparse, skip slots which do not have embedding. * fix collect feasign label info, skip slots which do not have embedding. * support when there are multi sparse tables in one or multi training programs, each program can pull/push its own related sparse tables instead of all sparse tables. * test=develop
-
- 18 10月, 2019 1 次提交
-
-
由 xujiaqi01 提交于
* add check nan / inf in downpour worker during training * test=develop
-
- 15 10月, 2019 1 次提交
-
-
由 Chengmo 提交于
* test=develop,Fix communicator slow bug * test=develop, delete if() in stop_worker() * test=develop * fix UT, test=develop * fix bug in fetch handler, test=develop * fix bug in fetch handler, test=develop * test=develop, fix fetch barrier bug * test=develop, bug fix * test=develop, bug fix * test=develop, fix bug
-
- 14 10月, 2019 1 次提交
-
-
由 Thunderbrook 提交于
* support dump multi file test=develop * dump fix num file test=develop
-
- 12 10月, 2019 1 次提交
-
-
由 zhang wenhui 提交于
-
- 11 10月, 2019 1 次提交
-
-
由 zhang wenhui 提交于
* fix fc sort . test=develop
-
- 07 10月, 2019 1 次提交
-
-
由 zhang wenhui 提交于
-
- 30 9月, 2019 1 次提交
-
-
由 Chengmo 提交于
* refector geo sgd & communicator
-
- 24 9月, 2019 1 次提交
-
-
由 xujiaqi01 提交于
* support change shuffle thread num * support change train thread num * fix receive shuffle data of each channel * data norm stop gradient * add check thread_tensor type and root_tensor type when merge metric * remove sleep in shuffle, add config * add config of pslib client to client communication * fix xbox str * add data norm op testcase * add flush in trainer finalize
-
- 06 9月, 2019 1 次提交
-
-
由 123malin 提交于
* fleet api add input check, test=develop
-
- 30 8月, 2019 1 次提交
-
-
由 yaoxuefeng 提交于
* add thread scope stat accurate metrics test=develop * fix style * fix style * fix style * fix style test=develop * fix style test=develop * fix style test=develop * fix style test=develop * fix style test=develop * fix style test=develop * fix style test=develop * fix conflict * fix style * fix style test=develop * fix error test=develop * fix error test=develop
-
- 29 8月, 2019 2 次提交
-
-
由 Thunderbrook 提交于
* dump slot * test * proto * dump slot * test * proto * code style * code style * code style * style * add delete after unseen days * add unseen days * code style * conflict solve test=develop * add clear model * code style test=develop * code style test=develop * support debug tensor of each ins test=develop * support debug tensor of each ins test=develop * learning rate * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style * code style test=develop * code style test=develop * unitest * style * style * multi phase * add channel * code style * style * style * unitest * style * define * define test=develop * style test=develop * rm define test=develop * linux * linux test=develop * style test=develop * output format test=develop * windows ci test=develop
-
由 zhang wenhui 提交于
fleet_desc sort fc name by dictionary sort, but we want to sort by number.
-
- 28 8月, 2019 1 次提交
-
-
由 tangwei12 提交于
* fix correctness of the communicator * fix a bug in send thread when sending var context is empty, test=develop * add lookup_table_prefetch_op and prefetch optimize, test=develop * remove remote prefetch GPU supported * word2vec force with CPU, test=develop * test dist remote lookup table force with CPU, test=develop
-
- 27 8月, 2019 1 次提交
-
-
由 zhang wenhui 提交于
fix fleet_desc dense_table unsort bug ,not support format for abacus hotstart yet.
-
- 23 8月, 2019 1 次提交
-
-
由 zhang wenhui 提交于
add fleet_desc config feature & multi_sparse table,
-
- 16 8月, 2019 1 次提交
-
-
由 gongweibao 提交于
node_num is not needed for users, so remove them and fix the bugs about it!
-
- 14 8月, 2019 3 次提交
-
-
由 jiaqi 提交于
* fix default value in ps_pb2.py: delta_keep_days 30 -> 16 * test=develop
-
由 jiaqi 提交于
* add get_last_save_xbox_base/get_last_save_xbox * fix fleet_util bug of load paddle model * add doc string in fleet api
-
由 jiaqi 提交于
* fix default value of fleet desc, default values are same with jingpai * print log when save model
-
- 12 8月, 2019 1 次提交
-
-
由 gongweibao 提交于
Polish fleet API to support cuda collective mode and nccl2 mode
-
- 11 8月, 2019 1 次提交
-
-
由 yaoxuefeng 提交于
add save cache model api in fleet& add slots shuffle in dataset module & add metric op to calculate ctr related metrics (#18871) * add ctr related metric layer test=develop * add save cache and slots shuffle test=develop * add save cache and slots shuffle test=develop * fix error * fix error * fix style for ci * fix for comments * change SlotsShuffle input to std::strinf for generality * fix style * fix style * fix style * fix style * fix style * fix style * fix stylr * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * fix style * change non-const reference to pointer * fix style * fix style * fix style test=develop * fix style test=develop * add return ins num in ctr metric op * change dtype to float in metric_op.py * fix error test=develop * fix style test=develop * fix API spec * fix API spec * fix API spec test=develop * add UT test=develop
-
- 01 8月, 2019 1 次提交
-
-
由 jiaqi 提交于
adjust ins weight according to nid slot , user can specify adjust_ins_weight in strategy
-
- 31 7月, 2019 1 次提交
-
-
由 jiaqi 提交于
(1) set fleet_send_batch_num a default value according to trainer num, the previous 80000 is fixed,if trainer num is much less or larger than 100,global shuffle may have timeout error. (2) fix load one table bug, add barrier
-
- 29 7月, 2019 1 次提交
-
-
由 Thunderbrook 提交于
* dump slot * test * proto * dump slot * test * proto * code style * code style * code style * style * add delete after unseen days * add unseen days * code style * conflict solve test=develop * add clear model * code style test=develop * code style test=develop
-
- 25 7月, 2019 1 次提交
-
-
由 fuyinno4 提交于
Fix FleetWrapper: 1. fix shrink dense: just scale show 2. add datanorm scale: divide datanorm's gradient by batch_size
-
- 24 7月, 2019 1 次提交
-
-
由 Thunderbrook 提交于
The change includes 2 things: 1. save delta model and shrink table are control by the same parameter before, now add delete_after_unseen_days to control shrink table. 2. value in sparse table has no slot before, now add slot in sparse table, and add DownpureCtrAccessor to support the new meta. test=develop
-
- 23 7月, 2019 1 次提交
-
-
由 jiaqi 提交于
(1)support patch data (merge slots of instances of same line id, modify dense layer which changes its size) (2)add fleet load_one_table interface, support load from paddle model and load from pslib model (3)fix push sparse bug which cause push sparse cost more time(about 10% in my testcase) (4)when some slots are not in one of your network (join/update, etc.),data feed、collect label info、push/pull sparse will skip these slots, instead of throw error. (5)add more debug info in TrainFilesWithProfiler
-
- 02 7月, 2019 1 次提交
-
-
由 guru4elephant 提交于
make fleet support mpi job submit directly.
-
- 27 6月, 2019 1 次提交
-
-
由 tangwei12 提交于
* add is_runnning in communicator, test=develop
-
- 13 6月, 2019 1 次提交
-
-
由 tangwei12 提交于
-
- 12 6月, 2019 1 次提交
-
-
由 tangwei12 提交于
* fix save/load in Fleet * add UT framework of Fleet
-
- 23 5月, 2019 1 次提交
-
-
由 Qiao Longfei 提交于
Async exe support communicator
-
- 17 5月, 2019 1 次提交
-
-
由 jiaqi 提交于
test=develop
-
- 15 5月, 2019 1 次提交
-
-
由 jiaqi 提交于
* support config file, cvm, load, save, shrink test=develop * fix error of worker_num & add table.compress_in_save test=develop * fix code style test=develop * fix save model bug test=develop
-