提交 · 74626f662d7fb475266b9b7052fdf1fbb49544db · 机器未来 / Paddle

11 4月, 2022 1 次提交

Unittest recover (#41431) (#41590) · 74626f66

由 zhaocaibei123 提交于 4月 11, 2022

* update name

* update name

* fix test

* fix fleet bind

* update name

* update name

* fix test

* fix gpups wrapper

* remove Push/Pull/Load/Save with context in client and wrapper base class

* fix

* fix

* remove some interface

* fix

* remove

* code style

* recover

* fix

* remove code unused

* remove some unused table & accessor & CommonDenseTable => MemoryDenseTable

* fix

* fix

* fix

* recover

* remove unused code

* recover unittest

* fix

* remove

* fix

* remove code unuseful

* remove

* fix

* recover

* remove
Co-authored-by: Nesythan <esythan@126.com>
Co-authored-by: Nesythan <esythan@126.com>

74626f66

23 3月, 2022 1 次提交

two-phase training for ps (#40762) · b1a4668c

由 zhaocaibei123 提交于 3月 23, 2022

* fix benchmark and communicator config

* fix bugs of the_one_ps

* multi program and fix bug in optimizer

* multi program in the_one_ps

* public commcontext

* ps optimizer multi programs

* cvm & datanorm backend

* fix dim

* fix unittest

* fix

* the one ps merge

* remove comm

* add DownpourLiteWorker

* all

* fix

* fix

* device worker downpour lite

* fix

* fix bug in global shuffle

* save inference model

* fix & add log

* fix

* remove log

* fix

* fix save summary

* fix

* fix pscore

* fix

* fix

* fix

* fix

* fix

* remove logs

* fix

* fix

* fix

* fix

* fix

* add some comments

* fix
Co-authored-by: Nesythan <esythan@126.com>

b1a4668c

15 2月, 2022 1 次提交

[PTen]Migrate proto::VarType outside of Pten (#39411) · 7e7e9404

由 Aurelius84 提交于 2月 15, 2022

* #1 migrate dist-related type()-> dtype()

* move datatype function from pten -> fluid/framework

* change type() in imperative into convert(dtype())

* modify xx_tensor->type into xx_tensor->dtype

* change the set_type interface and the caller

* modify xx_tensor.type into xx_tensor.dtype

* fix mutable_data(place, dtype())

* change caller of mutable_data in pten and distributed

* change the caller of mutable_data in fluid/framework

* change the caller of mutable_data in imperative directory

* mutable_data: inference

* update the call of mutable_data

* transfer MakePenScalarArray MakePtenScalar ResetHolderWithType

* pass the compile. the next step is remove VarType in Pten

* fix all and remove VarType from pten. success in linux. Next task is other platform

* fix conflict with develop

* fix compiled error

* Fix reset conversion

* fix conflict

* fix compiled problem

* fix typo

* Fix << in tensor_utils.cc

* fix type->dtype

* fix unittest

* fix tensor init constructor

* fix DataTypeSize for BFloat16

* fix code style

* fix npu compiled error

* fix npu

* compile npu sucessfully

* fix conflict

* fix conflict
Co-authored-by: Nxiongkun <xiongkun03@baidu.com>

7e7e9404

11 3月, 2021 1 次提交
- T
  solve bug in heter mode (#31531) · 3789a699
  由 Thunderbrook 提交于 3月 11, 2021
```
* heter bug

* format

* format
```
  3789a699
04 2月, 2021 1 次提交
- W
  use iwyu clean include second time, test=develop (#30829) · 35c5b23f
  由 wanghuancoder 提交于 2月 04, 2021
```
* use iwyu clean include second time, test=develop
```
  35c5b23f
27 11月, 2020 1 次提交
- Y
  
  add user_define_dump (#28596) · 545df287
  由 yaoxuefeng 提交于 11月 27, 2020
  
  545df287
06 8月, 2020 1 次提交

add heter ps mode (#25682) · 0cb60c70

由 Thunderbrook 提交于 8月 06, 2020

* add heter ps mode

* code style
test=develop

* add with_pslib
test=develop

* unitest
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* code style
test=develop

* test monitor
test=develop

* prepare trainer
test=develop

* code style
test=develop

0cb60c70

03 8月, 2020 1 次提交

fix dump, fix cvm check (#25400) · d11c140e

由 xujiaqi01 提交于 8月 03, 2020

* fix dump, fix cvm check
test=develop

* fix
test=develop

* fix
test=develop

* fix
test=develop

d11c140e

19 5月, 2020 1 次提交

Random Dump (#24477) · 0ec3a42e

由 hutuxian 提交于 5月 19, 2020

* Refactor code for dump_field & dump_param: abstracting the common function in base class.
* Support dump randomly & random with lineid
* Support specify the random interval, which avoids printing too much logs.

0ec3a42e

01 4月, 2020 1 次提交
- X
  add fleet pslib pull and push sparse op and push dense op (#23139) · 3a45767d
  由 xujiaqi01 提交于 4月 01, 2020
```
* add fleet pslib pull and push sparse op and push dense op
* test=develop
```
  3a45767d
02 2月, 2020 1 次提交
- X
  add GeneralRoleMaker (#22295) · 371f377b
  由 xujiaqi01 提交于 2月 02, 2020
```
* add GeneralRoleMaker which is for general usage
* test=develop
```
  371f377b
18 12月, 2019 1 次提交
- X
  fix compiled error when with_pslib=on (#21769) · 0eb4d990
  由 xujiaqi01 提交于 12月 18, 2019
```
* fix compiled error of butil when with_pslib=on and with_testing=on
* test=develop
```
  0eb4d990
28 11月, 2019 1 次提交

remove -Wno-error=sign-compare, make warning as error (#21358) · c0656dcb

由 Tao Luo 提交于 11月 28, 2019

* remove -Wno-error=sign-compare, make warning as error

test=develop test=document_fix

* fix exist compile warning

test=develop

c0656dcb

15 10月, 2019 1 次提交

Fix communicator slow bug & fix communicator stop bug (#20366) · 940c6ff1

由 Chengmo 提交于 10月 15, 2019

* test=develop,Fix communicator slow bug

* test=develop, delete if() in stop_worker()

* test=develop

* fix UT, test=develop

* fix bug in fetch handler, test=develop

* fix bug in fetch handler, test=develop

* test=develop, fix fetch barrier bug

* test=develop, bug fix

* test=develop, bug fix

* test=develop, fix bug

940c6ff1

14 10月, 2019 1 次提交
- T
  dump fix dov vec file num (#20539) · f76a32df
  由 Thunderbrook 提交于 10月 14, 2019
```
* support dump multi file
test=develop

* dump fix num file
test=develop
```
  f76a32df
24 9月, 2019 1 次提交

support change shuffle and train thread num (#19841) · cedc0477

由 xujiaqi01 提交于 9月 24, 2019

* support change shuffle thread num
* support change train thread num
* fix receive shuffle data of each channel
* data norm stop gradient
* add check thread_tensor type and root_tensor type when merge metric
* remove sleep in shuffle, add config
* add config of pslib client to client communication
* fix xbox str
* add data norm op testcase
* add flush in trainer finalize

cedc0477

30 8月, 2019 1 次提交

add thread scope stat accurate metrics test=develop (#19480) · 10ca3f96

由 yaoxuefeng 提交于 8月 30, 2019

* add thread scope stat accurate metrics test=develop

* fix style

* fix style

* fix style

* fix style test=develop

* fix style test=develop

* fix style test=develop

* fix style test=develop

* fix style test=develop

* fix style test=develop

* fix style test=develop

* fix conflict

* fix style

* fix style test=develop

* fix error test=develop

* fix error test=develop

10ca3f96

29 8月, 2019 1 次提交

support debug each output of each ins (#19004) · 1fe468d3

由 Thunderbrook 提交于 8月 29, 2019

* dump slot

* test

* proto

* dump slot

* test

* proto

* code style

* code style

* code style

* style

* add delete after unseen days

* add unseen days

* code style

* conflict solve
test=develop

* add clear model

* code style
test=develop

* code style
test=develop

* support debug tensor of each ins
test=develop

* support debug tensor of each ins
test=develop

* learning rate

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style

* code style
test=develop

* code style
test=develop

* unitest

* style

* style

* multi phase

* add channel

* code style

* style

* style

* unitest

* style

* define

* define
test=develop

* style
test=develop

* rm define
test=develop

* linux

* linux
test=develop

* style
test=develop

* output format
test=develop

* windows ci
test=develop

1fe468d3

21 6月, 2019 1 次提交

dataset (#17973) · 3f8031e2

由 jiaqi 提交于 6月 21, 2019

(1) use channel instead of vector/BlockingQueue in Dataset，to keep same with existing implementation, and make code more readable and flexible (dataset single output channel or multi output channel). one previous memory out of limit problem is cause by not release memory after training.
(2) add Record because MultiSlotType costs too much memory (80B)，fix memory out of limit problem.
(3) add Channel, Archive in paddle/fluid/framework
(4) change dataset from shared_ptr to unique_ptr in pybind
(5) move create/destroy readers from trainer to dataset
(6) move shuffle from datafeed to dataset. dataset holds memory, datafeed is only for load data and feed data to network.
(7) fix thread num bug of Dataset when filelist size < thread num
(8) support set_queue_num in InMemoryDataset

3f8031e2

29 3月, 2019 15 次提交
- D
  move root_scope->DropKids() into Finalize() so that we do not have to drop all the kids · ba15d6b1
  由 dongdaxiang 提交于 3月 24, 2019
```
test=develop
```
  ba15d6b1
- X
  
  support multi dataset && add init model && fix bug · a5b1a0e1
  由 xujiaqi01 提交于 3月 20, 2019
  
  a5b1a0e1
- D
  
  add trainfileswithprofiler for downpour worker · 6af697ad
  由 dongdaxiang 提交于 3月 15, 2019
  
  6af697ad
- D
  add comment for MPI Symetric role maker · 2644b886
  由 dongdaxiang 提交于 3月 14, 2019
```
test=develop
```
  2644b886
- D
  
  add distributed optimizer factory · cf45c543
  由 dongdaxiang 提交于 3月 13, 2019
  
  cf45c543
- X
  
  fix bug && add DestroyReaders in trainer · 39449ba0
  由 xujiaqi01 提交于 3月 13, 2019
  
  39449ba0
- D
  refactor downpour optimization · 328f11b8
  由 dongdaxiang 提交于 3月 12, 2019
```
test=develop
```
  328f11b8
- D
  
  fix data reading bugs in api, add VLOG(3) log for setup · b66f0074
  由 dongdaxiang 提交于 3月 10, 2019
  
  b66f0074
- D
  
  make Dataset* as an argument · b415ec27
  由 dongdaxiang 提交于 3月 09, 2019
  
  b415ec27
- X
  
  modify c++ and python dataset related code & fix bug · dd67ad08
  由 xjqbest 提交于 3月 09, 2019
  
  dd67ad08
- D
  
  add RunFromDataset in executor · 24863897
  由 dongdaxiang 提交于 3月 08, 2019
  
  24863897
- X
  
  add DataSet and InMemoryDataFeed, support load data into memory and shuffle data · 824b84d1
  由 xjqbest 提交于 3月 06, 2019
  
  824b84d1
- D
  
  fix class register problem · 39014b9f
  由 dongdaxiang 提交于 2月 02, 2019
  
  39014b9f
- D
  refine device_worker and trainer code · c1650120
  由 dongdaxiang 提交于 2月 02, 2019
```
test=develop
```
  c1650120
- D
  add dist_multi_trainer for distributed training, add trainer_factory and... · 855bf579
  由 dongdaxiang 提交于 1月 28, 2019
```
add dist_multi_trainer for distributed training, add trainer_factory and device_worker_factory so that we can easily extend new training mode, add pull dense worker which is a singleton for parameter fetching
```
  855bf579

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致