未验证 提交 df39d157 编写于 作者: F Fan Zhang 提交者: GitHub

bugfix - modify ps_gpu_wrapper.cc by pre-commit (#42376)

* XPUPS Adaptation (#40991)

* Adapt XPUPS - 1st version - 3.24

* Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24

* Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25

* refactor heter comm kernel

* update. test=develop

* Adapt XPUPS - modify by compilation - 4th version - 3.27

* update calc_shard_offset. test=develop

* update xpu kernel. test=develop

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* heter_comm update

* heter_comm update

* update calc_shard_offset. test=develop

* heter_comm update

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30

* update. test=develop

* update pslib.cmake

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* Adapt XPUPS - modify by kp compilation  - 6th version - 3.30

* update. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* used by minxu

* update heter_comm_inl

* fix. test=develop

* Adapt XPUPS - modify by kp compilation  - 7th version - 3.30

* fix. test=develop

* add optimizer kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 3.31 update

* Adapt XPUPS - update kp compilation path  - 8th version - 3.31

* add optimizer kernel. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm.h 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update hashtable. test=develop

* update. test=develop

* Adapt XPUPS - update by kp compilation  - 9th version - 4.1

* update hashtable. test=develop

* fix. test=develop

* update hashtable 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 10th version - 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* modify by compilation 4.1

* update. test=develop

* update. test=develop

* fix. test=develop

* modify by compilation 4.1

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1 19:30

* fix. test=develop

* update ps_gpu_wrapper.kps 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 11th version - 4.1

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 12nd version - 4.2

* fix. test=develop

* fix. test=develop

* modify by compilation 4.2

* 4.2 update

* fix. test=develop

* template init. test=develop

* update 4.6

* fix. test=develop

* template init. test=develop

* 4.6 modify by compilation

* hashtable template init. test=develop

* hashtable template init. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 13nd version - 4.7

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.11 update

* fix. test=develop

* fix. test=develop

* 4.11 update

* update by pre-commit

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.12 update

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 14th version - 4.13

* 4.13 update

* 4.14 update

* 4.14 update

* 4.14 update

* 4.14 modify by merged latest compilation

* retry CI 4.14

* 4.15 pass static check

* 4.15 modify by gpups CI

* 3.16 update by gpups CI - modify ps_gpu_wrapper.h

* 4.16 update

* 4.16 pass xpu compile

* 4.16 retry CI

* 4.16 update
Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>

* modify ps_gpu_wrapper.cc

* update

* Adapt BKCL comm for XPUPS (#42168)

* Adapt XPUPS - 1st version - 3.24

* Adapt XPUPS - update XPU PushSparse -  2nd version - 3.24

* Adapt XPUPS - add XPU PullSparseOp - 3nd version - 3.25

* refactor heter comm kernel

* update. test=develop

* Adapt XPUPS - modify by compilation - 4th version - 3.27

* update calc_shard_offset. test=develop

* update xpu kernel. test=develop

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* heter_comm update

* heter_comm update

* update calc_shard_offset. test=develop

* heter_comm update

* update args of calc_shard_offset

* update. test=develop

* remove customGradMerger

* update. test=develop

* fix. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* Adapt XPUPS - use WITH_XPU_KP and modify wrapper kernel function - 5th version - 3.30

* update. test=develop

* update pslib.cmake

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* Adapt XPUPS - modify by kp compilation  - 6th version - 3.30

* update. test=develop

* update. test=develop

* update. test=develop

* update optimizer kernel

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* update. test=develop

* fix. test=develop

* fix. test=develop

* used by minxu

* update heter_comm_inl

* fix. test=develop

* Adapt XPUPS - modify by kp compilation  - 7th version - 3.30

* fix. test=develop

* add optimizer kernel. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 3.31 update

* Adapt XPUPS - update kp compilation path  - 8th version - 3.31

* add optimizer kernel. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix kunlun not support size_t. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* update heter_comm_kernel.kps 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update heter_comm.h 3.31

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update hashtable. test=develop

* update. test=develop

* Adapt XPUPS - update by kp compilation  - 9th version - 4.1

* update hashtable. test=develop

* fix. test=develop

* update hashtable 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 10th version - 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* update. test=develop

* modify by compilation 4.1

* update. test=develop

* update. test=develop

* fix. test=develop

* modify by compilation 4.1

* update. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* modify by compilation 4.1 19:30

* fix. test=develop

* update ps_gpu_wrapper.kps 4.1

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 11th version - 4.1

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 12nd version - 4.2

* fix. test=develop

* fix. test=develop

* modify by compilation 4.2

* 4.2 update

* fix. test=develop

* template init. test=develop

* update 4.6

* fix. test=develop

* template init. test=develop

* 4.6 modify by compilation

* hashtable template init. test=develop

* hashtable template init. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=devlop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 13nd version - 4.7

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.11 update

* fix. test=develop

* fix. test=develop

* 4.11 update

* update by pre-commit

* fix. test=develop

* fix. test=develop

* fix. test=develop

* fix. test=develop

* 4.12 update

* fix. test=develop

* Adapt XPUPS - update by kp compilation  - 14th version - 4.13

* 4.13 update

* 4.14 update

* 4.14 update

* 4.14 update

* 4.14 modify by merged latest compilation

* retry CI 4.14

* 4.15 pass static check

* 4.15 modify by gpups CI

* 3.16 update by gpups CI - modify ps_gpu_wrapper.h

* 4.16 update

* 4.16 pass xpu compile

* 4.16 retry CI

* 4.16 update

* Adapt XPUPS - adapt BKCL comm for XPUPS - 4.24

* update by compilation

* Adapt XPUPS - register PSGPUTrainer for XPUPS - 4.25

* update device_worker_factory
Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>

* update

* update CMakeLists

* bugfix heter_resource 4.28

* modify ps_gpu_wrapper
Co-authored-by: Nzmxdream <zhangminxu01@baidu.com>
上级 e0e534ab
...@@ -497,114 +497,113 @@ void PSGPUWrapper::BuildPull(std::shared_ptr<HeterContext> gpu_task) { ...@@ -497,114 +497,113 @@ void PSGPUWrapper::BuildPull(std::shared_ptr<HeterContext> gpu_task) {
} }
#endif #endif
}; };
auto build_func = auto build_func = [device_num, record_status, &pass_values, &local_keys,
[device_num, record_status, &pass_values, &local_keys, &local_ptr, &local_ptr, &device_keys, &device_vals,
&device_keys, &device_vals, &device_mutex](int i) { &device_mutex](int i) {
std::vector<std::vector<FeatureKey>> task_keys(device_num); std::vector<std::vector<FeatureKey>> task_keys(device_num);
#ifdef PADDLE_WITH_PSLIB #ifdef PADDLE_WITH_PSLIB
std::vector<std::vector<paddle::ps::DownpourFixedFeatureValue*>> std::vector<std::vector<paddle::ps::DownpourFixedFeatureValue*>> task_ptrs(
task_ptrs(device_num); device_num);
#endif #endif
#ifdef PADDLE_WITH_PSCORE #ifdef PADDLE_WITH_PSCORE
std::vector<std::vector<paddle::distributed::FixedFeatureValue*>> std::vector<std::vector<paddle::distributed::FixedFeatureValue*>> task_ptrs(
task_ptrs(device_num); device_num);
#endif #endif
for (size_t j = 0; j < local_keys[i].size(); j++) { for (size_t j = 0; j < local_keys[i].size(); j++) {
int shard = local_keys[i][j] % device_num; int shard = local_keys[i][j] % device_num;
task_keys[shard].push_back(local_keys[i][j]); task_keys[shard].push_back(local_keys[i][j]);
task_ptrs[shard].push_back(local_ptr[i][j]); task_ptrs[shard].push_back(local_ptr[i][j]);
} }
#ifdef PADDLE_WITH_PSLIB #ifdef PADDLE_WITH_PSLIB
if (record_status) { if (record_status) {
size_t local_keys_size = local_keys.size(); size_t local_keys_size = local_keys.size();
size_t pass_values_size = pass_values.size(); size_t pass_values_size = pass_values.size();
for (size_t j = 0; j < pass_values_size; j += local_keys_size) { for (size_t j = 0; j < pass_values_size; j += local_keys_size) {
auto& shard_values = pass_values[j]; auto& shard_values = pass_values[j];
for (size_t pair_idx = 0; pair_idx < pass_values[j].size(); for (size_t pair_idx = 0; pair_idx < pass_values[j].size();
pair_idx++) { pair_idx++) {
auto& cur_pair = shard_values[pair_idx]; auto& cur_pair = shard_values[pair_idx];
int shard = cur_pair.first % device_num; int shard = cur_pair.first % device_num;
task_keys[shard].push_back(cur_pair.first); task_keys[shard].push_back(cur_pair.first);
task_ptrs[shard].push_back( task_ptrs[shard].push_back(
(paddle::ps::DownpourFixedFeatureValue*)cur_pair.second); (paddle::ps::DownpourFixedFeatureValue*)cur_pair.second);
}
}
} }
}
}
#endif #endif
for (int dev = 0; dev < device_num; dev++) { for (int dev = 0; dev < device_num; dev++) {
device_mutex[dev]->lock(); device_mutex[dev]->lock();
int len = task_keys[dev].size(); int len = task_keys[dev].size();
int cur = device_keys[dev].size(); int cur = device_keys[dev].size();
device_keys[dev].resize(device_keys[dev].size() + len); device_keys[dev].resize(device_keys[dev].size() + len);
device_vals[dev].resize(device_vals[dev].size() + len); device_vals[dev].resize(device_vals[dev].size() + len);
#ifdef PADDLE_WITH_PSLIB #ifdef PADDLE_WITH_PSLIB
for (int j = 0; j < len; ++j) { for (int j = 0; j < len; ++j) {
device_keys[dev][cur + j] = task_keys[dev][j]; device_keys[dev][cur + j] = task_keys[dev][j];
float* ptr_val = task_ptrs[dev][j]->data(); float* ptr_val = task_ptrs[dev][j]->data();
FeatureValue& val = device_vals[dev][cur + j]; FeatureValue& val = device_vals[dev][cur + j];
size_t dim = task_ptrs[dev][j]->size(); size_t dim = task_ptrs[dev][j]->size();
val.delta_score = ptr_val[1]; val.delta_score = ptr_val[1];
val.show = ptr_val[2]; val.show = ptr_val[2];
val.clk = ptr_val[3]; val.clk = ptr_val[3];
val.slot = ptr_val[6]; val.slot = ptr_val[6];
val.lr = ptr_val[4]; val.lr = ptr_val[4];
val.lr_g2sum = ptr_val[5]; val.lr_g2sum = ptr_val[5];
val.cpu_ptr = (uint64_t)(task_ptrs[dev][j]); val.cpu_ptr = (uint64_t)(task_ptrs[dev][j]);
if (dim > 7) { if (dim > 7) {
val.mf_size = MF_DIM + 1; val.mf_size = MF_DIM + 1;
for (int x = 0; x < val.mf_size; x++) { for (int x = 0; x < val.mf_size; x++) {
val.mf[x] = ptr_val[x + 7]; val.mf[x] = ptr_val[x + 7];
}
} else {
val.mf_size = 0;
for (int x = 0; x < MF_DIM + 1; x++) {
val.mf[x] = 0;
}
}
} }
} else {
val.mf_size = 0;
for (int x = 0; x < MF_DIM + 1; x++) {
val.mf[x] = 0;
}
}
}
#endif #endif
#ifdef PADDLE_WITH_PSCORE #ifdef PADDLE_WITH_PSCORE
for (int j = 0; j < len; ++j) { for (int j = 0; j < len; ++j) {
device_keys[dev][cur + j] = task_keys[dev][j]; device_keys[dev][cur + j] = task_keys[dev][j];
float* ptr_val = task_ptrs[dev][j]->data(); float* ptr_val = task_ptrs[dev][j]->data();
FeatureValue& val = device_vals[dev][cur + j]; FeatureValue& val = device_vals[dev][cur + j];
size_t dim = task_ptrs[dev][j]->size(); size_t dim = task_ptrs[dev][j]->size();
val.delta_score = ptr_val[2]; val.delta_score = ptr_val[2];
val.show = ptr_val[3]; val.show = ptr_val[3];
val.clk = ptr_val[4]; val.clk = ptr_val[4];
val.slot = ptr_val[0]; val.slot = ptr_val[0];
val.lr = ptr_val[5]; val.lr = ptr_val[5];
val.lr_g2sum = ptr_val[6]; val.lr_g2sum = ptr_val[6];
val.cpu_ptr = (uint64_t)(task_ptrs[dev][j]); val.cpu_ptr = (uint64_t)(task_ptrs[dev][j]);
if (dim > 7) { if (dim > 7) {
val.mf_size = MF_DIM + 1; val.mf_size = MF_DIM + 1;
for (int x = 0; x < val.mf_size; x++) { for (int x = 0; x < val.mf_size; x++) {
val.mf[x] = ptr_val[x + 7]; val.mf[x] = ptr_val[x + 7];
} }
} else { } else {
val.mf_size = 0; val.mf_size = 0;
for (int x = 0; x < MF_DIM + 1; x++) { for (int x = 0; x < MF_DIM + 1; x++) {
val.mf[x] = 0; val.mf[x] = 0;
}
}
} }
#endif
VLOG(3) << "GpuPs build hbmps done";
} }
} }
#endif
VLOG(3) << "GpuPs build hbmps done";
}
};
if (!multi_mf_dim_) { if (!multi_mf_dim_) {
for (size_t i = 0; i < threads.size(); i++) { for (size_t i = 0; i < threads.size(); i++) {
threads[i] = std::thread(build_func, i); threads[i] = std::thread(build_func, i);
} }
} } else {
else {
for (int i = 0; i < thread_keys_shard_num_; i++) { for (int i = 0; i < thread_keys_shard_num_; i++) {
for (int j = 0; j < multi_mf_dim_; j++) { for (int j = 0; j < multi_mf_dim_; j++) {
threads[i * multi_mf_dim_ + j] = threads[i * multi_mf_dim_ + j] =
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册