未验证 提交 a3e77197 编写于 作者: L liuyuhui 提交者: GitHub

[Kunlun]fix multi xpu dygraph hang, test=kunlun (#32662)

上级 0f578db9
......@@ -762,10 +762,11 @@ void Reducer::MarkGroupReady(size_t group_index) {
// TODO(liuyuhui): Add try catch to deal with exception later,
// otherwise the main thread will continue to run when an exception is
// thrown in comm_pool_.
comm_pool_->enqueue([&] {
auto next_group = next_group_;
comm_pool_->enqueue([this, run_order, next_group, &group] {
auto dev_id = BOOST_GET_CONST(platform::XPUPlace, place_).device;
platform::SetXPUDeviceId(dev_id);
FusedAllReduceSchedule(run_order, group, next_group_);
FusedAllReduceSchedule(run_order, group, next_group);
{
std::lock_guard<std::mutex> lock(mutex_);
comm_op_count_ -= 1; // lock
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册