未验证 提交 b969c32a 编写于 作者: C Chen Weihang 提交者: GitHub

fix occupied 0 device memory bug (#28771)

上级 d6aee759
...@@ -104,6 +104,12 @@ void BufferedReader::ReadAsync(size_t i) { ...@@ -104,6 +104,12 @@ void BufferedReader::ReadAsync(size_t i) {
std::vector<void *> cuda_pinned_ptrs; std::vector<void *> cuda_pinned_ptrs;
cuda_pinned_ptrs.reserve(cpu.size()); cuda_pinned_ptrs.reserve(cpu.size());
platform::RecordEvent record_event("BufferedReader:MemoryCopy"); platform::RecordEvent record_event("BufferedReader:MemoryCopy");
// NODE(chenwehiang): When we use CUDAPinned Memory, we need call
// cudaHostAlloc, that is a CUDA API, calling CUDA API need load
// cuda lib into device, it will cost hundreds of MB of GPU memory.
// If we don't set Device here, which will use CUDAPlace(0) default.
platform::SetDeviceId(
BOOST_GET_CONST(platform::CUDAPlace, place_).device);
for (size_t i = 0; i < cpu.size(); ++i) { for (size_t i = 0; i < cpu.size(); ++i) {
if (platform::is_cpu_place(cpu[i].place())) { if (platform::is_cpu_place(cpu[i].place())) {
cuda[i].Resize(cpu[i].dims()); cuda[i].Resize(cpu[i].dims());
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册