Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • DeepSpeech
  • Issue
  • #309

D
DeepSpeech
  • 项目概览

PaddlePaddle / DeepSpeech
大约 2 年 前同步成功

通知 210
Star 8425
Fork 1598
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 245
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 3
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
D
DeepSpeech
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 245
    • Issue 245
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 3
    • 合并请求 3
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 3月 08, 2019 by saxon_zh@saxon_zhGuest

Alwasy oom problem, even if bath_size==1

Created by: myrainbowandsky

python3 -u ./DeepSpeech.py --train_files /home/sky-ai/xwt/DeepSpeech/data/train/train.csv --dev_files /home/sky-ai/xwt/DeepSpeech/data/cv/cv.csv --test_files /home/sky/sky-ai/xwt/DeepSpeech/data/test/test.csv --train_batch_size 24 --dev_batch_size 15 --test_batch_size 20 --epoch 20 --display_step 1 \ --validation_step 1 \ --dropout_rate 0.30 \ --default_stddev 0.046875 \ --learning_rate 0.0001 \ --log_level 0


2019-03-08 16:25:17.113383: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2019-03-08 16:25:17.213002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645 pciBusID: 0000:17:00.0 totalMemory: 10.92GiB freeMemory: 10.77GiB 2019-03-08 16:25:17.274068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties: name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645 pciBusID: 0000:65:00.0 totalMemory: 10.92GiB freeMemory: 10.57GiB 2019-03-08 16:25:17.274882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1 2019-03-08 16:25:17.936644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-08 16:25:17.936675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2019-03-08 16:25:17.936680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y 2019-03-08 16:25:17.936683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N 2019-03-08 16:25:17.937213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 10419 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1) 2019-03-08 16:25:17.937478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:1 with 10226 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) D Starting coordinator… D Coordinator started. Thread id 140459615708928 Preprocessing [’/home/sky-ai/xwt/DeepSpeech/data/train/train.csv’] Preprocessing done Preprocessing [’/home/sky-ai/xwt/DeepSpeech/data/cv/cv.csv’] Preprocessing done W Parameter --validation_step needs to be >0 for early stopping to work 2019-03-08 16:26:26.961329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1 2019-03-08 16:26:26.961413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-08 16:26:26.961419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2019-03-08 16:26:26.961424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y 2019-03-08 16:26:26.961428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N 2019-03-08 16:26:26.961956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10419 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:17:00.0, compute capability: 6.1) 2019-03-08 16:26:26.962068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10226 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1) 2019-03-08 16:26:28.980208: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:28.980297: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592 2019-03-08 16:26:28.980344: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 7730940928 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:28.980353: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 7730940928 2019-03-08 16:26:28.980392: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 6957846528 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:28.980402: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 6957846528 2019-03-08 16:26:28.980433: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 6262061568 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:28.980440: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 6262061568 2019-03-08 16:26:28.980464: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 5635855360 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:28.980471: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 5635855360 2019-03-08 16:26:28.980494: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 5072269824 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:28.980501: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 5072269824 2019-03-08 16:26:28.980526: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 4565042688 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:28.980532: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 4565042688 2019-03-08 16:26:28.980551: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 4108538368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:28.980556: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 4108538368 2019-03-08 16:26:28.980572: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 3697684480 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:28.980577: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 3697684480 2019-03-08 16:26:28.980602: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:28.980607: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592 2019-03-08 16:26:38.980783: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:38.980838: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592 2019-03-08 16:26:38.980875: E tensorflow/stream_executor/cuda/cuda_driver.cc:868] failed to alloc 8589934592 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2019-03-08 16:26:38.980886: W ./tensorflow/core/common_runtime/gpu/cuda_host_allocator.h:44] could not allocate pinned host memory of size: 8589934592 2019-03-08 16:26:38.980901: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (cuda_host_bfc) ran out of memory trying to allocate 3.33GiB. Current allocation summary follows. 2019-03-08 16:26:38.980918: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256): Total Chunks: 3, Chunks in use: 3. 768B allocated for chunks. 768B in use in bin. 12B client-requested in use in bin. 2019-03-08 16:26:38.980931: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.980942: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1024): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.980954: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.980964: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.980979: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8192): Total Chunks: 12, Chunks in use: 12. 96.0KiB allocated for chunks. 96.0KiB in use in bin. 96.0KiB client-requested in use in bin. 2019-03-08 16:26:38.980990: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.981003: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (32768): Total Chunks: 3, Chunks in use: 3. 96.0KiB allocated for chunks. 96.0KiB in use in bin. 96.0KiB client-requested in use in bin. 2019-03-08 16:26:38.981014: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.981025: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (131072): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.981036: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (262144): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.981049: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (524288): Total Chunks: 1, Chunks in use: 0. 831.2KiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.981061: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1048576): Total Chunks: 3, Chunks in use: 3. 5.00MiB allocated for chunks. 5.00MiB in use in bin. 5.00MiB client-requested in use in bin. 2019-03-08 16:26:38.981075: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2097152): Total Chunks: 3, Chunks in use: 3. 11.58MiB allocated for chunks. 11.58MiB in use in bin. 11.58MiB client-requested in use in bin. 2019-03-08 16:26:38.981086: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4194304): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.981097: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8388608): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.981110: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16777216): Total Chunks: 9, Chunks in use: 9. 144.00MiB allocated for chunks. 144.00MiB in use in bin. 144.00MiB client-requested in use in bin. 2019-03-08 16:26:38.981122: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (33554432): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.981132: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-03-08 16:26:38.981146: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (134217728): Total Chunks: 4, Chunks in use: 3. 524.24MiB allocated for chunks. 384.00MiB in use in bin. 384.00MiB client-requested in use in bin. 2019-03-08 16:26:38.981158: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (268435456): Total Chunks: 3, Chunks in use: 2. 7.33GiB allocated for chunks. 6.66GiB in use in bin. 6.66GiB client-requested in use in bin. 2019-03-08 16:26:38.981171: I tensorflow/core/common_runtime/bfc_allocator.cc:613] Bin for 3.33GiB was 256.00MiB, Chunk State: 2019-03-08 16:26:38.981186: I tensorflow/core/common_runtime/bfc_allocator.cc:619] Size: 684.82MiB | Requested Size: 0B | in_use: 0, prev: Size: 3.33GiB | Requested Size: 3.33GiB | in_use: 1 2019-03-08 16:26:38.981198: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fb956000000 of size 3576881152 2019-03-08 16:26:38.981207: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7fba2b32e000 of size 718086144 2019-03-08 16:26:38.981215: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fba74000000 of size 3576881152 2019-03-08 16:26:38.981224: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb4932e000 of size 134217728 2019-03-08 16:26:38.981232: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb5132e000 of size 134217728 2019-03-08 16:26:38.981240: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb5932e000 of size 134217728 2019-03-08 16:26:38.981248: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb6132e000 of size 1746688 2019-03-08 16:26:38.981256: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb614d8700 of size 1746688 2019-03-08 16:26:38.981264: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb61682e00 of size 1746688 2019-03-08 16:26:38.981273: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb6182d500 of size 4046848 2019-03-08 16:26:38.981281: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb61c09500 of size 4046848 2019-03-08 16:26:38.981289: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb61fe5500 of size 4046848 2019-03-08 16:26:38.981297: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb623c1500 of size 16777216 2019-03-08 16:26:38.981305: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb633c1500 of size 16777216 2019-03-08 16:26:38.981313: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb643c1500 of size 16777216 2019-03-08 16:26:38.981321: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb653c1500 of size 16777216 2019-03-08 16:26:38.981329: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb663c1500 of size 16777216 2019-03-08 16:26:38.981337: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb673c1500 of size 16777216 2019-03-08 16:26:38.981345: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb683c1500 of size 16777216 2019-03-08 16:26:38.981353: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb693c1500 of size 16777216 2019-03-08 16:26:38.981361: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbb6a3c1500 of size 16777216 2019-03-08 16:26:38.981369: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7fbb6b3c1500 of size 147057408 2019-03-08 16:26:38.981378: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14800000 of size 8192 2019-03-08 16:26:38.981386: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14802000 of size 256 2019-03-08 16:26:38.981394: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14802100 of size 8192 2019-03-08 16:26:38.981402: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14804100 of size 8192 2019-03-08 16:26:38.981410: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14806100 of size 8192 2019-03-08 16:26:38.981418: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14808100 of size 8192 2019-03-08 16:26:38.981426: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf1480a100 of size 8192 2019-03-08 16:26:38.981434: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf1480c100 of size 8192 2019-03-08 16:26:38.981442: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf1480e100 of size 8192 2019-03-08 16:26:38.981450: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14810100 of size 8192 2019-03-08 16:26:38.981458: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14812100 of size 8192 2019-03-08 16:26:38.981466: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14814100 of size 8192 2019-03-08 16:26:38.981474: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14816100 of size 8192 2019-03-08 16:26:38.981485: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14818100 of size 256 2019-03-08 16:26:38.981493: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14818200 of size 256 2019-03-08 16:26:38.981501: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14818300 of size 32768 2019-03-08 16:26:38.981509: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14820300 of size 32768 2019-03-08 16:26:38.981517: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fbf14828300 of size 32768 2019-03-08 16:26:38.981525: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7fbf14830300 of size 851200 2019-03-08 16:26:38.981533: I tensorflow/core/common_runtime/bfc_allocator.cc:638] Summary of in-use Chunks by size: 2019-03-08 16:26:38.981543: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 256 totalling 768B 2019-03-08 16:26:38.981553: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 12 Chunks of size 8192 totalling 96.0KiB 2019-03-08 16:26:38.981562: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 32768 totalling 96.0KiB 2019-03-08 16:26:38.981571: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 1746688 totalling 5.00MiB 2019-03-08 16:26:38.981581: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 4046848 totalling 11.58MiB 2019-03-08 16:26:38.981590: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 9 Chunks of size 16777216 totalling 144.00MiB 2019-03-08 16:26:38.981599: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 134217728 totalling 384.00MiB 2019-03-08 16:26:38.981608: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 3576881152 totalling 6.66GiB 2019-03-08 16:26:38.981617: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 7.19GiB 2019-03-08 16:26:38.981629: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: Limit: 68719476736 InUse: 7724988416 MaxInUse: 7724988416 NumAllocs: 38 MaxAllocSize: 3576881152

2019-03-08 16:26:38.981646: W tensorflow/core/common_runtime/bfc_allocator.cc:271] _______********* 2019-03-08 16:26:38.982719: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Resource exhausted: OOM when allocating tensor with shape[2048,436631] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cuda_host_bfc Traceback (most recent call last): File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1334, in _do_call return fn(*args) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2048,436631] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cuda_host_bfc [[{{node save_1/RestoreV2_1}} = RestoreV2dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, …, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[{{node save_1/RestoreV2_1/_43}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_48_save_1/RestoreV2_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “./DeepSpeech.py”, line 964, in tf.app.run(main) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run _sys.exit(main(argv)) File “./DeepSpeech.py”, line 916, in main train() File “./DeepSpeech.py”, line 549, in train config=Config.session_config) as session: File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 504, in MonitoredTrainingSession stop_grace_period_secs=stop_grace_period_secs) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 921, in init stop_grace_period_secs=stop_grace_period_secs) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 643, in init self._sess = _RecoverableSession(self._coordinated_creator) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1107, in init _WrappedSession.init(self, self._create_session()) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1112, in _create_session return self._sess_creator.create_session() File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 800, in create_session self.tf_sess = self._session_creator.create_session() File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 566, in create_session init_fn=self._scaffold.init_fn) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/session_manager.py”, line 288, in prepare_session config=config) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/session_manager.py”, line 218, in _restore_checkpoint saver.restore(sess, ckpt.model_checkpoint_path) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 1546, in restore {self.saver_def.filename_tensor_name: save_path}) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 929, in run run_metadata_ptr) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1152, in _run feed_dict_tensor, options, run_metadata) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1328, in _do_run run_metadata) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2048,436631] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cuda_host_bfc [[node save_1/RestoreV2_1 (defined at ./DeepSpeech.py:549) = RestoreV2dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, …, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[{{node save_1/RestoreV2_1/_43}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_48_save_1/RestoreV2_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op ‘save_1/RestoreV2_1’, defined at: File “./DeepSpeech.py”, line 964, in tf.app.run(main) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py”, line 125, in run _sys.exit(main(argv)) File “./DeepSpeech.py”, line 916, in main train() File “./DeepSpeech.py”, line 549, in train config=Config.session_config) as session: File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 504, in MonitoredTrainingSession stop_grace_period_secs=stop_grace_period_secs) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 921, in init stop_grace_period_secs=stop_grace_period_secs) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 643, in init self._sess = _RecoverableSession(self._coordinated_creator) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1107, in init _WrappedSession.init(self, self._create_session()) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 1112, in _create_session return self._sess_creator.create_session() File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 800, in create_session self.tf_sess = self._session_creator.create_session() File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 557, in create_session self._scaffold.finalize() File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/monitored_session.py”, line 213, in finalize self._saver = training_saver._get_saver_or_default() # pylint: disable=protected-access File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 886, in _get_saver_or_default saver = Saver(sharded=True, allow_empty=True) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 1102, in init self.build() File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 1114, in build self._build(self._filename, build_save=True, build_restore=True) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 1151, in _build build_save=build_save, build_restore=build_restore) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 789, in _build_internal restore_sequentially, reshape) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 459, in _AddShardedRestoreOps name=“restore_shard”)) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 406, in _AddRestoreOps restore_sequentially) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/training/saver.py”, line 862, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py”, line 1466, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper op_def=op_def) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py”, line 488, in new_func return func(*args, **kwargs) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 3274, in create_op op_def=op_def) File “/home/sky-ai/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py”, line 1770, in init self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2048,436631] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cuda_host_bfc [[node save_1/RestoreV2_1 (defined at ./DeepSpeech.py:549) = RestoreV2dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, …, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[{{node save_1/RestoreV2_1/_43}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_48_save_1/RestoreV2_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


No matter how I reduced the train and CV data compatibility, it always said this OMM problem. I’ve tried batch_size 1 but all the same. I used 16GB memory, 2*1080Ti,i7. No other program was running.

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/DeepSpeech#309
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7