参数用单个文件保存后,分别用save_persistables/save_params保存下来的文件,不能再通过load_vars混用
Created by: Vonderland
原来项目默认使用了保存时每个参数一个文件的方式,分别使用fluid.io.save_persistables来保存成checkpoint以及fluid.io.save_params来保存参数。加载时统一通过fluid.io.load_vars来加载,通过load_vars参数中具体predicate为existed_persitables和existed_params两种不同的函数来决定加载成checkpoint还是参数。在原来用每个参数一个文件的保存方式情况下,load_vars可以混用,也就是说不管用哪种方式来保存,都可以通过existed_persitables和existed_params的方式来加载保存的文件。
目前项目参数保存方式调整成了所有参数保存成一个文件,这种情况下加载时predicate为existed_persitables和existed_params,两种情况不能混用。也就是说,使用fluid.io.save_persistables保存的文件,必须通过load_vars指定predicate为existed_persitables加载,fluid.io.save_params保存的文件load_vars必须指定predicate为existed_params。这种情况下使用十分不便:checkpoint不能作为参数使用了。
问题的日志如下(省略了项目部分):
- param作为checkpoint来加载: [libprotobuf ERROR /paddle/build/third_party/protobuf/src/extern_protobuf/src/google/protobuf/message_lite.cc:119] Can't parse message of type "paddle.framework.proto.VarType.TensorDesc" because it is missing required fields: data_type /home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") .... File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 751, in load_vars filename=filename) File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 805, in load_vars executor.run(load_prog) File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/home/ol/anaconda3/lib/python3.7/site-packages/six.py", line 693, in reraise raise value File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1154, in _run_impl use_program_cache=use_program_cache) File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1229, in _run_program fetch_var_name) paddle.fluid.core_avx.EnforceNotMet:
Error Message Summary:
InvalidArgumentError: Cannot parse tensor desc [Hint: Expected desc.ParseFromArray(buf.get(), size) == true, but received desc.ParseFromArray(buf.get(), size):0 != true:1.] at (/paddle/paddle/fluid/framework/tensor_util.cu:527) [operator < load_combine > error]
- checkpoint作为param来加载: /home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 751, in load_vars filename=filename) File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/io.py", line 805, in load_vars executor.run(load_prog) File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/home/ol/anaconda3/lib/python3.7/site-packages/six.py", line 693, in reraise raise value File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1154, in _run_impl use_program_cache=use_program_cache) File "/home/ol/anaconda3/lib/python3.7/site-packages/paddle/fluid/executor.py", line 1229, in _run_program fetch_var_name) paddle.fluid.core_avx.EnforceNotMet: UnavailableError: Not allowed to load partial data via load_combine_op, please use load_op instead. [Hint: Expected buffer->eof() == true, but received buffer->eof():0 != true:1.] at (/paddle/paddle/fluid/operators/load_combine_op.h:115) [operator < load_combine > error]
请问这是已知问题吗?应该如何处理? 期待回复~