Saving all trained params in a single file
Created by: sidgoyal78
Merging all params in a single file
For inference, we will to have 2 files, one for the programDesc
and one that has all the params together. We look at 1 approach to do this.
Understanding save/load ops (C++ side)
- From the model_format design doc, we see some details in the table but it is not super clear. So we will look at the implementation.
To understand the current serialization: we look at save_op
-
In
save_op
the main work is performed bySerializeToStream( <ofstream>, <framework::LoDTensor>, .. )
Code. This function saves a version number, size of LoD and actual LoD data. -
Then it calls,
SerializeToStream(<ofstream>, <Tensor> ..)
Code. This function saves a version number, tensor description as a serialized protobuf, and the actual data.
The corresponding load_op
basically does the deserialization accordingly (respecting the ordering in the save_op
).
Understanding how a model is saved (python api)
Now, we look at how the save/load works for saving actual model params, we look at the implementation of save_vars
in fluid. Code. We see that a new program is created with save
op is appended for each vars
which is persistable. Then the executor runs this program.
Approach
We basically make two assumptions:
- For both load/save, the order of iterating over the variables is the same. (This should hopefully be true)
- We don't worry about the
overwrite
option which is insave_op
.
While saving:
-
We basically store a
uint64_t
number in addition to the actual serialized bytes as in the originalsave
. This number will tell us about the size of the serialized LoDTensor in bytes. -
When the
save
is called for the first time, we will create a file, create a string that will have serialized LoDTensor data. Now we store the size of this string first in a fixed width (uint64_t
) number, and then store the string. -
When the
save
is called later, we basically go to the end of the file, and store 2 things: the size of the string and the string itself.
While loading:
-
We pass an additional attribute, in order to load the correct chunk of parameter. So we pass a counter value (which counts from 0 the relative order of the different params).
-
With this counter and the extra size information that we stored, we can hop to the appropriate part of the file, and read the chunk, and deserialize it.
For implementation, i think it will be better to have another op for this (rather than replacing the original save_op/load_op, so that is easier to debug, and i don't know the details of how the load_op and save_op are used in distributed version as of now).