@@ -12,24 +12,22 @@ The topology is saved as a plain text in a detailed self-contain protobuf file.
...
@@ -12,24 +12,22 @@ The topology is saved as a plain text in a detailed self-contain protobuf file.
The parameters are saved as a binary file. As we all know, the protobuf message has a limit of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We have done a [benchmark experiment](https://github.com/PaddlePaddle/Paddle/pull/4610), which shows that protobuf is not fit for the task.
The parameters are saved as a binary file. As we all know, the protobuf message has a limit of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We have done a [benchmark experiment](https://github.com/PaddlePaddle/Paddle/pull/4610), which shows that protobuf is not fit for the task.
As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of [LoDTensorDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99). We save the DescProto as the byte string header. It contains all the necessary information, such as the `dims`, the `name` of the tensor, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,
As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of [LoDTensorDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99). We save the DescProto as the byte string header. It contains all the necessary information, such as the `dims`, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,
<spanid="implementation"></span><h2>Implementation<aclass="headerlink"href="#implementation"title="Permalink to this headline">¶</a></h2>
<spanid="implementation"></span><h2>Implementation<aclass="headerlink"href="#implementation"title="Permalink to this headline">¶</a></h2>
<p>The topology is saved as a plain text in a detailed self-contain protobuf file.</p>
<p>The topology is saved as a plain text in a detailed self-contain protobuf file.</p>
<p>The parameters are saved as a binary file. As we all know, the protobuf message has a limit of <aclass="reference external"href="https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details">64M size</a>. We have done a <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/pull/4610">benchmark experiment</a>, which shows that protobuf is not fit for the task.</p>
<p>The parameters are saved as a binary file. As we all know, the protobuf message has a limit of <aclass="reference external"href="https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details">64M size</a>. We have done a <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/pull/4610">benchmark experiment</a>, which shows that protobuf is not fit for the task.</p>
<p>As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md">LoDTensor</a>, and has a description information proto of <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99">LoDTensorDesc</a>. We save the DescProto as the byte string header. It contains all the necessary information, such as the <codeclass="docutils literal"><spanclass="pre">dims</span></code>, the <codeclass="docutils literal"><spanclass="pre">name</span></code> of the tensor, and the <codeclass="docutils literal"><spanclass="pre">LoD</span></code> information in <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md">LoDTensor</a>. A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,</p>
<p>As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md">LoDTensor</a>, and has a description information proto of <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99">LoDTensorDesc</a>. We save the DescProto as the byte string header. It contains all the necessary information, such as the <codeclass="docutils literal"><spanclass="pre">dims</span></code>, and the <codeclass="docutils literal"><spanclass="pre">LoD</span></code> information in <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md">LoDTensor</a>. A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,</p>
| tensor data | void* | Tensor’s data in binary format. The length of <codeclass="docutils literal"><spanclass="pre">tensor_data</span></code> is decided by <codeclass="docutils literal"><spanclass="pre">TensorDesc.dims()</span></code> and <codeclass="docutils literal"><spanclass="pre">TensorDesc.data_type()</span></code> |
00100 1 bytes char TensorValue
| lod_level | uint64_t | Level of LoD |
00101 1 bytes char TensorValue
| length of lod[0] | uint64_t | [Optional] length of lod[0] in bytes. |
00102 1 bytes char TensorValue ..
| data of lod[0] | uint64_t* | [Optional] lod[0].data() |
...
| ... | ... | ... |</p>
</pre></div>
</div>
</div>
</div>
<divclass="section"id="summary">
<divclass="section"id="summary">
<spanid="summary"></span><h2>Summary<aclass="headerlink"href="#summary"title="Permalink to this headline">¶</a></h2>
<spanid="summary"></span><h2>Summary<aclass="headerlink"href="#summary"title="Permalink to this headline">¶</a></h2>
@@ -12,24 +12,22 @@ The topology is saved as a plain text in a detailed self-contain protobuf file.
...
@@ -12,24 +12,22 @@ The topology is saved as a plain text in a detailed self-contain protobuf file.
The parameters are saved as a binary file. As we all know, the protobuf message has a limit of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We have done a [benchmark experiment](https://github.com/PaddlePaddle/Paddle/pull/4610), which shows that protobuf is not fit for the task.
The parameters are saved as a binary file. As we all know, the protobuf message has a limit of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We have done a [benchmark experiment](https://github.com/PaddlePaddle/Paddle/pull/4610), which shows that protobuf is not fit for the task.
As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of [LoDTensorDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99). We save the DescProto as the byte string header. It contains all the necessary information, such as the `dims`, the `name` of the tensor, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,
As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of [LoDTensorDesc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99). We save the DescProto as the byte string header. It contains all the necessary information, such as the `dims`, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,
<p>The topology is saved as a plain text in a detailed self-contain protobuf file.</p>
<p>The topology is saved as a plain text in a detailed self-contain protobuf file.</p>
<p>The parameters are saved as a binary file. As we all know, the protobuf message has a limit of <aclass="reference external"href="https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details">64M size</a>. We have done a <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/pull/4610">benchmark experiment</a>, which shows that protobuf is not fit for the task.</p>
<p>The parameters are saved as a binary file. As we all know, the protobuf message has a limit of <aclass="reference external"href="https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details">64M size</a>. We have done a <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/pull/4610">benchmark experiment</a>, which shows that protobuf is not fit for the task.</p>
<p>As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md">LoDTensor</a>, and has a description information proto of <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99">LoDTensorDesc</a>. We save the DescProto as the byte string header. It contains all the necessary information, such as the <codeclass="docutils literal"><spanclass="pre">dims</span></code>, the <codeclass="docutils literal"><spanclass="pre">name</span></code> of the tensor, and the <codeclass="docutils literal"><spanclass="pre">LoD</span></code> information in <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md">LoDTensor</a>. A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,</p>
<p>As a result, we design a particular format for tensor serialization. By default, an arbitrary tensor in Paddle is a <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md">LoDTensor</a>, and has a description information proto of <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99">LoDTensorDesc</a>. We save the DescProto as the byte string header. It contains all the necessary information, such as the <codeclass="docutils literal"><spanclass="pre">dims</span></code>, and the <codeclass="docutils literal"><spanclass="pre">LoD</span></code> information in <aclass="reference external"href="https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md">LoDTensor</a>. A tensor stores values in a continuous memory buffer. For speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,</p>
| tensor data | void* | Tensor’s data in binary format. The length of <codeclass="docutils literal"><spanclass="pre">tensor_data</span></code> is decided by <codeclass="docutils literal"><spanclass="pre">TensorDesc.dims()</span></code> and <codeclass="docutils literal"><spanclass="pre">TensorDesc.data_type()</span></code> |
00100 1 bytes char TensorValue
| lod_level | uint64_t | Level of LoD |
00101 1 bytes char TensorValue
| length of lod[0] | uint64_t | [Optional] length of lod[0] in bytes. |
00102 1 bytes char TensorValue ..
| data of lod[0] | uint64_t* | [Optional] lod[0].data() |