<!DOCTYPE html> <!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]--> <!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]--> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>参数设置 — PaddlePaddle 文档</title> <link rel="stylesheet" href="../../_static/css/theme.css" type="text/css" /> <link rel="index" title="索引" href="../../genindex.html"/> <link rel="search" title="搜索" href="../../search.html"/> <link rel="top" title="PaddlePaddle 文档" href="../../index.html"/> <link rel="up" title="FAQ" href="../index_cn.html"/> <link rel="next" title="本地训练与预测" href="../local/index_cn.html"/> <link rel="prev" title="模型配置" href="../model/index_cn.html"/> <link rel="stylesheet" href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" type="text/css" /> <link rel="stylesheet" href="../../_static/css/override.css" type="text/css" /> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "//hm.baidu.com/hm.js?b9a314ab40d04d805655aab1deee08ba"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> <script src="../../_static/js/modernizr.min.js"></script> </head> <body class="wy-body-for-nav" role="document"> <header class="site-header"> <div class="site-logo"> <a href="/"><img src="../../_static/images/PP_w.png"></a> </div> <div class="site-nav-links"> <div class="site-menu"> <a class="fork-on-github" href="https://github.com/PaddlePaddle/Paddle" target="_blank"><i class="fa fa-github"></i>Fork me on Github</a> <div class="language-switcher dropdown"> <a type="button" data-toggle="dropdown"> <span>English</span> <i class="fa fa-angle-up"></i> <i class="fa fa-angle-down"></i> </a> <ul class="dropdown-menu"> <li><a href="/doc_cn">中文</a></li> <li><a href="/doc">English</a></li> </ul> </div> <ul class="site-page-links"> <li><a href="/">Home</a></li> </ul> </div> <div class="doc-module"> <ul class="current"> <li class="toctree-l1"><a class="reference internal" href="../../getstarted/index_cn.html">新手入门</a></li> <li class="toctree-l1"><a class="reference internal" href="../../howto/index_cn.html">进阶指南</a></li> <li class="toctree-l1"><a class="reference internal" href="../../api/index_cn.html">API</a></li> <li class="toctree-l1 current"><a class="reference internal" href="../index_cn.html">FAQ</a></li> </ul> <div role="search"> <form id="rtd-search-form" class="wy-form" action="../../search.html" method="get"> <input type="text" name="q" placeholder="Search docs" /> <input type="hidden" name="check_keywords" value="yes" /> <input type="hidden" name="area" value="default" /> </form> </div> </div> </div> </header> <div class="main-content-wrap"> <nav class="doc-menu-vertical" role="navigation"> <ul class="current"> <li class="toctree-l1"><a class="reference internal" href="../../getstarted/index_cn.html">新手入门</a><ul> <li class="toctree-l2"><a class="reference internal" href="../../getstarted/build_and_install/index_cn.html">安装与编译</a><ul> <li class="toctree-l3"><a class="reference internal" href="../../getstarted/build_and_install/pip_install_cn.html">使用pip安装</a></li> <li class="toctree-l3"><a class="reference internal" href="../../getstarted/build_and_install/docker_install_cn.html">使用Docker安装运行</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/dev/build_cn.html">用Docker编译和测试PaddlePaddle</a></li> <li class="toctree-l3"><a class="reference internal" href="../../getstarted/build_and_install/build_from_source_cn.html">从源码编译</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../../getstarted/concepts/use_concepts_cn.html">基本使用概念</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="../../howto/index_cn.html">进阶指南</a><ul> <li class="toctree-l2"><a class="reference internal" href="../../howto/usage/cmd_parameter/index_cn.html">设置命令行参数</a><ul> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cmd_parameter/use_case_cn.html">使用案例</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cmd_parameter/arguments_cn.html">参数概述</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cmd_parameter/detail_introduction_cn.html">细节描述</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../../howto/usage/cluster/cluster_train_cn.html">分布式训练</a><ul> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cluster/fabric_cn.html">fabric集群</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cluster/openmpi_cn.html">openmpi集群</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cluster/k8s_cn.html">kubernetes单机</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cluster/k8s_distributed_cn.html">kubernetes distributed分布式</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/cluster/k8s_aws_cn.html">AWS上运行kubernetes集群训练</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../../howto/usage/capi/index_cn.html">PaddlePaddle C-API</a><ul> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/capi/compile_paddle_lib_cn.html">编译 PaddlePaddle 预测库</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/capi/organization_of_the_inputs_cn.html">输入/输出数据组织</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/usage/capi/workflow_of_capi_cn.html">C-API 使用流程</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../../howto/dev/contribute_to_paddle_cn.html">如何贡献代码</a></li> <li class="toctree-l2"><a class="reference internal" href="../../howto/dev/write_docs_cn.html">如何贡献/修改文档</a></li> <li class="toctree-l2"><a class="reference internal" href="../../howto/deep_model/rnn/index_cn.html">RNN相关模型</a><ul> <li class="toctree-l3"><a class="reference internal" href="../../howto/deep_model/rnn/rnn_config_cn.html">RNN配置</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/deep_model/rnn/recurrent_group_cn.html">Recurrent Group教程</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/deep_model/rnn/hierarchical_layer_cn.html">支持双层序列作为输入的Layer</a></li> <li class="toctree-l3"><a class="reference internal" href="../../howto/deep_model/rnn/hrnn_rnn_api_compare_cn.html">单双层RNN API对比介绍</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../../howto/optimization/gpu_profiling_cn.html">GPU性能分析与调优</a></li> </ul> </li> <li class="toctree-l1"><a class="reference internal" href="../../api/index_cn.html">API</a><ul> <li class="toctree-l2"><a class="reference internal" href="../../api/v2/model_configs.html">模型配置</a><ul> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/activation.html">Activation</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/layer.html">Layers</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/evaluators.html">Evaluators</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/optimizer.html">Optimizer</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/pooling.html">Pooling</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/networks.html">Networks</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/config/attr.html">Parameter Attribute</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../../api/v2/data.html">数据访问</a><ul> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/data/data_reader.html">Data Reader Interface</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/data/image.html">Image Interface</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/data/dataset.html">Dataset</a></li> </ul> </li> <li class="toctree-l2"><a class="reference internal" href="../../api/v2/run_logic.html">训练与应用</a></li> <li class="toctree-l2"><a class="reference internal" href="../../api/v2/fluid.html">Fluid</a><ul> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/layers.html">layers</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/data_feeder.html">data_feeder</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/executor.html">executor</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/initializer.html">initializer</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/evaluator.html">evaluator</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/nets.html">nets</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/optimizer.html">optimizer</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/param_attr.html">param_attr</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/profiler.html">profiler</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/regularizer.html">regularizer</a></li> <li class="toctree-l3"><a class="reference internal" href="../../api/v2/fluid/io.html">io</a></li> </ul> </li> </ul> </li> <li class="toctree-l1 current"><a class="reference internal" href="../index_cn.html">FAQ</a><ul class="current"> <li class="toctree-l2"><a class="reference internal" href="../build_and_install/index_cn.html">编译安装与单元测试</a></li> <li class="toctree-l2"><a class="reference internal" href="../model/index_cn.html">模型配置</a></li> <li class="toctree-l2 current"><a class="current reference internal" href="#">参数设置</a></li> <li class="toctree-l2"><a class="reference internal" href="../local/index_cn.html">本地训练与预测</a></li> <li class="toctree-l2"><a class="reference internal" href="../cluster/index_cn.html">集群训练与预测</a></li> </ul> </li> </ul> </nav> <section class="doc-content-wrap"> <div role="navigation" aria-label="breadcrumbs navigation"> <ul class="wy-breadcrumbs"> <li><a href="../index_cn.html">FAQ</a> > </li> <li>参数设置</li> </ul> </div> <div class="wy-nav-content" id="doc-content"> <div class="rst-content"> <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article"> <div itemprop="articleBody"> <div class="section" id="id1"> <h1><a class="toc-backref" href="#id6">参数设置</a><a class="headerlink" href="#id1" title="永久链接至标题">¶</a></h1> <div class="contents topic" id="contents"> <p class="topic-title first">Contents</p> <ul class="simple"> <li><a class="reference internal" href="#id1" id="id6">参数设置</a><ul> <li><a class="reference internal" href="#sgd" id="id7">1. 如何选择SGD算法的学习率</a></li> <li><a class="reference internal" href="#learning-rate-annealing" id="id8">2. 如何设置学习率退火(learning rate annealing)</a></li> <li><a class="reference internal" href="#id2" id="id9">3. 如何初始化参数</a></li> <li><a class="reference internal" href="#id3" id="id10">4. 如何共享参数</a></li> <li><a class="reference internal" href="#id4" id="id11">5. 如何加载预训练参数</a></li> <li><a class="reference internal" href="#id5" id="id12">6. 存储的参数格式是什么,如何和明文进行相互转化</a></li> <li><a class="reference internal" href="#a-protocol-message-was-rejected-because-it-was-too-big" id="id13">7. A protocol message was rejected because it was too big</a></li> </ul> </li> </ul> </div> <div class="section" id="sgd"> <h2><a class="toc-backref" href="#id7">1. 如何选择SGD算法的学习率</a><a class="headerlink" href="#sgd" title="永久链接至标题">¶</a></h2> <p>在采用sgd/async_sgd进行训练时,一个重要的问题是选择正确的learning_rate。如果learning_rate太大,那么训练有可能不收敛,如果learning_rate太小,那么收敛可能很慢,导致训练时间过长。</p> <p>通常做法是从一个比较大的learning_rate开始试,如果不收敛,那减少学习率10倍继续试验,直到训练收敛为止。那么如何判断训练不收敛呢?可以估计出如果模型采用不变的输出最小的cost0是多少。</p> <p>如果训练过程的的cost明显高于这个常数输出的cost,那么我们可以判断为训练不收敛。举一个例子,假如我们是三分类问题,采用multi-class-cross-entropy作为cost,数据中0,1,2三类的比例为 <code class="code docutils literal"><span class="pre">0.2,</span> <span class="pre">0.5,</span> <span class="pre">0.3</span></code> , 那么常数输出所能达到的最小cost是 <code class="code docutils literal"><span class="pre">-(0.2*log(0.2)+0.5*log(0.5)+0.3*log(0.3))=1.03</span></code> 。如果训练一个pass(或者更早)后,cost还大于这个数,那么可以认为训练不收敛,应该降低学习率。</p> </div> <div class="section" id="learning-rate-annealing"> <h2><a class="toc-backref" href="#id8">2. 如何设置学习率退火(learning rate annealing)</a><a class="headerlink" href="#learning-rate-annealing" title="永久链接至标题">¶</a></h2> <p>在相应的优化算法里设置learning_rate_schedule及相关参数,以使用Adam算法为例,代码如下:</p> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">optimizer</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">optimizer</span><span class="o">.</span><span class="n">Adam</span><span class="p">(</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">1e-3</span><span class="p">,</span> <span class="n">learning_rate_decay_a</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span> <span class="n">learning_rate_decay_b</span><span class="o">=</span><span class="mf">0.75</span><span class="p">,</span> <span class="n">learning_rate_schedule</span><span class="o">=</span><span class="s2">"poly"</span><span class="p">,)</span> </pre></div> </div> <p>PaddlePaddle目前支持8种learning_rate_schedule,这8种learning_rate_schedule及其对应学习率计算方式如下:</p> <ul> <li><p class="first">“constant”</p> <p>lr = learning_rate</p> </li> <li><p class="first">“poly”</p> <p>lr = learning_rate * pow(1 + learning_rate_decay_a * num_samples_processed, -learning_rate_decay_b)</p> <p>其中,num_samples_processed为已训练样本数,下同。</p> </li> <li><p class="first">“caffe_poly”</p> <p>lr = learning_rate * pow(1.0 - num_samples_processed / learning_rate_decay_a, learning_rate_decay_b)</p> </li> <li><p class="first">“exp”</p> <p>lr = learning_rate * pow(learning_rate_decay_a, num_samples_processed / learning_rate_decay_b)</p> </li> <li><p class="first">“discexp”</p> <p>lr = learning_rate * pow(learning_rate_decay_a, floor(num_samples_processed / learning_rate_decay_b))</p> </li> <li><p class="first">“linear”</p> <p>lr = max(learning_rate - learning_rate_decay_a * num_samples_processed, learning_rate_decay_b)</p> </li> <li><p class="first">“manual”</p> <p>这是一种按已训练样本数分段取值的学习率退火方法。使用该learning_rate_schedule时,用户通过参数 <code class="code docutils literal"><span class="pre">learning_rate_args</span></code> 设置学习率衰减因子分段函数,当前的学习率为所设置 <code class="code docutils literal"><span class="pre">learning_rate</span></code> 与当前的衰减因子的乘积。以使用Adam算法为例,代码如下:</p> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">optimizer</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">optimizer</span><span class="o">.</span><span class="n">Adam</span><span class="p">(</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">1e-3</span><span class="p">,</span> <span class="n">learning_rate_schedule</span><span class="o">=</span><span class="s2">"manual"</span><span class="p">,</span> <span class="n">learning_rate_args</span><span class="o">=</span><span class="s2">"1000:1.0,2000:0.9,3000:0.8"</span><span class="p">,)</span> </pre></div> </div> <p>在该示例中,当已训练样本数小于等于1000时,学习率为 <code class="code docutils literal"><span class="pre">1e-3</span> <span class="pre">*</span> <span class="pre">1.0</span></code>;当已训练样本数大于1000小于等于2000时,学习率为 <code class="code docutils literal"><span class="pre">1e-3</span> <span class="pre">*</span> <span class="pre">0.9</span></code>;当已训练样本数大于2000时,学习率为 <code class="code docutils literal"><span class="pre">1e-3</span> <span class="pre">*</span> <span class="pre">0.8</span></code>。</p> </li> <li><p class="first">“pass_manual”</p> <p>这是一种按已训练pass数分段取值的学习率退火方法。使用该learning_rate_schedule时,用户通过参数 <code class="code docutils literal"><span class="pre">learning_rate_args</span></code> 设置学习率衰减因子分段函数,当前的学习率为所设置 <code class="code docutils literal"><span class="pre">learning_rate</span></code> 与当前的衰减因子的乘积。以使用Adam算法为例,代码如下:</p> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">optimizer</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">optimizer</span><span class="o">.</span><span class="n">Adam</span><span class="p">(</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">1e-3</span><span class="p">,</span> <span class="n">learning_rate_schedule</span><span class="o">=</span><span class="s2">"pass_manual"</span><span class="p">,</span> <span class="n">learning_rate_args</span><span class="o">=</span><span class="s2">"1:1.0,2:0.9,3:0.8"</span><span class="p">,)</span> </pre></div> </div> <p>在该示例中,当已训练pass数小于等于1时,学习率为 <code class="code docutils literal"><span class="pre">1e-3</span> <span class="pre">*</span> <span class="pre">1.0</span></code>;当已训练pass数大于1小于等于2时,学习率为 <code class="code docutils literal"><span class="pre">1e-3</span> <span class="pre">*</span> <span class="pre">0.9</span></code>;当已训练pass数大于2时,学习率为 <code class="code docutils literal"><span class="pre">1e-3</span> <span class="pre">*</span> <span class="pre">0.8</span></code>。</p> </li> </ul> </div> <div class="section" id="id2"> <h2><a class="toc-backref" href="#id9">3. 如何初始化参数</a><a class="headerlink" href="#id2" title="永久链接至标题">¶</a></h2> <p>默认情况下,PaddlePaddle使用均值0,标准差为 <span class="math">\(\frac{1}{\sqrt{d}}\)</span> 来初始化参数。其中 <span class="math">\(d\)</span> 为参数矩阵的宽度。这种初始化方式在一般情况下不会产生很差的结果。如果用户想要自定义初始化方式,PaddlePaddle目前提供两种参数初始化的方式:</p> <ul class="simple"> <li>高斯分布。将 <code class="code docutils literal"><span class="pre">param_attr</span></code> 设置成 <code class="code docutils literal"><span class="pre">param_attr=ParamAttr(initial_mean=0.0,</span> <span class="pre">initial_std=1.0)</span></code></li> <li>均匀分布。将 <code class="code docutils literal"><span class="pre">param_attr</span></code> 设置成 <code class="code docutils literal"><span class="pre">param_attr=ParamAttr(initial_max=1.0,</span> <span class="pre">initial_min=-1.0)</span></code></li> </ul> <p>比如设置一个全连接层的参数初始化方式和bias初始化方式,可以使用如下代码。</p> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">hidden</span> <span class="o">=</span> <span class="n">fc_layer</span><span class="p">(</span><span class="nb">input</span><span class="o">=</span><span class="n">ipt</span><span class="p">,</span> <span class="n">param_attr</span><span class="o">=</span><span class="n">ParamAttr</span><span class="p">(</span><span class="n">initial_max</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">initial_min</span><span class="o">=-</span><span class="mf">1.0</span><span class="p">),</span> <span class="n">bias_attr</span><span class="o">=</span><span class="n">ParamAttr</span><span class="p">(</span><span class="n">initial_mean</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">initial_std</span><span class="o">=</span><span class="mf">0.0</span><span class="p">))</span> </pre></div> </div> <p>上述代码将bias全部初始化为1.0, 同时将参数初始化为 <code class="code docutils literal"><span class="pre">[1.0,</span> <span class="pre">-1.0]</span></code> 的均匀分布。</p> </div> <div class="section" id="id3"> <h2><a class="toc-backref" href="#id10">4. 如何共享参数</a><a class="headerlink" href="#id3" title="永久链接至标题">¶</a></h2> <p>PaddlePaddle的参数使用名字 <code class="code docutils literal"><span class="pre">name</span></code> 作为参数的ID,相同名字的参数,会共享参数。设置参数的名字,可以使用 <code class="code docutils literal"><span class="pre">ParamAttr(name="YOUR_PARAM_NAME")</span></code> 来设置。更方便的设置方式,是使得要共享的参数使用同样的 <code class="code docutils literal"><span class="pre">ParamAttr</span></code> 对象。</p> <p>简单的全连接网络,参数共享的配置示例为:</p> <p>这里 <code class="code docutils literal"><span class="pre">hidden_a</span></code> 和 <code class="code docutils literal"><span class="pre">hidden_b</span></code> 使用了同样的parameter和bias。并且softmax层的两个输入也使用了同样的参数 <code class="code docutils literal"><span class="pre">softmax_param</span></code>。</p> </div> <div class="section" id="id4"> <h2><a class="toc-backref" href="#id11">5. 如何加载预训练参数</a><a class="headerlink" href="#id4" title="永久链接至标题">¶</a></h2> <ul class="simple"> <li>对加载预训练参数的层,设置其参数属性 <code class="code docutils literal"><span class="pre">is_static=True</span></code>,使该层的参数在训练过程中保持不变。以embedding层为例,代码如下:</li> </ul> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">emb_para</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">attr</span><span class="o">.</span><span class="n">Param</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">'emb'</span><span class="p">,</span> <span class="n">is_static</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span> <span class="n">paddle</span><span class="o">.</span><span class="n">layer</span><span class="o">.</span><span class="n">embedding</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">word_dim</span><span class="p">,</span> <span class="nb">input</span><span class="o">=</span><span class="n">x</span><span class="p">,</span> <span class="n">param_attr</span><span class="o">=</span><span class="n">emb_para</span><span class="p">)</span> </pre></div> </div> <ul class="simple"> <li>从模型文件将预训练参数载入 <code class="code docutils literal"><span class="pre">numpy.array</span></code>,在创建parameters后,使用 <code class="code docutils literal"><span class="pre">parameters.set()</span></code> 加载预训练参数。PaddlePaddle保存的模型参数文件前16字节为头信息,用户将参数载入 <code class="code docutils literal"><span class="pre">numpy.array</span></code> 时须从第17字节开始。以embedding层为例,代码如下:</li> </ul> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">load_parameter</span><span class="p">(</span><span class="n">file_name</span><span class="p">,</span> <span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">):</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">file_name</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">16</span><span class="p">)</span> <span class="c1"># skip header.</span> <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">fromfile</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">w</span><span class="p">)</span> <span class="n">parameters</span> <span class="o">=</span> <span class="n">paddle</span><span class="o">.</span><span class="n">parameters</span><span class="o">.</span><span class="n">create</span><span class="p">(</span><span class="n">my_cost</span><span class="p">)</span> <span class="n">parameters</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="s1">'emb'</span><span class="p">,</span> <span class="n">load_parameter</span><span class="p">(</span><span class="n">emb_param_file</span><span class="p">,</span> <span class="mi">30000</span><span class="p">,</span> <span class="mi">256</span><span class="p">))</span> </pre></div> </div> </div> <div class="section" id="id5"> <h2><a class="toc-backref" href="#id12">6. 存储的参数格式是什么,如何和明文进行相互转化</a><a class="headerlink" href="#id5" title="永久链接至标题">¶</a></h2> <p>PaddlePaddle保存的模型参数文件内容由16字节头信息和网络参数两部分组成。头信息中,1~4字节表示PaddlePaddle版本信息,请直接填充0;5~8字节表示每个参数占用的字节数,当保存的网络参数为float类型时为4,double类型时为8;9~16字节表示保存的参数总个数。</p> <p>将PaddlePaddle保存的模型参数还原回明文时,可以使用相应数据类型的 <code class="code docutils literal"><span class="pre">numpy.array</span></code> 加载具体网络参数,此时可以跳过PaddlePaddle模型参数文件的头信息。若在PaddlePaddle编译时,未指定按照double精度编译,默认情况下按照float精度计算,保存的参数也是float类型。这时在使用 <code class="code docutils literal"><span class="pre">numpy.array</span></code> 时,一般设置 <code class="code docutils literal"><span class="pre">dtype=float32</span></code> 。示例如下:</p> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">read_parameter</span><span class="p">(</span><span class="n">fname</span><span class="p">,</span> <span class="n">width</span><span class="p">):</span> <span class="n">s</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="n">fname</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="c1"># skip header</span> <span class="n">vec</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">fromstring</span><span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">16</span><span class="p">:],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span> <span class="c1"># width is the size of the corresponding layer</span> <span class="n">np</span><span class="o">.</span><span class="n">savetxt</span><span class="p">(</span><span class="n">fname</span> <span class="o">+</span> <span class="s2">".csv"</span><span class="p">,</span> <span class="n">vec</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">width</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">),</span> <span class="n">fmt</span><span class="o">=</span><span class="s2">"</span><span class="si">%.6f</span><span class="s2">"</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s2">","</span><span class="p">)</span> </pre></div> </div> <p>将明文参数转化为PaddlePaddle可加载的模型参数时,首先构造头信息,再写入网络参数。下面的代码将随机生成的矩阵转化为可以被PaddlePaddle加载的模型参数。</p> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">gen_rand_param</span><span class="p">(</span><span class="n">param_file</span><span class="p">,</span> <span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">,</span> <span class="n">need_trans</span><span class="p">):</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">()</span> <span class="n">header</span> <span class="o">=</span> <span class="n">struct</span><span class="o">.</span><span class="n">pack</span><span class="p">(</span><span class="s2">"iil"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="n">height</span> <span class="o">*</span> <span class="n">width</span><span class="p">)</span> <span class="n">param</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="n">height</span><span class="p">,</span> <span class="n">width</span><span class="p">))</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">param_file</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">fparam</span><span class="p">:</span> <span class="n">fparam</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">header</span> <span class="o">+</span> <span class="n">param</span><span class="o">.</span><span class="n">tostring</span><span class="p">())</span> </pre></div> </div> </div> <div class="section" id="a-protocol-message-was-rejected-because-it-was-too-big"> <h2><a class="toc-backref" href="#id13">7. A protocol message was rejected because it was too big</a><a class="headerlink" href="#a-protocol-message-was-rejected-because-it-was-too-big" title="永久链接至标题">¶</a></h2> <p>如果在训练NLP相关模型时,出现以下错误:</p> <div class="highlight-bash"><div class="highlight"><pre><span></span><span class="o">[</span>libprotobuf ERROR google/protobuf/io/coded_stream.cc:171<span class="o">]</span> A protocol message was rejected because it was too big <span class="o">(</span>more than <span class="m">67108864</span> bytes<span class="o">)</span>. To increase the limit <span class="o">(</span>or to disable these warnings<span class="o">)</span>, see CodedInputStream::SetTotalBytesLimit<span class="o">()</span> in google/protobuf/io/coded_stream.h. F1205 <span class="m">14</span>:59:50.295174 <span class="m">14703</span> TrainerConfigHelper.cpp:59<span class="o">]</span> Check failed: m->conf.ParseFromString<span class="o">(</span>configProtoStr<span class="o">)</span> </pre></div> </div> <p>可能的原因是:传给dataprovider的某一个args过大,一般是由于直接传递大字典导致的。错误的define_py_data_sources2类似:</p> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">src_dict</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span> <span class="k">for</span> <span class="n">line_count</span><span class="p">,</span> <span class="n">line</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="n">src_dict_path</span><span class="p">,</span> <span class="s2">"r"</span><span class="p">)):</span> <span class="n">src_dict</span><span class="p">[</span><span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span> <span class="o">=</span> <span class="n">line_count</span> <span class="n">define_py_data_sources2</span><span class="p">(</span> <span class="n">train_list</span><span class="p">,</span> <span class="n">test_list</span><span class="p">,</span> <span class="n">module</span><span class="o">=</span><span class="s2">"dataprovider"</span><span class="p">,</span> <span class="n">obj</span><span class="o">=</span><span class="s2">"process"</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">{</span><span class="s2">"src_dict"</span><span class="p">:</span> <span class="n">src_dict</span><span class="p">})</span> </pre></div> </div> <p>解决方案是:将字典的地址作为args传给dataprovider,然后在dataprovider里面根据该地址加载字典。即define_py_data_sources2应改为:</p> <div class="highlight-python"><div class="highlight"><pre><span></span><span class="n">define_py_data_sources2</span><span class="p">(</span> <span class="n">train_list</span><span class="p">,</span> <span class="n">test_list</span><span class="p">,</span> <span class="n">module</span><span class="o">=</span><span class="s2">"dataprovider"</span><span class="p">,</span> <span class="n">obj</span><span class="o">=</span><span class="s2">"process"</span><span class="p">,</span> <span class="n">args</span><span class="o">=</span><span class="p">{</span><span class="s2">"src_dict_path"</span><span class="p">:</span> <span class="n">src_dict_path</span><span class="p">})</span> </pre></div> </div> <p>完整源码可参考 <a class="reference external" href="https://github.com/PaddlePaddle/Paddle/tree/develop/demo/seqToseq">seqToseq</a> 示例。</p> </div> </div> </div> </div> <footer> <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation"> <a href="../local/index_cn.html" class="btn btn-neutral float-right" title="本地训练与预测" accesskey="n">Next <span class="fa fa-arrow-circle-right"></span></a> <a href="../model/index_cn.html" class="btn btn-neutral" title="模型配置" accesskey="p"><span class="fa fa-arrow-circle-left"></span> Previous</a> </div> <hr/> <div role="contentinfo"> <p> © Copyright 2016, PaddlePaddle developers. </p> </div> Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>. </footer> </div> </div> </section> </div> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT:'../../', VERSION:'', COLLAPSE_INDEX:false, FILE_SUFFIX:'.html', HAS_SOURCE: true, SOURCELINK_SUFFIX: ".txt", }; </script> <script type="text/javascript" src="../../_static/jquery.js"></script> <script type="text/javascript" src="../../_static/underscore.js"></script> <script type="text/javascript" src="../../_static/doctools.js"></script> <script type="text/javascript" src="../../_static/translations.js"></script> <script type="text/javascript" src="https://cdn.bootcss.com/mathjax/2.7.0/MathJax.js"></script> <script type="text/javascript" src="../../_static/js/theme.js"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script> <script src="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/js/perfect-scrollbar.jquery.min.js"></script> <script src="../../_static/js/paddle_doc_init.js"></script> </body> </html>