

class paddle::TrainerStats

TrainerStats object will statistics sample processed and total cost.

There are two stats in it, the ‘AvgCost’ and ‘CurrentAvgCost’. ‘AvgCost’ means cost through one pass(all mini-batches). ‘CurrentAvgCost’ means cost through one mini-batch.

Public Functions

void reset()

reset all stats.

often used before pass start.

void resetCurrentStat()

reset current stat.

‘current’ means the most recent log_period mini-batches

void addCost(int64_t numProcessed, real cost)

add cost to stat.

  • numProcessed: current mini-batch size
  • cost: current mini-batch cost

real getAvgCost() const

get average cost through on pass(all processed mini-batches)

pass average cost

real getCurrentAvgCost() const

get current mini-batch’s average cost.

mini-batch average cost

int64_t getNumProcessed() const

get all processed samples’ number

all processed samples’ number

TrainerStats &operator+=(const std::pair<int64_t, real> &p)

same function as addCost. But it is simple to invoke. For example:

TrainerStats stat;
cost = neuralNetwork.forward(batchSize);
stat += {batchSize, cost};

  • p: a pair of parameter, first is numProcessed, second is cost.


TrainerStats Constructor.

reset stat when constructed.

void showStats(std::ostream &os, bool withCurrentCost = true) const

show stats to ostream.

If there is no need to print current cost, set withCurrentCost to False.

  • os: output stream.
  • withCurrentCost: print current cost or not.

std::string getStats(bool withCurrentCost = true) const

get stats to std::string

stats string
  • withCurrentCost: return current cost or not


class paddle::RemoteParameterUpdater

Normal remote parameter updater for dense parameters.

It first packs all parameters for all pservers using ParameterClient module, then wait for merged parameters data from all pservers. The synchronization pattern specified by sync-sgd or async-sgd is achieved by all pservers with the help of the controller within this remote parameter updater. This module indeedly bridges the gradient machines and parameter servers. It helps to transfer the parameters from acceleration device to cpu end for network. It contains additional parameters copy buffers for acceleration devices at cpu end, such as gpu, otherwise it will directly use original parameters data to update pservers.

This remote parameter updater does not use pipeline mechanism to hide copy latency from gpu to cpu buffer. In addition the overlapped between backward and communication is not supported.

Inherits from paddle::ParameterUpdater

Subclassed by paddle::ConcurrentRemoteParameterUpdater

Public Functions

RemoteParameterUpdater(const OptimizationConfig &config, int expectedPpassCount, std::unique_ptr<ParameterUpdater> &&localUpdater = nullptr)
void init(std::vector<ParameterPtr> &parameters)

initialize the internal parameter client and itself.

virtual PassType startBatch(int64_t batchSize)

start batch

one batch training exhibits stateful feature to help to do performance tuning, sgd optimization if necessary.

void finishBatch(real cost)

send parameters to pservers and get returned parameters from all pservers if necessary. it will implictly cooperate with controller thread for sync-sgd.

void startPass()
bool finishPass(real cost)
virtual void setForwardbackwardTime(uint64_t delta)
void apply()
void restore()

Protected Functions

void controller()

control all pservers with all trainers for sync-sgd

void updateImpl(Parameter *para)

work need to do after finishBatch

void startController()
void copyParametersToDevice(ParameterType parameterType)

copy parameters from cpu host to device, such as gpu.

return if all data are transfered.

void copyParametersFromDevice(ParameterType parameterType)

copy parameters from device to cpu host

return if all data are transfered

Protected Attributes

OptimizationConfig config_

Optimization config used to guide initialization and finishBatch.

std::unique_ptr<ParameterClient2> parameterClient_

internal parameter client object for exchanging data with pserver

std::vector<ParameterPtr> cpuParameters_

internal shadow buffer at cpu host end, use original parameters_ if no acceleration devices are used.

std::unique_ptr<ParameterUpdater> localUpdater_

local updater for aggregating multi-batches local delta

int64_t batchSize_

the size of mini-batch

int64_t numBatches_

batches passed

BatchStatus batchStatus_

for stateful control

std::unique_ptr<std::thread> controllerThread_

controller thread for sync-sgd

int64_t passCount_

passed alread finished

int64_t expectedPassCount_

expected passes to finished

bool separateSendAndRecv_

use normal synchronization communication if True

bool isFirstPass_

true if it’s first pass

bool useApplyInPserver_

Protected Static Attributes

const std::string kAverage
const std::string kElasticAverage


class paddle::ConcurrentRemoteParameterUpdater

This updater add additional optimization for overlapping synchronization from pservers with backward computation.

Parameter can be sent to pservers when related backward stage is finished. This concurrent udpater does data copy from acceleration device to host memory aynchronously. In addition internal parameter client reads data in host memory and send them to all pservers in next stage. So this class help to pipeline device-to-host copy and host-to-network to hide network latency in backward stage. It contains separate send and recv thread for pipeline usage.

Inherits from paddle::RemoteParameterUpdater

Public Functions

ConcurrentRemoteParameterUpdater(OptimizationConfig config, int expectedPassCount, std::unique_ptr<ParameterUpdater> &&localUpdater)
void finishBatch(real cost)

send paraemeters to all pservers

it just signal the end signal to internal parameter client to finished the aynchronous send action. In addition it also do synchronization for all asynchronous host-to-device copy.

Protected Functions

void updateImpl(Parameter *para)

work need to do after finishBatch

void send(Parameter *para)

internal thread called in send thread

void recv(Parameter *para)

internal function called in recv thread

void send()

send thread for relaying data from gradient to parameter client

just pipe data to internal parameter client for pipeline

void recv()

recv thread for relaying data from internal parameter client to host memory

it contains the asynchronous data copy form host to device

void copySingleParaToDevice(Parameter *para, ParameterType parameterType)

copy specified parameter from host to device

void copySingleParaFromDevice(Parameter *para, ParameterType parameterType)

copy specified parameter from device to host

bool needToUpdateRemotely()


class paddle::SparseRemoteParameterUpdater

This class is specified for updating sparse parameters.

It allows part of parameter to be exchanged with all pservers. If sparse input assigned, part gradients of first hidden layer could remained zero which can not need to be exchanged within all pservers. This is the key optimization point for this updater

For updating sparse parameters, all latest parameters are stored in pservers instead of keeping full copy at train end, so need to prefetch parameters weight value which can be changed in next-batch before doing next forwardbackward. Also, with above fact that the parameters can be stored in pserver instead of trainer, we can fetch specified parmeters if necessary, and can support huge parameters which is larger enough than the RAM size in single node.

Internally, this updater will direct internal parameter client to encapsulate sparse specified message for all pservers.

Inherits from paddle::ParameterUpdater

Public Functions

SparseRemoteParameterUpdater(const OptimizationConfig &config, int expectedPassCount, bool testing)
void init(std::vector<ParameterPtr> &parameters)


PassType startBatch(int64_t batchSize)

stateful batch control

void finishBatch(real cost)

send all sparse related parameters to all pservers

void startPass()
bool finishPass(real cost)
void apply()
void restore()
void loadParametersRemote(const std::string &dirName)

load parameters from pservers

void saveParametersRemote(const std::string &dirName)

save parameters to pservers

void getParametersRemote(bool fullSize, bool apply)

get latest sparse parameters value from all pservers

call it before next mini-batch

void randParametersRemote()
virtual void setForwardbackwardTime(uint64_t delta)

Protected Functions

virtual void updateImpl(Parameter *para)

update implimentation, not implemented

void controller()

internal controller routine for controller thread

void startController()

start controller thread

Protected Attributes

OptimizationConfig config_

optimization config

std::unique_ptr<ParameterClient2> parameterClient_

internal parameter client

int64_t batchSize_
std::unique_ptr<std::thread> controllerThread_
int64_t passCount_
int64_t expectedPassCount_
bool testing_
bool useApplyInPserver_


class paddle::SparseRemoteParameterUpdaterComposite

Class for supporting normal updater and sparse updater

Not all parts of one model are sparse, so it exists dense updater for normal layers while sparse updater is for sparse layers.

it directly call internal dense and sparse udpater individually.

Inherits from paddle::ParameterUpdaterComposite

Public Types

enum [anonymous]



Public Functions

SparseRemoteParameterUpdaterComposite(const OptimizationConfig &config, int expectedPassCount, bool testing, std::unique_ptr<ParameterUpdater> &&normalUpdater)

create one dense updater and one sparse updater

use syncThreadPool to synchronize these two updaters

void init(std::vector<ParameterPtr> &parameters)

initialization of dense and sparse updaters