Trainer¶
TrainerStats¶
- 
class 
paddle::TrainerStats¶ TrainerStats object will statistics sample processed and total cost.
There are two stats in it, the ‘AvgCost’ and ‘CurrentAvgCost’. ‘AvgCost’ means cost through one pass(all mini-batches). ‘CurrentAvgCost’ means cost through one mini-batch.
Public Functions
- 
void 
reset()¶ reset all stats.
often used before pass start.
- 
void 
resetCurrentStat()¶ reset current stat.
‘current’ means the most recent log_period mini-batches
- 
void 
addCost(int64_t numProcessed, real cost)¶ add cost to stat.
- Parameters
 numProcessed: current mini-batch sizecost: current mini-batch cost
- 
real 
getAvgCost() const¶ get average cost through on pass(all processed mini-batches)
- Return
 - pass average cost
 
- 
real 
getCurrentAvgCost() const¶ get current mini-batch’s average cost.
- Return
 - mini-batch average cost
 
- 
int64_t 
getNumProcessed() const¶ get all processed samples’ number
- Return
 - all processed samples’ number
 
- 
TrainerStats &
operator+=(const std::pair<int64_t, real> &p)¶ same function as addCost. But it is simple to invoke. For example:
TrainerStats stat; cost = neuralNetwork.forward(batchSize); stat += {batchSize, cost};
- Return
 - *this
 - Parameters
 p: a pair of parameter, first is numProcessed, second is cost.
- 
TrainerStats()¶ TrainerStats Constructor.
reset stat when constructed.
- 
void 
showStats(std::ostream &os, bool withCurrentCost = true) const¶ show stats to ostream.
If there is no need to print current cost, set withCurrentCost to False.
- Parameters
 os: output stream.withCurrentCost: print current cost or not.
- 
std::string 
getStats(bool withCurrentCost = true) const¶ get stats to std::string
- Return
 - stats string
 - Parameters
 withCurrentCost: return current cost or not
- 
void 
 
RemoteParameterUpdater¶
- 
class 
paddle::RemoteParameterUpdater¶ Normal remote parameter updater for dense parameters.
It first packs all parameters for all pservers using ParameterClient module, then wait for merged parameters data from all pservers. The synchronization pattern specified by sync-sgd or async-sgd is achieved by all pservers with the help of the controller within this remote parameter updater. This module indeedly bridges the gradient machines and parameter servers. It helps to transfer the parameters from acceleration device to cpu end for network. It contains additional parameters copy buffers for acceleration devices at cpu end, such as gpu, otherwise it will directly use original parameters data to update pservers.
This remote parameter updater does not use pipeline mechanism to hide copy latency from gpu to cpu buffer. In addition the overlapped between backward and communication is not supported.
Inherits from paddle::ParameterUpdater
Subclassed by paddle::ConcurrentRemoteParameterUpdater
Public Functions
- 
RemoteParameterUpdater(const OptimizationConfig &config, int expectedPpassCount, std::unique_ptr<ParameterUpdater> &&localUpdater = nullptr)¶ 
- 
~RemoteParameterUpdater()¶ 
- 
void 
init(std::vector<ParameterPtr> ¶meters)¶ initialize the internal parameter client and itself.
- 
virtual PassType 
startBatch(int64_t batchSize)¶ start batch
- Note
 - one batch training exhibits stateful feature to help to do performance tuning, sgd optimization if necessary.
 
- 
void 
finishBatch(real cost)¶ send parameters to pservers and get returned parameters from all pservers if necessary. it will implictly cooperate with controller thread for sync-sgd.
- 
void 
startPass()¶ 
- 
bool 
finishPass(real cost)¶ 
- 
virtual void 
setForwardbackwardTime(uint64_t delta)¶ 
- 
void 
apply()¶ 
- 
void 
restore()¶ 
Protected Functions
- 
void 
controller()¶ control all pservers with all trainers for sync-sgd
- 
void 
startController()¶ 
- 
void 
copyParametersToDevice(ParameterType parameterType)¶ copy parameters from cpu host to device, such as gpu.
- Note
 - return if all data are transfered.
 
- 
void 
copyParametersFromDevice(ParameterType parameterType)¶ copy parameters from device to cpu host
- Note
 - return if all data are transfered
 
Protected Attributes
- 
OptimizationConfig 
config_¶ Optimization config used to guide initialization and finishBatch.
- 
std::unique_ptr<ParameterClient2> 
parameterClient_¶ internal parameter client object for exchanging data with pserver
- 
std::vector<ParameterPtr> 
cpuParameters_¶ internal shadow buffer at cpu host end, use original parameters_ if no acceleration devices are used.
- 
std::unique_ptr<ParameterUpdater> 
localUpdater_¶ local updater for aggregating multi-batches local delta
- 
int64_t 
batchSize_¶ the size of mini-batch
- 
int64_t 
numBatches_¶ batches passed
- 
BatchStatus 
batchStatus_¶ for stateful control
- 
std::unique_ptr<std::thread> 
controllerThread_¶ controller thread for sync-sgd
- 
int64_t 
passCount_¶ passed alread finished
- 
int64_t 
expectedPassCount_¶ expected passes to finished
- 
bool 
separateSendAndRecv_¶ use normal synchronization communication if True
- 
bool 
isFirstPass_¶ true if it’s first pass
- 
bool 
useApplyInPserver_¶ 
- 
 
ConcurrentRemoteParameterUpdater¶
- 
class 
paddle::ConcurrentRemoteParameterUpdater¶ This updater add additional optimization for overlapping synchronization from pservers with backward computation.
Parameter can be sent to pservers when related backward stage is finished. This concurrent udpater does data copy from acceleration device to host memory aynchronously. In addition internal parameter client reads data in host memory and send them to all pservers in next stage. So this class help to pipeline device-to-host copy and host-to-network to hide network latency in backward stage. It contains separate send and recv thread for pipeline usage.
Inherits from paddle::RemoteParameterUpdater
Public Functions
- 
ConcurrentRemoteParameterUpdater(OptimizationConfig config, int expectedPassCount, std::unique_ptr<ParameterUpdater> &&localUpdater)¶ 
- 
~ConcurrentRemoteParameterUpdater()¶ 
- 
void 
finishBatch(real cost)¶ send paraemeters to all pservers
- Note
 - it just signal the end signal to internal parameter client to finished the aynchronous send action. In addition it also do synchronization for all asynchronous host-to-device copy.
 
Protected Functions
- 
void 
send()¶ send thread for relaying data from gradient to parameter client
- Note
 - just pipe data to internal parameter client for pipeline
 
- 
void 
recv()¶ recv thread for relaying data from internal parameter client to host memory
- Note
 - it contains the asynchronous data copy form host to device
 
- 
void 
copySingleParaToDevice(Parameter *para, ParameterType parameterType)¶ copy specified parameter from host to device
- 
void 
copySingleParaFromDevice(Parameter *para, ParameterType parameterType)¶ copy specified parameter from device to host
- 
bool 
needToUpdateRemotely()¶ 
- 
 
SparseRemoteParameterUpdater¶
- 
class 
paddle::SparseRemoteParameterUpdater¶ This class is specified for updating sparse parameters.
It allows part of parameter to be exchanged with all pservers. If sparse input assigned, part gradients of first hidden layer could remained zero which can not need to be exchanged within all pservers. This is the key optimization point for this updater
For updating sparse parameters, all latest parameters are stored in pservers instead of keeping full copy at train end, so need to prefetch parameters weight value which can be changed in next-batch before doing next forwardbackward. Also, with above fact that the parameters can be stored in pserver instead of trainer, we can fetch specified parmeters if necessary, and can support huge parameters which is larger enough than the RAM size in single node.
Internally, this updater will direct internal parameter client to encapsulate sparse specified message for all pservers.
Inherits from paddle::ParameterUpdater
Public Functions
- 
SparseRemoteParameterUpdater(const OptimizationConfig &config, int expectedPassCount, bool testing)¶ 
- 
~SparseRemoteParameterUpdater()¶ 
- 
void 
init(std::vector<ParameterPtr> ¶meters)¶ initialization
- 
PassType 
startBatch(int64_t batchSize)¶ stateful batch control
- 
void 
finishBatch(real cost)¶ send all sparse related parameters to all pservers
- 
void 
startPass()¶ 
- 
bool 
finishPass(real cost)¶ 
- 
void 
apply()¶ 
- 
void 
restore()¶ 
- 
void 
loadParametersRemote(const std::string &dirName)¶ load parameters from pservers
- 
void 
saveParametersRemote(const std::string &dirName)¶ save parameters to pservers
- 
void 
getParametersRemote(bool fullSize, bool apply)¶ get latest sparse parameters value from all pservers
- Note
 - call it before next mini-batch
 
- 
void 
randParametersRemote()¶ 
- 
virtual void 
setForwardbackwardTime(uint64_t delta)¶ 
Protected Functions
- 
void 
controller()¶ internal controller routine for controller thread
- 
void 
startController()¶ start controller thread
Protected Attributes
- 
OptimizationConfig 
config_¶ optimization config
- 
std::unique_ptr<ParameterClient2> 
parameterClient_¶ internal parameter client
- 
int64_t 
batchSize_¶ 
- 
std::unique_ptr<std::thread> 
controllerThread_¶ 
- 
int64_t 
passCount_¶ 
- 
int64_t 
expectedPassCount_¶ 
- 
bool 
testing_¶ 
- 
bool 
useApplyInPserver_¶ 
- 
 
SparseRemoteParameterUpdaterComposite¶
- 
class 
paddle::SparseRemoteParameterUpdaterComposite¶ Class for supporting normal updater and sparse updater
Not all parts of one model are sparse, so it exists dense updater for normal layers while sparse updater is for sparse layers.
it directly call internal dense and sparse udpater individually.
Inherits from paddle::ParameterUpdaterComposite
Public Types
Public Functions
- 
SparseRemoteParameterUpdaterComposite(const OptimizationConfig &config, int expectedPassCount, bool testing, std::unique_ptr<ParameterUpdater> &&normalUpdater)¶ create one dense updater and one sparse updater
- Note
 - use syncThreadPool to synchronize these two updaters
 
- 
void 
init(std::vector<ParameterPtr> ¶meters)¶ initialization of dense and sparse updaters
-