Optimizer¶
-
namespace
paddle
¶ -
class
SgdOptimizer
¶ Inherits from paddle::ParameterOptimizer
Public Functions
-
SgdOptimizer
(const OptimizationConfig &optConfig)¶
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
-
class
SparseMomentumParameterOptimizer
¶ Inherits from paddle::ParameterOptimizer
Public Functions
-
SparseMomentumParameterOptimizer
(const OptimizationConfig &optConfig)¶
-
virtual void
init
(size_t numRows, const ParameterConfig *config)¶ For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
virtual ParameterOptimizer::TraverseCallback
needSpecialTraversal
(const ParameterConfig &config) const¶ following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:
startBatch(); if (dense) { update(blockVec); } else {//sparse for (row : rows_in_block) {update(rowVec)} } auto callback = needSpecialTraversal(); if (callback) { // do traverse, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : all_rows_in_block) {callback();} } } finishBatch();
- Return
- callback if need traverse, else return nullptr. It should be no state change.
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
-
class
AdagradParameterOptimizer
¶ Inherits from paddle::ParameterOptimizer
Public Functions
-
AdagradParameterOptimizer
(const OptimizationConfig &optConfig)¶
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
virtual ParameterOptimizer::TraverseCallback
needSpecialTraversal
(const ParameterConfig &config) const¶ following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:
startBatch(); if (dense) { update(blockVec); } else {//sparse for (row : rows_in_block) {update(rowVec)} } auto callback = needSpecialTraversal(); if (callback) { // do traverse, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : all_rows_in_block) {callback();} } } finishBatch();
- Return
- callback if need traverse, else return nullptr. It should be no state change.
Protected Attributes
-
int64_t
numUpdates_
¶
Protected Static Attributes
-
const int64_t
kMaxNumAccumulates
¶
-
-
class
AdaDeltaParameterOptimizer
¶ Inherits from paddle::ParameterOptimizer
Public Functions
-
AdaDeltaParameterOptimizer
(const OptimizationConfig &optConfig)¶
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
-
class
RMSPropParameterOptimizer
¶ Inherits from paddle::ParameterOptimizer
Public Functions
-
RMSPropParameterOptimizer
(const OptimizationConfig &optConfig)¶
-
virtual void
init
(size_t numRows, const ParameterConfig *config)¶ For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
Protected Attributes
-
real
rou_
¶
-
real
epsilon_
¶
-
int64_t
timer_
¶ counting batches, donot need catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.
-
std::vector<int64_t>
t0Vec_
¶
-
-
class
DecayedAdagradParameterOptimizer
¶ Inherits from paddle::ParameterOptimizer
Public Functions
-
DecayedAdagradParameterOptimizer
(const OptimizationConfig &optConfig)¶
-
virtual void
init
(size_t numRows, const ParameterConfig *config)¶ For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
Protected Attributes
-
real
rou_
¶
-
real
epsilon_
¶
-
int64_t
timer_
¶ counting batches, donot need catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.
-
std::vector<int64_t>
t0Vec_
¶
-
-
class
AdamParameterOptimizer
¶ - #include <FirstOrderOptimizer.h>
Adam Optimizer. Reference Paper: http://arxiv.org/abs/1412.6980 Algorithm 1
Inherits from paddle::ParameterOptimizer
Public Functions
-
AdamParameterOptimizer
(const OptimizationConfig &optConfig)¶
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
-
class
AdamaxParameterOptimizer
¶ - #include <FirstOrderOptimizer.h>
AdaMax Optimizer. Reference Paper: http://arxiv.org/abs/1412.6980 Algorithm 2
Inherits from paddle::ParameterOptimizer
Public Functions
-
AdamaxParameterOptimizer
(const OptimizationConfig &optConfig)¶
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
-
class
AddOptimizer
¶ Inherits from paddle::ParameterOptimizer
Public Functions
-
AddOptimizer
(const OptimizationConfig &optConfig)¶
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
-
class
DummyOptimizer
¶ Inherits from paddle::ParameterOptimizer
Public Functions
-
DummyOptimizer
(const OptimizationConfig &optConfig)¶
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
-
class
OptimizerWithGradientClipping
¶ Inherits from paddle::ParameterOptimizer
Public Functions
-
OptimizerWithGradientClipping
(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer)¶
-
virtual void
init
(size_t numRows, const ParameterConfig *config)¶ For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.
-
virtual void
startPass
()¶
-
virtual void
finishPass
()¶
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
virtual TraverseCallback
needSpecialTraversal
(const ParameterConfig &config) const¶ following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:
startBatch(); if (dense) { update(blockVec); } else {//sparse for (row : rows_in_block) {update(rowVec)} } auto callback = needSpecialTraversal(); if (callback) { // do traverse, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : all_rows_in_block) {callback();} } } finishBatch();
- Return
- callback if need traverse, else return nullptr. It should be no state change.
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
virtual void
setNoDecay
()¶
Protected Attributes
-
std::unique_ptr<ParameterOptimizer>
optimizer_
¶
-
-
class
-
namespace
paddle
¶ -
class
AverageOptimizer
¶ Inherits from paddle::ParameterOptimizer
Subclassed by paddle::AverageSparseOptimizer
Public Functions
-
AverageOptimizer
(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, bool useParameterApply)¶
-
virtual void
init
(size_t numRows, const ParameterConfig *config)¶ For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.
-
virtual void
startPass
()¶
-
virtual void
finishPass
()¶
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
virtual ParameterOptimizer::TraverseCallback
needSpecialTraversal
(const ParameterConfig &config) const¶ following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:
startBatch(); if (dense) { update(blockVec); } else {//sparse for (row : rows_in_block) {update(rowVec)} } auto callback = needSpecialTraversal(); if (callback) { // do traverse, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : all_rows_in_block) {callback();} } } finishBatch();
- Return
- callback if need traverse, else return nullptr. It should be no state change.
-
virtual TraverseCallback
startCatchUpWith
() const¶ following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:
auto callback = startCatchUpWith(); if (callback) { // do catch up with, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : rows_in_block) {callback();} } // finish catch up with, main thread finishCatchUpWith(); }
- Return
- callback if need catch up with, else return nullptr. It should be no state change.
-
virtual void
finishCatchUpWith
()¶
-
virtual ParameterOptimizer::TraverseCallback
apply
()¶ following two hooks used by averager, apply to final parameter value (PARAMETER_VALUE or PARAMETER_APPLY).
restore() will restore orginal value if it apply to PARAMETER_VALUE. Caller must ensure it’s catched up with current time before apply.
Use returned callback same way as callback returned by ParameterOptimizer::needSpecialTraversal()
-
virtual ParameterOptimizer::TraverseCallback
restore
()¶
-
virtual void
setNoDecay
()¶
Public Static Functions
-
ParameterOptimizer *
create
(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, bool isParameterSparse = false, bool useParameterApply = false)¶
Protected Attributes
-
std::unique_ptr<ParameterOptimizer>
optimizer_
¶
-
bool
useApply_
¶
-
int64_t
numUpdates_
¶
-
int64_t
prevNumUpdates_
¶
-
int64_t
numAccumulates_
¶
-
int64_t
oldNumAccumulates_
¶
-
int64_t
minAverageWindow_
¶
-
int64_t
maxAverageWindow_
¶
Protected Static Attributes
-
const int64_t
kMaxNumAccumulates
¶
-
-
class
AverageSparseOptimizer
¶ Inherits from paddle::AverageOptimizer
Public Functions
-
AverageSparseOptimizer
(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, bool useParameterApply)¶
-
virtual void
init
(size_t numRows, const ParameterConfig *config)¶ For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
void
catchUpWith
(const VectorPtr vecs[], const ParameterConfig ¶Config, size_t sparseId) const¶
-
virtual ParameterOptimizer::TraverseCallback
startCatchUpWith
() const¶ following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:
auto callback = startCatchUpWith(); if (callback) { // do catch up with, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : rows_in_block) {callback();} } // finish catch up with, main thread finishCatchUpWith(); }
- Return
- callback if need catch up with, else return nullptr. It should be no state change.
-
virtual void
finishCatchUpWith
()¶
-
-
class
-
namespace
paddle
-
class
ParameterOptimizer
¶ - #include <ParameterOptimizer.h>
Some member functions are set to const for two reasons:
- For sparse update thread safe: update(), traverse callback(const this) may be called many times, each time one row, and these function can be called parallelly by multi worker, to speed up large block.
- For predicate functions, needSpecialTraversal(), startCatchUpWith() may be called many times, should be no state change between calls.
Subclassed by paddle::AdaDeltaParameterOptimizer, paddle::AdagradParameterOptimizer, paddle::AdamaxParameterOptimizer, paddle::AdamParameterOptimizer, paddle::AddOptimizer, paddle::AverageOptimizer, paddle::DecayedAdagradParameterOptimizer, paddle::DummyOptimizer, paddle::OptimizerWithGradientClipping, paddle::OptimizerWithRegularizer, paddle::RMSPropParameterOptimizer, paddle::SgdOptimizer, paddle::SparseMomentumParameterOptimizer
Public Types
-
typedef std::function<void(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId)>
TraverseCallback
¶
Public Functions
-
ParameterOptimizer
(const OptimizationConfig &optConfig)¶
-
real
calcLearningRate
(int64_t numSamplesProcessed, int64_t pass)¶
-
virtual
~ParameterOptimizer
()¶
-
virtual void
init
(size_t numRows, const ParameterConfig *config)¶ For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.
-
virtual void
startPass
()¶
-
virtual void
finishPass
()¶
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual TraverseCallback
needSpecialTraversal
(const ParameterConfig &config) const¶ following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:
startBatch(); if (dense) { update(blockVec); } else {//sparse for (row : rows_in_block) {update(rowVec)} } auto callback = needSpecialTraversal(); if (callback) { // do traverse, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : all_rows_in_block) {callback();} } } finishBatch();
- Return
- callback if need traverse, else return nullptr. It should be no state change.
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId = -1LU) const = 0¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
virtual TraverseCallback
startCatchUpWith
() const¶ following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:
auto callback = startCatchUpWith(); if (callback) { // do catch up with, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : rows_in_block) {callback();} } // finish catch up with, main thread finishCatchUpWith(); }
- Return
- callback if need catch up with, else return nullptr. It should be no state change.
-
virtual void
finishCatchUpWith
()¶
-
virtual TraverseCallback
apply
()¶ following two hooks used by averager, apply to final parameter value (PARAMETER_VALUE or PARAMETER_APPLY).
restore() will restore orginal value if it apply to PARAMETER_VALUE. Caller must ensure it’s catched up with current time before apply.
Use returned callback same way as callback returned by ParameterOptimizer::needSpecialTraversal()
-
virtual TraverseCallback
restore
()¶
-
const std::vector<ParameterType> &
getParameterTypes
() const¶ return the parameter types used by this updater
-
void
addParameterType
(ParameterType type)¶
-
real
getLearningRate
() const¶
-
virtual void
setNoDecay
()¶
Public Static Functions
-
ParameterOptimizer *
create
(const OptimizationConfig &optConfig, bool inPserver = false)¶
Protected Types
-
typedef std::vector<ParameterOptimizer::TraverseCallback>
TraverseCallbackVec
¶
Protected Attributes
-
bool
applyDecay_
¶
-
const OptimizationConfig &
optConfig_
¶
-
std::vector<ParameterType>
parameterTypes_
¶
-
real
learningRate_
¶ global learning rate, init value is opt_config.learning_rate, sparse regularizer get this value per batch, after StartBatch() called so, if lr change in StartBatch, please assign to learningRate_
-
std::unique_ptr<LearningRateScheduler>
learningRateScheduler_
¶
-
int64_t
pass_
¶
-
bool
firstTime_
¶
Protected Static Functions
-
static TraverseCallback
composeCallbacks
(const TraverseCallbackVec &callbacks)¶
-
class
-
namespace
paddle
-
class
OptimizerWithRegularizer
¶ Inherits from paddle::ParameterOptimizer
Subclassed by paddle::OptimizerWithRegularizerEveryNumBatches, paddle::OptimizerWithRegularizerSparse
Public Functions
-
OptimizerWithRegularizer
(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, Regularizer *regularizer)¶
-
virtual void
init
(size_t numRows, const ParameterConfig *config)¶ For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.
-
virtual void
startPass
()¶
-
virtual void
finishPass
()¶
-
virtual void
startBatch
(int64_t numSamplesProcessed)¶ called by Trainer before forward() of a batch.
-
virtual void
finishBatch
()¶ called by Trainer after backward() of a batch
-
virtual TraverseCallback
needSpecialTraversal
(const ParameterConfig &config) const¶ following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:
startBatch(); if (dense) { update(blockVec); } else {//sparse for (row : rows_in_block) {update(rowVec)} } auto callback = needSpecialTraversal(); if (callback) { // do traverse, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : all_rows_in_block) {callback();} } } finishBatch();
- Return
- callback if need traverse, else return nullptr. It should be no state change.
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
Public Static Functions
-
ParameterOptimizer *
create
(const OptimizationConfig &optConfig, const ParameterConfig ¶Config, bool isParameterSparse, bool inPserver)¶
Protected Attributes
-
std::unique_ptr<ParameterOptimizer>
optimizer_
¶
-
Regularizer *
regularizer_
¶
-
int
timer_
¶ counting batches, clear after catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.
-
-
class
OptimizerWithRegularizerEveryNumBatches
¶ Inherits from paddle::OptimizerWithRegularizer
Public Functions
-
OptimizerWithRegularizerEveryNumBatches
(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, Regularizer *regularizer)¶
-
virtual void
startPass
()¶
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
virtual ParameterOptimizer::TraverseCallback
needSpecialTraversal
(const ParameterConfig &config) const¶ following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:
startBatch(); if (dense) { update(blockVec); } else {//sparse for (row : rows_in_block) {update(rowVec)} } auto callback = needSpecialTraversal(); if (callback) { // do traverse, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : all_rows_in_block) {callback();} } } finishBatch();
- Return
- callback if need traverse, else return nullptr. It should be no state change.
-
void
doTraversal
(const VectorPtr vecs[], const ParameterConfig &config) const¶
-
void
catchUpWith
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶
-
virtual ParameterOptimizer::TraverseCallback
startCatchUpWith
() const¶ following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:
auto callback = startCatchUpWith(); if (callback) { // do catch up with, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : rows_in_block) {callback();} } // finish catch up with, main thread finishCatchUpWith(); }
- Return
- callback if need catch up with, else return nullptr. It should be no state change.
-
virtual void
finishCatchUpWith
()¶
Protected Functions
-
bool
isRegularizationBatch
(const ParameterConfig &config) const¶
Protected Attributes
-
int
baseTimer_
¶ recored the timer_ value while catchUpWith called.
-
-
class
OptimizerWithRegularizerSparse
¶ Inherits from paddle::OptimizerWithRegularizer
Public Functions
-
OptimizerWithRegularizerSparse
(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, Regularizer *regularizer)¶
-
virtual void
init
(size_t numRows, const ParameterConfig *config)¶ For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.
-
virtual void
update
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶ between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.
-
void
catchUpWith
(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶
-
virtual ParameterOptimizer::TraverseCallback
startCatchUpWith
() const¶ following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:
auto callback = startCatchUpWith(); if (callback) { // do catch up with, maybe multi-thread if (dense) { callback(); } else {//sparse for (row : rows_in_block) {callback();} } // finish catch up with, main thread finishCatchUpWith(); }
- Return
- callback if need catch up with, else return nullptr. It should be no state change.
-
virtual void
finishCatchUpWith
()¶
Protected Attributes
-
std::vector<int32_t>
t0Vec_
¶ t0Vec_ are last occur time of i rows if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.
-
-
class