Optimizer

namespace paddle
class SgdOptimizer

Inherits from paddle::ParameterOptimizer

Public Functions

SgdOptimizer(const OptimizationConfig &optConfig)
virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual void finishBatch()

called by Trainer after backward() of a batch

class SparseMomentumParameterOptimizer

Inherits from paddle::ParameterOptimizer

Public Functions

SparseMomentumParameterOptimizer(const OptimizationConfig &optConfig)
virtual void init(size_t numRows, const ParameterConfig *config)

For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual ParameterOptimizer::TraverseCallback needSpecialTraversal(const ParameterConfig &config) const

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return
callback if need traverse, else return nullptr. It should be no state change.

virtual void finishBatch()

called by Trainer after backward() of a batch

Protected Attributes

int64_t timer_
std::vector<int64_t> t0Vec_
bool isParameterSparse_

Private Members

real alpha_
real beta_
real tau_
real gamma_
real threshold_
real momentum_
real decayRate_
class AdagradParameterOptimizer

Inherits from paddle::ParameterOptimizer

Public Functions

AdagradParameterOptimizer(const OptimizationConfig &optConfig)
virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual ParameterOptimizer::TraverseCallback needSpecialTraversal(const ParameterConfig &config) const

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return
callback if need traverse, else return nullptr. It should be no state change.

Protected Attributes

int64_t numUpdates_

Protected Static Attributes

const int64_t kMaxNumAccumulates
class AdaDeltaParameterOptimizer

Inherits from paddle::ParameterOptimizer

Public Functions

AdaDeltaParameterOptimizer(const OptimizationConfig &optConfig)
virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Protected Attributes

real rou_
real epsilon_
class RMSPropParameterOptimizer

Inherits from paddle::ParameterOptimizer

Public Functions

RMSPropParameterOptimizer(const OptimizationConfig &optConfig)
virtual void init(size_t numRows, const ParameterConfig *config)

For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual void finishBatch()

called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Protected Attributes

real rou_
real epsilon_
int64_t timer_

counting batches, donot need catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.

std::vector<int64_t> t0Vec_
class DecayedAdagradParameterOptimizer

Inherits from paddle::ParameterOptimizer

Public Functions

DecayedAdagradParameterOptimizer(const OptimizationConfig &optConfig)
virtual void init(size_t numRows, const ParameterConfig *config)

For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual void finishBatch()

called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Protected Attributes

real rou_
real epsilon_
int64_t timer_

counting batches, donot need catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.

std::vector<int64_t> t0Vec_
class AdamParameterOptimizer
#include <FirstOrderOptimizer.h>

Adam Optimizer. Reference Paper: http://arxiv.org/abs/1412.6980 Algorithm 1

Inherits from paddle::ParameterOptimizer

Public Functions

AdamParameterOptimizer(const OptimizationConfig &optConfig)
virtual void finishBatch()

called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Protected Attributes

real beta1_
real beta2_
real epsilon_
int64_t step_
real learningRate_
class AdamaxParameterOptimizer
#include <FirstOrderOptimizer.h>

AdaMax Optimizer. Reference Paper: http://arxiv.org/abs/1412.6980 Algorithm 2

Inherits from paddle::ParameterOptimizer

Public Functions

AdamaxParameterOptimizer(const OptimizationConfig &optConfig)
virtual void finishBatch()

called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Protected Attributes

real beta1_
real beta2_
int64_t step_
real learningRate_
class AddOptimizer

Inherits from paddle::ParameterOptimizer

Public Functions

AddOptimizer(const OptimizationConfig &optConfig)
virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

class DummyOptimizer

Inherits from paddle::ParameterOptimizer

Public Functions

DummyOptimizer(const OptimizationConfig &optConfig)
virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

class OptimizerWithGradientClipping

Inherits from paddle::ParameterOptimizer

Public Functions

OptimizerWithGradientClipping(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer)
virtual void init(size_t numRows, const ParameterConfig *config)

For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startPass()
virtual void finishPass()
virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual void finishBatch()

called by Trainer after backward() of a batch

virtual TraverseCallback needSpecialTraversal(const ParameterConfig &config) const

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return
callback if need traverse, else return nullptr. It should be no state change.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual void setNoDecay()

Protected Attributes

std::unique_ptr<ParameterOptimizer> optimizer_
namespace paddle
class AverageOptimizer

Inherits from paddle::ParameterOptimizer

Subclassed by paddle::AverageSparseOptimizer

Public Functions

AverageOptimizer(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, bool useParameterApply)
virtual void init(size_t numRows, const ParameterConfig *config)

For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startPass()
virtual void finishPass()
virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual void finishBatch()

called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual ParameterOptimizer::TraverseCallback needSpecialTraversal(const ParameterConfig &config) const

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return
callback if need traverse, else return nullptr. It should be no state change.

virtual TraverseCallback startCatchUpWith() const

following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:

auto callback = startCatchUpWith();
if (callback) {
  // do catch up with, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : rows_in_block) {callback();}
  }
  // finish catch up with, main thread
  finishCatchUpWith();
}

Return
callback if need catch up with, else return nullptr. It should be no state change.

virtual void finishCatchUpWith()
virtual ParameterOptimizer::TraverseCallback apply()

following two hooks used by averager, apply to final parameter value (PARAMETER_VALUE or PARAMETER_APPLY).

restore() will restore orginal value if it apply to PARAMETER_VALUE. Caller must ensure it’s catched up with current time before apply.

Use returned callback same way as callback returned by ParameterOptimizer::needSpecialTraversal()

virtual ParameterOptimizer::TraverseCallback restore()
virtual void setNoDecay()

Public Static Functions

ParameterOptimizer *create(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, bool isParameterSparse = false, bool useParameterApply = false)

Protected Functions

void updateAverageWindowLimit()
bool isAverageWindowTooLong() const

Protected Attributes

std::unique_ptr<ParameterOptimizer> optimizer_
bool useApply_
int64_t numUpdates_
int64_t prevNumUpdates_
int64_t numAccumulates_
int64_t oldNumAccumulates_
int64_t minAverageWindow_
int64_t maxAverageWindow_

Protected Static Attributes

const int64_t kMaxNumAccumulates
class AverageSparseOptimizer

Inherits from paddle::AverageOptimizer

Public Functions

AverageSparseOptimizer(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, bool useParameterApply)
virtual void init(size_t numRows, const ParameterConfig *config)

For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void finishBatch()

called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

void catchUpWith(const VectorPtr vecs[], const ParameterConfig &paraConfig, size_t sparseId) const
virtual ParameterOptimizer::TraverseCallback startCatchUpWith() const

following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:

auto callback = startCatchUpWith();
if (callback) {
  // do catch up with, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : rows_in_block) {callback();}
  }
  // finish catch up with, main thread
  finishCatchUpWith();
}

Return
callback if need catch up with, else return nullptr. It should be no state change.

virtual void finishCatchUpWith()

Protected Attributes

int timer_

counting batches, clear after catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.

std::vector<int32_t> t0Vec_
namespace paddle
class ParameterOptimizer
#include <ParameterOptimizer.h>

Some member functions are set to const for two reasons:

  1. For sparse update thread safe: update(), traverse callback(const this) may be called many times, each time one row, and these function can be called parallelly by multi worker, to speed up large block.
  2. For predicate functions, needSpecialTraversal(), startCatchUpWith() may be called many times, should be no state change between calls.

Subclassed by paddle::AdaDeltaParameterOptimizer, paddle::AdagradParameterOptimizer, paddle::AdamaxParameterOptimizer, paddle::AdamParameterOptimizer, paddle::AddOptimizer, paddle::AverageOptimizer, paddle::DecayedAdagradParameterOptimizer, paddle::DummyOptimizer, paddle::OptimizerWithGradientClipping, paddle::OptimizerWithRegularizer, paddle::RMSPropParameterOptimizer, paddle::SgdOptimizer, paddle::SparseMomentumParameterOptimizer

Public Types

typedef std::function<void(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId)> TraverseCallback

Public Functions

ParameterOptimizer(const OptimizationConfig &optConfig)
real calcLearningRate(int64_t numSamplesProcessed, int64_t pass)
virtual ~ParameterOptimizer()
virtual void init(size_t numRows, const ParameterConfig *config)

For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startPass()
virtual void finishPass()
virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual TraverseCallback needSpecialTraversal(const ParameterConfig &config) const

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return
callback if need traverse, else return nullptr. It should be no state change.

virtual void finishBatch()

called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId = -1LU) const = 0

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual TraverseCallback startCatchUpWith() const

following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:

auto callback = startCatchUpWith();
if (callback) {
  // do catch up with, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : rows_in_block) {callback();}
  }
  // finish catch up with, main thread
  finishCatchUpWith();
}

Return
callback if need catch up with, else return nullptr. It should be no state change.

virtual void finishCatchUpWith()
virtual TraverseCallback apply()

following two hooks used by averager, apply to final parameter value (PARAMETER_VALUE or PARAMETER_APPLY).

restore() will restore orginal value if it apply to PARAMETER_VALUE. Caller must ensure it’s catched up with current time before apply.

Use returned callback same way as callback returned by ParameterOptimizer::needSpecialTraversal()

virtual TraverseCallback restore()
const std::vector<ParameterType> &getParameterTypes() const

return the parameter types used by this updater

void addParameterType(ParameterType type)
real getLearningRate() const
virtual void setNoDecay()

Public Static Functions

ParameterOptimizer *create(const OptimizationConfig &optConfig, bool inPserver = false)

Protected Types

typedef std::vector<ParameterOptimizer::TraverseCallback> TraverseCallbackVec

Protected Attributes

bool applyDecay_
const OptimizationConfig &optConfig_
std::vector<ParameterType> parameterTypes_
real learningRate_

global learning rate, init value is opt_config.learning_rate, sparse regularizer get this value per batch, after StartBatch() called so, if lr change in StartBatch, please assign to learningRate_

std::unique_ptr<LearningRateScheduler> learningRateScheduler_
int64_t pass_
bool firstTime_

Protected Static Functions

static TraverseCallback composeCallbacks(const TraverseCallbackVec &callbacks)
namespace paddle
class OptimizerWithRegularizer

Inherits from paddle::ParameterOptimizer

Subclassed by paddle::OptimizerWithRegularizerEveryNumBatches, paddle::OptimizerWithRegularizerSparse

Public Functions

OptimizerWithRegularizer(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, Regularizer *regularizer)
virtual void init(size_t numRows, const ParameterConfig *config)

For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startPass()
virtual void finishPass()
virtual void startBatch(int64_t numSamplesProcessed)

called by Trainer before forward() of a batch.

virtual void finishBatch()

called by Trainer after backward() of a batch

virtual TraverseCallback needSpecialTraversal(const ParameterConfig &config) const

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return
callback if need traverse, else return nullptr. It should be no state change.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Public Static Functions

ParameterOptimizer *create(const OptimizationConfig &optConfig, const ParameterConfig &paraConfig, bool isParameterSparse, bool inPserver)

Protected Attributes

std::unique_ptr<ParameterOptimizer> optimizer_
Regularizer *regularizer_
int timer_

counting batches, clear after catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.

class OptimizerWithRegularizerEveryNumBatches

Inherits from paddle::OptimizerWithRegularizer

Public Functions

OptimizerWithRegularizerEveryNumBatches(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, Regularizer *regularizer)
virtual void startPass()
virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual ParameterOptimizer::TraverseCallback needSpecialTraversal(const ParameterConfig &config) const

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return
callback if need traverse, else return nullptr. It should be no state change.

void doTraversal(const VectorPtr vecs[], const ParameterConfig &config) const
void catchUpWith(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const
virtual ParameterOptimizer::TraverseCallback startCatchUpWith() const

following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:

auto callback = startCatchUpWith();
if (callback) {
  // do catch up with, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : rows_in_block) {callback();}
  }
  // finish catch up with, main thread
  finishCatchUpWith();
}

Return
callback if need catch up with, else return nullptr. It should be no state change.

virtual void finishCatchUpWith()

Protected Functions

bool isRegularizationBatch(const ParameterConfig &config) const

Protected Attributes

int baseTimer_

recored the timer_ value while catchUpWith called.

class OptimizerWithRegularizerSparse

Inherits from paddle::OptimizerWithRegularizer

Public Functions

OptimizerWithRegularizerSparse(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, Regularizer *regularizer)
virtual void init(size_t numRows, const ParameterConfig *config)

For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const

between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

void catchUpWith(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const
virtual ParameterOptimizer::TraverseCallback startCatchUpWith() const

following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:

auto callback = startCatchUpWith();
if (callback) {
  // do catch up with, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : rows_in_block) {callback();}
  }
  // finish catch up with, main thread
  finishCatchUpWith();
}

Return
callback if need catch up with, else return nullptr. It should be no state change.

virtual void finishCatchUpWith()

Protected Attributes

std::vector<int32_t> t0Vec_

t0Vec_ are last occur time of i rows if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.