Optimizer¶

namespace paddle¶

class AdaDeltaParameterOptimizer¶

Inherits from paddle::ParameterOptimizer

Public Functions

AdaDeltaParameterOptimizer(const OptimizationConfig &optConfig)¶

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Protected Attributes

real rou_¶

real epsilon_¶

class AdagradParameterOptimizer¶

Inherits from paddle::ParameterOptimizer

Public Functions

AdagradParameterOptimizer(const OptimizationConfig &optConfig)¶

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual ParameterOptimizer::TraverseCallback needSpecialTraversal(const ParameterConfig &config) const¶

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return: callback if need traverse, else return nullptr. It should be no state change.

Protected Attributes

int64_t numUpdates_¶

Protected Static Attributes

const int64_t kMaxNumAccumulates¶

class AdamaxParameterOptimizer¶

#include <FirstOrderOptimizer.h>

AdaMax Optimizer. Reference Paper: http://arxiv.org/abs/1412.6980 Algorithm 2

Inherits from paddle::ParameterOptimizer

Public Functions

AdamaxParameterOptimizer(const OptimizationConfig &optConfig)¶

virtual void finishBatch()¶: called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Protected Attributes

real beta1_¶

real beta2_¶

int64_t step_¶

real learningRate_¶

class AdamParameterOptimizer¶

#include <FirstOrderOptimizer.h>

Adam Optimizer. Reference Paper: http://arxiv.org/abs/1412.6980 Algorithm 1

Inherits from paddle::ParameterOptimizer

Public Functions

AdamParameterOptimizer(const OptimizationConfig &optConfig)¶

virtual void finishBatch()¶: called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Protected Attributes

real beta1_¶

real beta2_¶

real epsilon_¶

int64_t step_¶

real learningRate_¶

class AddOptimizer¶

Inherits from paddle::ParameterOptimizer

Public Functions

AddOptimizer(const OptimizationConfig &optConfig)¶

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

class DecayedAdagradParameterOptimizer¶

Inherits from paddle::ParameterOptimizer

Public Functions

DecayedAdagradParameterOptimizer(const OptimizationConfig &optConfig)¶

virtual void init(size_t numRows, const ParameterConfig *config)¶: For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual void finishBatch()¶: called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Protected Attributes

real rou_¶

real epsilon_¶

int64_t timer_¶: counting batches, donot need catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.

std::vector<int64_t> t0Vec_¶

class DummyOptimizer¶

Inherits from paddle::ParameterOptimizer

Public Functions

DummyOptimizer(const OptimizationConfig &optConfig)¶

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

class OptimizerWithGradientClipping¶

Inherits from paddle::ParameterOptimizer

Public Functions

OptimizerWithGradientClipping(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer)¶

virtual void init(size_t numRows, const ParameterConfig *config)¶: For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startPass()¶

virtual void finishPass()¶

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual void finishBatch()¶: called by Trainer after backward() of a batch

virtual TraverseCallback needSpecialTraversal(const ParameterConfig &config) const¶

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return: callback if need traverse, else return nullptr. It should be no state change.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual void setNoDecay()¶

Protected Attributes

std::unique_ptr<ParameterOptimizer> optimizer_¶

class RMSPropParameterOptimizer¶

Inherits from paddle::ParameterOptimizer

Public Functions

RMSPropParameterOptimizer(const OptimizationConfig &optConfig)¶

virtual void init(size_t numRows, const ParameterConfig *config)¶: For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual void finishBatch()¶: called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Protected Attributes

real rou_¶

real epsilon_¶

int64_t timer_¶: counting batches, donot need catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.

std::vector<int64_t> t0Vec_¶

class SgdOptimizer¶

Inherits from paddle::ParameterOptimizer

Public Functions

SgdOptimizer(const OptimizationConfig &optConfig)¶

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual void finishBatch()¶: called by Trainer after backward() of a batch

class SparseMomentumParameterOptimizer¶

Inherits from paddle::ParameterOptimizer

Public Functions

SparseMomentumParameterOptimizer(const OptimizationConfig &optConfig)¶

virtual void init(size_t numRows, const ParameterConfig *config)¶: For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual ParameterOptimizer::TraverseCallback needSpecialTraversal(const ParameterConfig &config) const¶

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return: callback if need traverse, else return nullptr. It should be no state change.

virtual void finishBatch()¶: called by Trainer after backward() of a batch

Protected Attributes

int64_t timer_¶

std::vector<int64_t> t0Vec_¶

bool isParameterSparse_¶

Private Members

real alpha_¶

real beta_¶

real tau_¶

real gamma_¶

real threshold_¶

real momentum_¶

real decayRate_¶

namespace paddle¶

class AverageOptimizer¶

Inherits from paddle::ParameterOptimizer

Subclassed by paddle::AverageSparseOptimizer

Public Functions

AverageOptimizer(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, bool useParameterApply)¶

virtual void init(size_t numRows, const ParameterConfig *config)¶: For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startPass()¶

virtual void finishPass()¶

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual void finishBatch()¶: called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual ParameterOptimizer::TraverseCallback needSpecialTraversal(const ParameterConfig &config) const¶

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return: callback if need traverse, else return nullptr. It should be no state change.

virtual TraverseCallback startCatchUpWith() const¶

following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:

auto callback = startCatchUpWith();
if (callback) {
  // do catch up with, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : rows_in_block) {callback();}
  }
  // finish catch up with, main thread
  finishCatchUpWith();
}

Return: callback if need catch up with, else return nullptr. It should be no state change.

virtual void finishCatchUpWith()¶

virtual ParameterOptimizer::TraverseCallback apply()¶

following two hooks used by averager, apply to final parameter value (PARAMETER_VALUE or PARAMETER_APPLY).

restore() will restore orginal value if it apply to PARAMETER_VALUE. Caller must ensure it’s catched up with current time before apply.

Use returned callback same way as callback returned by ParameterOptimizer::needSpecialTraversal()

virtual ParameterOptimizer::TraverseCallback restore()¶

virtual void setNoDecay()¶

Public Static Functions

ParameterOptimizer *create(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, bool isParameterSparse = false, bool useParameterApply = false)¶

Protected Functions

void updateAverageWindowLimit()¶

bool isAverageWindowTooLong() const¶

Protected Attributes

std::unique_ptr<ParameterOptimizer> optimizer_¶

bool useApply_¶

int64_t numUpdates_¶

int64_t prevNumUpdates_¶

int64_t numAccumulates_¶

int64_t oldNumAccumulates_¶

int64_t minAverageWindow_¶

int64_t maxAverageWindow_¶

Protected Static Attributes

const int64_t kMaxNumAccumulates¶

class AverageSparseOptimizer¶

Inherits from paddle::AverageOptimizer

Public Functions

AverageSparseOptimizer(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, bool useParameterApply)¶

virtual void init(size_t numRows, const ParameterConfig *config)¶: For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void finishBatch()¶: called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

void catchUpWith(const VectorPtr vecs[], const ParameterConfig &paraConfig, size_t sparseId) const¶

virtual ParameterOptimizer::TraverseCallback startCatchUpWith() const¶

following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:

auto callback = startCatchUpWith();
if (callback) {
  // do catch up with, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : rows_in_block) {callback();}
  }
  // finish catch up with, main thread
  finishCatchUpWith();
}

Return: callback if need catch up with, else return nullptr. It should be no state change.

virtual void finishCatchUpWith()¶

Protected Attributes

int timer_¶: counting batches, clear after catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.

std::vector<int32_t> t0Vec_¶

namespace paddle¶

class ParameterOptimizer¶

#include <ParameterOptimizer.h>

Some member functions are set to const for two reasons:

For sparse update thread safe: update(), traverse callback(const this) may be called many times, each time one row, and these function can be called parallelly by multi worker, to speed up large block.
For predicate functions, needSpecialTraversal(), startCatchUpWith() may be called many times, should be no state change between calls.

Subclassed by paddle::AdaDeltaParameterOptimizer, paddle::AdagradParameterOptimizer, paddle::AdamaxParameterOptimizer, paddle::AdamParameterOptimizer, paddle::AddOptimizer, paddle::AverageOptimizer, paddle::DecayedAdagradParameterOptimizer, paddle::DummyOptimizer, paddle::OptimizerWithGradientClipping, paddle::OptimizerWithRegularizer, paddle::RMSPropParameterOptimizer, paddle::SgdOptimizer, paddle::SparseMomentumParameterOptimizer

Public Types

typedef std::function<void(const VectorPtr vecs[], const ParameterConfig& config, size_t sparseId)> TraverseCallback¶

Public Functions

ParameterOptimizer(const OptimizationConfig &optConfig)¶

real calcLearningRate(int64_t numSamplesProcessed, int64_t pass)¶

virtual ~ParameterOptimizer()¶

virtual void init(size_t numRows, const ParameterConfig *config)¶: For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startPass()¶

virtual void finishPass()¶

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual TraverseCallback needSpecialTraversal(const ParameterConfig &config) const¶

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return: callback if need traverse, else return nullptr. It should be no state change.

virtual void finishBatch()¶: called by Trainer after backward() of a batch

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId = -1LU) const = 0¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual TraverseCallback startCatchUpWith() const¶

following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:

auto callback = startCatchUpWith();
if (callback) {
  // do catch up with, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : rows_in_block) {callback();}
  }
  // finish catch up with, main thread
  finishCatchUpWith();
}

Return: callback if need catch up with, else return nullptr. It should be no state change.

virtual void finishCatchUpWith()¶

virtual TraverseCallback apply()¶

following two hooks used by averager, apply to final parameter value (PARAMETER_VALUE or PARAMETER_APPLY).

restore() will restore orginal value if it apply to PARAMETER_VALUE. Caller must ensure it’s catched up with current time before apply.

Use returned callback same way as callback returned by ParameterOptimizer::needSpecialTraversal()

virtual TraverseCallback restore()¶

const std::vector<ParameterType> &getParameterTypes() const¶: return the parameter types used by this updater

void addParameterType(ParameterType type)¶

real getLearningRate() const¶

virtual void setNoDecay()¶

Public Static Functions

ParameterOptimizer *create(const OptimizationConfig &optConfig, bool inPserver = false)¶

Protected Types

typedef std::vector<ParameterOptimizer::TraverseCallback> TraverseCallbackVec¶

Protected Attributes

bool applyDecay_¶

const OptimizationConfig &optConfig_¶

std::vector<ParameterType> parameterTypes_¶

real learningRate_¶: global learning rate, init value is opt_config.learning_rate, sparse regularizer get this value per batch, after StartBatch() called so, if lr change in StartBatch, please assign to learningRate_

std::unique_ptr<LearningRateScheduler> learningRateScheduler_¶

int64_t pass_¶

bool firstTime_¶

Protected Static Functions

static TraverseCallback composeCallbacks(const TraverseCallbackVec &callbacks)¶

namespace paddle¶

class OptimizerWithRegularizer¶

Inherits from paddle::ParameterOptimizer

Subclassed by paddle::OptimizerWithRegularizerEveryNumBatches, paddle::OptimizerWithRegularizerSparse

Public Functions

OptimizerWithRegularizer(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, Regularizer *regularizer)¶

virtual void init(size_t numRows, const ParameterConfig *config)¶: For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void startPass()¶

virtual void finishPass()¶

virtual void startBatch(int64_t numSamplesProcessed)¶: called by Trainer before forward() of a batch.

virtual void finishBatch()¶: called by Trainer after backward() of a batch

virtual TraverseCallback needSpecialTraversal(const ParameterConfig &config) const¶

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return: callback if need traverse, else return nullptr. It should be no state change.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

Public Static Functions

ParameterOptimizer *create(const OptimizationConfig &optConfig, const ParameterConfig &paraConfig, bool isParameterSparse, bool inPserver)¶

Protected Attributes

std::unique_ptr<ParameterOptimizer> optimizer_¶

Regularizer *regularizer_¶

int timer_¶: counting batches, clear after catch up with t(timer_) is current time, t0(t0Vec_) are last occur time of i rows. if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.

class OptimizerWithRegularizerEveryNumBatches¶

Inherits from paddle::OptimizerWithRegularizer

Public Functions

OptimizerWithRegularizerEveryNumBatches(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, Regularizer *regularizer)¶

virtual void startPass()¶

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

virtual ParameterOptimizer::TraverseCallback needSpecialTraversal(const ParameterConfig &config) const¶

following hooks useful for sparse update, because the traversal in block costs. called by Trainer after update and before finishBatch e.g. Trainer call like this:

startBatch();
if (dense) {
  update(blockVec);
} else {//sparse
  for (row : rows_in_block) {update(rowVec)}
}
auto callback = needSpecialTraversal();
if (callback) {
  // do traverse, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : all_rows_in_block) {callback();}
  }
}
finishBatch();

Return: callback if need traverse, else return nullptr. It should be no state change.

void doTraversal(const VectorPtr vecs[], const ParameterConfig &config) const¶

void catchUpWith(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶

virtual ParameterOptimizer::TraverseCallback startCatchUpWith() const¶

following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:

auto callback = startCatchUpWith();
if (callback) {
  // do catch up with, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : rows_in_block) {callback();}
  }
  // finish catch up with, main thread
  finishCatchUpWith();
}

Return: callback if need catch up with, else return nullptr. It should be no state change.

virtual void finishCatchUpWith()¶

Protected Functions

bool isRegularizationBatch(const ParameterConfig &config) const¶

Protected Attributes

int baseTimer_¶: recored the timer_ value while catchUpWith called.

class OptimizerWithRegularizerSparse¶

Inherits from paddle::OptimizerWithRegularizer

Public Functions

OptimizerWithRegularizerSparse(const OptimizationConfig &optConfig, ParameterOptimizer *optimizer, Regularizer *regularizer)¶

virtual void init(size_t numRows, const ParameterConfig *config)¶: For sparse update, optimizer can maintain numRows of timer(t0). Some sparse optimizer depends on parameter config in functions such as startBatch(). Optimizer can get it here. But notice that, not all callers can pass config here, so the optimizer should check config passed in is not null ptr.

virtual void update(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶: between startBatch() and finishBatch(), update() will be called by the trainer multiple times, each time for updating one Parameter with its gradient in PARAMETER_GRADIENT. sparseId is row id, when sparseId set, update is sparse, each time one row.

void catchUpWith(const VectorPtr vecs[], const ParameterConfig &config, size_t sparseId) const¶

virtual ParameterOptimizer::TraverseCallback startCatchUpWith() const¶

following hooks catch up with current time for sparse update, In the beginning, call startCatchUpWith() and check return. In the end, call finishCatchUpWith() to finish state. callback do the actual works, can call many times for sparse data. e.g. Trainer call like this:

auto callback = startCatchUpWith();
if (callback) {
  // do catch up with, maybe multi-thread
  if (dense) {
    callback();
  } else {//sparse
    for (row : rows_in_block) {callback();}
  }
  // finish catch up with, main thread
  finishCatchUpWith();
}

Return: callback if need catch up with, else return nullptr. It should be no state change.

virtual void finishCatchUpWith()¶

Protected Attributes

std::vector<int32_t> t0Vec_¶: t0Vec_ are last occur time of i rows if one block is update by multi threads, caller should hash sparse ids to avoid write conflict in t0Vec_.

Optimizer¶

Previous topic

Next topic

This Page