Why l1weight has impact on convergence while sgd optimization was used.
Created by: backyes
settings(
#learning_rate_decay_a = 1e-05,
#learning_rate_decay_b = 0.0,
learning_rate = 0.1,
batch_size = batch_size,
algorithm = 'sgd',
#l1weight=0.1,
#num_batches_per_send_parameter = 1,
learning_method = 'rmsprop',
)
Generally, l1weigh is specified for owlqn, but with setting above, l1weigh indeed has influence on convergence .
With l1weight=0.1
I /home/wangyanfei/baidu/idl/paddle/paddle/trainer/Tester.cpp:111] Test samples=10877807 cost=0.0634097 Eval: __auc_evaluator_0__=0.641067
..............I /home/wangyanfei/baidu/idl/paddle/paddle/trainer/TrainerInternal.cpp:179] Pass=0 Batch=1094 samples=43721520 AvgCost=0.0665462 Eval: __auc_evaluator_0__=0.620773
I /home/wangyanfei/baidu/idl/paddle/paddle/trainer/Tester.cpp:111] Test samples=10877807 cost=0.0635014 Eval: __auc_evaluator_0__=0.634032
With l1weigh=0.1 is removed:
I /home/wangyanfei/baidu/idl/paddle/paddle/trainer/Tester.cpp:111] Test samples=10877807 cost=0.0629021 Eval: __auc_evaluator_0__=0.65267
..............I /home/wangyanfei/baidu/idl/paddle/paddle/trainer/TrainerInternal.cpp:179] Pass=0 Batch=1094 samples=43721520 AvgCost=0.0658823 Eval: __auc_evaluator_0__=0.619865
I /home/wangyanfei/baidu/idl/paddle/paddle/trainer/Tester.cpp:111] Test samples=10877807 cost=0.0640271 Eval: __auc_evaluator_0__=0.600817
need more TEST to verify this conclusion...