未验证 提交 7f73ef2c 编写于 作者: S Sing_chan 提交者: GitHub

fix bfgs_doc (#41505)

* fix bfgs_doc; test=document_fix

* add parameter name; test=document_fix

* modify according to chenlong's comments;test=document_fix
上级 e0abb90b
...@@ -33,63 +33,43 @@ def minimize_bfgs(objective_func, ...@@ -33,63 +33,43 @@ def minimize_bfgs(objective_func,
name=None): name=None):
r""" r"""
Minimizes a differentiable function `func` using the BFGS method. Minimizes a differentiable function `func` using the BFGS method.
The BFGS is a quasi-Newton method for solving an unconstrained The BFGS is a quasi-Newton method for solving an unconstrained optimization problem over a differentiable function.
optimization problem over a differentiable function. Closely related is the Newton method for minimization. Consider the iterate update formula:
Closely related is the Newton method for minimization. Consider the iterate
update formula
.. math:: .. math::
x_{k+1} = x_{k} + H \nabla{f}, x_{k+1} = x_{k} + H_k \nabla{f_k}
If $H$ is the inverse Hessian of $f$ at $x_{k}$, then it's the Newton method.
If $H$ is symmetric and positive definite, used as an approximation of the inverse Hessian, then If :math:`H_k` is the inverse Hessian of :math:`f` at :math:`x_k`, then it's the Newton method.
If :math:`H_k` is symmetric and positive definite, used as an approximation of the inverse Hessian, then
it's a quasi-Newton. In practice, the approximated Hessians are obtained it's a quasi-Newton. In practice, the approximated Hessians are obtained
by only using the gradients, over either whole or part of the search by only using the gradients, over either whole or part of the search
history, the former is BFGS. history, the former is BFGS, the latter is L-BFGS.
Reference: Reference:
Jorge Nocedal, Stephen J. Wright, Numerical Optimization, Second Edition, 2006. Jorge Nocedal, Stephen J. Wright, Numerical Optimization, Second Edition, 2006. pp140: Algorithm 6.1 (BFGS Method).
pp140: Algorithm 6.1 (BFGS Method).
Following summarizes the the main logic of the program based on BFGS. Note: _k represents value of
k_th iteration, ^T represents the transposition of a vector or matrix.
repeat
p_k = H_k * g_k
alpha = strong_wolfe(f, x_k, p_k)
x_k+1 = x_k + alpha * p_k
s_k = x_k+1 - x_k
y_k = g_k+1 - g_k
rho_k = 1 / (s_k^T * y_k)
V_k^T = I - rho_k * s_k * y_k^T
V_k = I - rho_k * y_k * s_k^T
H_k+1 = V_k^T * H_k * V_k + rho_k * s_k * s_k^T
check_converge
end
Args: Args:
objective_func: the objective function to minimize. ``func`` accepts objective_func: the objective function to minimize. ``objective_func`` accepts a multivariate input and returns a scalar.
a multivariate input and returns a scalar. initial_position (Tensor): the starting point of the iterates.
initial_position (Tensor): the starting point of the iterates. For methods like Newton and quasi-Newton max_iters (int, optional): the maximum number of minimization iterations. Default value: 50.
the initial trial step length should always be 1.0. tolerance_grad (float, optional): terminates if the gradient norm is smaller than this. Currently gradient norm uses inf norm. Default value: 1e-7.
max_iters (int): the maximum number of minimization iterations. tolerance_change (float, optional): terminates if the change of function value/position/parameter between two iterations is smaller than this value. Default value: 1e-9.
tolerance_grad (float): terminates if the gradient norm is smaller than this. Currently gradient norm uses inf norm. initial_inverse_hessian_estimate (Tensor, optional): the initial inverse hessian approximation at initial_position. It must be symmetric and positive definite. Default value: None.
tolerance_change (float): terminates if the change of function value/position/parameter between line_search_fn (str, optional): indicate which line search method to use, only support 'strong wolfe' right now. May support 'Hager Zhang' in the futrue. Default value: 'strong wolfe'.
two iterations is smaller than this value. max_line_search_iters (int, optional): the maximum number of line search iterations. Default value: 50.
initial_inverse_hessian_estimate (Tensor): the initial inverse hessian approximation at initial_position. initial_step_length (float, optional): step length used in first iteration of line search. different initial_step_length may cause different optimal result. For methods like Newton and quasi-Newton the initial trial step length should always be 1.0. Default value: 1.0.
It must be symmetric and positive definite. dtype ('float32' | 'float64', optional): data type used in the algorithm. Default value: 'float32'.
line_search_fn (str): indicate which line search method to use, only support 'strong wolfe' right now. May support name (str, optional): Name for the operation. For more information, please refer to :ref:`api_guide_Name`. Default value: None.
'Hager Zhang' in the futrue.
max_line_search_iters (int): the maximum number of line search iterations.
initial_step_length (float): step length used in first iteration of line search. different initial_step_length
may cause different optimal result.
dtype ('float32' | 'float64'): In static graph, float64 will be convert to float32 due to paddle.assign limit.
Returns: Returns:
is_converge (bool): Indicates whether found the minimum within tolerance. output(tuple):
num_func_calls (int): number of objective function called.
position (Tensor): the position of the last iteration. If the search converged, this value is the argmin of - is_converge (bool): Indicates whether found the minimum within tolerance.
the objective function regrading to the initial position. - num_func_calls (int): number of objective function called.
objective_value (Tensor): objective function value at the `position`. - position (Tensor): the position of the last iteration. If the search converged, this value is the argmin of the objective function regrading to the initial position.
objective_gradient (Tensor): objective function gradient at the `position`. - objective_value (Tensor): objective function value at the `position`.
inverse_hessian_estimate (Tensor): the estimate of inverse hessian at the `position`. - objective_gradient (Tensor): objective function gradient at the `position`.
- inverse_hessian_estimate (Tensor): the estimate of inverse hessian at the `position`.
Examples: Examples:
.. code-block:: python .. code-block:: python
......
...@@ -32,54 +32,46 @@ def minimize_lbfgs(objective_func, ...@@ -32,54 +32,46 @@ def minimize_lbfgs(objective_func,
initial_step_length=1.0, initial_step_length=1.0,
dtype='float32', dtype='float32',
name=None): name=None):
r"""Minimizes a differentiable function `func` using the L-BFGS method. r"""
The L-BFGS is simalar as BFGS, the only difference is that L-BFGS use historical Minimizes a differentiable function `func` using the L-BFGS method.
sk, yk, rhok rather than H_k-1 to compute Hk. The L-BFGS is a quasi-Newton method for solving an unconstrained optimization problem over a differentiable function.
Closely related is the Newton method for minimization. Consider the iterate update formula:
.. math::
x_{k+1} = x_{k} + H_k \nabla{f_k}
If :math:`H_k` is the inverse Hessian of :math:`f` at :math:`x_k`, then it's the Newton method.
If :math:`H_k` is symmetric and positive definite, used as an approximation of the inverse Hessian, then
it's a quasi-Newton. In practice, the approximated Hessians are obtained
by only using the gradients, over either whole or part of the search
history, the former is BFGS, the latter is L-BFGS.
Reference: Reference:
Jorge Nocedal, Stephen J. Wright, Numerical Optimization, Second Edition, 2006. Jorge Nocedal, Stephen J. Wright, Numerical Optimization, Second Edition, 2006. pp179: Algorithm 7.5 (L-BFGS).
pp179: Algorithm 7.5 (L-BFGS).
Following summarizes the the main logic of the program based on L-BFGS.Note: _k represents
value of k_th iteration, ^T represents the transposition of a vector or matrix.
repeat
compute p_k by two-loop recursion
alpha = strong_wolfe(f, x_k, p_k)
x_k+1 = x_k + alpha * p_k
s_k = x_k+1 - x_k
y_k = g_k+1 - g_k
rho_k = 1 / (s_k^T * y_k)
update sk_vec, yk_vec, rhok_vec
check_converge
end
Args: Args:
objective_func: the objective function to minimize. ``func`` accepts objective_func: the objective function to minimize. ``objective_func`` accepts a multivariate input and returns a scalar.
a multivariate input and returns a scalar. initial_position (Tensor): the starting point of the iterates.
initial_position (Tensor): the starting point of the iterates. For methods like Newton and quasi-Newton history_size (Scalar): the number of stored vector pairs {si,yi}. Default value: 100.
the initial trial step length should always be 1.0 . max_iters (int, optional): the maximum number of minimization iterations. Default value: 50.
history_size (Scalar): the number of stored vector pairs {si,yi}. tolerance_grad (float, optional): terminates if the gradient norm is smaller than this. Currently gradient norm uses inf norm. Default value: 1e-7.
max_iters (Scalar): the maximum number of minimization iterations. tolerance_change (float, optional): terminates if the change of function value/position/parameter between two iterations is smaller than this value. Default value: 1e-9.
tolerance_grad (Scalar): terminates if the gradient norm is smaller than initial_inverse_hessian_estimate (Tensor, optional): the initial inverse hessian approximation at initial_position. It must be symmetric and positive definite. Default value: None.
this. Currently gradient norm uses inf norm. line_search_fn (str, optional): indicate which line search method to use, only support 'strong wolfe' right now. May support 'Hager Zhang' in the futrue. Default value: 'strong wolfe'.
tolerance_change (Scalar): terminates if the change of function value/position/parameter between max_line_search_iters (int, optional): the maximum number of line search iterations. Default value: 50.
two iterations is smaller than this value. initial_step_length (float, optional): step length used in first iteration of line search. different initial_step_length may cause different optimal result. For methods like Newton and quasi-Newton the initial trial step length should always be 1.0. Default value: 1.0.
initial_inverse_hessian_estimate (Tensor): the initial inverse hessian approximation. dtype ('float32' | 'float64', optional): data type used in the algorithm. Default value: 'float32'.
line_search_fn (str): indicate which line search method to use, only support 'strong wolfe' right now. May support name (str, optional): Name for the operation. For more information, please refer to :ref:`api_guide_Name`. Default value: None.
'Hager Zhang' in the futrue.
max_line_search_iters (Scalar): the maximum number of line search iterations.
initial_step_length: step length used in first iteration of line search. different initial_step_length
may cause different optimal result.
dtype ('float' | 'float32' | 'float64' | 'double'): the data
type to be used.
Returns: Returns:
is_converge (bool): Indicates whether found the minimum within tolerance. output(tuple):
num_func_calls (int): number of objective function called.
position (Tensor): the position of the last iteration. If the search converged, this value is the argmin of
the objective function regrading to the initial position.
objective_value (Tensor): objective function value at the `position`.
objective_gradient (Tensor): objective function gradient at the `position`.
- is_converge (bool): Indicates whether found the minimum within tolerance.
- num_func_calls (int): number of objective function called.
- position (Tensor): the position of the last iteration. If the search converged, this value is the argmin of the objective function regrading to the initial position.
- objective_value (Tensor): objective function value at the `position`.
- objective_gradient (Tensor): objective function gradient at the `position`.
Examples: Examples:
.. code-block:: python .. code-block:: python
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册