• L
    mempolicy: use MPOL_PREFERRED for system-wide default policy · bea904d5
    Lee Schermerhorn 提交于
    Currently, when one specifies MPOL_DEFAULT via a NUMA memory policy API
    [set_mempolicy(), mbind() and internal versions], the kernel simply installs a
    NULL struct mempolicy pointer in the appropriate context: task policy, vma
    policy, or shared policy.  This causes any use of that policy to "fall back"
    to the next most specific policy scope.
    
    The only use of MPOL_DEFAULT to mean "local allocation" is in the system
    default policy.  This requires extra checks/cases for MPOL_DEFAULT in many
    mempolicy.c functions.
    
    There is another, "preferred" way to specify local allocation via the APIs.
    That is using the MPOL_PREFERRED policy mode with an empty nodemask.
    Internally, the empty nodemask gets converted to a preferred_node id of '-1'.
    All internal usage of MPOL_PREFERRED will convert the '-1' to the id of the
    node local to the cpu where the allocation occurs.
    
    System default policy, except during boot, is hard-coded to "local
    allocation".  By using the MPOL_PREFERRED mode with a negative value of
    preferred node for system default policy, MPOL_DEFAULT will never occur in the
    'policy' member of a struct mempolicy.  Thus, we can remove all checks for
    MPOL_DEFAULT when converting policy to a node id/zonelist in the allocation
    paths.
    
    In slab_node() return local node id when policy pointer is NULL.  No need to
    set a pol value to take the switch default.  Replace switch default with
    BUG()--i.e., shouldn't happen.
    
    With this patch MPOL_DEFAULT is only used in the APIs, including internal
    calls to do_set_mempolicy() and in the display of policy in
    /proc/<pid>/numa_maps.  It always means "fall back" to the the next most
    specific policy scope.  This simplifies the description of memory policies
    quite a bit, with no visible change in behavior.
    
    get_mempolicy() continues to return MPOL_DEFAULT and an empty nodemask when
    the requested policy [task or vma/shared] is NULL.  These are the values one
    would supply via set_mempolicy() or mbind() to achieve that condition--default
    behavior.
    
    This patch updates Documentation to reflect this change.
    Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
    Cc: Christoph Lameter <clameter@sgi.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Mel Gorman <mel@csn.ul.ie>
    Cc: Andi Kleen <ak@suse.de>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    bea904d5
numa_memory_policy.txt 22.6 KB