• D
    [PATCH] hugepage: Strict page reservation for hugepage inodes · b45b5bd6
    David Gibson 提交于
    These days, hugepages are demand-allocated at first fault time.  There's a
    somewhat dubious (and racy) heuristic when making a new mmap() to check if
    there are enough available hugepages to fully satisfy that mapping.
    
    A particularly obvious case where the heuristic breaks down is where a
    process maps its hugepages not as a single chunk, but as a bunch of
    individually mmap()ed (or shmat()ed) blocks without touching and
    instantiating the pages in between allocations.  In this case the size of
    each block is compared against the total number of available hugepages.
    It's thus easy for the process to become overcommitted, because each block
    mapping will succeed, although the total number of hugepages required by
    all blocks exceeds the number available.  In particular, this defeats such
    a program which will detect a mapping failure and adjust its hugepage usage
    downward accordingly.
    
    The patch below addresses this problem, by strictly reserving a number of
    physical hugepages for hugepage inodes which have been mapped, but not
    instatiated.  MAP_SHARED mappings are thus "safe" - they will fail on
    mmap(), not later with an OOM SIGKILL.  MAP_PRIVATE mappings can still
    trigger an OOM.  (Actually SHARED mappings can technically still OOM, but
    only if the sysadmin explicitly reduces the hugepage pool between mapping
    and instantiation)
    
    This patch appears to address the problem at hand - it allows DB2 to start
    correctly, for instance, which previously suffered the failure described
    above.
    
    This patch causes no regressions on the libhugetblfs testsuite, and makes a
    test (designed to catch this problem) pass which previously failed (ppc64,
    POWER5).
    Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
    Cc: William Lee Irwin III <wli@holomorphy.com>
    Signed-off-by: NAndrew Morton <akpm@osdl.org>
    Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
    b45b5bd6
inode.c 19.7 KB