1. 20 3月, 2012 1 次提交
  2. 13 1月, 2012 1 次提交
  3. 11 1月, 2012 1 次提交
    • M
      mm: avoid livelock on !__GFP_FS allocations · f90ac398
      Mel Gorman 提交于
      Colin Cross reported;
      
        Under the following conditions, __alloc_pages_slowpath can loop forever:
        gfp_mask & __GFP_WAIT is true
        gfp_mask & __GFP_FS is false
        reclaim and compaction make no progress
        order <= PAGE_ALLOC_COSTLY_ORDER
      
        These conditions happen very often during suspend and resume,
        when pm_restrict_gfp_mask() effectively converts all GFP_KERNEL
        allocations into __GFP_WAIT.
      
        The oom killer is not run because gfp_mask & __GFP_FS is false,
        but should_alloc_retry will always return true when order is less
        than PAGE_ALLOC_COSTLY_ORDER.
      
      In his fix, he avoided retrying the allocation if reclaim made no progress
      and __GFP_FS was not set.  The problem is that this would result in
      GFP_NOIO allocations failing that previously succeeded which would be very
      unfortunate.
      
      The big difference between GFP_NOIO and suspend converting GFP_KERNEL to
      behave like GFP_NOIO is that normally flushers will be cleaning pages and
      kswapd reclaims pages allowing GFP_NOIO to succeed after a short delay.
      The same does not necessarily apply during suspend as the storage device
      may be suspended.
      
      This patch special cases the suspend case to fail the page allocation if
      reclaim cannot make progress and adds some documentation on how
      gfp_allowed_mask is currently used.  Failing allocations like this may
      cause suspend to abort but that is better than a livelock.
      
      [mgorman@suse.de: Rework fix to be suspend specific]
      [rientjes@google.com: Move suspended device check to should_alloc_retry]
      Reported-by: NColin Cross <ccross@android.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f90ac398
  4. 01 11月, 2011 1 次提交
    • D
      oom: fix race while temporarily setting current's oom_score_adj · 43362a49
      David Rientjes 提交于
      test_set_oom_score_adj() was introduced in 72788c38 ("oom: replace
      PF_OOM_ORIGIN with toggling oom_score_adj") to temporarily elevate
      current's oom_score_adj for ksm and swapoff without requiring an
      additional per-process flag.
      
      Using that function to both set oom_score_adj to OOM_SCORE_ADJ_MAX and
      then reinstate the previous value is racy since it's possible that
      userspace can set the value to something else itself before the old value
      is reinstated.  That results in userspace setting current's oom_score_adj
      to a different value and then the kernel immediately setting it back to
      its previous value without notification.
      
      To fix this, a new compare_swap_oom_score_adj() function is introduced
      with the same semantics as the compare and swap CAS instruction, or
      CMPXCHG on x86.  It is used to reinstate the previous value of
      oom_score_adj if and only if the present value is the same as the old
      value.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43362a49
  5. 31 10月, 2011 1 次提交
  6. 04 8月, 2011 1 次提交
    • H
      mm: let swap use exceptional entries · a2c16d6c
      Hugh Dickins 提交于
      If swap entries are to be stored along with struct page pointers in a
      radix tree, they need to be distinguished as exceptional entries.
      
      Most of the handling of swap entries in radix tree will be contained in
      shmem.c, but a few functions in filemap.c's common code need to check
      for their appearance: find_get_page(), find_lock_page(),
      find_get_pages() and find_get_pages_contig().
      
      So as not to slow their fast paths, tuck those checks inside the
      existing checks for unlikely radix_tree_deref_slot(); except for
      find_lock_page(), where it is an added test.  And make it a BUG in
      find_get_pages_tag(), which is not applied to tmpfs files.
      
      A part of the reason for eliminating shmem_readpage() earlier, was to
      minimize the places where common code would need to allow for swap
      entries.
      
      The swp_entry_t known to swapfile.c must be massaged into a slightly
      different form when stored in the radix tree, just as it gets massaged
      into a pte_t when stored in page tables.
      
      In an i386 kernel this limits its information (type and page offset) to
      30 bits: given 32 "types" of swapfile and 4kB pagesize, that's a maximum
      swapfile size of 128GB.  Which is less than the 512GB we previously
      allowed with X86_PAE (where the swap entry can occupy the entire upper
      32 bits of a pte_t), but not a new limitation on 32-bit without PAE; and
      there's not a new limitation on 64-bit (where swap filesize is already
      limited to 16TB by a 32-bit page offset).  Thirty areas of 128GB is
      probably still enough swap for a 64GB 32-bit machine.
      
      Provide swp_to_radix_entry() and radix_to_swp_entry() conversions, and
      enforce filesize limit in read_swap_header(), just as for ptes.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2c16d6c
  7. 21 7月, 2011 1 次提交
  8. 28 6月, 2011 1 次提交
  9. 25 5月, 2011 1 次提交
    • D
      oom: replace PF_OOM_ORIGIN with toggling oom_score_adj · 72788c38
      David Rientjes 提交于
      There's a kernel-wide shortage of per-process flags, so it's always
      helpful to trim one when possible without incurring a significant penalty.
       It's even more important when you're planning on adding a per- process
      flag yourself, which I plan to do shortly for transparent hugepages.
      
      PF_OOM_ORIGIN is used by ksm and swapoff to prefer current since it has a
      tendency to allocate large amounts of memory and should be preferred for
      killing over other tasks.  We'd rather immediately kill the task making
      the errant syscall rather than penalizing an innocent task.
      
      This patch removes PF_OOM_ORIGIN since its behavior is equivalent to
      setting the process's oom_score_adj to OOM_SCORE_ADJ_MAX.
      
      The process's old oom_score_adj is stored and then set to
      OOM_SCORE_ADJ_MAX during the time it used to have PF_OOM_ORIGIN.  The old
      value is then reinstated when the process should no longer be considered a
      high priority for oom killing.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72788c38
  10. 24 3月, 2011 1 次提交
  11. 23 3月, 2011 27 次提交
  12. 10 3月, 2011 1 次提交
  13. 25 2月, 2011 1 次提交
  14. 14 1月, 2011 1 次提交
    • A
      thp: split_huge_page paging · 3f04f62f
      Andrea Arcangeli 提交于
      Paging logic that splits the page before it is unmapped and added to swap
      to ensure backwards compatibility with the legacy swap code.  Eventually
      swap should natively pageout the hugepages to increase performance and
      decrease seeking and fragmentation of swap space.  swapoff can just skip
      over huge pmd as they cannot be part of swap yet.  In add_to_swap be
      careful to split the page only if we got a valid swap entry so we don't
      split hugepages with a full swap.
      
      In theory we could split pages before isolating them during the lru scan,
      but for khugepaged to be safe, I'm relying on either mmap_sem write mode,
      or PG_lock taken, so split_huge_page has to run either with mmap_sem
      read/write mode or PG_lock taken.  Calling it from isolate_lru_page would
      make locking more complicated, in addition to that split_huge_page would
      deadlock if called by __isolate_lru_page because it has to take the lru
      lock to add the tail pages.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f04f62f