1. 21 5月, 2016 10 次提交
    • M
      mm: use existing helper to convert "on"/"off" to boolean · 2a138dc7
      Minfei Huang 提交于
      It's more convenient to use existing function helper to convert string
      "on/off" to boolean.
      
      Link: http://lkml.kernel.org/r/1461908824-16129-1-git-send-email-mnghuan@gmail.comSigned-off-by: NMinfei Huang <mnghuan@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a138dc7
    • M
      mm, oom: protect !costly allocations some more for !CONFIG_COMPACTION · 31e49bfd
      Michal Hocko 提交于
      Joonsoo has reported that he is able to trigger OOM for !costly high
      order requests (heavy fork() workload close the OOM) with the new oom
      detection rework.  This is because we rely only on should_reclaim_retry
      when the compaction is disabled and it only checks watermarks for the
      requested order and so we might trigger OOM when there is a lot of free
      memory.
      
      It is not very clear what are the usual workloads when the compaction is
      disabled.  Relying on high order allocations heavily without any
      mechanism to create those orders except for unbound amount of reclaim is
      certainly not a good idea.
      
      To prevent from potential regressions let's help this configuration
      some.  We have to sacrifice the determinsm though because there simply
      is none here possible.  should_compact_retry implementation for
      !CONFIG_COMPACTION, which was empty so far, will do watermark check for
      order-0 on all eligible zones.  This will cause retrying until either
      the reclaim cannot make any further progress or all the zones are
      depleted even for order-0 pages.  This means that the number of retries
      is basically unbounded for !costly orders but that was the case before
      the rework as well so this shouldn't regress.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/1463051677-29418-3-git-send-email-mhocko@kernel.orgReported-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31e49bfd
    • M
      mm, oom, compaction: prevent from should_compact_retry looping for ever for costly orders · 86a294a8
      Michal Hocko 提交于
      "mm: consider compaction feedback also for costly allocation" has
      removed the upper bound for the reclaim/compaction retries based on the
      number of reclaimed pages for costly orders.  While this is desirable
      the patch did miss a mis interaction between reclaim, compaction and the
      retry logic.  The direct reclaim tries to get zones over min watermark
      while compaction backs off and returns COMPACT_SKIPPED when all zones
      are below low watermark + 1<<order gap.  If we are getting really close
      to OOM then __compaction_suitable can keep returning COMPACT_SKIPPED a
      high order request (e.g.  hugetlb order-9) while the reclaim is not able
      to release enough pages to get us over low watermark.  The reclaim is
      still able to make some progress (usually trashing over few remaining
      pages) so we are not able to break out from the loop.
      
      I have seen this happening with the same test described in "mm: consider
      compaction feedback also for costly allocation" on a swapless system.
      The original problem got resolved by "vmscan: consider classzone_idx in
      compaction_ready" but it shows how things might go wrong when we
      approach the oom event horizont.
      
      The reason why compaction requires being over low rather than min
      watermark is not clear to me.  This check was there essentially since
      56de7263 ("mm: compaction: direct compact when a high-order
      allocation fails").  It is clearly an implementation detail though and
      we shouldn't pull it into the generic retry logic while we should be
      able to cope with such eventuality.  The only place in
      should_compact_retry where we retry without any upper bound is for
      compaction_withdrawn() case.
      
      Introduce compaction_zonelist_suitable function which checks the given
      zonelist and returns true only if there is at least one zone which would
      would unblock __compaction_suitable if more memory got reclaimed.  In
      this implementation it checks __compaction_suitable with NR_FREE_PAGES
      plus part of the reclaimable memory as the target for the watermark
      check.  The reclaimable memory is reduced linearly by the allocation
      order.  The idea is that we do not want to reclaim all the remaining
      memory for a single allocation request just unblock
      __compaction_suitable which doesn't guarantee we will make a further
      progress.
      
      The new helper is then used if compaction_withdrawn() feedback was
      provided so we do not retry if there is no outlook for a further
      progress.  !costly requests shouldn't be affected much - e.g.  order-2
      pages would require to have at least 64kB on the reclaimable LRUs while
      order-9 would need at least 32M which should be enough to not lock up.
      
      [vbabka@suse.cz: fix classzone_idx vs. high_zoneidx usage in compaction_zonelist_suitable]
      [akpm@linux-foundation.org: fix it for Mel's mm-page_alloc-remove-field-from-alloc_context.patch]
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      86a294a8
    • M
      mm: consider compaction feedback also for costly allocation · 7854ea6c
      Michal Hocko 提交于
      PAGE_ALLOC_COSTLY_ORDER retry logic is mostly handled inside
      should_reclaim_retry currently where we decide to not retry after at
      least order worth of pages were reclaimed or the watermark check for at
      least one zone would succeed after reclaiming all pages if the reclaim
      hasn't made any progress.  Compaction feedback is mostly ignored and we
      just try to make sure that the compaction did at least something before
      giving up.
      
      The first condition was added by a41f24ea ("page allocator: smarter
      retry of costly-order allocations) and it assumed that lumpy reclaim
      could have created a page of the sufficient order.  Lumpy reclaim, has
      been removed quite some time ago so the assumption doesn't hold anymore.
      Remove the check for the number of reclaimed pages and rely on the
      compaction feedback solely.  should_reclaim_retry now only makes sure
      that we keep retrying reclaim for high order pages only if they are
      hidden by watermaks so order-0 reclaim makes really sense.
      
      should_compact_retry now keeps retrying even for the costly allocations.
      The number of retries is reduced wrt.  !costly requests because they are
      less important and harder to grant and so their pressure shouldn't cause
      contention for other requests or cause an over reclaim.  We also do not
      reset no_progress_loops for costly request to make sure we do not keep
      reclaiming too agressively.
      
      This has been tested by running a process which fragments memory:
      	- compact memory
      	- mmap large portion of the memory (1920M on 2GRAM machine with 2G
      	  of swapspace)
      	- MADV_DONTNEED single page in PAGE_SIZE*((1UL<<MAX_ORDER)-1)
      	  steps until certain amount of memory is freed (250M in my test)
      	  and reduce the step to (step / 2) + 1 after reaching the end of
      	  the mapping
      	- then run a script which populates the page cache 2G (MemTotal)
      	  from /dev/zero to a new file
      And then tries to allocate
      nr_hugepages=$(awk '/MemAvailable/{printf "%d\n", $2/(2*1024)}' /proc/meminfo)
      huge pages.
      
      root@test1:~# echo 1 > /proc/sys/vm/overcommit_memory;echo 1 > /proc/sys/vm/compact_memory; ./fragment-mem-and-run /root/alloc_hugepages.sh 1920M 250M
      Node 0, zone      DMA     31     28     31     10      2      0      2      1      2      3      1
      Node 0, zone    DMA32    437    319    171     50     28     25     20     16     16     14    437
      
      * This is the /proc/buddyinfo after the compaction
      
      Done fragmenting. size=2013265920 freed=262144000
      Node 0, zone      DMA    165     48      3      1      2      0      2      2      2      2      0
      Node 0, zone    DMA32  35109  14575    185     51     41     12      6      0      0      0      0
      
      * /proc/buddyinfo after memory got fragmented
      
      Executing "/root/alloc_hugepages.sh"
      Eating some pagecache
      508623+0 records in
      508623+0 records out
      2083319808 bytes (2.1 GB) copied, 11.7292 s, 178 MB/s
      Node 0, zone      DMA      3      5      3      1      2      0      2      2      2      2      0
      Node 0, zone    DMA32    111    344    153     20     24     10      3      0      0      0      0
      
      * /proc/buddyinfo after page cache got eaten
      
      Trying to allocate 129
      129
      
      * 129 hugepages requested and all of them granted.
      
      Node 0, zone      DMA      3      5      3      1      2      0      2      2      2      2      0
      Node 0, zone    DMA32    127     97     30     99     11      6      2      1      4      0      0
      
      * /proc/buddyinfo after hugetlb allocation.
      
      10 runs will behave as follows:
      Trying to allocate 130
      130
      --
      Trying to allocate 129
      129
      --
      Trying to allocate 128
      128
      --
      Trying to allocate 129
      129
      --
      Trying to allocate 128
      128
      --
      Trying to allocate 129
      129
      --
      Trying to allocate 132
      132
      --
      Trying to allocate 129
      129
      --
      Trying to allocate 128
      128
      --
      Trying to allocate 129
      129
      
      So basically 100% success for all 10 attempts.
      Without the patch numbers looked much worse:
      Trying to allocate 128
      12
      --
      Trying to allocate 129
      14
      --
      Trying to allocate 129
      7
      --
      Trying to allocate 129
      16
      --
      Trying to allocate 129
      30
      --
      Trying to allocate 129
      38
      --
      Trying to allocate 129
      19
      --
      Trying to allocate 129
      37
      --
      Trying to allocate 129
      28
      --
      Trying to allocate 129
      37
      
      Just for completness the base kernel without oom detection rework looks
      as follows:
      Trying to allocate 127
      30
      --
      Trying to allocate 129
      12
      --
      Trying to allocate 129
      52
      --
      Trying to allocate 128
      32
      --
      Trying to allocate 129
      12
      --
      Trying to allocate 129
      10
      --
      Trying to allocate 129
      32
      --
      Trying to allocate 128
      14
      --
      Trying to allocate 128
      16
      --
      Trying to allocate 129
      8
      
      As we can see the success rate is much more volatile and smaller without
      this patch. So the patch not only makes the retry logic for costly
      requests more sensible the success rate is even higher.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7854ea6c
    • M
      mm, oom: protect !costly allocations some more · 33c2d214
      Michal Hocko 提交于
      should_reclaim_retry will give up retries for higher order allocations
      if none of the eligible zones has any requested or higher order pages
      available even if we pass the watermak check for order-0.  This is done
      because there is no guarantee that the reclaimable and currently free
      pages will form the required order.
      
      This can, however, lead to situations where the high-order request (e.g.
      order-2 required for the stack allocation during fork) will trigger OOM
      too early - e.g.  after the first reclaim/compaction round.  Such a
      system would have to be highly fragmented and there is no guarantee
      further reclaim/compaction attempts would help but at least make sure
      that the compaction was active before we go OOM and keep retrying even
      if should_reclaim_retry tells us to oom if
      
      	- the last compaction round backed off or
      	- we haven't completed at least MAX_COMPACT_RETRIES active
      	  compaction rounds.
      
      The first rule ensures that the very last attempt for compaction was not
      ignored while the second guarantees that the compaction has done some
      work.  Multiple retries might be needed to prevent occasional pigggy
      backing of other contexts to steal the compacted pages before the
      current context manages to retry to allocate them.
      
      compaction_failed() is taken as a final word from the compaction that
      the retry doesn't make much sense.  We have to be careful though because
      the first compaction round is MIGRATE_ASYNC which is rather weak as it
      ignores pages under writeback and gives up too easily in other
      situations.  We therefore have to make sure that MIGRATE_SYNC_LIGHT mode
      has been used before we give up.  With this logic in place we do not
      have to increase the migration mode unconditionally and rather do it
      only if the compaction failed for the weaker mode.  A nice side effect
      is that the stronger migration mode is used only when really needed so
      this has a potential of smaller latencies in some cases.
      
      Please note that the compaction doesn't tell us much about how
      successful it was when returning compaction_made_progress so we just
      have to blindly trust that another retry is worthwhile and cap the
      number to something reasonable to guarantee a convergence.
      
      If the given number of successful retries is not sufficient for a
      reasonable workloads we should focus on the collected compaction
      tracepoints data and try to address the issue in the compaction code.
      If this is not feasible we can increase the retries limit.
      
      [mhocko@suse.com: fix warning]
        Link: http://lkml.kernel.org/r/20160512061636.GA4200@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33c2d214
    • M
      mm: throttle on IO only when there are too many dirty and writeback pages · ede37713
      Michal Hocko 提交于
      wait_iff_congested has been used to throttle allocator before it retried
      another round of direct reclaim to allow the writeback to make some
      progress and prevent reclaim from looping over dirty/writeback pages
      without making any progress.
      
      We used to do congestion_wait before commit 0e093d99 ("writeback: do
      not sleep on the congestion queue if there are no congested BDIs or if
      significant congestion is not being encountered in the current zone")
      but that led to undesirable stalls and sleeping for the full timeout
      even when the BDI wasn't congested.  Hence wait_iff_congested was used
      instead.
      
      But it seems that even wait_iff_congested doesn't work as expected.  We
      might have a small file LRU list with all pages dirty/writeback and yet
      the bdi is not congested so this is just a cond_resched in the end and
      can end up triggering pre mature OOM.
      
      This patch replaces the unconditional wait_iff_congested by
      congestion_wait which is executed only if we _know_ that the last round
      of direct reclaim didn't make any progress and dirty+writeback pages are
      more than a half of the reclaimable pages on the zone which might be
      usable for our target allocation.  This shouldn't reintroduce stalls
      fixed by 0e093d99 because congestion_wait is called only when we are
      getting hopeless when sleeping is a better choice than OOM with many
      pages under IO.
      
      We have to preserve logic introduced by commit 373ccbe5 ("mm,
      vmstat: allow WQ concurrency to discover memory reclaim doesn't make any
      progress") into the __alloc_pages_slowpath now that wait_iff_congested
      is not used anymore.  As the only remaining user of wait_iff_congested
      is shrink_inactive_list we can remove the WQ specific short sleep from
      wait_iff_congested because the sleep is needed to be done only once in
      the allocation retry cycle.
      
      [mhocko@suse.com: high_zoneidx->ac_classzone_idx to evaluate memory reserves properly]
       Link: http://lkml.kernel.org/r/1463051677-29418-2-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ede37713
    • M
      mm, oom: rework oom detection · 0a0337e0
      Michal Hocko 提交于
      __alloc_pages_slowpath has traditionally relied on the direct reclaim
      and did_some_progress as an indicator that it makes sense to retry
      allocation rather than declaring OOM.  shrink_zones had to rely on
      zone_reclaimable if shrink_zone didn't make any progress to prevent from
      a premature OOM killer invocation - the LRU might be full of dirty or
      writeback pages and direct reclaim cannot clean those up.
      
      zone_reclaimable allows to rescan the reclaimable lists several times
      and restart if a page is freed.  This is really subtle behavior and it
      might lead to a livelock when a single freed page keeps allocator
      looping but the current task will not be able to allocate that single
      page.  OOM killer would be more appropriate than looping without any
      progress for unbounded amount of time.
      
      This patch changes OOM detection logic and pulls it out from shrink_zone
      which is too low to be appropriate for any high level decisions such as
      OOM which is per zonelist property.  It is __alloc_pages_slowpath which
      knows how many attempts have been done and what was the progress so far
      therefore it is more appropriate to implement this logic.
      
      The new heuristic is implemented in should_reclaim_retry helper called
      from __alloc_pages_slowpath.  It tries to be more deterministic and
      easier to follow.  It builds on an assumption that retrying makes sense
      only if the currently reclaimable memory + free pages would allow the
      current allocation request to succeed (as per __zone_watermark_ok) at
      least for one zone in the usable zonelist.
      
      This alone wouldn't be sufficient, though, because the writeback might
      get stuck and reclaimable pages might be pinned for a really long time
      or even depend on the current allocation context.  Therefore there is a
      backoff mechanism implemented which reduces the reclaim target after
      each reclaim round without any progress.  This means that we should
      eventually converge to only NR_FREE_PAGES as the target and fail on the
      wmark check and proceed to OOM.  The backoff is simple and linear with
      1/16 of the reclaimable pages for each round without any progress.  We
      are optimistic and reset counter for successful reclaim rounds.
      
      Costly high order pages mostly preserve their semantic and those without
      __GFP_REPEAT fail right away while those which have the flag set will
      back off after the amount of reclaimable pages reaches equivalent of the
      requested order.  The only difference is that if there was no progress
      during the reclaim we rely on zone watermark check.  This is more
      logical thing to do than previous 1<<order attempts which were a result
      of zone_reclaimable faking the progress.
      
      [vdavydov@virtuozzo.com: check classzone_idx for shrink_zone]
      [hannes@cmpxchg.org: separate the heuristic into should_reclaim_retry]
      [rientjes@google.com: use zone_page_state_snapshot for NR_FREE_PAGES]
      [rientjes@google.com: shrink_zones doesn't need to return anything]
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a0337e0
    • M
      mm, compaction: simplify __alloc_pages_direct_compact feedback interface · c5d01d0d
      Michal Hocko 提交于
      __alloc_pages_direct_compact communicates potential back off by two
      variables:
      	- deferred_compaction tells that the compaction returned
      	  COMPACT_DEFERRED
      	- contended_compaction is set when there is a contention on
      	  zone->lock resp. zone->lru_lock locks
      
      __alloc_pages_slowpath then backs of for THP allocation requests to
      prevent from long stalls. This is rather messy and it would be much
      cleaner to return a single compact result value and hide all the nasty
      details into __alloc_pages_direct_compact.
      
      This patch shouldn't introduce any functional changes.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5d01d0d
    • M
      mm, compaction: change COMPACT_ constants into enum · ea7ab982
      Michal Hocko 提交于
      Compaction code is doing weird dances between COMPACT_FOO -> int ->
      unsigned long
      
      But there doesn't seem to be any reason for that.  All functions which
      return/use one of those constants are not expecting any other value so it
      really makes sense to define an enum for them and make it clear that no
      other values are expected.
      
      This is a pure cleanup and shouldn't introduce any functional changes.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea7ab982
    • R
      mm: vmscan: reduce size of inactive file list · 59dc76b0
      Rik van Riel 提交于
      The inactive file list should still be large enough to contain readahead
      windows and freshly written file data, but it no longer is the only
      source for detecting multiple accesses to file pages.  The workingset
      refault measurement code causes recently evicted file pages that get
      accessed again after a shorter interval to be promoted directly to the
      active list.
      
      With that mechanism in place, we can afford to (on a larger system)
      dedicate more memory to the active file list, so we can actually cache
      more of the frequently used file pages in memory, and not have them
      pushed out by streaming writes, once-used streaming file reads, etc.
      
      This can help things like database workloads, where only half the page
      cache can currently be used to cache the database working set.  This
      patch automatically increases that fraction on larger systems, using the
      same ratio that has already been used for anonymous memory.
      
      [hannes@cmpxchg.org: cgroup-awareness]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NAndres Freund <andres@anarazel.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59dc76b0
  2. 20 5月, 2016 30 次提交
    • M
      mm, page_alloc: restore the original nodemask if the fast path allocation failed · 4741526b
      Mel Gorman 提交于
      The page allocator fast path uses either the requested nodemask or
      cpuset_current_mems_allowed if cpusets are enabled.  If the allocation
      context allows watermarks to be ignored then it can also ignore memory
      policies.  However, on entering the allocator slowpath the nodemask may
      still be cpuset_current_mems_allowed and the policies are enforced.
      This patch resets the nodemask appropriately before entering the
      slowpath.
      
      Link: http://lkml.kernel.org/r/20160504143628.GU2858@techsingularity.netSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4741526b
    • V
      mm, page_alloc: uninline the bad page part of check_new_page() · 4e611801
      Vlastimil Babka 提交于
      Bad pages should be rare so the code handling them doesn't need to be
      inline for performance reasons.  Put it to separate function which
      returns void.  This also assumes that the initial page_expected_state()
      result will match the result of the thorough check, i.e.  the page
      doesn't become "good" in the meanwhile.  This matches the same
      expectations already in place in free_pages_check().
      
      !DEBUG_VM bloat-o-meter:
      
        add/remove: 1/0 grow/shrink: 0/1 up/down: 134/-274 (-140)
        function                                     old     new   delta
        check_new_page_bad                             -     134    +134
        get_page_from_freelist                      3468    3194    -274
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e611801
    • M
      mm, page_alloc: don't duplicate code in free_pcp_prepare · e2769dbd
      Mel Gorman 提交于
      The new free_pcp_prepare() function shares a lot of code with
      free_pages_prepare(), which makes this a maintenance risk when some
      future patch modifies only one of them.  We should be able to achieve
      the same effect (skipping free_pages_check() from !DEBUG_VM configs) by
      adding a parameter to free_pages_prepare() and making it inline, so the
      checks (and the order != 0 parts) are eliminated from the call from
      free_pcp_prepare().
      
      !DEBUG_VM: bloat-o-meter reports no difference, as my gcc was already
      inlining free_pages_prepare() and the elimination seems to work as
      expected
      
      DEBUG_VM bloat-o-meter:
      
        add/remove: 0/1 grow/shrink: 2/0 up/down: 1035/-778 (257)
        function                                     old     new   delta
        __free_pages_ok                              297    1060    +763
        free_hot_cold_page                           480     752    +272
        free_pages_prepare                           778       -    -778
      
      Here inlining didn't occur before, and added some code, but it's ok for
      a debug option.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2769dbd
    • M
      mm, page_alloc: defer debugging checks of pages allocated from the PCP · 479f854a
      Mel Gorman 提交于
      Every page allocated checks a number of page fields for validity.  This
      catches corruption bugs of pages that are already freed but it is
      expensive.  This patch weakens the debugging check by checking PCP pages
      only when the PCP lists are being refilled.  All compound pages are
      checked.  This potentially avoids debugging checks entirely if the PCP
      lists are never emptied and refilled so some corruption issues may be
      missed.  Full checking requires DEBUG_VM.
      
      With the two deferred debugging patches applied, the impact to a page
      allocator microbenchmark is
      
                                                   4.6.0-rc3                  4.6.0-rc3
                                                 inline-v3r6            deferalloc-v3r7
        Min      alloc-odr0-1               344.00 (  0.00%)           317.00 (  7.85%)
        Min      alloc-odr0-2               248.00 (  0.00%)           231.00 (  6.85%)
        Min      alloc-odr0-4               209.00 (  0.00%)           192.00 (  8.13%)
        Min      alloc-odr0-8               181.00 (  0.00%)           166.00 (  8.29%)
        Min      alloc-odr0-16              168.00 (  0.00%)           154.00 (  8.33%)
        Min      alloc-odr0-32              161.00 (  0.00%)           148.00 (  8.07%)
        Min      alloc-odr0-64              158.00 (  0.00%)           145.00 (  8.23%)
        Min      alloc-odr0-128             156.00 (  0.00%)           143.00 (  8.33%)
        Min      alloc-odr0-256             168.00 (  0.00%)           154.00 (  8.33%)
        Min      alloc-odr0-512             178.00 (  0.00%)           167.00 (  6.18%)
        Min      alloc-odr0-1024            186.00 (  0.00%)           174.00 (  6.45%)
        Min      alloc-odr0-2048            192.00 (  0.00%)           180.00 (  6.25%)
        Min      alloc-odr0-4096            198.00 (  0.00%)           184.00 (  7.07%)
        Min      alloc-odr0-8192            200.00 (  0.00%)           188.00 (  6.00%)
        Min      alloc-odr0-16384           201.00 (  0.00%)           188.00 (  6.47%)
        Min      free-odr0-1                189.00 (  0.00%)           180.00 (  4.76%)
        Min      free-odr0-2                132.00 (  0.00%)           126.00 (  4.55%)
        Min      free-odr0-4                104.00 (  0.00%)            99.00 (  4.81%)
        Min      free-odr0-8                 90.00 (  0.00%)            85.00 (  5.56%)
        Min      free-odr0-16                84.00 (  0.00%)            80.00 (  4.76%)
        Min      free-odr0-32                80.00 (  0.00%)            76.00 (  5.00%)
        Min      free-odr0-64                78.00 (  0.00%)            74.00 (  5.13%)
        Min      free-odr0-128               77.00 (  0.00%)            73.00 (  5.19%)
        Min      free-odr0-256               94.00 (  0.00%)            91.00 (  3.19%)
        Min      free-odr0-512              108.00 (  0.00%)           112.00 ( -3.70%)
        Min      free-odr0-1024             115.00 (  0.00%)           118.00 ( -2.61%)
        Min      free-odr0-2048             120.00 (  0.00%)           125.00 ( -4.17%)
        Min      free-odr0-4096             123.00 (  0.00%)           129.00 ( -4.88%)
        Min      free-odr0-8192             126.00 (  0.00%)           130.00 ( -3.17%)
        Min      free-odr0-16384            126.00 (  0.00%)           131.00 ( -3.97%)
      
      Note that the free paths for large numbers of pages is impacted as the
      debugging cost gets shifted into that path when the page data is no
      longer necessarily cache-hot.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      479f854a
    • M
      mm, page_alloc: defer debugging checks of freed pages until a PCP drain · 4db7548c
      Mel Gorman 提交于
      Every page free checks a number of page fields for validity.  This
      catches premature frees and corruptions but it is also expensive.  This
      patch weakens the debugging check by checking PCP pages at the time they
      are drained from the PCP list.  This will trigger the bug but the site
      that freed the corrupt page will be lost.  To get the full context, a
      kernel rebuild with DEBUG_VM is necessary.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4db7548c
    • V
      cpuset: use static key better and convert to new API · 002f2906
      Vlastimil Babka 提交于
      An important function for cpusets is cpuset_node_allowed(), which
      optimizes on the fact if there's a single root CPU set, it must be
      trivially allowed.  But the check "nr_cpusets() <= 1" doesn't use the
      cpusets_enabled_key static key the right way where static keys eliminate
      branching overhead with jump labels.
      
      This patch converts it so that static key is used properly.  It's also
      switched to the new static key API and the checking functions are
      converted to return bool instead of int.  We also provide a new variant
      __cpuset_zone_allowed() which expects that the static key check was
      already done and they key was enabled.  This is needed for
      get_page_from_freelist() where we want to also avoid the relatively
      slower check when ALLOC_CPUSET is not set in alloc_flags.
      
      The impact on the page allocator microbenchmark is less than expected
      but the cleanup in itself is worthwhile.
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                             multcheck-v1r20               cpuset-v1r20
        Min      alloc-odr0-1               348.00 (  0.00%)           348.00 (  0.00%)
        Min      alloc-odr0-2               254.00 (  0.00%)           254.00 (  0.00%)
        Min      alloc-odr0-4               213.00 (  0.00%)           213.00 (  0.00%)
        Min      alloc-odr0-8               186.00 (  0.00%)           183.00 (  1.61%)
        Min      alloc-odr0-16              173.00 (  0.00%)           171.00 (  1.16%)
        Min      alloc-odr0-32              166.00 (  0.00%)           163.00 (  1.81%)
        Min      alloc-odr0-64              162.00 (  0.00%)           159.00 (  1.85%)
        Min      alloc-odr0-128             160.00 (  0.00%)           157.00 (  1.88%)
        Min      alloc-odr0-256             169.00 (  0.00%)           166.00 (  1.78%)
        Min      alloc-odr0-512             180.00 (  0.00%)           180.00 (  0.00%)
        Min      alloc-odr0-1024            188.00 (  0.00%)           187.00 (  0.53%)
        Min      alloc-odr0-2048            194.00 (  0.00%)           193.00 (  0.52%)
        Min      alloc-odr0-4096            199.00 (  0.00%)           198.00 (  0.50%)
        Min      alloc-odr0-8192            202.00 (  0.00%)           201.00 (  0.50%)
        Min      alloc-odr0-16384           203.00 (  0.00%)           202.00 (  0.49%)
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NZefan Li <lizefan@huawei.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      002f2906
    • M
      mm, page_alloc: inline pageblock lookup in page free fast paths · 0b423ca2
      Mel Gorman 提交于
      The function call overhead of get_pfnblock_flags_mask() is measurable in
      the page free paths.  This patch uses an inlined version that is faster.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b423ca2
    • M
      mm, page_alloc: remove unnecessary variable from free_pcppages_bulk · e5b31ac2
      Mel Gorman 提交于
      The original count is never reused so it can be removed.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5b31ac2
    • M
      mm, page_alloc: pull out side effects from free_pages_check · da838d4f
      Mel Gorman 提交于
      Check without side-effects should be easier to maintain.  It also
      removes the duplicated cpupid and flags reset done in !DEBUG_VM variant
      of both free_pcp_prepare() and then bulkfree_pcp_prepare().  Finally, it
      enables the next patch.
      
      It shouldn't result in new branches, thanks to inlining of the check.
      
      !DEBUG_VM bloat-o-meter:
      
        add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-27 (-27)
        function                                     old     new   delta
        __free_pages_ok                              748     739      -9
        free_pcppages_bulk                          1403    1385     -18
      
      DEBUG_VM:
      
        add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-28 (-28)
        function                                     old     new   delta
        free_pages_prepare                           806     778     -28
      
      This is also slightly faster because cpupid information is not set on
      tail pages so we can avoid resets there.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da838d4f
    • M
      mm, page_alloc: un-inline the bad part of free_pages_check · bb552ac6
      Mel Gorman 提交于
      From: Vlastimil Babka <vbabka@suse.cz>
      
      !DEBUG_VM size and bloat-o-meter:
      
        add/remove: 1/0 grow/shrink: 0/2 up/down: 124/-370 (-246)
        function                                     old     new   delta
        free_pages_check_bad                           -     124    +124
        free_pcppages_bulk                          1288    1171    -117
        __free_pages_ok                              948     695    -253
      
      DEBUG_VM:
      
        add/remove: 1/0 grow/shrink: 0/1 up/down: 124/-214 (-90)
        function                                     old     new   delta
        free_pages_check_bad                           -     124    +124
        free_pages_prepare                          1112     898    -214
      
      [akpm@linux-foundation.org: fix whitespace]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb552ac6
    • M
      mm, page_alloc: check multiple page fields with a single branch · 7bfec6f4
      Mel Gorman 提交于
      Every page allocated or freed is checked for sanity to avoid corruptions
      that are difficult to detect later.  A bad page could be due to a number
      of fields.  Instead of using multiple branches, this patch combines
      multiple fields into a single branch.  A detailed check is only
      necessary if that check fails.
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                              initonce-v1r20            multcheck-v1r20
        Min      alloc-odr0-1               359.00 (  0.00%)           348.00 (  3.06%)
        Min      alloc-odr0-2               260.00 (  0.00%)           254.00 (  2.31%)
        Min      alloc-odr0-4               214.00 (  0.00%)           213.00 (  0.47%)
        Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
        Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
        Min      alloc-odr0-32              165.00 (  0.00%)           166.00 ( -0.61%)
        Min      alloc-odr0-64              162.00 (  0.00%)           162.00 (  0.00%)
        Min      alloc-odr0-128             161.00 (  0.00%)           160.00 (  0.62%)
        Min      alloc-odr0-256             170.00 (  0.00%)           169.00 (  0.59%)
        Min      alloc-odr0-512             181.00 (  0.00%)           180.00 (  0.55%)
        Min      alloc-odr0-1024            190.00 (  0.00%)           188.00 (  1.05%)
        Min      alloc-odr0-2048            196.00 (  0.00%)           194.00 (  1.02%)
        Min      alloc-odr0-4096            202.00 (  0.00%)           199.00 (  1.49%)
        Min      alloc-odr0-8192            205.00 (  0.00%)           202.00 (  1.46%)
        Min      alloc-odr0-16384           205.00 (  0.00%)           203.00 (  0.98%)
      
      Again, the benefit is marginal but avoiding excessive branches is
      important.  Ideally the paths would not have to check these conditions
      at all but regrettably abandoning the tests would make use-after-free
      bugs much harder to detect.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bfec6f4
    • M
      mm, page_alloc: remove field from alloc_context · 93ea9964
      Mel Gorman 提交于
      The classzone_idx can be inferred from preferred_zoneref so remove the
      unnecessary field and save stack space.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93ea9964
    • M
      mm, page_alloc: avoid looking up the first zone in a zonelist twice · c33d6c06
      Mel Gorman 提交于
      The allocator fast path looks up the first usable zone in a zonelist and
      then get_page_from_freelist does the same job in the zonelist iterator.
      This patch preserves the necessary information.
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                              fastmark-v1r20             initonce-v1r20
        Min      alloc-odr0-1               364.00 (  0.00%)           359.00 (  1.37%)
        Min      alloc-odr0-2               262.00 (  0.00%)           260.00 (  0.76%)
        Min      alloc-odr0-4               214.00 (  0.00%)           214.00 (  0.00%)
        Min      alloc-odr0-8               186.00 (  0.00%)           186.00 (  0.00%)
        Min      alloc-odr0-16              173.00 (  0.00%)           173.00 (  0.00%)
        Min      alloc-odr0-32              165.00 (  0.00%)           165.00 (  0.00%)
        Min      alloc-odr0-64              161.00 (  0.00%)           162.00 ( -0.62%)
        Min      alloc-odr0-128             159.00 (  0.00%)           161.00 ( -1.26%)
        Min      alloc-odr0-256             168.00 (  0.00%)           170.00 ( -1.19%)
        Min      alloc-odr0-512             180.00 (  0.00%)           181.00 ( -0.56%)
        Min      alloc-odr0-1024            190.00 (  0.00%)           190.00 (  0.00%)
        Min      alloc-odr0-2048            196.00 (  0.00%)           196.00 (  0.00%)
        Min      alloc-odr0-4096            202.00 (  0.00%)           202.00 (  0.00%)
        Min      alloc-odr0-8192            206.00 (  0.00%)           205.00 (  0.49%)
        Min      alloc-odr0-16384           206.00 (  0.00%)           205.00 (  0.49%)
      
      The benefit is negligible and the results are within the noise but each
      cycle counts.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c33d6c06
    • M
      mm, page_alloc: shortcut watermark checks for order-0 pages · 48ee5f36
      Mel Gorman 提交于
      Watermarks have to be checked on every allocation including the number
      of pages being allocated and whether reserves can be accessed.  The
      reserves only matter if memory is limited and the free_pages adjustment
      only applies to high-order pages.  This patch adds a shortcut for
      order-0 pages that avoids numerous calculations if there is plenty of
      free memory yielding the following performance difference in a page
      allocator microbenchmark;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               optfair-v1r20             fastmark-v1r20
        Min      alloc-odr0-1               380.00 (  0.00%)           364.00 (  4.21%)
        Min      alloc-odr0-2               273.00 (  0.00%)           262.00 (  4.03%)
        Min      alloc-odr0-4               227.00 (  0.00%)           214.00 (  5.73%)
        Min      alloc-odr0-8               196.00 (  0.00%)           186.00 (  5.10%)
        Min      alloc-odr0-16              183.00 (  0.00%)           173.00 (  5.46%)
        Min      alloc-odr0-32              173.00 (  0.00%)           165.00 (  4.62%)
        Min      alloc-odr0-64              169.00 (  0.00%)           161.00 (  4.73%)
        Min      alloc-odr0-128             169.00 (  0.00%)           159.00 (  5.92%)
        Min      alloc-odr0-256             180.00 (  0.00%)           168.00 (  6.67%)
        Min      alloc-odr0-512             190.00 (  0.00%)           180.00 (  5.26%)
        Min      alloc-odr0-1024            198.00 (  0.00%)           190.00 (  4.04%)
        Min      alloc-odr0-2048            204.00 (  0.00%)           196.00 (  3.92%)
        Min      alloc-odr0-4096            209.00 (  0.00%)           202.00 (  3.35%)
        Min      alloc-odr0-8192            213.00 (  0.00%)           206.00 (  3.29%)
        Min      alloc-odr0-16384           214.00 (  0.00%)           206.00 (  3.74%)
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48ee5f36
    • M
      mm, page_alloc: reduce cost of fair zone allocation policy retry · 30534755
      Mel Gorman 提交于
      The fair zone allocation policy is not without cost but it can be
      reduced slightly.  This patch removes an unnecessary local variable,
      checks the likely conditions of the fair zone policy first, uses a bool
      instead of a flags check and falls through when a remote node is
      encountered instead of doing a full restart.  The benefit is marginal
      but it's there
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               decstat-v1r20              optfair-v1r20
        Min      alloc-odr0-1               377.00 (  0.00%)           380.00 ( -0.80%)
        Min      alloc-odr0-2               273.00 (  0.00%)           273.00 (  0.00%)
        Min      alloc-odr0-4               226.00 (  0.00%)           227.00 ( -0.44%)
        Min      alloc-odr0-8               196.00 (  0.00%)           196.00 (  0.00%)
        Min      alloc-odr0-16              183.00 (  0.00%)           183.00 (  0.00%)
        Min      alloc-odr0-32              175.00 (  0.00%)           173.00 (  1.14%)
        Min      alloc-odr0-64              172.00 (  0.00%)           169.00 (  1.74%)
        Min      alloc-odr0-128             170.00 (  0.00%)           169.00 (  0.59%)
        Min      alloc-odr0-256             183.00 (  0.00%)           180.00 (  1.64%)
        Min      alloc-odr0-512             191.00 (  0.00%)           190.00 (  0.52%)
        Min      alloc-odr0-1024            199.00 (  0.00%)           198.00 (  0.50%)
        Min      alloc-odr0-2048            204.00 (  0.00%)           204.00 (  0.00%)
        Min      alloc-odr0-4096            210.00 (  0.00%)           209.00 (  0.48%)
        Min      alloc-odr0-8192            213.00 (  0.00%)           213.00 (  0.00%)
        Min      alloc-odr0-16384           214.00 (  0.00%)           214.00 (  0.00%)
      
      The benefit is marginal at best but one of the most important benefits,
      avoiding a second search when falling back to another node is not
      triggered by this particular test so the benefit for some corner cases
      is understated.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30534755
    • M
      mm, page_alloc: shorten the page allocator fast path · 4fcb0971
      Mel Gorman 提交于
      The page allocator fast path checks page multiple times unnecessarily.
      This patch avoids all the slowpath checks if the first allocation
      attempt succeeds.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4fcb0971
    • M
      mm, page_alloc: check once if a zone has isolated pageblocks · 3777999d
      Mel Gorman 提交于
      When bulk freeing pages from the per-cpu lists the zone is checked for
      isolated pageblocks on every release.  This patch checks it once per
      drain.
      
      [mgorman@techsingularity.net: fix locking radce, per Vlastimil]
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3777999d
    • M
      mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath · 83d4ca81
      Mel Gorman 提交于
      __GFP_HARDWALL only has meaning in the context of cpusets but the fast
      path always applies the flag on the first attempt.  Move the
      manipulations into the cpuset paths where they will be masked by a
      static branch in the common case.
      
      With the other micro-optimisations in this series combined, the impact
      on a page allocator microbenchmark is
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               decstat-v1r20                micro-v1r20
        Min      alloc-odr0-1               381.00 (  0.00%)           377.00 (  1.05%)
        Min      alloc-odr0-2               275.00 (  0.00%)           273.00 (  0.73%)
        Min      alloc-odr0-4               229.00 (  0.00%)           226.00 (  1.31%)
        Min      alloc-odr0-8               199.00 (  0.00%)           196.00 (  1.51%)
        Min      alloc-odr0-16              186.00 (  0.00%)           183.00 (  1.61%)
        Min      alloc-odr0-32              179.00 (  0.00%)           175.00 (  2.23%)
        Min      alloc-odr0-64              174.00 (  0.00%)           172.00 (  1.15%)
        Min      alloc-odr0-128             172.00 (  0.00%)           170.00 (  1.16%)
        Min      alloc-odr0-256             181.00 (  0.00%)           183.00 ( -1.10%)
        Min      alloc-odr0-512             193.00 (  0.00%)           191.00 (  1.04%)
        Min      alloc-odr0-1024            201.00 (  0.00%)           199.00 (  1.00%)
        Min      alloc-odr0-2048            206.00 (  0.00%)           204.00 (  0.97%)
        Min      alloc-odr0-4096            212.00 (  0.00%)           210.00 (  0.94%)
        Min      alloc-odr0-8192            215.00 (  0.00%)           213.00 (  0.93%)
        Min      alloc-odr0-16384           216.00 (  0.00%)           214.00 (  0.93%)
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83d4ca81
    • M
      mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask() · 5bb1b169
      Mel Gorman 提交于
      page is guaranteed to be set before it is read with or without the
      initialisation.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5bb1b169
    • M
      mm, page_alloc: remove unnecessary initialisation in get_page_from_freelist · be06af00
      Mel Gorman 提交于
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be06af00
    • M
      mm, page_alloc: remove unnecessary local variable in get_page_from_freelist · 4dfa6cd8
      Mel Gorman 提交于
      zonelist here is a copy of a struct field that is used once.  Ditch it.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4dfa6cd8
    • M
      mm, page_alloc: convert nr_fair_skipped to bool · fa379b95
      Mel Gorman 提交于
      The number of zones skipped to a zone expiring its fair zone allocation
      quota is irrelevant.  Convert to bool.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa379b95
    • M
      mm, page_alloc: convert alloc_flags to unsigned · c603844b
      Mel Gorman 提交于
      alloc_flags is a bitmask of flags but it is signed which does not
      necessarily generate the best code depending on the compiler.  Even
      without an impact, it makes more sense that this be unsigned.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c603844b
    • M
      mm, page_alloc: avoid unnecessary zone lookups during pageblock operations · f75fb889
      Mel Gorman 提交于
      Pageblocks have an associated bitmap to store migrate types and whether
      the pageblock should be skipped during compaction.  The bitmap may be
      associated with a memory section or a zone but the zone is looked up
      unconditionally.  The compiler should optimise this away automatically
      so this is a cosmetic patch only in many cases.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f75fb889
    • M
      mm, page_alloc: use __dec_zone_state for order-0 page allocation · 754078eb
      Mel Gorman 提交于
      __dec_zone_state is cheaper to use for removing an order-0 page as it
      has fewer conditions to check.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                               optiter-v1r20              decstat-v1r20
        Min      alloc-odr0-1               382.00 (  0.00%)           381.00 (  0.26%)
        Min      alloc-odr0-2               282.00 (  0.00%)           275.00 (  2.48%)
        Min      alloc-odr0-4               233.00 (  0.00%)           229.00 (  1.72%)
        Min      alloc-odr0-8               203.00 (  0.00%)           199.00 (  1.97%)
        Min      alloc-odr0-16              188.00 (  0.00%)           186.00 (  1.06%)
        Min      alloc-odr0-32              182.00 (  0.00%)           179.00 (  1.65%)
        Min      alloc-odr0-64              177.00 (  0.00%)           174.00 (  1.69%)
        Min      alloc-odr0-128             175.00 (  0.00%)           172.00 (  1.71%)
        Min      alloc-odr0-256             184.00 (  0.00%)           181.00 (  1.63%)
        Min      alloc-odr0-512             197.00 (  0.00%)           193.00 (  2.03%)
        Min      alloc-odr0-1024            203.00 (  0.00%)           201.00 (  0.99%)
        Min      alloc-odr0-2048            209.00 (  0.00%)           206.00 (  1.44%)
        Min      alloc-odr0-4096            214.00 (  0.00%)           212.00 (  0.93%)
        Min      alloc-odr0-8192            218.00 (  0.00%)           215.00 (  1.38%)
        Min      alloc-odr0-16384           219.00 (  0.00%)           216.00 (  1.37%)
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      754078eb
    • M
      mm, page_alloc: inline the fast path of the zonelist iterator · 682a3385
      Mel Gorman 提交于
      The page allocator iterates through a zonelist for zones that match the
      addressing limitations and nodemask of the caller but many allocations
      will not be restricted.  Despite this, there is always functional call
      overhead which builds up.
      
      This patch inlines the optimistic basic case and only calls the iterator
      function for the complex case.  A hindrance was the fact that
      cpuset_current_mems_allowed is used in the fastpath as the allowed
      nodemask even though all nodes are allowed on most systems.  The patch
      handles this by only considering cpuset_current_mems_allowed if a cpuset
      exists.  As well as being faster in the fast-path, this removes some
      junk in the slowpath.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                            statinline-v1r20              optiter-v1r20
        Min      alloc-odr0-1               412.00 (  0.00%)           382.00 (  7.28%)
        Min      alloc-odr0-2               301.00 (  0.00%)           282.00 (  6.31%)
        Min      alloc-odr0-4               247.00 (  0.00%)           233.00 (  5.67%)
        Min      alloc-odr0-8               215.00 (  0.00%)           203.00 (  5.58%)
        Min      alloc-odr0-16              199.00 (  0.00%)           188.00 (  5.53%)
        Min      alloc-odr0-32              191.00 (  0.00%)           182.00 (  4.71%)
        Min      alloc-odr0-64              187.00 (  0.00%)           177.00 (  5.35%)
        Min      alloc-odr0-128             185.00 (  0.00%)           175.00 (  5.41%)
        Min      alloc-odr0-256             193.00 (  0.00%)           184.00 (  4.66%)
        Min      alloc-odr0-512             207.00 (  0.00%)           197.00 (  4.83%)
        Min      alloc-odr0-1024            213.00 (  0.00%)           203.00 (  4.69%)
        Min      alloc-odr0-2048            220.00 (  0.00%)           209.00 (  5.00%)
        Min      alloc-odr0-4096            226.00 (  0.00%)           214.00 (  5.31%)
        Min      alloc-odr0-8192            229.00 (  0.00%)           218.00 (  4.80%)
        Min      alloc-odr0-16384           229.00 (  0.00%)           219.00 (  4.37%)
      
      perf indicated that next_zones_zonelist disappeared in the profile and
      __next_zones_zonelist did not appear.  This is expected as the
      micro-benchmark would hit the inlined fast-path every time.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      682a3385
    • M
      mm, page_alloc: inline zone_statistics · 060e7417
      Mel Gorman 提交于
      zone_statistics has one call-site but it's a public function.  Make it
      static and inline.
      
      The performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                            statbranch-v1r20           statinline-v1r20
        Min      alloc-odr0-1               419.00 (  0.00%)           412.00 (  1.67%)
        Min      alloc-odr0-2               305.00 (  0.00%)           301.00 (  1.31%)
        Min      alloc-odr0-4               250.00 (  0.00%)           247.00 (  1.20%)
        Min      alloc-odr0-8               219.00 (  0.00%)           215.00 (  1.83%)
        Min      alloc-odr0-16              203.00 (  0.00%)           199.00 (  1.97%)
        Min      alloc-odr0-32              195.00 (  0.00%)           191.00 (  2.05%)
        Min      alloc-odr0-64              191.00 (  0.00%)           187.00 (  2.09%)
        Min      alloc-odr0-128             189.00 (  0.00%)           185.00 (  2.12%)
        Min      alloc-odr0-256             198.00 (  0.00%)           193.00 (  2.53%)
        Min      alloc-odr0-512             210.00 (  0.00%)           207.00 (  1.43%)
        Min      alloc-odr0-1024            216.00 (  0.00%)           213.00 (  1.39%)
        Min      alloc-odr0-2048            221.00 (  0.00%)           220.00 (  0.45%)
        Min      alloc-odr0-4096            227.00 (  0.00%)           226.00 (  0.44%)
        Min      alloc-odr0-8192            232.00 (  0.00%)           229.00 (  1.29%)
        Min      alloc-odr0-16384           232.00 (  0.00%)           229.00 (  1.29%)
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      060e7417
    • M
      mm, page_alloc: use new PageAnonHead helper in the free page fast path · 17514574
      Mel Gorman 提交于
      The PageAnon check always checks for compound_head but this is a
      relatively expensive check if the caller already knows the page is a
      head page.  This patch creates a helper and uses it in the page free
      path which only operates on head pages.
      
      With this patch and "Only check PageCompound for high-order pages", the
      performance difference on a page allocator microbenchmark is;
      
                                                   4.6.0-rc2                  4.6.0-rc2
                                                     vanilla           nocompound-v1r20
        Min      alloc-odr0-1               425.00 (  0.00%)           417.00 (  1.88%)
        Min      alloc-odr0-2               313.00 (  0.00%)           308.00 (  1.60%)
        Min      alloc-odr0-4               257.00 (  0.00%)           253.00 (  1.56%)
        Min      alloc-odr0-8               224.00 (  0.00%)           221.00 (  1.34%)
        Min      alloc-odr0-16              208.00 (  0.00%)           205.00 (  1.44%)
        Min      alloc-odr0-32              199.00 (  0.00%)           199.00 (  0.00%)
        Min      alloc-odr0-64              195.00 (  0.00%)           193.00 (  1.03%)
        Min      alloc-odr0-128             192.00 (  0.00%)           191.00 (  0.52%)
        Min      alloc-odr0-256             204.00 (  0.00%)           200.00 (  1.96%)
        Min      alloc-odr0-512             213.00 (  0.00%)           212.00 (  0.47%)
        Min      alloc-odr0-1024            219.00 (  0.00%)           219.00 (  0.00%)
        Min      alloc-odr0-2048            225.00 (  0.00%)           225.00 (  0.00%)
        Min      alloc-odr0-4096            230.00 (  0.00%)           231.00 ( -0.43%)
        Min      alloc-odr0-8192            235.00 (  0.00%)           234.00 (  0.43%)
        Min      alloc-odr0-16384           235.00 (  0.00%)           234.00 (  0.43%)
        Min      free-odr0-1                215.00 (  0.00%)           191.00 ( 11.16%)
        Min      free-odr0-2                152.00 (  0.00%)           136.00 ( 10.53%)
        Min      free-odr0-4                119.00 (  0.00%)           107.00 ( 10.08%)
        Min      free-odr0-8                106.00 (  0.00%)            96.00 (  9.43%)
        Min      free-odr0-16                97.00 (  0.00%)            87.00 ( 10.31%)
        Min      free-odr0-32                91.00 (  0.00%)            83.00 (  8.79%)
        Min      free-odr0-64                89.00 (  0.00%)            81.00 (  8.99%)
        Min      free-odr0-128               88.00 (  0.00%)            80.00 (  9.09%)
        Min      free-odr0-256              106.00 (  0.00%)            95.00 ( 10.38%)
        Min      free-odr0-512              116.00 (  0.00%)           111.00 (  4.31%)
        Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
        Min      free-odr0-2048             133.00 (  0.00%)           126.00 (  5.26%)
        Min      free-odr0-4096             136.00 (  0.00%)           130.00 (  4.41%)
        Min      free-odr0-8192             138.00 (  0.00%)           130.00 (  5.80%)
        Min      free-odr0-16384            137.00 (  0.00%)           130.00 (  5.11%)
      
      There is a sizable boost to the free allocator performance.  While there
      is an apparent boost on the allocation side, it's likely a co-incidence
      or due to the patches slightly reducing cache footprint.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      17514574
    • M
      mm, page_alloc: only check PageCompound for high-order pages · d61f8590
      Mel Gorman 提交于
      Another year, another round of page allocator optimisations focusing
      this time on the alloc and free fast paths.  This should be of help to
      workloads that are allocator-intensive from kernel space where the cost
      of zeroing is not nceessraily incurred.
      
      The series is motivated by the observation that page alloc
      microbenchmarks on multiple machines regressed between 3.12.44 and 4.4.
      Second, there is discussions before LSF/MM considering the possibility
      of adding another page allocator which is potentially hazardous but a
      patch series improving performance is better than whining.
      
      After the series is applied, there are still hazards.  In the free
      paths, the debugging checking and page zone/pageblock lookups dominate
      but there was not an obvious solution to that.  In the alloc path, the
      major contributers are dealing with zonelists, new page preperation, the
      fair zone allocation and numerous statistic updates.  The fair zone
      allocator is removed by the per-node LRU series if that gets merged so
      it's nor a major concern at the moment.
      
      On normal userspace benchmarks, there is little impact as the zeroing
      cost is significant but it's visible
      
        aim9
                                       4.6.0-rc3             4.6.0-rc3
                                         vanilla         deferalloc-v3
        Min      page_test   828693.33 (  0.00%)   887060.00 (  7.04%)
        Min      brk_test   4847266.67 (  0.00%)  4966266.67 (  2.45%)
        Min      exec_test     1271.00 (  0.00%)     1275.67 (  0.37%)
        Min      fork_test    12371.75 (  0.00%)    12380.00 (  0.07%)
      
      The overall impact on a page allocator microbenchmark for a range of orders
      and number of pages allocated in a batch is
      
                                                  4.6.0-rc3                  4.6.0-rc3
                                                     vanilla            deferalloc-v3r7
        Min      alloc-odr0-1               428.00 (  0.00%)           316.00 ( 26.17%)
        Min      alloc-odr0-2               314.00 (  0.00%)           231.00 ( 26.43%)
        Min      alloc-odr0-4               256.00 (  0.00%)           192.00 ( 25.00%)
        Min      alloc-odr0-8               222.00 (  0.00%)           166.00 ( 25.23%)
        Min      alloc-odr0-16              207.00 (  0.00%)           154.00 ( 25.60%)
        Min      alloc-odr0-32              197.00 (  0.00%)           148.00 ( 24.87%)
        Min      alloc-odr0-64              193.00 (  0.00%)           144.00 ( 25.39%)
        Min      alloc-odr0-128             191.00 (  0.00%)           143.00 ( 25.13%)
        Min      alloc-odr0-256             203.00 (  0.00%)           153.00 ( 24.63%)
        Min      alloc-odr0-512             212.00 (  0.00%)           165.00 ( 22.17%)
        Min      alloc-odr0-1024            221.00 (  0.00%)           172.00 ( 22.17%)
        Min      alloc-odr0-2048            225.00 (  0.00%)           179.00 ( 20.44%)
        Min      alloc-odr0-4096            232.00 (  0.00%)           185.00 ( 20.26%)
        Min      alloc-odr0-8192            235.00 (  0.00%)           187.00 ( 20.43%)
        Min      alloc-odr0-16384           236.00 (  0.00%)           188.00 ( 20.34%)
        Min      alloc-odr1-1               519.00 (  0.00%)           450.00 ( 13.29%)
        Min      alloc-odr1-2               391.00 (  0.00%)           336.00 ( 14.07%)
        Min      alloc-odr1-4               313.00 (  0.00%)           268.00 ( 14.38%)
        Min      alloc-odr1-8               277.00 (  0.00%)           235.00 ( 15.16%)
        Min      alloc-odr1-16              256.00 (  0.00%)           218.00 ( 14.84%)
        Min      alloc-odr1-32              252.00 (  0.00%)           212.00 ( 15.87%)
        Min      alloc-odr1-64              244.00 (  0.00%)           206.00 ( 15.57%)
        Min      alloc-odr1-128             244.00 (  0.00%)           207.00 ( 15.16%)
        Min      alloc-odr1-256             243.00 (  0.00%)           207.00 ( 14.81%)
        Min      alloc-odr1-512             245.00 (  0.00%)           209.00 ( 14.69%)
        Min      alloc-odr1-1024            248.00 (  0.00%)           214.00 ( 13.71%)
        Min      alloc-odr1-2048            253.00 (  0.00%)           220.00 ( 13.04%)
        Min      alloc-odr1-4096            258.00 (  0.00%)           224.00 ( 13.18%)
        Min      alloc-odr1-8192            261.00 (  0.00%)           229.00 ( 12.26%)
        Min      alloc-odr2-1               560.00 (  0.00%)           753.00 (-34.46%)
        Min      alloc-odr2-2               424.00 (  0.00%)           351.00 ( 17.22%)
        Min      alloc-odr2-4               339.00 (  0.00%)           393.00 (-15.93%)
        Min      alloc-odr2-8               298.00 (  0.00%)           246.00 ( 17.45%)
        Min      alloc-odr2-16              276.00 (  0.00%)           227.00 ( 17.75%)
        Min      alloc-odr2-32              271.00 (  0.00%)           221.00 ( 18.45%)
        Min      alloc-odr2-64              264.00 (  0.00%)           217.00 ( 17.80%)
        Min      alloc-odr2-128             264.00 (  0.00%)           217.00 ( 17.80%)
        Min      alloc-odr2-256             264.00 (  0.00%)           218.00 ( 17.42%)
        Min      alloc-odr2-512             269.00 (  0.00%)           223.00 ( 17.10%)
        Min      alloc-odr2-1024            279.00 (  0.00%)           230.00 ( 17.56%)
        Min      alloc-odr2-2048            283.00 (  0.00%)           235.00 ( 16.96%)
        Min      alloc-odr2-4096            285.00 (  0.00%)           239.00 ( 16.14%)
        Min      alloc-odr3-1               629.00 (  0.00%)           505.00 ( 19.71%)
        Min      alloc-odr3-2               472.00 (  0.00%)           374.00 ( 20.76%)
        Min      alloc-odr3-4               383.00 (  0.00%)           301.00 ( 21.41%)
        Min      alloc-odr3-8               341.00 (  0.00%)           266.00 ( 21.99%)
        Min      alloc-odr3-16              316.00 (  0.00%)           248.00 ( 21.52%)
        Min      alloc-odr3-32              308.00 (  0.00%)           241.00 ( 21.75%)
        Min      alloc-odr3-64              305.00 (  0.00%)           241.00 ( 20.98%)
        Min      alloc-odr3-128             308.00 (  0.00%)           244.00 ( 20.78%)
        Min      alloc-odr3-256             317.00 (  0.00%)           249.00 ( 21.45%)
        Min      alloc-odr3-512             327.00 (  0.00%)           256.00 ( 21.71%)
        Min      alloc-odr3-1024            331.00 (  0.00%)           261.00 ( 21.15%)
        Min      alloc-odr3-2048            333.00 (  0.00%)           266.00 ( 20.12%)
        Min      alloc-odr4-1               767.00 (  0.00%)           572.00 ( 25.42%)
        Min      alloc-odr4-2               578.00 (  0.00%)           429.00 ( 25.78%)
        Min      alloc-odr4-4               474.00 (  0.00%)           346.00 ( 27.00%)
        Min      alloc-odr4-8               422.00 (  0.00%)           310.00 ( 26.54%)
        Min      alloc-odr4-16              399.00 (  0.00%)           295.00 ( 26.07%)
        Min      alloc-odr4-32              392.00 (  0.00%)           293.00 ( 25.26%)
        Min      alloc-odr4-64              394.00 (  0.00%)           293.00 ( 25.63%)
        Min      alloc-odr4-128             405.00 (  0.00%)           305.00 ( 24.69%)
        Min      alloc-odr4-256             417.00 (  0.00%)           319.00 ( 23.50%)
        Min      alloc-odr4-512             425.00 (  0.00%)           326.00 ( 23.29%)
        Min      alloc-odr4-1024            426.00 (  0.00%)           329.00 ( 22.77%)
        Min      free-odr0-1                216.00 (  0.00%)           178.00 ( 17.59%)
        Min      free-odr0-2                152.00 (  0.00%)           125.00 ( 17.76%)
        Min      free-odr0-4                120.00 (  0.00%)            99.00 ( 17.50%)
        Min      free-odr0-8                106.00 (  0.00%)            85.00 ( 19.81%)
        Min      free-odr0-16                97.00 (  0.00%)            80.00 ( 17.53%)
        Min      free-odr0-32                92.00 (  0.00%)            76.00 ( 17.39%)
        Min      free-odr0-64                89.00 (  0.00%)            74.00 ( 16.85%)
        Min      free-odr0-128               89.00 (  0.00%)            73.00 ( 17.98%)
        Min      free-odr0-256              107.00 (  0.00%)            90.00 ( 15.89%)
        Min      free-odr0-512              117.00 (  0.00%)           108.00 (  7.69%)
        Min      free-odr0-1024             125.00 (  0.00%)           118.00 (  5.60%)
        Min      free-odr0-2048             132.00 (  0.00%)           125.00 (  5.30%)
        Min      free-odr0-4096             135.00 (  0.00%)           130.00 (  3.70%)
        Min      free-odr0-8192             137.00 (  0.00%)           130.00 (  5.11%)
        Min      free-odr0-16384            137.00 (  0.00%)           131.00 (  4.38%)
        Min      free-odr1-1                318.00 (  0.00%)           289.00 (  9.12%)
        Min      free-odr1-2                228.00 (  0.00%)           207.00 (  9.21%)
        Min      free-odr1-4                182.00 (  0.00%)           165.00 (  9.34%)
        Min      free-odr1-8                163.00 (  0.00%)           146.00 ( 10.43%)
        Min      free-odr1-16               151.00 (  0.00%)           135.00 ( 10.60%)
        Min      free-odr1-32               146.00 (  0.00%)           129.00 ( 11.64%)
        Min      free-odr1-64               145.00 (  0.00%)           130.00 ( 10.34%)
        Min      free-odr1-128              148.00 (  0.00%)           134.00 (  9.46%)
        Min      free-odr1-256              148.00 (  0.00%)           137.00 (  7.43%)
        Min      free-odr1-512              151.00 (  0.00%)           140.00 (  7.28%)
        Min      free-odr1-1024             154.00 (  0.00%)           143.00 (  7.14%)
        Min      free-odr1-2048             156.00 (  0.00%)           144.00 (  7.69%)
        Min      free-odr1-4096             156.00 (  0.00%)           142.00 (  8.97%)
        Min      free-odr1-8192             156.00 (  0.00%)           140.00 ( 10.26%)
        Min      free-odr2-1                361.00 (  0.00%)           457.00 (-26.59%)
        Min      free-odr2-2                258.00 (  0.00%)           224.00 ( 13.18%)
        Min      free-odr2-4                208.00 (  0.00%)           223.00 ( -7.21%)
        Min      free-odr2-8                185.00 (  0.00%)           160.00 ( 13.51%)
        Min      free-odr2-16               173.00 (  0.00%)           149.00 ( 13.87%)
        Min      free-odr2-32               166.00 (  0.00%)           145.00 ( 12.65%)
        Min      free-odr2-64               166.00 (  0.00%)           146.00 ( 12.05%)
        Min      free-odr2-128              169.00 (  0.00%)           148.00 ( 12.43%)
        Min      free-odr2-256              170.00 (  0.00%)           152.00 ( 10.59%)
        Min      free-odr2-512              177.00 (  0.00%)           156.00 ( 11.86%)
        Min      free-odr2-1024             182.00 (  0.00%)           162.00 ( 10.99%)
        Min      free-odr2-2048             181.00 (  0.00%)           160.00 ( 11.60%)
        Min      free-odr2-4096             180.00 (  0.00%)           159.00 ( 11.67%)
        Min      free-odr3-1                431.00 (  0.00%)           367.00 ( 14.85%)
        Min      free-odr3-2                306.00 (  0.00%)           259.00 ( 15.36%)
        Min      free-odr3-4                249.00 (  0.00%)           208.00 ( 16.47%)
        Min      free-odr3-8                224.00 (  0.00%)           186.00 ( 16.96%)
        Min      free-odr3-16               208.00 (  0.00%)           176.00 ( 15.38%)
        Min      free-odr3-32               206.00 (  0.00%)           174.00 ( 15.53%)
        Min      free-odr3-64               210.00 (  0.00%)           178.00 ( 15.24%)
        Min      free-odr3-128              215.00 (  0.00%)           182.00 ( 15.35%)
        Min      free-odr3-256              224.00 (  0.00%)           189.00 ( 15.62%)
        Min      free-odr3-512              232.00 (  0.00%)           195.00 ( 15.95%)
        Min      free-odr3-1024             230.00 (  0.00%)           195.00 ( 15.22%)
        Min      free-odr3-2048             229.00 (  0.00%)           193.00 ( 15.72%)
        Min      free-odr4-1                561.00 (  0.00%)           439.00 ( 21.75%)
        Min      free-odr4-2                418.00 (  0.00%)           318.00 ( 23.92%)
        Min      free-odr4-4                339.00 (  0.00%)           269.00 ( 20.65%)
        Min      free-odr4-8                299.00 (  0.00%)           239.00 ( 20.07%)
        Min      free-odr4-16               289.00 (  0.00%)           234.00 ( 19.03%)
        Min      free-odr4-32               291.00 (  0.00%)           235.00 ( 19.24%)
        Min      free-odr4-64               298.00 (  0.00%)           238.00 ( 20.13%)
        Min      free-odr4-128              308.00 (  0.00%)           251.00 ( 18.51%)
        Min      free-odr4-256              321.00 (  0.00%)           267.00 ( 16.82%)
        Min      free-odr4-512              327.00 (  0.00%)           269.00 ( 17.74%)
        Min      free-odr4-1024             326.00 (  0.00%)           271.00 ( 16.87%)
        Min      total-odr0-1               644.00 (  0.00%)           494.00 ( 23.29%)
        Min      total-odr0-2               466.00 (  0.00%)           356.00 ( 23.61%)
        Min      total-odr0-4               376.00 (  0.00%)           291.00 ( 22.61%)
        Min      total-odr0-8               328.00 (  0.00%)           251.00 ( 23.48%)
        Min      total-odr0-16              304.00 (  0.00%)           234.00 ( 23.03%)
        Min      total-odr0-32              289.00 (  0.00%)           224.00 ( 22.49%)
        Min      total-odr0-64              282.00 (  0.00%)           218.00 ( 22.70%)
        Min      total-odr0-128             280.00 (  0.00%)           216.00 ( 22.86%)
        Min      total-odr0-256             310.00 (  0.00%)           243.00 ( 21.61%)
        Min      total-odr0-512             329.00 (  0.00%)           273.00 ( 17.02%)
        Min      total-odr0-1024            346.00 (  0.00%)           290.00 ( 16.18%)
        Min      total-odr0-2048            357.00 (  0.00%)           304.00 ( 14.85%)
        Min      total-odr0-4096            367.00 (  0.00%)           315.00 ( 14.17%)
        Min      total-odr0-8192            372.00 (  0.00%)           317.00 ( 14.78%)
        Min      total-odr0-16384           373.00 (  0.00%)           319.00 ( 14.48%)
        Min      total-odr1-1               838.00 (  0.00%)           739.00 ( 11.81%)
        Min      total-odr1-2               619.00 (  0.00%)           543.00 ( 12.28%)
        Min      total-odr1-4               495.00 (  0.00%)           433.00 ( 12.53%)
        Min      total-odr1-8               440.00 (  0.00%)           382.00 ( 13.18%)
        Min      total-odr1-16              407.00 (  0.00%)           353.00 ( 13.27%)
        Min      total-odr1-32              398.00 (  0.00%)           341.00 ( 14.32%)
        Min      total-odr1-64              389.00 (  0.00%)           336.00 ( 13.62%)
        Min      total-odr1-128             392.00 (  0.00%)           341.00 ( 13.01%)
        Min      total-odr1-256             391.00 (  0.00%)           344.00 ( 12.02%)
        Min      total-odr1-512             396.00 (  0.00%)           349.00 ( 11.87%)
        Min      total-odr1-1024            402.00 (  0.00%)           357.00 ( 11.19%)
        Min      total-odr1-2048            409.00 (  0.00%)           364.00 ( 11.00%)
        Min      total-odr1-4096            414.00 (  0.00%)           366.00 ( 11.59%)
        Min      total-odr1-8192            417.00 (  0.00%)           369.00 ( 11.51%)
        Min      total-odr2-1               921.00 (  0.00%)          1210.00 (-31.38%)
        Min      total-odr2-2               682.00 (  0.00%)           576.00 ( 15.54%)
        Min      total-odr2-4               547.00 (  0.00%)           616.00 (-12.61%)
        Min      total-odr2-8               483.00 (  0.00%)           406.00 ( 15.94%)
        Min      total-odr2-16              449.00 (  0.00%)           376.00 ( 16.26%)
        Min      total-odr2-32              437.00 (  0.00%)           366.00 ( 16.25%)
        Min      total-odr2-64              431.00 (  0.00%)           363.00 ( 15.78%)
        Min      total-odr2-128             433.00 (  0.00%)           365.00 ( 15.70%)
        Min      total-odr2-256             434.00 (  0.00%)           371.00 ( 14.52%)
        Min      total-odr2-512             446.00 (  0.00%)           379.00 ( 15.02%)
        Min      total-odr2-1024            461.00 (  0.00%)           392.00 ( 14.97%)
        Min      total-odr2-2048            464.00 (  0.00%)           395.00 ( 14.87%)
        Min      total-odr2-4096            465.00 (  0.00%)           398.00 ( 14.41%)
        Min      total-odr3-1              1060.00 (  0.00%)           872.00 ( 17.74%)
        Min      total-odr3-2               778.00 (  0.00%)           633.00 ( 18.64%)
        Min      total-odr3-4               632.00 (  0.00%)           510.00 ( 19.30%)
        Min      total-odr3-8               565.00 (  0.00%)           452.00 ( 20.00%)
        Min      total-odr3-16              524.00 (  0.00%)           424.00 ( 19.08%)
        Min      total-odr3-32              514.00 (  0.00%)           415.00 ( 19.26%)
        Min      total-odr3-64              515.00 (  0.00%)           419.00 ( 18.64%)
        Min      total-odr3-128             523.00 (  0.00%)           426.00 ( 18.55%)
        Min      total-odr3-256             541.00 (  0.00%)           438.00 ( 19.04%)
        Min      total-odr3-512             559.00 (  0.00%)           451.00 ( 19.32%)
        Min      total-odr3-1024            561.00 (  0.00%)           456.00 ( 18.72%)
        Min      total-odr3-2048            562.00 (  0.00%)           459.00 ( 18.33%)
        Min      total-odr4-1              1328.00 (  0.00%)          1011.00 ( 23.87%)
        Min      total-odr4-2               997.00 (  0.00%)           747.00 ( 25.08%)
        Min      total-odr4-4               813.00 (  0.00%)           615.00 ( 24.35%)
        Min      total-odr4-8               721.00 (  0.00%)           550.00 ( 23.72%)
        Min      total-odr4-16              689.00 (  0.00%)           529.00 ( 23.22%)
        Min      total-odr4-32              683.00 (  0.00%)           528.00 ( 22.69%)
        Min      total-odr4-64              692.00 (  0.00%)           531.00 ( 23.27%)
        Min      total-odr4-128             713.00 (  0.00%)           556.00 ( 22.02%)
        Min      total-odr4-256             738.00 (  0.00%)           586.00 ( 20.60%)
        Min      total-odr4-512             753.00 (  0.00%)           595.00 ( 20.98%)
        Min      total-odr4-1024            752.00 (  0.00%)           600.00 ( 20.21%)
      
      This patch (of 27):
      
      order-0 pages by definition cannot be compound so avoid the check in the
      fast path for those pages.
      
      [akpm@linux-foundation.org: use unlikely(order) in free_pages_prepare(), per Vlastimil]
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d61f8590
    • M
      mm, oom: move GFP_NOFS check to out_of_memory · 3da88fb3
      Michal Hocko 提交于
      __alloc_pages_may_oom is the central place to decide when the
      out_of_memory should be invoked.  This is a good approach for most
      checks there because they are page allocator specific and the allocation
      fails right after for all of them.
      
      The notable exception is GFP_NOFS context which is faking
      did_some_progress and keep the page allocator looping even though there
      couldn't have been any progress from the OOM killer.  This patch doesn't
      change this behavior because we are not ready to allow those allocation
      requests to fail yet (and maybe we will face the reality that we will
      never manage to safely fail these request).  Instead __GFP_FS check is
      moved down to out_of_memory and prevent from OOM victim selection there.
      There are two reasons for that
      
      	- OOM notifiers might release some memory even from this context
      	  as none of the registered notifier seems to be FS related
      	- this might help a dying thread to get an access to memory
                reserves and move on which will make the behavior more
                consistent with the case when the task gets killed from a
                different context.
      
      Keep a comment in __alloc_pages_may_oom to make sure we do not forget
      how GFP_NOFS is special and that we really want to do something about
      it.
      
      Note to the current oom_notifier users:
      
      The observable difference for you is that oom notifiers cannot depend on
      any fs locks because we could deadlock.  Not that this would be allowed
      today because that would just lockup machine in most of the cases and
      ruling out the OOM killer along the way.  Another difference is that
      callbacks might be invoked sooner now because GFP_NOFS is a weaker
      reclaim context and so there could be reclaimable memory which is just
      not reachable now.  That would require GFP_NOFS only loads which are
      really rare and more importantly the observable result would be dropping
      of reconstructible object and potential performance drop which is not
      such a big deal when we are struggling to fulfill other important
      allocation requests.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Raushaniya Maksudova <rmaksudova@parallels.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3da88fb3