1. 13 7月, 2019 40 次提交
    • K
      proc: use down_read_killable mmap_sem for /proc/pid/smaps_rollup · a26a9781
      Konstantin Khlebnikov 提交于
      Do not remain stuck forever if something goes wrong.  Using a killable
      lock permits cleanup of stuck tasks and simplifies investigation.
      
      Link: http://lkml.kernel.org/r/156007493429.3335.14666825072272692455.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NCyrill Gorcunov <gorcunov@gmail.com>
      Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a26a9781
    • K
      proc: use down_read_killable mmap_sem for /proc/pid/maps · 8a713e7d
      Konstantin Khlebnikov 提交于
      Do not remain stuck forever if something goes wrong.  Using a killable
      lock permits cleanup of stuck tasks and simplifies investigation.
      
      This function is also used for /proc/pid/smaps.
      
      Link: http://lkml.kernel.org/r/156007493160.3335.14447544314127417266.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NCyrill Gorcunov <gorcunov@gmail.com>
      Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a713e7d
    • T
      tools/vm/slabinfo: add sorting info to help menu · cbf800d9
      Tobin C. Harding 提交于
      Passing more than one sorting option has undefined behaviour.
      
      Add an explicit statement as such to the help menu, this also has the
      advantage of highlighting all the sorting options.
      
      Link: http://lkml.kernel.org/r/20190426022622.4089-5-tobin@kernel.orgSigned-off-by: NTobin C. Harding <tobin@kernel.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>,
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@iki.fi>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbf800d9
    • T
      tools/vm/slabinfo: add option to sort by partial slabs · 53a83f97
      Tobin C. Harding 提交于
      We would like to get a better view of the level of fragmentation within
      the SLUB allocator.  Total number of partial slabs is an indicator of
      fragmentation.
      
      Add a command line option (-P | --partial) to sort the slab list by total
      number of partial slabs.
      
      Link: http://lkml.kernel.org/r/20190426022622.4089-4-tobin@kernel.orgSigned-off-by: NTobin C. Harding <tobin@kernel.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>,
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@iki.fi>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53a83f97
    • T
      tools/vm/slabinfo: add partial slab listing to -X · 1106b205
      Tobin C. Harding 提交于
      We would like to see how fragmented the SLUB allocator is, one window into
      fragmentation is the total number of partial slabs.
      
      Currently `slabinfo -X` shows slabs sorted by loss and by size.  We can
      use this option to also show slabs sorted by number of partial slabs.
      
      Option '-X' can be used in conjunction with '-N' to control the number of
      slabs shown e.g.  list of top 5 slabs:
      
      	slabinfo -X -N5
      
      Add list of slabs ordered by number of partial slabs to output of
      `slabinfo -X`.
      
      Link: http://lkml.kernel.org/r/20190426022622.4089-3-tobin@kernel.orgSigned-off-by: NTobin C. Harding <tobin@kernel.org>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>,
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Pekka Enberg <penberg@iki.fi>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1106b205
    • T
      tools/vm/slabinfo: order command line options · d9149996
      Tobin C. Harding 提交于
      During recent discussion on LKML over SLAB vs SLUB it was suggested by
      Jesper that it would be nice to have a tool to view the current
      fragmentation of the slab allocators.  CC list for this set is taken
      from that thread.
      
      For SLUB we have all the information for this already exposed by the
      kernel and also we have a userspace tool for displaying this info:
      
      	tools/vm/slabinfo.c
      
      Extend slabinfo to improve the fragmentation information by enabling
      sorting of caches by number of partial slabs.
      
      Also add cache list sorted in this manner to the output of `slabinfo -X`.
      
      This patch (of 4):
      
      get_opt() has a spurious character within the option string.  Remove it
      and reorder the options in alphabetic order so that it is easier to keep
      the options correct.  Use the same ordering for command help output and
      long option handling code.
      
      Link: http://lkml.kernel.org/r/20190426022622.4089-2-tobin@kernel.orgSigned-off-by: NTobin C. Harding <tobin@kernel.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pekka Enberg <penberg@iki.fi>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Alexander Duyck <alexander.duyck@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>,
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9149996
    • Y
      mm: vmscan: correct some vmscan counters for THP swapout · 98879b3b
      Yang Shi 提交于
      Commit bd4c82c2 ("mm, THP, swap: delay splitting THP after swapped
      out"), THP can be swapped out in a whole.  But, nr_reclaimed and some
      other vm counters still get inc'ed by one even though a whole THP (512
      pages) gets swapped out.
      
      This doesn't make too much sense to memory reclaim.
      
      For example, direct reclaim may just need reclaim SWAP_CLUSTER_MAX
      pages, reclaiming one THP could fulfill it.  But, if nr_reclaimed is not
      increased correctly, direct reclaim may just waste time to reclaim more
      pages, SWAP_CLUSTER_MAX * 512 pages in worst case.
      
      And, it may cause pgsteal_{kswapd|direct} is greater than
      pgscan_{kswapd|direct}, like the below:
      
      pgsteal_kswapd 122933
      pgsteal_direct 26600225
      pgscan_kswapd 174153
      pgscan_direct 14678312
      
      nr_reclaimed and nr_scanned must be fixed in parallel otherwise it would
      break some page reclaim logic, e.g.
      
      vmpressure: this looks at the scanned/reclaimed ratio so it won't change
      semantics as long as scanned & reclaimed are fixed in parallel.
      
      compaction/reclaim: compaction wants a certain number of physical pages
      freed up before going back to compacting.
      
      kswapd priority raising: kswapd raises priority if we scan fewer pages
      than the reclaim target (which itself is obviously expressed in order-0
      pages).  As a result, kswapd can falsely raise its aggressiveness even
      when it's making great progress.
      
      Other than nr_scanned and nr_reclaimed, some other counters, e.g.
      pgactivate, nr_skipped, nr_ref_keep and nr_unmap_fail need to be fixed too
      since they are user visible via cgroup, /proc/vmstat or trace points,
      otherwise they would be underreported.
      
      When isolating pages from LRUs, nr_taken has been accounted in base page,
      but nr_scanned and nr_skipped are still accounted in THP.  It doesn't make
      too much sense too since this may cause trace point underreport the
      numbers as well.
      
      So accounting those counters in base page instead of accounting THP as one
      page.
      
      nr_dirty, nr_unqueued_dirty, nr_congested and nr_writeback are used by
      file cache, so they are not impacted by THP swap.
      
      This change may result in lower steal/scan ratio in some cases since THP
      may get split during page reclaim, then a part of tail pages get reclaimed
      instead of the whole 512 pages, but nr_scanned is accounted by 512,
      particularly for direct reclaim.  But, this should be not a significant
      issue.
      
      Link: http://lkml.kernel.org/r/1559025859-72759-2-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98879b3b
    • Y
      mm: vmscan: remove double slab pressure by inc'ing sc->nr_scanned · af5d4403
      Yang Shi 提交于
      Commit 9092c71b ("mm: use sc->priority for slab shrink targets") has
      broken up the relationship between sc->nr_scanned and slab pressure.
      The sc->nr_scanned can't double slab pressure anymore.  So, it sounds no
      sense to still keep sc->nr_scanned inc'ed.  Actually, it would prevent
      from adding pressure on slab shrink since excessive sc->nr_scanned would
      prevent from scan->priority raise.
      
      The bonnie test doesn't show this would change the behavior of slab
      shrinkers.
      
      				w/		w/o
      			  /sec    %CP      /sec      %CP
      Sequential delete: 	3960.6    94.6    3997.6     96.2
      Random delete: 		2518      63.8    2561.6     64.6
      
      The slight increase of "/sec" without the patch would be caused by the
      slight increase of CPU usage.
      
      Link: http://lkml.kernel.org/r/1559025859-72759-1-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af5d4403
    • A
      mm: init: report memory auto-initialization features at boot time · 23a5c8cb
      Alexander Potapenko 提交于
      Print the currently enabled stack and heap initialization modes.
      
      Stack initialization is enabled by a config flag, while heap
      initialization is configured at boot time with defaults being set in the
      config.  It's more convenient for the user to have all information about
      these hardening measures in one place at boot, so the user can reason
      about the expected behavior of the running system.
      
      The possible options for stack are:
       - "all" for CONFIG_INIT_STACK_ALL;
       - "byref_all" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL;
       - "byref" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF;
       - "__user" for CONFIG_GCC_PLUGIN_STRUCTLEAK_USER;
       - "off" otherwise.
      
      Depending on the values of init_on_alloc and init_on_free boottime options
      we also report "heap alloc" and "heap free" as "on"/"off".
      
      In the init_on_free mode initializing pages at boot time may take a while,
      so print a notice about that as well.  This depends on how much memory is
      installed, the memory bandwidth, etc.  On a relatively modern x86 system,
      it takes about 0.75s/GB to wipe all memory:
      
        [    0.418722] mem auto-init: stack:byref_all, heap alloc:off, heap free:on
        [    0.419765] mem auto-init: clearing system memory may take some time...
        [   12.376605] Memory: 16408564K/16776672K available (14339K kernel code, 1397K rwdata, 3756K rodata, 1636K init, 11460K bss, 368108K reserved, 0K cma-reserved)
      
      Link: http://lkml.kernel.org/r/20190617151050.92663-3-glider@google.comSigned-off-by: NAlexander Potapenko <glider@google.com>
      Suggested-by: NKees Cook <keescook@chromium.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Sandeep Patil <sspatil@android.com>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Kaiwan N Billimoria <kaiwan@kaiwantech.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23a5c8cb
    • A
      mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options · 6471384a
      Alexander Potapenko 提交于
      Patch series "add init_on_alloc/init_on_free boot options", v10.
      
      Provide init_on_alloc and init_on_free boot options.
      
      These are aimed at preventing possible information leaks and making the
      control-flow bugs that depend on uninitialized values more deterministic.
      
      Enabling either of the options guarantees that the memory returned by the
      page allocator and SL[AU]B is initialized with zeroes.  SLOB allocator
      isn't supported at the moment, as its emulation of kmem caches complicates
      handling of SLAB_TYPESAFE_BY_RCU caches correctly.
      
      Enabling init_on_free also guarantees that pages and heap objects are
      initialized right after they're freed, so it won't be possible to access
      stale data by using a dangling pointer.
      
      As suggested by Michal Hocko, right now we don't let the heap users to
      disable initialization for certain allocations.  There's not enough
      evidence that doing so can speed up real-life cases, and introducing ways
      to opt-out may result in things going out of control.
      
      This patch (of 2):
      
      The new options are needed to prevent possible information leaks and make
      control-flow bugs that depend on uninitialized values more deterministic.
      
      This is expected to be on-by-default on Android and Chrome OS.  And it
      gives the opportunity for anyone else to use it under distros too via the
      boot args.  (The init_on_free feature is regularly requested by folks
      where memory forensics is included in their threat models.)
      
      init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
      objects with zeroes.  Initialization is done at allocation time at the
      places where checks for __GFP_ZERO are performed.
      
      init_on_free=1 makes the kernel initialize freed pages and heap objects
      with zeroes upon their deletion.  This helps to ensure sensitive data
      doesn't leak via use-after-free accesses.
      
      Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
      returns zeroed memory.  The two exceptions are slab caches with
      constructors and SLAB_TYPESAFE_BY_RCU flag.  Those are never
      zero-initialized to preserve their semantics.
      
      Both init_on_alloc and init_on_free default to zero, but those defaults
      can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
      CONFIG_INIT_ON_FREE_DEFAULT_ON.
      
      If either SLUB poisoning or page poisoning is enabled, those options take
      precedence over init_on_alloc and init_on_free: initialization is only
      applied to unpoisoned allocations.
      
      Slowdown for the new features compared to init_on_free=0, init_on_alloc=0:
      
      hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
      hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
      
      Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
      Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
      Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
      Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
      
      The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline
      is within the standard error.
      
      The new features are also going to pave the way for hardware memory
      tagging (e.g.  arm64's MTE), which will require both on_alloc and on_free
      hooks to set the tags for heap objects.  With MTE, tagging will have the
      same cost as memory initialization.
      
      Although init_on_free is rather costly, there are paranoid use-cases where
      in-memory data lifetime is desired to be minimized.  There are various
      arguments for/against the realism of the associated threat models, but
      given that we'll need the infrastructure for MTE anyway, and there are
      people who want wipe-on-free behavior no matter what the performance cost,
      it seems reasonable to include it in this series.
      
      [glider@google.com: v8]
        Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com
      [glider@google.com: v9]
        Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com
      [glider@google.com: v10]
        Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com
      Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.comSigned-off-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: Michal Hocko <mhocko@suse.cz>		[page and dmapool parts
      Acked-by: James Morris <jamorris@linux.microsoft.com>]
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Sandeep Patil <sspatil@android.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6471384a
    • K
      arm64: move jump_label_init() before parse_early_param() · ba5c5e4a
      Kees Cook 提交于
      While jump_label_init() was moved earlier in the boot process in
      efd9e03f ("arm64: Use static keys for CPU features"), it wasn't early
      enough for early params to use it.  The old state of things was as
      described here...
      
      init/main.c calls out to arch-specific things before general jump label
      and early param handling:
      
        asmlinkage __visible void __init start_kernel(void)
        {
              ...
              setup_arch(&command_line);
              ...
              smp_prepare_boot_cpu();
              ...
              /* parameters may set static keys */
              jump_label_init();
              parse_early_param();
              ...
        }
      
      x86 setup_arch() wants those earlier, so it handles jump label and
      early param:
      
        void __init setup_arch(char **cmdline_p)
        {
              ...
              jump_label_init();
              ...
              parse_early_param();
              ...
        }
      
      arm64 setup_arch() only had early param:
      
        void __init setup_arch(char **cmdline_p)
        {
              ...
              parse_early_param();
              ...
      }
      
      with jump label later in smp_prepare_boot_cpu():
      
        void __init smp_prepare_boot_cpu(void)
        {
              ...
              jump_label_init();
              ...
        }
      
      This moves arm64 jump_label_init() from smp_prepare_boot_cpu() to
      setup_arch(), as done already on x86, in preparation from early param
      usage in the init_on_alloc/free() series:
      https://lkml.kernel.org/r/1561572949.5154.81.camel@lca.pw
      
      Link: http://lkml.kernel.org/r/201906271003.005303B52@keescookSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba5c5e4a
    • N
      mm/large system hash: clear hashdist when only one node with memory is booted · e03a5125
      Nicholas Piggin 提交于
      CONFIG_NUMA on 64-bit CPUs currently enables hashdist unconditionally even
      when booting on single node machines.  This causes the large system hashes
      to be allocated with vmalloc, and mapped with small pages.
      
      This change clears hashdist if only one node has come up with memory.
      
      This results in the important large inode and dentry hashes using memblock
      allocations.  All others are within 4MB size up to about 128GB of RAM,
      which allows them to be allocated from the linear map on most non-NUMA
      images.
      
      Other big hashes like futex and TCP should eventually be moved over to the
      same style of allocation as those vfs caches that use HASH_EARLY if
      !hashdist, so they don't exceed MAX_ORDER on very large non-NUMA images.
      
      This brings dTLB misses for linux kernel tree `git diff` from ~45,000 to
      ~8,000 on a Kaby Lake KVM guest with 8MB dentry hash and mitigations=off
      (performance is in the noise, under 1% difference, page tables are likely
      to be well cached for this workload).
      
      Link: http://lkml.kernel.org/r/20190605144814.29319-2-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e03a5125
    • N
      mm/large system hash: use vmalloc for size > MAX_ORDER when !hashdist · ec11408a
      Nicholas Piggin 提交于
      The kernel currently clamps large system hashes to MAX_ORDER when hashdist
      is not set, which is rather arbitrary.
      
      vmalloc space is limited on 32-bit machines, but this shouldn't result in
      much more used because of small physical memory limiting system hash
      sizes.
      
      Include "vmalloc" or "linear" in the kernel log message.
      
      Link: http://lkml.kernel.org/r/20190605144814.29319-1-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec11408a
    • G
    • U
      mm/vmalloc.c: switch to WARN_ON() and move it under unlink_va() · 460e42d1
      Uladzislau Rezki (Sony) 提交于
      Trigger a warning if an object that is about to be freed is detached.  We
      used to have a BUG_ON(), but even though it is considered as faulty
      behaviour that is not a good reason to break a system.
      
      Link: http://lkml.kernel.org/r/20190606120411.8298-5-urezki@gmail.comSigned-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      460e42d1
    • U
      mm/vmalloc.c: get rid of one single unlink_va() when merge · 54f63d9d
      Uladzislau Rezki (Sony) 提交于
      It does not make sense to try to "unlink" the node that is definitely not
      linked with a list nor tree.  On the first merge step VA just points to
      the previously disconnected busy area.
      
      On the second step, check if the node has been merged and do "unlink" if
      so, because now it points to an object that must be linked.
      
      Link: http://lkml.kernel.org/r/20190606120411.8298-4-urezki@gmail.comSigned-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
      Acked-by: NHillf Danton <hdanton@sina.com>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54f63d9d
    • U
      mm/vmalloc.c: preload a CPU with one object for split purpose · 82dd23e8
      Uladzislau Rezki (Sony) 提交于
      Refactor the NE_FIT_TYPE split case when it comes to an allocation of one
      extra object.  We need it in order to build a remaining space.  The
      preload is done per CPU in non-atomic context with GFP_KERNEL flags.
      
      More permissive parameters can be beneficial for systems which are suffer
      from high memory pressure or low memory condition.  For example on my KVM
      system(4xCPUs, no swap, 256MB RAM) i can simulate the failure of page
      allocation with GFP_NOWAIT flags.  Using "stress-ng" tool and starting N
      workers spinning on fork() and exit(), i can trigger below trace:
      
      <snip>
      [  179.815161] stress-ng-fork: page allocation failure: order:0, mode:0x40800(GFP_NOWAIT|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
      [  179.815168] CPU: 0 PID: 12612 Comm: stress-ng-fork Not tainted 5.2.0-rc3+ #1003
      [  179.815170] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [  179.815171] Call Trace:
      [  179.815178]  dump_stack+0x5c/0x7b
      [  179.815182]  warn_alloc+0x108/0x190
      [  179.815187]  __alloc_pages_slowpath+0xdc7/0xdf0
      [  179.815191]  __alloc_pages_nodemask+0x2de/0x330
      [  179.815194]  cache_grow_begin+0x77/0x420
      [  179.815197]  fallback_alloc+0x161/0x200
      [  179.815200]  kmem_cache_alloc+0x1c9/0x570
      [  179.815202]  alloc_vmap_area+0x32c/0x990
      [  179.815206]  __get_vm_area_node+0xb0/0x170
      [  179.815208]  __vmalloc_node_range+0x6d/0x230
      [  179.815211]  ? _do_fork+0xce/0x3d0
      [  179.815213]  copy_process.part.46+0x850/0x1b90
      [  179.815215]  ? _do_fork+0xce/0x3d0
      [  179.815219]  _do_fork+0xce/0x3d0
      [  179.815226]  ? __do_page_fault+0x2bf/0x4e0
      [  179.815229]  do_syscall_64+0x55/0x130
      [  179.815231]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  179.815234] RIP: 0033:0x7fedec4c738b
      ...
      [  179.815237] RSP: 002b:00007ffda469d730 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
      [  179.815239] RAX: ffffffffffffffda RBX: 00007ffda469d730 RCX: 00007fedec4c738b
      [  179.815240] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
      [  179.815241] RBP: 00007ffda469d780 R08: 00007fededd6e300 R09: 00007ffda47f50a0
      [  179.815242] R10: 00007fededd6e5d0 R11: 0000000000000246 R12: 0000000000000000
      [  179.815243] R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000
      [  179.815245] Mem-Info:
      [  179.815249] active_anon:12686 inactive_anon:14760 isolated_anon:0
                      active_file:502 inactive_file:61 isolated_file:70
                      unevictable:2 dirty:0 writeback:0 unstable:0
                      slab_reclaimable:2380 slab_unreclaimable:7520
                      mapped:15069 shmem:14813 pagetables:10833 bounce:0
                      free:1922 free_pcp:229 free_cma:0
      <snip>
      
      Link: http://lkml.kernel.org/r/20190606120411.8298-3-urezki@gmail.comSigned-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82dd23e8
    • U
      mm/vmalloc.c: remove "node" argument · cacca6ba
      Uladzislau Rezki (Sony) 提交于
      Patch series "Some cleanups for the KVA/vmalloc", v5.
      
      This patch (of 4):
      
      Remove unused argument from the __alloc_vmap_area() function.
      
      Link: http://lkml.kernel.org/r/20190606120411.8298-2-urezki@gmail.comSigned-off-by: NUladzislau Rezki (Sony) <urezki@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cacca6ba
    • J
      mm/mmu_notifier: use hlist_add_head_rcu() · 543bdb2d
      Jean-Philippe Brucker 提交于
      Make mmu_notifier_register() safer by issuing a memory barrier before
      registering a new notifier.  This fixes a theoretical bug on weakly
      ordered CPUs.  For example, take this simplified use of notifiers by a
      driver:
      
      	my_struct->mn.ops = &my_ops; /* (1) */
      	mmu_notifier_register(&my_struct->mn, mm)
      		...
      		hlist_add_head(&mn->hlist, &mm->mmu_notifiers); /* (2) */
      		...
      
      Once mmu_notifier_register() releases the mm locks, another thread can
      invalidate a range:
      
      	mmu_notifier_invalidate_range()
      		...
      		hlist_for_each_entry_rcu(mn, &mm->mmu_notifiers, hlist) {
      			if (mn->ops->invalidate_range)
      
      The read side relies on the data dependency between mn and ops to ensure
      that the pointer is properly initialized.  But the write side doesn't have
      any dependency between (1) and (2), so they could be reordered and the
      readers could dereference an invalid mn->ops.  mmu_notifier_register()
      does take all the mm locks before adding to the hlist, but those have
      acquire semantics which isn't sufficient.
      
      By calling hlist_add_head_rcu() instead of hlist_add_head() we update the
      hlist using a store-release, ensuring that readers see prior
      initialization of my_struct.  This situation is better illustated by
      litmus test MP+onceassign+derefonce.
      
      Link: http://lkml.kernel.org/r/20190502133532.24981-1-jean-philippe.brucker@arm.com
      Fixes: cddb8a5c ("mmu-notifiers: core")
      Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      543bdb2d
    • M
      mm/memory.c: fail when offset == num in first check of __vm_map_pages() · 96756fcb
      Miguel Ojeda 提交于
      If the caller asks us for offset == num, we should already fail in the
      first check, i.e.  the one testing for offsets beyond the object.
      
      At the moment, we are failing on the second test anyway, since count
      cannot be 0.  Still, to agree with the comment of the first test, we
      should first test it there.
      
      Link: http://lkml.kernel.org/r/20190528193004.GA7744@gmail.comSigned-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96756fcb
    • A
      mm/pgtable: drop pgtable_t variable from pte_fn_t functions · 8b1e0f81
      Anshuman Khandual 提交于
      Drop the pgtable_t variable from all implementation for pte_fn_t as none
      of them use it.  apply_to_pte_range() should stop computing it as well.
      Should help us save some cycles.
      
      Link: http://lkml.kernel.org/r/1556803126-26596-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: NMatthew Wilcox <willy@infradead.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <jglisse@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b1e0f81
    • M
      unicore32: switch to generic version of pte allocation · c2471e79
      Mike Rapoport 提交于
      Replace __get_free_page() and alloc_pages() calls with the generic
      __pte_alloc_one_kernel() and __pte_alloc_one().
      
      There is no functional change for the kernel PTE allocation.
      
      The difference for the user PTEs, is that the clear_pte_table() is now
      called after pgtable_page_ctor() and the addition of __GFP_ACCOUNT to the
      GFP flags.
      
      The pte_free() and pte_free_kernel() versions are identical to the generic
      ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-15-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2471e79
    • M
      um: switch to generic version of pte allocation · f32848e1
      Mike Rapoport 提交于
      um allocates PTE pages with __get_free_page() and uses
      GFP_KERNEL | __GFP_ZERO for the allocations.
      
      Switch it to the generic version that does exactly the same thing for the
      kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
      
      The pte_free() and pte_free_kernel() versions are identical to the generic
      ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-14-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
      Acked-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f32848e1
    • M
      riscv: switch to generic version of pte allocation · d1b46fe5
      Mike Rapoport 提交于
      The only difference between the generic and RISC-V implementation of PTE
      allocation is the usage of __GFP_RETRY_MAYFAIL for both kernel and user
      PTEs and the absence of __GFP_ACCOUNT for the user PTEs.
      
      The conversion to the generic version removes the __GFP_RETRY_MAYFAIL and
      ensures that GFP_ACCOUNT is used for the user PTE allocations.
      
      The pte_free() and pte_free_kernel() versions are identical to the generic
      ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-13-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NPalmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1b46fe5
    • M
      parisc: switch to generic version of pte allocation · 3f4a1308
      Mike Rapoport 提交于
      parisc allocates PTE pages with __get_free_page() and uses
      GFP_KERNEL | __GFP_ZERO for the allocations.
      
      Switch it to the generic version that does exactly the same thing for the
      kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
      
      The pte_free_kernel() and pte_free() versions on are identical to the
      generic ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-12-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f4a1308
    • M
      nios2: switch to generic version of pte allocation · fc7835c2
      Mike Rapoport 提交于
      nios2 allocates kernel PTE pages with
      
              __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTE_ORDER);
      
      and user page tables with
      
              pte = alloc_pages(GFP_KERNEL, PTE_ORDER);
              if (pte)
                      clear_highpage();
      
      The PTE_ORDER is hardwired to zero, which makes nios2 implementation almost
      identical to the generic one.
      
      Switch nios2 to the generic version that does exactly the same thing for
      the kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
      
      The pte_free_kernel() and pte_free() versions on nios2 are identical to the
      generic ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-11-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc7835c2
    • M
      nds32: switch to generic version of pte allocation · f52a8e1a
      Mike Rapoport 提交于
      The nds32 implementation of pte_alloc_one_kernel() differs from the
      generic in the use of __GFP_RETRY_MAYFAIL flag, which is removed after the
      conversion.
      
      The nds32 version of pte_alloc_one() missed the call to
      pgtable_page_ctor() and also used __GFP_RETRY_MAYFAIL.  Switching it to
      use generic __pte_alloc_one() for the PTE page allocation ensures that
      page table constructor is run and the user page tables are allocated with
      __GFP_ACCOUNT.
      
      The conversion to the generic version of pte_free_kernel() removes the
      NULL check for pte.
      
      The pte_free() version on nds32 is identical to the generic one and can be
      simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-10-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f52a8e1a
    • M
      mips: switch to generic version of pte allocation · b7902ce1
      Mike Rapoport 提交于
      MIPS allocates kernel PTE pages with
      
      	__get_free_pages(GFP_KERNEL | __GFP_ZERO, PTE_ORDER)
      
      and user PTE pages with
      
      	pte = alloc_pages(GFP_KERNEL, PTE_ORDER)
      
      and then uses clear_highpage(pte) to zero out the allocated page for the
      user page tables.
      
      The PTE_ORDER is hardwired to zero, which makes MIPS implementation almost
      identical to the generic one.
      
      Switch MIPS to the generic version that does exactly the same thing for the
      kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
      
      The pte_free_kernel() and pte_free() versions on mips are identical to the
      generic ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-9-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NPaul Burton <paul.burton@mips.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7902ce1
    • M
      m68k: sun3: switch to generic version of pte allocation · 14c0a39c
      Mike Rapoport 提交于
      The sun3 MMU variant of m68k uses GFP_KERNEL to allocate a PTE page and
      then memset(0) or clear_highpage() to clear it.
      
      This is equivalent to allocating the page with GFP_KERNEL | __GFP_ZERO,
      which allows replacing sun3 implementation of pte_alloc_one() and
      pte_alloc_one_kernel() with the generic ones.
      
      The pte_free() and pte_free_kernel() versions are identical to the generic
      ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-8-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      14c0a39c
    • M
      csky: switch to generic version of pte allocation · bd5ff066
      Mike Rapoport 提交于
      The csky implementation pte_alloc_one(), pte_free_kernel() and pte_free()
      is identical to the generic except of lack of __GFP_ACCOUNT for the user
      PTEs allocation.
      
      Switch csky to use generic version of these functions.
      
      The csky implementation of pte_alloc_one_kernel() is not replaced because
      it does not clear the allocated page but rather sets each PTE in it to a
      non-zero value.
      
      The pte_free_kernel() and pte_free() versions on csky are identical to the
      generic ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-6-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NGuo Ren <ren_guo@c-sky.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd5ff066
    • M
      arm64: switch to generic version of pte allocation · 50f11a8a
      Mike Rapoport 提交于
      The PTE allocations in arm64 are identical to the generic ones modulo the
      GFP flags.
      
      Using the generic pte_alloc_one() functions ensures that the user page
      tables are allocated with __GFP_ACCOUNT set.
      
      The arm64 definition of PGALLOC_GFP is removed and replaced with
      GFP_PGTABLE_USER for p[gum]d_alloc_one() for the user page tables and
      GFP_PGTABLE_KERNEL for the kernel page tables. The KVM memory cache is now
      using GFP_PGTABLE_USER.
      
      The mappings created with create_pgd_mapping() are now using
      GFP_PGTABLE_KERNEL.
      
      The conversion to the generic version of pte_free_kernel() removes the NULL
      check for pte.
      
      The pte_free() version on arm64 is identical to the generic one and
      can be simply dropped.
      
      [cai@lca.pw: fix a bogus GFP flag in pgd_alloc()]
        Link: https://lore.kernel.org/r/1559656836-24940-1-git-send-email-cai@lca.pw/
      [and fix it more]
        Link: https://lore.kernel.org/linux-mm/20190617151252.GF16810@rapoport-lnx/
      Link: http://lkml.kernel.org/r/1557296232-15361-5-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50f11a8a
    • M
      arm: switch to generic version of pte allocation · 28bcf593
      Mike Rapoport 提交于
      Replace __get_free_page() and alloc_pages() calls with the generic
      __pte_alloc_one_kernel() and __pte_alloc_one().
      
      There is no functional change for the kernel PTE allocation.
      
      The difference for the user PTEs, is that the clear_pte_table() is now
      called after pgtable_page_ctor() and the addition of __GFP_ACCOUNT to the
      GFP flags.
      
      The conversion to the generic version of pte_free_kernel() removes the NULL
      check for pte.
      
      The pte_free() version on arm is identical to the generic one and can be
      simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-4-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28bcf593
    • M
      alpha: switch to generic version of pte allocation · bc3ace9b
      Mike Rapoport 提交于
      alpha allocates PTE pages with __get_free_page() and uses
      GFP_KERNEL | __GFP_ZERO for the allocations.
      
      Switch it to the generic version that does exactly the same thing for the
      kernel page tables and adds __GFP_ACCOUNT for the user PTEs.
      
      The alpha pte_free() and pte_free_kernel() versions are identical to the
      generic ones and can be simply dropped.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-3-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc3ace9b
    • M
      asm-generic, x86: introduce generic pte_{alloc,free}_one[_kernel] · 5fba4af4
      Mike Rapoport 提交于
      Most architectures have identical or very similar implementation of
      pte_alloc_one_kernel(), pte_alloc_one(), pte_free_kernel() and
      pte_free().
      
      Add a generic implementation that can be reused across architectures and
      enable its use on x86.
      
      The generic implementation uses
      
      	GFP_KERNEL | __GFP_ZERO
      
      for the kernel page tables and
      
      	GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT
      
      for the user page tables.
      
      The "base" functions for PTE allocation, namely __pte_alloc_one_kernel()
      and __pte_alloc_one() are intended for the architectures that require
      additional actions after actual memory allocation or must use non-default
      GFP flags.
      
      x86 is switched to use generic pte_alloc_one_kernel(), pte_free_kernel() and
      pte_free().
      
      x86 still implements pte_alloc_one() to allow run-time control of GFP
      flags required for "userpte" command line option.
      
      Link: http://lkml.kernel.org/r/1557296232-15361-2-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Sam Creasey <sammy@sammy.net>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fba4af4
    • G
      mm/gup.c: mark undo_dev_pagemap as __maybe_unused · 790c7369
      Guenter Roeck 提交于
      Several mips builds generate the following build warning.
      
        mm/gup.c:1788:13: warning: 'undo_dev_pagemap' defined but not used
      
      The function is declared unconditionally but only called from behind
      various ifdefs. Mark it __maybe_unused.
      
      Link: http://lkml.kernel.org/r/1562072523-22311-1-git-send-email-linux@roeck-us.netSigned-off-by: NGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      790c7369
    • A
      mm/gup.c: remove some BUG_ONs from get_gate_page() · b5d1c39f
      Andy Lutomirski 提交于
      If we end up without a PGD or PUD entry backing the gate area, don't BUG
      -- just fail gracefully.
      
      It's not entirely implausible that this could happen some day on x86.  It
      doesn't right now even with an execute-only emulated vsyscall page because
      the fixmap shares the PUD, but the core mm code shouldn't rely on that
      particular detail to avoid OOPSing.
      
      Link: http://lkml.kernel.org/r/a1d9f4efb75b9d464e59fd6af00104b21c58f6f7.1561610798.git.luto@kernel.orgSigned-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5d1c39f
    • P
      mm/gup: speed up check_and_migrate_cma_pages() on huge page · aa712399
      Pingfan Liu 提交于
      Both hugetlb and thp locate on the same migration type of pageblock, since
      they are allocated from a free_list[].  Based on this fact, it is enough
      to check on a single subpage to decide the migration type of the whole
      huge page.  By this way, it saves (2M/4K - 1) times loop for pmd_huge on
      x86, similar on other archs.
      
      Furthermore, when executing isolate_huge_page(), it avoid taking global
      hugetlb_lock many times, and meanless remove/add to the local link list
      cma_page_list.
      
      [akpm@linux-foundation.org: make `i' and `step' unsigned]
      Link: http://lkml.kernel.org/r/1561612545-28997-1-git-send-email-kernelfans@gmail.comSigned-off-by: NPingfan Liu <kernelfans@gmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa712399
    • C
      mm: mark the page referenced in gup_hugepte · 520b4a44
      Christoph Hellwig 提交于
      All other get_user_page_fast cases mark the page referenced, so do this
      here as well.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-17-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      520b4a44
    • C
      mm: switch gup_hugepte to use try_get_compound_head · 01a36916
      Christoph Hellwig 提交于
      This applies the overflow fixes from 8fde12ca ("mm: prevent
      get_user_pages() from overflowing page refcount") to the powerpc hugepd
      code and brings it back in sync with the other GUP cases.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-16-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01a36916
    • C
      mm: move the powerpc hugepd code to mm/gup.c · cbd34da7
      Christoph Hellwig 提交于
      While only powerpc supports the hugepd case, the code is pretty generic
      and I'd like to keep all GUP internals in one place.
      
      Link: http://lkml.kernel.org/r/20190625143715.1689-15-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbd34da7