1. 15 12月, 2016 6 次提交
    • J
      mm: use vmf->address instead of of vmf->virtual_address · 1a29d85e
      Jan Kara 提交于
      Every single user of vmf->virtual_address typed that entry to unsigned
      long before doing anything with it so the type of virtual_address does
      not really provide us any additional safety.  Just use masked
      vmf->address which already has the appropriate type.
      
      Link: http://lkml.kernel.org/r/1479460644-25076-3-git-send-email-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a29d85e
    • J
      mm: join struct fault_env and vm_fault · 82b0f8c3
      Jan Kara 提交于
      Currently we have two different structures for passing fault information
      around - struct vm_fault and struct fault_env.  DAX will need more
      information in struct vm_fault to handle its faults so the content of
      that structure would become event closer to fault_env.  Furthermore it
      would need to generate struct fault_env to be able to call some of the
      generic functions.  So at this point I don't think there's much use in
      keeping these two structures separate.  Just embed into struct vm_fault
      all that is needed to use it for both purposes.
      
      Link: http://lkml.kernel.org/r/1479460644-25076-2-git-send-email-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82b0f8c3
    • L
      mm: unexport __get_user_pages_unlocked() · 8b7457ef
      Lorenzo Stoakes 提交于
      Unexport the low-level __get_user_pages_unlocked() function and replaces
      invocations with calls to more appropriate higher-level functions.
      
      In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked()
      with get_user_pages_unlocked() since we can now pass gup_flags.
      
      In async_pf_execute() and process_vm_rw_single_vec() we need to pass
      different tsk, mm arguments so get_user_pages_remote() is the sane
      replacement in these cases (having added manual acquisition and release
      of mmap_sem.)
      
      Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH
      flag.  However, this flag was originally silently dropped by commit
      1e987790 ("mm/gup: Introduce get_user_pages_remote()"), so this
      appears to have been unintentional and reintroducing it is therefore not
      an issue.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20161027095141.2569-3-lstoakes@gmail.comSigned-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b7457ef
    • L
      mm: add locked parameter to get_user_pages_remote() · 5b56d49f
      Lorenzo Stoakes 提交于
      Patch series "mm: unexport __get_user_pages_unlocked()".
      
      This patch series continues the cleanup of get_user_pages*() functions
      taking advantage of the fact we can now pass gup_flags as we please.
      
      It firstly adds an additional 'locked' parameter to
      get_user_pages_remote() to allow for its callers to utilise
      VM_FAULT_RETRY functionality.  This is necessary as the invocation of
      __get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of
      this and no other existing higher level function would allow it to do
      so.
      
      Secondly existing callers of __get_user_pages_unlocked() are replaced
      with the appropriate higher-level replacement -
      get_user_pages_unlocked() if the current task and memory descriptor are
      referenced, or get_user_pages_remote() if other task/memory descriptors
      are referenced (having acquiring mmap_sem.)
      
      This patch (of 2):
      
      Add a int *locked parameter to get_user_pages_remote() to allow
      VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().
      
      Taking into account the previous adjustments to get_user_pages*()
      functions allowing for the passing of gup_flags, we are now in a
      position where __get_user_pages_unlocked() need only be exported for his
      ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to
      subsequently unexport __get_user_pages_unlocked() as well as allowing
      for future flexibility in the use of get_user_pages_remote().
      
      [sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change]
        Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.comSigned-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b56d49f
    • A
      mm: add support for releasing multiple instances of a page · 44fdffd7
      Alexander Duyck 提交于
      Add a function that allows us to batch free a page that has multiple
      references outstanding.  Specifically this function can be used to drop
      a page being used in the page frag alloc cache.  With this drivers can
      make use of functionality similar to the page frag alloc cache without
      having to do any workarounds for the fact that there is no function that
      frees multiple references.
      
      Link: http://lkml.kernel.org/r/20161110113606.76501.70752.stgit@ahduyck-blue-test.jf.intel.comSigned-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Keguang Zhang <keguang.zhang@gmail.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Tobias Klauser <tklauser@distanz.ch>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44fdffd7
    • M
      mm, compaction: allow compaction for GFP_NOFS requests · 73e64c51
      Michal Hocko 提交于
      compaction has been disabled for GFP_NOFS and GFP_NOIO requests since
      the direct compaction was introduced by commit 56de7263 ("mm:
      compaction: direct compact when a high-order allocation fails").  The
      main reason is that the migration of page cache pages might recurse back
      to fs/io layer and we could potentially deadlock.  This is overly
      conservative because all the anonymous memory is migrateable in the
      GFP_NOFS context just fine.  This might be a large portion of the memory
      in many/most workkloads.
      
      Remove the GFP_NOFS restriction and make sure that we skip all fs pages
      (those with a mapping) while isolating pages to be migrated.  We cannot
      consider clean fs pages because they might need a metadata update so
      only isolate pages without any mapping for nofs requests.
      
      The effect of this patch will be probably very limited in many/most
      workloads because higher order GFP_NOFS requests are quite rare,
      although different configurations might lead to very different results.
      David Chinner has mentioned a heavy metadata workload with 64kB block
      which to quote him:
      
      : Unfortunately, there was an era of cargo cult configuration tweaks in the
      : Ceph community that has resulted in a large number of production machines
      : with XFS filesystems configured this way.  And a lot of them store large
      : numbers of small files and run under significant sustained memory
      : pressure.
      :
      : I slowly working towards getting rid of these high order allocations and
      : replacing them with the equivalent number of single page allocations, but
      : I haven't got that (complex) change working yet.
      
      We can do the following to simulate that workload:
      $ mkfs.xfs -f -n size=64k <dev>
      $ mount <dev> /mnt/scratch
      $ time ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32 \
              -d  /mnt/scratch/0  -d  /mnt/scratch/1 \
              -d  /mnt/scratch/2  -d  /mnt/scratch/3 \
              -d  /mnt/scratch/4  -d  /mnt/scratch/5 \
              -d  /mnt/scratch/6  -d  /mnt/scratch/7 \
              -d  /mnt/scratch/8  -d  /mnt/scratch/9 \
              -d  /mnt/scratch/10  -d  /mnt/scratch/11 \
              -d  /mnt/scratch/12  -d  /mnt/scratch/13 \
              -d  /mnt/scratch/14  -d  /mnt/scratch/15
      
      and indeed is hammers the system with many high order GFP_NOFS requests as
      per a simle tracepoint during the load:
      $ echo '!(gfp_flags & 0x80) && (gfp_flags &0x400000)' > $TRACE_MNT/events/kmem/mm_page_alloc/filter
      I am getting
      5287609 order=0
           37 order=1
      1594905 order=2
      3048439 order=3
      6699207 order=4
        66645 order=5
      
      My testing was done in a kvm guest so performance numbers should be
      taken with a grain of salt but there seems to be a difference when the
      patch is applied:
      
      * Original kernel
      FSUse%        Count         Size    Files/sec     App Overhead
           1      1600000            0       4300.1         20745838
           3      3200000            0       4239.9         23849857
           5      4800000            0       4243.4         25939543
           6      6400000            0       4248.4         19514050
           8      8000000            0       4262.1         20796169
           9      9600000            0       4257.6         21288675
          11     11200000            0       4259.7         19375120
          13     12800000            0       4220.7         22734141
          14     14400000            0       4238.5         31936458
          16     16000000            0       4231.5         23409901
          18     17600000            0       4045.3         23577700
          19     19200000            0       2783.4         58299526
          21     20800000            0       2678.2         40616302
          23     22400000            0       2693.5         83973996
      
      and xfs complaining about memory allocation not making progress
      [ 2304.372647] XFS: fs_mark(3289) possible memory allocation deadlock size 65624 in kmem_alloc (mode:0x2408240)
      [ 2304.443323] XFS: fs_mark(3285) possible memory allocation deadlock size 65728 in kmem_alloc (mode:0x2408240)
      [ 4796.772477] XFS: fs_mark(3424) possible memory allocation deadlock size 46936 in kmem_alloc (mode:0x2408240)
      [ 4796.775329] XFS: fs_mark(3423) possible memory allocation deadlock size 51416 in kmem_alloc (mode:0x2408240)
      [ 4797.388808] XFS: fs_mark(3424) possible memory allocation deadlock size 65728 in kmem_alloc (mode:0x2408240)
      
      * Patched kernel
      FSUse%        Count         Size    Files/sec     App Overhead
           1      1600000            0       4289.1         19243934
           3      3200000            0       4241.6         32828865
           5      4800000            0       4248.7         32884693
           6      6400000            0       4314.4         19608921
           8      8000000            0       4269.9         24953292
           9      9600000            0       4270.7         33235572
          11     11200000            0       4346.4         40817101
          13     12800000            0       4285.3         29972397
          14     14400000            0       4297.2         20539765
          16     16000000            0       4219.6         18596767
          18     17600000            0       4273.8         49611187
          19     19200000            0       4300.4         27944451
          21     20800000            0       4270.6         22324585
          22     22400000            0       4317.6         22650382
          24     24000000            0       4065.2         22297964
      
      So the dropdown at Count 19200000 didn't happen and there was only a
      single warning about allocation not making progress
      [ 3063.815003] XFS: fs_mark(3272) possible memory allocation deadlock size 65624 in kmem_alloc (mode:0x2408240)
      
      This suggests that the patch has helped even though there is not all that
      much of anonymous memory as the workload mostly generates fs metadata.  I
      assume the success rate would be higher with more anonymous memory which
      should be the case in many workloads.
      
      [akpm@linux-foundation.org: fix comment]
      Link: http://lkml.kernel.org/r/20161012114721.31853-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73e64c51
  2. 13 12月, 2016 34 次提交