1. 30 4月, 2013 5 次提交
    • J
      mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo() · d4033afd
      Joonsoo Kim 提交于
      This patch is a preparatory step for removing vmlist entirely.  For
      above purpose, we change iterating a vmap_list codes to iterating a
      vmap_area_list.  It is somewhat trivial change, but just one thing
      should be noticed.
      
      Using vmap_area_list in vmallocinfo() introduce ordering problem in SMP
      system.  In s_show(), we retrieve some values from vm_struct.
      vm_struct's values is not fully setup when va->vm is assigned.  Full
      setup is notified by removing VM_UNLIST flag without holding a lock.
      When we see that VM_UNLIST is removed, it is not ensured that vm_struct
      has proper values in view of other CPUs.  So we need smp_[rw]mb for
      ensuring that proper values is assigned when we see that VM_UNLIST is
      removed.
      
      Therefore, this patch not only change a iteration list, but also add a
      appropriate smp_[rw]mb to right places.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d4033afd
    • J
      mm, vmalloc: iterate vmap_area_list in get_vmalloc_info() · f98782dd
      Joonsoo Kim 提交于
      This patch is a preparatory step for removing vmlist entirely.  For
      above purpose, we change iterating a vmap_list codes to iterating a
      vmap_area_list.  It is somewhat trivial change, but just one thing
      should be noticed.
      
      vmlist is lack of information about some areas in vmalloc address space.
      For example, vm_map_ram() allocate area in vmalloc address space, but it
      doesn't make a link with vmlist.  To provide full information about
      vmalloc address space is better idea, so we don't use va->vm and use
      vmap_area directly.  This makes get_vmalloc_info() more precise.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f98782dd
    • J
      mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite() · e81ce85f
      Joonsoo Kim 提交于
      Now, when we hold a vmap_area_lock, va->vm can't be discarded.  So we can
      safely access to va->vm when iterating a vmap_area_list with holding a
      vmap_area_lock.  With this property, change iterating vmlist codes in
      vread/vwrite() to iterating vmap_area_list.
      
      There is a little difference relate to lock, because vmlist_lock is mutex,
      but, vmap_area_lock is spin_lock.  It may introduce a spinning overhead
      during vread/vwrite() is executing.  But, these are debug-oriented
      functions, so this overhead is not real problem for common case.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e81ce85f
    • J
      mm, vmalloc: protect va->vm by vmap_area_lock · c69480ad
      Joonsoo Kim 提交于
      Inserting and removing an entry to vmlist is linear time complexity, so
      it is inefficient.  Following patches will try to remove vmlist
      entirely.  This patch is preparing step for it.
      
      For removing vmlist, iterating vmlist codes should be changed to
      iterating a vmap_area_list.  Before implementing that, we should make
      sure that when we iterate a vmap_area_list, accessing to va->vm doesn't
      cause a race condition.  This patch ensure that when iterating a
      vmap_area_list, there is no race condition for accessing to vm_struct.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c69480ad
    • J
      mm, vmalloc: move get_vmalloc_info() to vmalloc.c · db3808c1
      Joonsoo Kim 提交于
      Now get_vmalloc_info() is in fs/proc/mmu.c.  There is no reason that this
      code must be here and it's implementation needs vmlist_lock and it iterate
      a vmlist which may be internal data structure for vmalloc.
      
      It is preferable that vmlist_lock and vmlist is only used in vmalloc.c
      for maintainability. So move the code to vmalloc.c
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      db3808c1
  2. 24 2月, 2013 1 次提交
  3. 12 12月, 2012 1 次提交
  4. 09 10月, 2012 2 次提交
  5. 01 8月, 2012 2 次提交
  6. 30 7月, 2012 2 次提交
  7. 24 7月, 2012 1 次提交
  8. 30 5月, 2012 2 次提交
  9. 20 3月, 2012 1 次提交
  10. 13 1月, 2012 1 次提交
  11. 11 1月, 2012 1 次提交
  12. 21 12月, 2011 1 次提交
  13. 09 12月, 2011 1 次提交
  14. 19 11月, 2011 1 次提交
    • N
      mm: add vm_area_add_early() · be9b7335
      Nicolas Pitre 提交于
      The existing vm_area_register_early() allows for early vmalloc space
      allocation.  However upcoming cleanups in the ARM architecture require
      that some fixed locations in the vmalloc area be reserved also very early.
      
      The name "vm_area_register_early" would have been a good name for the
      reservation part without the allocation.  Since it is already in use with
      different semantics, let's create vm_area_add_early() instead.
      
      Both vm_area_register_early() and vm_area_add_early() can be used together
      meaning that the former is now implemented using the later where it is
      ensured that no conflicting areas are added, but no attempt is made to
      make the allocation scheme in vm_area_register_early() more sophisticated.
      After all, you must know what you're doing when using those functions.
      Signed-off-by: NNicolas Pitre <nicolas.pitre@linaro.org>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: linux-mm@kvack.org
      be9b7335
  15. 17 11月, 2011 1 次提交
  16. 01 11月, 2011 3 次提交
  17. 15 9月, 2011 1 次提交
    • D
      mm: sync vmalloc address space page tables in alloc_vm_area() · 461ae488
      David Vrabel 提交于
      Xen backend drivers (e.g., blkback and netback) would sometimes fail to
      map grant pages into the vmalloc address space allocated with
      alloc_vm_area().  The GNTTABOP_map_grant_ref would fail because Xen could
      not find the page (in the L2 table) containing the PTEs it needed to
      update.
      
      (XEN) mm.c:3846:d0 Could not find L1 PTE for address fbb42000
      
      netback and blkback were making the hypercall from a kernel thread where
      task->active_mm != &init_mm and alloc_vm_area() was only updating the page
      tables for init_mm.  The usual method of deferring the update to the page
      tables of other processes (i.e., after taking a fault) doesn't work as a
      fault cannot occur during the hypercall.
      
      This would work on some systems depending on what else was using vmalloc.
      
      Fix this by reverting ef691947 ("vmalloc: remove vmalloc_sync_all()
      from alloc_vm_area()") and add a comment to explain why it's needed.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Keir Fraser <keir.xen@gmail.com>
      Cc: <stable@kernel.org>		[3.0.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      461ae488
  18. 15 8月, 2011 1 次提交
  19. 27 7月, 2011 1 次提交
  20. 21 7月, 2011 2 次提交
  21. 25 5月, 2011 2 次提交
    • D
      mm: print vmalloc() state after allocation failures · 22943ab1
      Dave Hansen 提交于
      I was tracking down a page allocation failure that ended up in vmalloc().
      Since vmalloc() uses 0-order pages, if somebody asks for an insane amount
      of memory, we'll still get a warning with "order:0" in it.  That's not
      very useful.
      
      During recovery, vmalloc() also nicely frees all of the memory that it got
      up to the point of the failure.  That is wonderful, but it also quickly
      hides any issues.  We have a much different sitation if vmalloc()
      repeatedly fails 10GB in to:
      
      	vmalloc(100 * 1<<30);
      
      versus repeatedly failing 4096 bytes in to a:
      
      	vmalloc(8192);
      
      This patch will print out messages that look like this:
      
      [   68.123503] vmalloc: allocation failure, allocated 66805763 of 13426688 bytes
      [   68.124218] bash: page allocation failure: order:0, mode:0xd2
      [   68.124811] Pid: 3770, comm: bash Not tainted 2.6.39-rc3-00082-g85f2e689-dirty #333
      [   68.125579] Call Trace:
      [   68.125853]  [<ffffffff810f6da6>] warn_alloc_failed+0x146/0x170
      [   68.126464]  [<ffffffff8107e05c>] ? printk+0x6c/0x70
      [   68.126791]  [<ffffffff8112b5d4>] ? alloc_pages_current+0x94/0xe0
      [   68.127661]  [<ffffffff8111ed37>] __vmalloc_node_range+0x237/0x290
      ...
      
      The 'order' variable is added for clarity when calling warn_alloc_failed()
      to avoid having an unexplained '0' as an argument.
      
      The 'tmp_mask' is because adding an open-coded '| __GFP_NOWARN' would take
      us over 80 columns for the alloc_pages_node() call.  If we are going to
      add a line, it might as well be one that makes the sucker easier to read.
      
      As a side issue, I also noticed that ctl_ioctl() does vmalloc() based
      solely on an unverified value passed in from userspace.  Granted, it's
      under CAP_SYS_ADMIN, but it still frightens me a bit.
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22943ab1
    • J
      mm/vmalloc: remove guard page from between vmap blocks · 248ac0e1
      Johannes Weiner 提交于
      The vmap allocator is used to, among other things, allocate per-cpu vmap
      blocks, where each vmap block is naturally aligned to its own size.
      Obviously, leaving a guard page after each vmap area forbids packing vmap
      blocks efficiently and can make the kernel run out of possible vmap blocks
      long before overall vmap space is exhausted.
      
      The new interface to map a user-supplied page array into linear vmalloc
      space (vm_map_ram) insists on allocating from a vmap block (instead of
      falling back to a custom area) when the area size is below a certain
      threshold.  With heavy users of this interface (e.g.  XFS) and limited
      vmalloc space on 32-bit, vmap block exhaustion is a real problem.
      
      Remove the guard page from the core vmap allocator.  vmalloc and the old
      vmap interface enforce a guard page on their own at a higher level.
      
      Note that without this patch, we had accidental guard pages after those
      vm_map_ram areas that happened to be at the end of a vmap block, but not
      between every area.  This patch removes this accidental guard page only.
      
      If we want guard pages after every vm_map_ram area, this should be done
      separately.  And just like with vmalloc and the old interface on a
      different level, not in the core allocator.
      
      Mel pointed out: "If necessary, the guard page could be reintroduced as a
      debugging-only option (CONFIG_DEBUG_PAGEALLOC?).  Otherwise it seems
      reasonable."
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Dave Chinner <david@fromorbit.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      248ac0e1
  22. 21 5月, 2011 1 次提交
  23. 23 3月, 2011 2 次提交
    • N
      vmalloc: remove confusing comment on vwrite() · a42931bf
      Namhyung Kim 提交于
      KM_USER1 is never used for vwrite() path so the caller doesn't need to
      guarantee it is not used.  Only the caller should guarantee is KM_USER0
      and it is commented already.
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a42931bf
    • N
      mm: vmap area cache · 89699605
      Nick Piggin 提交于
      Provide a free area cache for the vmalloc virtual address allocator, based
      on the algorithm used by the user virtual memory allocator.
      
      This reduces the number of rbtree operations and linear traversals over
      the vmap extents in order to find a free area, by starting off at the last
      point that a free area was found.
      
      The free area cache is reset if areas are freed behind it, or if we are
      searching for a smaller area or alignment than last time.  So allocation
      patterns are not changed (verified by corner-case and random test cases in
      userspace testing).
      
      This solves a regression caused by lazy vunmap TLB purging introduced in
      db64fe02 (mm: rewrite vmap layer).  That patch will leave extents in the
      vmap allocator after they are vunmapped, and until a significant number
      accumulate that can be flushed in a single batch.  So in a workload that
      vmalloc/vfree frequently, a chain of extents will build up from
      VMALLOC_START address, which have to be iterated over each time (giving an
      O(n) type of behaviour).
      
      After this patch, the search will start from where it left off, giving
      closer to an amortized O(1).
      
      This is verified to solve regressions reported Steven in GFS2, and Avi in
      KVM.
      
      Hugh's update:
      
      : I tried out the recent mmotm, and on one machine was fortunate to hit
      : the BUG_ON(first->va_start < addr) which seems to have been stalling
      : your vmap area cache patch ever since May.
      
      : I can get you addresses etc, I did dump a few out; but once I stared
      : at them, it was easier just to look at the code: and I cannot see how
      : you would be so sure that first->va_start < addr, once you've done
      : that addr = ALIGN(max(...), align) above, if align is over 0x1000
      : (align was 0x8000 or 0x4000 in the cases I hit: ioremaps like Steve).
      
      : I originally got around it by just changing the
      : 		if (first->va_start < addr) {
      : to
      : 		while (first->va_start < addr) {
      : without thinking about it any further; but that seemed unsatisfactory,
      : why would we want to loop here when we've got another very similar
      : loop just below it?
      
      : I am never going to admit how long I've spent trying to grasp your
      : "while (n)" rbtree loop just above this, the one with the peculiar
      : 		if (!first && tmp->va_start < addr + size)
      : in.  That's unfamiliar to me, I'm guessing it's designed to save a
      : subsequent rb_next() in a few circumstances (at risk of then setting
      : a wrong cached_hole_size?); but they did appear few to me, and I didn't
      : feel I could sign off something with that in when I don't grasp it,
      : and it seems responsible for extra code and mistaken BUG_ON below it.
      
      : I've reverted to the familiar rbtree loop that find_vma() does (but
      : with va_end >= addr as you had, to respect the additional guard page):
      : and then (given that cached_hole_size starts out 0) I don't see the
      : need for any complications below it.  If you do want to keep that loop
      : as you had it, please add a comment to explain what it's trying to do,
      : and where addr is relative to first when you emerge from it.
      
      : Aren't your tests "size <= cached_hole_size" and
      : "addr + size > first->va_start" forgetting the guard page we want
      : before the next area?  I've changed those.
      
      : I have not changed your many "addr + size - 1 < addr" overflow tests,
      : but have since come to wonder, shouldn't they be "addr + size < addr"
      : tests - won't the vend checks go wrong if addr + size is 0?
      
      : I have added a few comments - Wolfgang Wander's 2.6.13 description of
      : 1363c3cd Avoiding mmap fragmentation
      : helped me a lot, perhaps a pointer to that would be good too.  And I found
      : it easier to understand when I renamed cached_start slightly and moved the
      : overflow label down.
      
      : This patch would go after your mm-vmap-area-cache.patch in mmotm.
      : Trivially, nobody is going to get that BUG_ON with this patch, and it
      : appears to work fine on my machines; but I have not given it anything like
      : the testing you did on your original, and may have broken all the
      : performance you were aiming for.  Please take a look and test it out
      : integrate with yours if you're satisfied - thanks.
      
      [akpm@linux-foundation.org: add locking comment]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Reported-and-tested-by: NSteven Whitehouse <swhiteho@redhat.com>
      Reported-and-tested-by: NAvi Kivity <avi@redhat.com>
      Tested-by: N"Barry J. Marson" <bmarson@redhat.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89699605
  24. 14 1月, 2011 4 次提交