1. 07 8月, 2014 2 次提交
    • E
      mm/vmalloc.c: add a schedule point to vmalloc() · 660654f9
      Eric Dumazet 提交于
      It is not uncommon on busy servers to get stuck hundred of ms in
      vmalloc() calls (like file descriptor expansions).
      
      Add a cond_resched() to __vmalloc_area_node() to be gentle to
      other tasks.
      
      [akpm@linux-foundation.org: only do it for __GFP_WAIT, per David]
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      660654f9
    • J
      vmalloc: use rcu list iterator to reduce vmap_area_lock contention · 474750ab
      Joonsoo Kim 提交于
      Richard Yao reported a month ago that his system have a trouble with
      vmap_area_lock contention during performance analysis by /proc/meminfo.
      Andrew asked why his analysis checks /proc/meminfo stressfully, but he
      didn't answer it.
      
        https://lkml.org/lkml/2014/4/10/416
      
      Although I'm not sure that this is right usage or not, there is a
      solution reducing vmap_area_lock contention with no side-effect.  That
      is just to use rcu list iterator in get_vmalloc_info().
      
      rcu can be used in this function because all RCU protocol is already
      respected by writers, since Nick Piggin commit db64fe02 ("mm:
      rewrite vmap layer") back in linux-2.6.28
      
      Specifically :
         insertions use list_add_rcu(),
         deletions use list_del_rcu() and kfree_rcu().
      
      Note the rb tree is not used from rcu reader (it would not be safe),
      only the vmap_area_list has full RCU protection.
      
      Note that __purge_vmap_area_lazy() already uses this rcu protection.
      
              rcu_read_lock();
              list_for_each_entry_rcu(va, &vmap_area_list, list) {
                      if (va->flags & VM_LAZY_FREE) {
                              if (va->va_start < *start)
                                      *start = va->va_start;
                              if (va->va_end > *end)
                                      *end = va->va_end;
                              nr += (va->va_end - va->va_start) >> PAGE_SHIFT;
                              list_add_tail(&va->purge_list, &valist);
                              va->flags |= VM_LAZY_FREEING;
                              va->flags &= ~VM_LAZY_FREE;
                      }
              }
              rcu_read_unlock();
      
      Peter:
      
      : While rcu list traversal over the vmap_area_list is safe, this may
      : arrive at different results than the spinlocked version. The rcu list
      : traversal version will not be a 'snapshot' of a single, valid instant
      : of the entire vmap_area_list, but rather a potential amalgam of
      : different list states.
      
      Joonsoo:
      
      : Yes, you are right, but I don't think that we should be strict here.
      : Meminfo is already not a 'snapshot' at specific time.  While we try to get
      : certain stats, the other stats can change.  And, although we may arrive at
      : different results than the spinlocked version, the difference would not be
      : large and would not make serious side-effect.
      
      [edumazet@google.com: add more commit description]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-by: NRichard Yao <ryao@gentoo.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Zhang Yanfei <zhangyanfei.yes@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      474750ab
  2. 05 6月, 2014 3 次提交
  3. 08 4月, 2014 2 次提交
    • G
      mm/vmalloc.c: enhance vm_map_ram() comment · 36437638
      Gioh Kim 提交于
      vm_map_ram() has a fragmentation problem when it cannot purge a
      chunk(ie, 4M address space) if there is a pinning object in that
      addresss space.  So it could consume all VMALLOC address space easily.
      
      We can fix the fragmentation problem by using vmap instead of
      vm_map_ram() but vmap() is known to be slow compared to vm_map_ram().
      Minchan said vm_map_ram is 5 times faster than vmap in his tests.  So I
      thought we should fix fragment problem of vm_map_ram because our
      proprietary GPU driver has used it heavily.
      
      On second thought, it's not an easy because we should reuse freed space
      for solving the problem and it could make more IPI and bitmap operation
      for searching hole.  It could mitigate API's goal which is very fast
      mapping.  And even fragmentation problem wouldn't show in 64 bit
      machine.
      
      Another option is that the user should separate long-life and short-life
      object and use vmap for long-life but vm_map_ram for short-life.  If we
      inform the user about the characteristic of vm_map_ram the user can
      choose one according to the page lifetime.
      
      Let's add some notice messages to user.
      
      [akpm@linux-foundation.org: tweak comment text]
      Signed-off-by: NGioh Kim <gioh.kim@lge.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36437638
    • G
      mm: use macros from compiler.h instead of __attribute__((...)) · 3b32123d
      Gideon Israel Dsouza 提交于
      To increase compiler portability there is <linux/compiler.h> which
      provides convenience macros for various gcc constructs.  Eg: __weak for
      __attribute__((weak)).  I've replaced all instances of gcc attributes with
      the right macro in the memory management (/mm) subsystem.
      
      [akpm@linux-foundation.org: while-we're-there consistency tweaks]
      Signed-off-by: NGideon Israel Dsouza <gidisrael@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b32123d
  4. 28 1月, 2014 1 次提交
    • M
      Revert "mm/vmalloc: interchage the implementation of vmalloc_to_{pfn,page}" · add688fb
      malc 提交于
      Revert commit ece86e22, which was intended as a small performance
      improvement.
      
      Despite the claim that the patch doesn't introduce any functional
      changes in fact it does.
      
      The "no page" path behaves different now.  Originally, vmalloc_to_page
      might return NULL under some conditions, with new implementation it
      returns pfn_to_page(0) which is not the same as NULL.
      
      Simple test shows the difference.
      
      test.c
      
      #include <linux/kernel.h>
      #include <linux/module.h>
      #include <linux/vmalloc.h>
      #include <linux/mm.h>
      
      int __init myi(void)
      {
      	struct page *p;
      	void *v;
      
      	v = vmalloc(PAGE_SIZE);
      	/* trigger the "no page" path in vmalloc_to_page*/
      	vfree(v);
      
      	p = vmalloc_to_page(v);
      
      	pr_err("expected val = NULL, returned val = %p", p);
      
      	return -EBUSY;
      }
      
      void __exit mye(void)
      {
      
      }
      module_init(myi)
      module_exit(mye)
      
      Before interchange:
      expected val = NULL, returned val =   (null)
      
      After interchange:
      expected val = NULL, returned val = c7ebe000
      Signed-off-by: NVladimir Murzin <murzin.v@gmail.com>
      Cc: Jianyu Zhan <nasa4836@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      add688fb
  5. 22 1月, 2014 1 次提交
  6. 13 11月, 2013 6 次提交
  7. 12 9月, 2013 3 次提交
  8. 10 7月, 2013 9 次提交
  9. 04 7月, 2013 6 次提交
  10. 08 5月, 2013 1 次提交
  11. 30 4月, 2013 6 次提交
    • A
      kexec, vmalloc: export additional vmalloc layer information · 13ba3fcb
      Atsushi Kumagai 提交于
      Now, vmap_area_list is exported as VMCOREINFO for makedumpfile to get
      the start address of vmalloc region (vmalloc_start).  The address which
      contains vmalloc_start value is represented as below:
      
        vmap_area_list.next - OFFSET(vmap_area.list) + OFFSET(vmap_area.va_start)
      
      However, both OFFSET(vmap_area.va_start) and OFFSET(vmap_area.list)
      aren't exported as VMCOREINFO.
      
      So this patch exports them externally with small cleanup.
      
      [akpm@linux-foundation.org: vmalloc.h should include list.h for list_head]
      Signed-off-by: NAtsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13ba3fcb
    • J
      mm, vmalloc: remove list management of vmlist after initializing vmalloc · 4341fa45
      Joonsoo Kim 提交于
      Now, there is no need to maintain vmlist after initializing vmalloc.  So
      remove related code and data structure.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4341fa45
    • J
      mm, vmalloc: export vmap_area_list, instead of vmlist · f1c4069e
      Joonsoo Kim 提交于
      Although our intention is to unexport internal structure entirely, but
      there is one exception for kexec.  kexec dumps address of vmlist and
      makedumpfile uses this information.
      
      We are about to remove vmlist, then another way to retrieve information
      of vmalloc layer is needed for makedumpfile.  For this purpose, we
      export vmap_area_list, instead of vmlist.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1c4069e
    • J
      mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo() · d4033afd
      Joonsoo Kim 提交于
      This patch is a preparatory step for removing vmlist entirely.  For
      above purpose, we change iterating a vmap_list codes to iterating a
      vmap_area_list.  It is somewhat trivial change, but just one thing
      should be noticed.
      
      Using vmap_area_list in vmallocinfo() introduce ordering problem in SMP
      system.  In s_show(), we retrieve some values from vm_struct.
      vm_struct's values is not fully setup when va->vm is assigned.  Full
      setup is notified by removing VM_UNLIST flag without holding a lock.
      When we see that VM_UNLIST is removed, it is not ensured that vm_struct
      has proper values in view of other CPUs.  So we need smp_[rw]mb for
      ensuring that proper values is assigned when we see that VM_UNLIST is
      removed.
      
      Therefore, this patch not only change a iteration list, but also add a
      appropriate smp_[rw]mb to right places.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d4033afd
    • J
      mm, vmalloc: iterate vmap_area_list in get_vmalloc_info() · f98782dd
      Joonsoo Kim 提交于
      This patch is a preparatory step for removing vmlist entirely.  For
      above purpose, we change iterating a vmap_list codes to iterating a
      vmap_area_list.  It is somewhat trivial change, but just one thing
      should be noticed.
      
      vmlist is lack of information about some areas in vmalloc address space.
      For example, vm_map_ram() allocate area in vmalloc address space, but it
      doesn't make a link with vmlist.  To provide full information about
      vmalloc address space is better idea, so we don't use va->vm and use
      vmap_area directly.  This makes get_vmalloc_info() more precise.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f98782dd
    • J
      mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite() · e81ce85f
      Joonsoo Kim 提交于
      Now, when we hold a vmap_area_lock, va->vm can't be discarded.  So we can
      safely access to va->vm when iterating a vmap_area_list with holding a
      vmap_area_lock.  With this property, change iterating vmlist codes in
      vread/vwrite() to iterating vmap_area_list.
      
      There is a little difference relate to lock, because vmlist_lock is mutex,
      but, vmap_area_lock is spin_lock.  It may introduce a spinning overhead
      during vread/vwrite() is executing.  But, these are debug-oriented
      functions, so this overhead is not real problem for common case.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e81ce85f