1. 14 1月, 2011 2 次提交
  2. 03 12月, 2010 1 次提交
    • J
      vmalloc: eagerly clear ptes on vunmap · 64141da5
      Jeremy Fitzhardinge 提交于
      On stock 2.6.37-rc4, running:
      
        # mount lilith:/export /mnt/lilith
        # find  /mnt/lilith/ -type f -print0 | xargs -0 file
      
      crashes the machine fairly quickly under Xen.  Often it results in oops
      messages, but the couple of times I tried just now, it just hung quietly
      and made Xen print some rude messages:
      
          (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
          3000000000000000) for mfn 1d7058 (pfn 18fa7)
          (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
          (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
          1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
          (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04
      
      Which means the domain tried to map a pagetable page RW, which would
      allow it to map arbitrary memory, so Xen stopped it.  This is because
      vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
      finished with them, and those pages got recycled as pagetable pages
      while still having these RW aliases.
      
      Removing those mappings immediately removes the Xen-visible aliases, and
      so it has no problem with those pages being reused as pagetable pages.
      Deferring the TLB flush doesn't upset Xen because it can flush the TLB
      itself as needed to maintain its invariants.
      
      When unmapping a region in the vmalloc space, clear the ptes
      immediately.  There's no point in deferring this because there's no
      amortization benefit.
      
      The TLBs are left dirty, and they are flushed lazily to amortize the
      cost of the IPIs.
      
      This specific motivation for this patch is an oops-causing regression
      since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
      of vm_map_ram() introduced in 56e4ebf8 ("NFS: readdir with vmapped
      pages") .  XFS also uses vm_map_ram() and could cause similar problems.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Bryan Schumaker <bjschuma@netapp.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64141da5
  3. 27 10月, 2010 3 次提交
  4. 02 10月, 2010 1 次提交
  5. 17 9月, 2010 1 次提交
    • C
      mm, x86: Saving vmcore with non-lazy freeing of vmas · 3ee48b6a
      Cliff Wickman 提交于
      During the reading of /proc/vmcore the kernel is doing
      ioremap()/iounmap() repeatedly. And the buildup of un-flushed
      vm_area_struct's is causing a great deal of overhead. (rb_next()
      is chewing up most of that time).
      
      This solution is to provide function set_iounmap_nonlazy(). It
      causes a subsequent call to iounmap() to immediately purge the
      vma area (with try_purge_vmap_area_lazy()).
      
      With this patch we have seen the time for writing a 250MB
      compressed dump drop from 71 seconds to 44 seconds.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: kexec@lists.infradead.org
      Cc: <stable@kernel.org>
      LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3ee48b6a
  6. 08 9月, 2010 1 次提交
  7. 10 8月, 2010 2 次提交
  8. 27 7月, 2010 1 次提交
  9. 10 7月, 2010 1 次提交
    • K
      x86, ioremap: Fix incorrect physical address handling in PAE mode · ffa71f33
      Kenji Kaneshige 提交于
      Current x86 ioremap() doesn't handle physical address higher than
      32-bit properly in X86_32 PAE mode. When physical address higher than
      32-bit is passed to ioremap(), higher 32-bits in physical address is
      cleared wrongly. Due to this bug, ioremap() can map wrong address to
      linear address space.
      
      In my case, 64-bit MMIO region was assigned to a PCI device (ioat
      device) on my system. Because of the ioremap()'s bug, wrong physical
      address (instead of MMIO region) was mapped to linear address space.
      Because of this, loading ioatdma driver caused unexpected behavior
      (kernel panic, kernel hangup, ...).
      Signed-off-by: NKenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      LKML-Reference: <4C1AE680.7090408@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      ffa71f33
  10. 03 2月, 2010 2 次提交
    • N
      mm: purge fragmented percpu vmap blocks · 02b709df
      Nick Piggin 提交于
      Improve handling of fragmented per-CPU vmaps.  We previously don't free
      up per-CPU maps until all its addresses have been used and freed.  So
      fragmented blocks could fill up vmalloc space even if they actually had
      no active vmap regions within them.
      
      Add some logic to allow all CPUs to have these blocks purged in the case
      of failure to allocate a new vm area, and also put some logic to trim
      such blocks of a current CPU if we hit them in the allocation path (so
      as to avoid a large build up of them).
      
      Christoph reported some vmap allocation failures when using the per CPU
      vmap APIs in XFS, which cannot be reproduced after this patch and the
      previous bug fix.
      
      Cc: linux-mm@kvack.org
      Cc: stable@kernel.org
      Tested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      --
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02b709df
    • N
      mm: percpu-vmap fix RCU list walking · de560423
      Nick Piggin 提交于
      RCU list walking of the per-cpu vmap cache was broken.  It did not use
      RCU primitives, and also the union of free_list and rcu_head is
      obviously wrong (because free_list is indeed the list we are RCU
      walking).
      
      While we are there, remove a couple of unused fields from an earlier
      iteration.
      
      These APIs aren't actually used anywhere, because of problems with the
      XFS conversion.  Christoph has now verified that the problems are solved
      with these patches.  Also it is an exported interface, so I think it
      will be good to be merged now (and Christoph wants to get the XFS
      changes into their local tree).
      
      Cc: stable@kernel.org
      Cc: linux-mm@kvack.org
      Tested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      --
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de560423
  11. 21 1月, 2010 1 次提交
    • Y
      vmalloc: remove BUG_ON due to racy counting of VM_LAZY_FREE · 88f50044
      Yongseok Koh 提交于
      In free_unmap_area_noflush(), va->flags is marked as VM_LAZY_FREE first, and
      then vmap_lazy_nr is increased atomically.
      
      But, in __purge_vmap_area_lazy(), while traversing of vmap_are_list, nr
      is counted by checking VM_LAZY_FREE is set to va->flags.  After counting
      the variable nr, kernel reads vmap_lazy_nr atomically and checks a
      BUG_ON condition whether nr is greater than vmap_lazy_nr to prevent
      vmap_lazy_nr from being negative.
      
      The problem is that, if interrupted right after marking VM_LAZY_FREE,
      increment of vmap_lazy_nr can be delayed.  Consequently, BUG_ON
      condition can be met because nr is counted more than vmap_lazy_nr.
      
      It is highly probable when vmalloc/vfree are called frequently.  This
      scenario have been verified by adding delay between marking VM_LAZY_FREE
      and increasing vmap_lazy_nr in free_unmap_area_noflush().
      
      Even the vmap_lazy_nr is for checking high watermark, it never be the
      strict watermark.  Although the BUG_ON condition is to prevent
      vmap_lazy_nr from being negative, vmap_lazy_nr is signed variable.  So,
      it could go down to negative value temporarily.
      
      Consequently, removing the BUG_ON condition is proper.
      
      A possible BUG_ON message is like the below.
      
         kernel BUG at mm/vmalloc.c:517!
         invalid opcode: 0000 [#1] SMP
         EIP: 0060:[<c04824a4>] EFLAGS: 00010297 CPU: 3
         EIP is at __purge_vmap_area_lazy+0x144/0x150
         EAX: ee8a8818 EBX: c08e77d4 ECX: e7c7ae40 EDX: c08e77ec
         ESI: 000081fe EDI: e7c7ae60 EBP: e7c7ae64 ESP: e7c7ae3c
         DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
         Call Trace:
         [<c0482ad9>] free_unmap_vmap_area_noflush+0x69/0x70
         [<c0482b02>] remove_vm_area+0x22/0x70
         [<c0482c15>] __vunmap+0x45/0xe0
         [<c04831ec>] vmalloc+0x2c/0x30
         Code: 8d 59 e0 eb 04 66 90 89 cb 89 d0 e8 87 fe ff ff 8b 43 20 89 da 8d 48 e0 8d 43 20 3b 04 24 75 e7 fe 05 a8 a5 a3 c0 e9 78 ff ff ff <0f> 0b eb fe 90 8d b4 26 00 00 00 00 56 89 c6 b8 ac a5 a3 c0 31
         EIP: [<c04824a4>] __purge_vmap_area_lazy+0x144/0x150 SS:ESP 0068:e7c7ae3c
      
      [ See also http://marc.info/?l=linux-kernel&m=126335856228090&w=2 ]
      Signed-off-by: NYongseok Koh <yongseok.koh@samsung.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88f50044
  12. 16 12月, 2009 1 次提交
  13. 29 10月, 2009 1 次提交
  14. 12 10月, 2009 1 次提交
  15. 08 10月, 2009 2 次提交
  16. 23 9月, 2009 1 次提交
  17. 22 9月, 2009 4 次提交
  18. 14 8月, 2009 2 次提交
    • T
      vmalloc: implement pcpu_get_vm_areas() · ca23e405
      Tejun Heo 提交于
      To directly use spread NUMA memories for percpu units, percpu
      allocator will be updated to allow sparsely mapping units in a chunk.
      As the distances between units can be very large, this makes
      allocating single vmap area for each chunk undesirable.  This patch
      implements pcpu_get_vm_areas() and pcpu_free_vm_areas() which
      allocates and frees sparse congruent vmap areas.
      
      pcpu_get_vm_areas() take @offsets and @sizes array which define
      distances and sizes of vmap areas.  It scans down from the top of
      vmalloc area looking for the top-most address which can accomodate all
      the areas.  The top-down scan is to avoid interacting with regular
      vmallocs which can push up these congruent areas up little by little
      ending up wasting address space and page table.
      
      To speed up top-down scan, the highest possible address hint is
      maintained.  Although the scan is linear from the hint, given the
      usual large holes between memory addresses between NUMA nodes, the
      scanning is highly likely to finish after finding the first hole for
      the last unit which is scanned first.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <npiggin@suse.de>
      ca23e405
    • T
      vmalloc: separate out insert_vmalloc_vm() · cf88c790
      Tejun Heo 提交于
      Separate out insert_vmalloc_vm() from __get_vm_area_node().
      insert_vmalloc_vm() initializes vm_struct from vmap_area and inserts
      it into vmlist.  insert_vmalloc_vm() only initializes fields which can
      be determined from @vm, @flags and @caller The rest should be
      initialized by the caller.  For __get_vm_area_node(), all other fields
      just need to be cleared and this is done by using kzalloc instead of
      kmalloc.
      
      This will be used to implement pcpu_get_vm_areas().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <npiggin@suse.de>
      cf88c790
  19. 12 6月, 2009 2 次提交
  20. 07 5月, 2009 1 次提交
  21. 01 4月, 2009 1 次提交
  22. 28 2月, 2009 2 次提交
    • V
      mm: fix lazy vmap purging (use-after-free error) · cbb76676
      Vegard Nossum 提交于
      I just got this new warning from kmemcheck:
      
          WARNING: kmemcheck: Caught 32-bit read from freed memory (c7806a60)
          a06a80c7ecde70c1a04080c700000000a06709c1000000000000000000000000
           f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f f
           ^
      
          Pid: 0, comm: swapper Not tainted (2.6.29-rc4 #230)
          EIP: 0060:[<c1096df7>] EFLAGS: 00000286 CPU: 0
          EIP is at __purge_vmap_area_lazy+0x117/0x140
          EAX: 00070f43 EBX: c7806a40 ECX: c1677080 EDX: 00027b66
          ESI: 00002001 EDI: c170df0c EBP: c170df00 ESP: c178830c
           DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
          CR0: 80050033 CR2: c7806b14 CR3: 01775000 CR4: 00000690
          DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
          DR6: 00004000 DR7: 00000000
           [<c1096f3e>] free_unmap_vmap_area_noflush+0x6e/0x70
           [<c1096f6a>] remove_vm_area+0x2a/0x70
           [<c1097025>] __vunmap+0x45/0xe0
           [<c10970de>] vunmap+0x1e/0x30
           [<c1008ba5>] text_poke+0x95/0x150
           [<c1008ca9>] alternatives_smp_unlock+0x49/0x60
           [<c171ef47>] alternative_instructions+0x11b/0x124
           [<c171f991>] check_bugs+0xbd/0xdc
           [<c17148c5>] start_kernel+0x2ed/0x360
           [<c171409e>] __init_begin+0x9e/0xa9
           [<ffffffff>] 0xffffffff
      
      It happened here:
      
          $ addr2line -e vmlinux -i c1096df7
          mm/vmalloc.c:540
      
      Code:
      
      	list_for_each_entry(va, &valist, purge_list)
      		__free_vmap_area(va);
      
      It's this instruction:
      
          mov    0x20(%ebx),%edx
      
      Which corresponds to a dereference of va->purge_list.next:
      
          (gdb) p ((struct vmap_area *) 0)->purge_list.next
          Cannot access memory at address 0x20
      
      It seems that we should use "safe" list traversal here, as the element
      is freed inside the loop. Please verify that this is the right fix.
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbb76676
    • N
      mm: vmap fix overflow · 7766970c
      Nick Piggin 提交于
      The new vmap allocator can wrap the address and get confused in the case
      of large allocations or VMALLOC_END near the end of address space.
      
      Problem reported by Christoph Hellwig on a 32-bit XFS workload.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7766970c
  23. 25 2月, 2009 1 次提交
  24. 24 2月, 2009 1 次提交
    • T
      vmalloc: add @align to vm_area_register_early() · c0c0a293
      Tejun Heo 提交于
      Impact: allow larger alignment for early vmalloc area allocation
      
      Some early vmalloc users might want larger alignment, for example, for
      custom large page mapping.  Add @align to vm_area_register_early().
      While at it, drop docbook comment on non-existent @size.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      c0c0a293
  25. 21 2月, 2009 1 次提交
  26. 20 2月, 2009 3 次提交
    • T
      vmalloc: add un/map_kernel_range_noflush() · 8fc48985
      Tejun Heo 提交于
      Impact: two more public map/unmap functions
      
      Implement map_kernel_range_noflush() and unmap_kernel_range_noflush().
      These functions respectively map and unmap address range in kernel VM
      area but doesn't do any vcache or tlb flushing.  These will be used by
      new percpu allocator.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      8fc48985
    • T
      vmalloc: implement vm_area_register_early() · f0aa6617
      Tejun Heo 提交于
      Impact: allow multiple early vm areas
      
      There are places where kernel VM area needs to be allocated before
      vmalloc is initialized.  This is done by allocating static vm_struct,
      initializing several fields and linking it to vmlist and later vmalloc
      initialization picking up these from vmlist.  This is currently done
      manually and if there's more than one such areas, there's no defined
      way to arbitrate who gets which address.
      
      This patch implements vm_area_register_early(), which takes vm_area
      struct with flags and size initialized, assigns address to it and puts
      it on the vmlist.  This way, multiple early vm areas can determine
      which addresses they should use.  The only current user - alpha mm
      init - is converted to use it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f0aa6617
    • T
      vmalloc: call flush_cache_vunmap() from unmap_kernel_range() · 73426952
      Tejun Heo 提交于
      Impact: proper vcache flush on unmap_kernel_range()
      
      flush_cache_vunmap() should be called before pages are unmapped.  Add
      a call to it in unmap_kernel_range().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      73426952