1. 13 6月, 2013 1 次提交
    • S
      slab: prevent warnings when allocating with __GFP_NOWARN · 907985f4
      Sasha Levin 提交于
      Sasha Levin noticed that the warning introduced by commit 6286ae97
      ("slab: Return NULL for oversized allocations) is being triggered:
      
        WARNING: CPU: 15 PID: 21519 at mm/slab_common.c:376 kmalloc_slab+0x2f/0xb0()
        can: request_module (can-proto-4) failed.
        mpoa: proc_mpc_write: could not parse ''
        Modules linked in:
        CPU: 15 PID: 21519 Comm: trinity-child15 Tainted: G W    3.10.0-rc4-next-20130607-sasha-00011-gcd78395-dirty #2
         0000000000000009 ffff880020a95e30 ffffffff83ff4041 0000000000000000
         ffff880020a95e68 ffffffff8111fe12 fffffffffffffff0 00000000000082d0
         0000000000080000 0000000000080000 0000000001400000 ffff880020a95e78
        Call Trace:
         [<ffffffff83ff4041>] dump_stack+0x4e/0x82
         [<ffffffff8111fe12>] warn_slowpath_common+0x82/0xb0
         [<ffffffff8111fe55>] warn_slowpath_null+0x15/0x20
         [<ffffffff81243dcf>] kmalloc_slab+0x2f/0xb0
         [<ffffffff81278d54>] __kmalloc+0x24/0x4b0
         [<ffffffff8196ffe3>] ? security_capable+0x13/0x20
         [<ffffffff812a26b7>] ? pipe_fcntl+0x107/0x210
         [<ffffffff812a26b7>] pipe_fcntl+0x107/0x210
         [<ffffffff812b7ea0>] ? fget_raw_light+0x130/0x3f0
         [<ffffffff812aa5fb>] SyS_fcntl+0x60b/0x6a0
         [<ffffffff8403ca98>] tracesys+0xe1/0xe6
      
      Andrew Morton writes:
      
        __GFP_NOWARN is frequently used by kernel code to probe for "how big
        an allocation can I get".  That's a bit lame, but it's used on slow
        paths and is pretty simple.
      
      However, SLAB would still spew a warning when a big allocation happens
      if the __GFP_NOWARN flag is _not_ set to expose kernel bugs.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      [ penberg@kernel.org: improve changelog ]
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      907985f4
  2. 06 6月, 2013 1 次提交
    • P
      arch, mm: Remove tlb_fast_mode() · 29eb7782
      Peter Zijlstra 提交于
      Since the introduction of preemptible mmu_gather TLB fast mode has been
      broken. TLB fast mode relies on there being absolutely no concurrency;
      it frees pages first and invalidates TLBs later.
      
      However now we can get concurrency and stuff goes *bang*.
      
      This patch removes all tlb_fast_mode() code; it was found the better
      option vs trying to patch the hole by entangling tlb invalidation with
      the scheduler.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Reported-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29eb7782
  3. 25 5月, 2013 6 次提交
    • C
      mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas · a9ff785e
      Cliff Wickman 提交于
      A panic can be caused by simply cat'ing /proc/<pid>/smaps while an
      application has a VM_PFNMAP range.  It happened in-house when a
      benchmarker was trying to decipher the memory layout of his program.
      
      /proc/<pid>/smaps and similar walks through a user page table should not
      be looking at VM_PFNMAP areas.
      
      Certain tests in walk_page_range() (specifically split_huge_page_pmd())
      assume that all the mapped PFN's are backed with page structures.  And
      this is not usually true for VM_PFNMAP areas.  This can result in panics
      on kernel page faults when attempting to address those page structures.
      
      There are a half dozen callers of walk_page_range() that walk through a
      task's entire page table (as N.  Horiguchi pointed out).  So rather than
      change all of them, this patch changes just walk_page_range() to ignore
      VM_PFNMAP areas.
      
      The logic of hugetlb_vma() is moved back into walk_page_range(), as we
      want to test any vma in the range.
      
      VM_PFNMAP areas are used by:
      - graphics memory manager   gpu/drm/drm_gem.c
      - global reference unit     sgi-gru/grufile.c
      - sgi special memory        char/mspec.c
      - and probably several out-of-tree modules
      
      [akpm@linux-foundation.org: remove now-unused hugetlb_vma() stub]
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9ff785e
    • R
      mm/memory_hotplug.c: fix printk format warnings · 348f9f05
      Randy Dunlap 提交于
      Fix printk format warnings in mm/memory_hotplug.c by using "%pa":
      
        mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'resource_size_t' [-Wformat]
        mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'resource_size_t' [-Wformat]
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      348f9f05
    • A
      mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer · 7c342512
      Aneesh Kumar K.V 提交于
      We should not use set_pmd_at to update pmd_t with pgtable_t pointer.
      set_pmd_at is used to set pmd with huge pte entries and architectures
      like ppc64, clear few flags from the pte when saving a new entry.
      Without this change we observe bad pte errors like below on ppc64 with
      THP enabled.
      
        BUG: Bad page map in process ld mm=0xc000001ee39f4780 pte:7fc3f37848000001 pmd:c000001ec0000000
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c342512
    • L
      mm compaction: fix of improper cache flush in migration code · c2cc499c
      Leonid Yegoshin 提交于
      Page 'new' during MIGRATION can't be flushed with flush_cache_page().
      Using flush_cache_page(vma, addr, pfn) is justified only if the page is
      already placed in process page table, and that is done right after
      flush_cache_page().  But without it the arch function has no knowledge
      of process PTE and does nothing.
      
      Besides that, flush_cache_page() flushes an application cache page, but
      the kernel has a different page virtual address and dirtied it.
      
      Replace it with flush_dcache_page(new) which is the proper usage.
      
      The old page is flushed in try_to_unmap_one() before migration.
      
      This bug takes place in Sead3 board with M14Kc MIPS CPU without cache
      aliasing (but Harvard arch - separate I and D cache) in tight memory
      environment (128MB) each 1-3days on SOAK test.  It fails in cc1 during
      kernel build (SIGILL, SIGBUS, SIGSEG) if CONFIG_COMPACTION is switched
      ON.
      Signed-off-by: NLeonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Leonid Yegoshin <yegoshin@mips.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2cc499c
    • J
      mm: memcg: remove incorrect VM_BUG_ON for swap cache pages in uncharge · 28ccddf7
      Johannes Weiner 提交于
      Commit 0c59b89c ("mm: memcg: push down PageSwapCache check into
      uncharge entry functions") added a VM_BUG_ON() on PageSwapCache in the
      uncharge path after checking that page flag once, assuming that the
      state is stable in all paths, but this is not the case and the condition
      triggers in user environments.  An uncharge after the last page table
      reference to the page goes away can race with reclaim adding the page to
      swap cache.
      
      Swap cache pages are usually uncharged when they are freed after
      swapout, from a path that also handles swap usage accounting and memcg
      lifetime management.  However, since the last page table reference is
      gone and thus no references to the swap slot left, the swap slot will be
      freed shortly when reclaim attempts to write the page to disk.  The
      whole swap accounting is not even necessary.
      
      So while the race condition for which this VM_BUG_ON was added is real
      and actually existed all along, there are no negative effects.  Remove
      the VM_BUG_ON again.
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reported-by: NLingzhu Xiang <lxiang@redhat.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NHugh Dickins <hughd@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28ccddf7
    • X
      mm: mmu_notifier: re-fix freed page still mapped in secondary MMU · d34883d4
      Xiao Guangrong 提交于
      Commit 751efd86 ("mmu_notifier_unregister NULL Pointer deref and
      multiple ->release()") breaks the fix 3ad3d901 ("mm: mmu_notifier:
      fix freed page still mapped in secondary MMU").
      
      Since hlist_for_each_entry_rcu() is changed now, we can not revert that
      patch directly, so this patch reverts the commit and simply fix the bug
      spotted by that patch
      
      This bug spotted by commit 751efd86 is:
      
          There is a race condition between mmu_notifier_unregister() and
          __mmu_notifier_release().
      
          Assume two tasks, one calling mmu_notifier_unregister() as a result
          of a filp_close() ->flush() callout (task A), and the other calling
          mmu_notifier_release() from an mmput() (task B).
      
                              A                               B
          t1                                            srcu_read_lock()
          t2            if (!hlist_unhashed())
          t3                                            srcu_read_unlock()
          t4            srcu_read_lock()
          t5                                            hlist_del_init_rcu()
          t6                                            synchronize_srcu()
          t7            srcu_read_unlock()
          t8            hlist_del_rcu()  <--- NULL pointer deref.
      
      This can be fixed by using hlist_del_init_rcu instead of hlist_del_rcu.
      
      The another issue spotted in the commit is "multiple ->release()
      callouts", we needn't care it too much because it is really rare (e.g,
      can not happen on kvm since mmu-notify is unregistered after
      exit_mmap()) and the later call of multiple ->release should be fast
      since all the pages have already been released by the first call.
      Anyway, this issue should be fixed in a separate patch.
      
      -stable suggestions: Any version that has commit 751efd86 need to be
      backported.  I find the oldest version has this commit is 3.0-stable.
      
      [akpm@linux-foundation.org: tweak comments]
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Tested-by: NRobin Holt <holt@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d34883d4
  4. 22 5月, 2013 1 次提交
    • R
      mm: Fix virt_to_page() warning · bb3ec6b0
      Ralf Baechle 提交于
      virt_to_page() is typically implemented as a macro containing a cast so
      that it will accept both pointers and unsigned long without causing a
      warning.
      
      But MIPS virt_to_page() uses virt_to_phys which is a function so passing
      an unsigned long will cause a warning:
      
          CC      mm/page_alloc.o
        mm/page_alloc.c: In function ‘free_reserved_area’:
        mm/page_alloc.c:5161:3: warning: passing argument 1 of ‘virt_to_phys’ makes pointer from integer without a cast [enabled by default]
        arch/mips/include/asm/io.h:119:100: note: expected ‘const volatile void *’ but argument is of type ‘long unsigned int’
      
      All others users of virt_to_page() in mm/ are passing a void *.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Reported-by: NEunbong Song <eunb.song@samsung.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-mips@linux-mips.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb3ec6b0
  5. 10 5月, 2013 1 次提交
  6. 09 5月, 2013 1 次提交
  7. 08 5月, 2013 5 次提交
    • K
      aio: don't include aio.h in sched.h · a27bb332
      Kent Overstreet 提交于
      Faster kernel compiles by way of fewer unnecessary includes.
      
      [akpm@linux-foundation.org: fix fallout]
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a27bb332
    • Z
      mm: remove old aio use_mm() comment · 697f4d68
      Zach Brown 提交于
      Bunch of performance improvements and cleanups Zach Brown and I have
      been working on.  The code should be pretty solid at this point, though
      it could of course use more review and testing.
      
      The results in my testing are pretty impressive, particularly when an
      ioctx is being shared between multiple threads.  In my crappy synthetic
      benchmark, with 4 threads submitting and one thread reaping completions,
      I saw overhead in the aio code go from ~50% (mostly ioctx lock
      contention) to low single digits.  Performance with ioctx per thread
      improved too, but I'd have to rerun those benchmarks.
      
      The reason I've been focused on performance when the ioctx is shared is
      that for a fair number of real world completions, userspace needs the
      completions aggregated somehow - in practice people just end up
      implementing this aggregation in userspace today, but if it's done right
      we can do it much more efficiently in the kernel.
      
      Performance wise, the end result of this patch series is that submitting
      a kiocb writes to _no_ shared cachelines - the penalty for sharing an
      ioctx is gone there.  There's still going to be some cacheline
      contention when we deliver the completions to the aio ringbuffer (at
      least if you have interrupts being delivered on multiple cores, which
      for high end stuff you do) but I have a couple more patches not in this
      series that implement coalescing for that (by taking advantage of
      interrupt coalescing).  With that, there's basically no bottlenecks or
      performance issues to speak of in the aio code.
      
      This patch:
      
      use_mm() is used in more places than just aio.  There's no need to mention
      callers when describing the function.
      Signed-off-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      697f4d68
    • A
      mm/vmalloc.c: add vfree comment · c9fcee51
      Andrew Morton 提交于
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9fcee51
    • N
      hugetlbfs: fix mmap failure in unaligned size request · af73e4d9
      Naoya Horiguchi 提交于
      The current kernel returns -EINVAL unless a given mmap length is
      "almost" hugepage aligned.  This is because in sys_mmap_pgoff() the
      given length is passed to vm_mmap_pgoff() as it is without being aligned
      with hugepage boundary.
      
      This is a regression introduced in commit 40716e29 ("hugetlbfs: fix
      alignment of huge page requests"), where alignment code is pushed into
      hugetlb_file_setup() and the variable len in caller side is not changed.
      
      To fix this, this patch partially reverts that commit, and adds
      alignment code in caller side.  And it also introduces hstate_sizelog()
      in order to get proper hstate to specified hugepage size.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881
      
      [akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: <iceman_dvd@yahoo.com>
      Cc: Steven Truelove <steven.truelove@utoronto.ca>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af73e4d9
    • D
      mm, memcg: add rss_huge stat to memory.stat · b070e65c
      David Rientjes 提交于
      This exports the amount of anonymous transparent hugepages for each
      memcg via the new "rss_huge" stat in memory.stat.  The units are in
      bytes.
      
      This is helpful to determine the hugepage utilization for individual
      jobs on the system in comparison to rss and opportunities where
      MADV_HUGEPAGE may be helpful.
      
      The amount of anonymous transparent hugepages is also included in "rss"
      for backwards compatibility.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b070e65c
  8. 07 5月, 2013 1 次提交
  9. 06 5月, 2013 1 次提交
  10. 01 5月, 2013 8 次提交
  11. 30 4月, 2013 14 次提交