1. 18 10月, 2013 10 次提交
  2. 17 10月, 2013 30 次提交
    • L
      Merge branch 'akpm' (fixes from Andrew Morton) · 056cdce0
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
        mm: revert mremap pud_free anti-fix
        mm: fix BUG in __split_huge_page_pmd
        swap: fix set_blocksize race during swapon/swapoff
        procfs: call default get_unmapped_area on MMU-present architectures
        procfs: fix unintended truncation of returned mapped address
        writeback: fix negative bdi max pause
        percpu_refcount: export symbols
        fs: buffer: move allocation failure loop into the allocator
        mm: memcg: handle non-error OOM situations more gracefully
        tools/testing/selftests: fix uninitialized variable
        block/partitions/efi.c: treat size mismatch as a warning, not an error
        mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
        mm/zswap: bugfix: memory leak when re-swapon
        mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
        mm: migration: do not lose soft dirty bit if page is in migration state
        gcov: MAINTAINERS: Add an entry for gcov
        mm/hugetlb.c: correct missing private flag clearing
        mm/vmscan.c: don't forget to free shrinker->nr_deferred
        ipc/sem.c: synchronize semop and semctl with IPC_RMID
        ipc: update locking scheme comments
        ...
      056cdce0
    • H
      mm: revert mremap pud_free anti-fix · 57a8f0cd
      Hugh Dickins 提交于
      Revert commit 1ecfd533 ("mm/mremap.c: call pud_free() after fail
      calling pmd_alloc()").
      
      The original code was correct: pud_alloc(), pmd_alloc(), pte_alloc_map()
      ensure that the pud, pmd, pt is already allocated, and seldom do they
      need to allocate; on failure, upper levels are freed if appropriate by
      the subsequent do_munmap().  Whereas commit 1ecfd533 did an
      unconditional pud_free() of a most-likely still-in-use pud: saved only
      by the near-impossiblity of pmd_alloc() failing.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Chen Gang <gang.chen@asianux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57a8f0cd
    • H
      mm: fix BUG in __split_huge_page_pmd · 750e8165
      Hugh Dickins 提交于
      Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of
      __split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED).
      
      It's invalid: we don't always have down_write of mmap_sem there: a racing
      do_huge_pmd_wp_page() might have copied-on-write to another huge page
      before our split_huge_page() got the anon_vma lock.
      
      Forget the BUG_ON, just go back and try again if this happens.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      750e8165
    • K
      swap: fix set_blocksize race during swapon/swapoff · 5b808a23
      Krzysztof Kozlowski 提交于
      Fix race between swapoff and swapon.  Swapoff used old_block_size from
      swap_info outside of swapon_mutex so it could be overwritten by
      concurrent swapon.
      
      The race has visible effect only if more than one swap block device
      exists with different block sizes (e.g.  /dev/sda1 with block size 4096
      and /dev/sdb1 with 512).  In such case it leads to setting the blocksize
      of swapped off device with wrong blocksize.
      
      The bug can be triggered with multiple concurrent swapoff and swapon:
      0. Swap for some device is on.
      1. swapoff:
      First the swapoff is called on this device and "struct swap_info_struct
      *p" is assigned. This is done under swap_lock however this lock is
      released for the call try_to_unuse().
      
      2. swapon:
      After the assignment above (and before acquiring swapon_mutex &
      swap_lock by swapoff) the swapon is called on the same device.
      The p->old_block_size is assigned to the value of block_size the device.
      This block size should be the same as previous but sometimes it is not.
      The swapon ends successfully.
      
      3. swapoff:
      Swapoff resumes, grabs the locks and mutex and continues to disable this
      swap device. Now it sets the block size to value taken from swap_info
      which was overwritten by swapon in 2.
      Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Reported-by: NWeijie Yang <weijie.yang.kh@gmail.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Shaohua Li <shli@fusionio.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b808a23
    • H
      procfs: call default get_unmapped_area on MMU-present architectures · fad1a86e
      HATAYAMA Daisuke 提交于
      Commit c4fe2448 ("sparc: fix PCI device proc file mmap(2)") added
      proc_reg_get_unmapped_area in proc_reg_file_ops and
      proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
      get_unmapped_area method is not defined for the target procfs file,
      which causes regression of mmap on /proc/vmcore.
      
      To address this issue, like get_unmapped_area(), call default
      current->mm->get_unmapped_area on MMU-present architectures if
      pde->proc_fops->get_unmapped_area, i.e.  the one in actual file
      operation in the procfs file, is not defined.
      Reported-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Tested-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fad1a86e
    • H
      procfs: fix unintended truncation of returned mapped address · 2cbe3b0a
      HATAYAMA Daisuke 提交于
      Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
      mapped virtual address returned from get_unmapped_area method in
      pde->proc_fops due to the variable rv of signed integer on x86_64.  This
      is too small to have vitual address of unsigned long on x86_64 since on
      x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
      To fix this issue, use unsigned long instead.
      
      Fixes a regression added in commit c4fe2448 ("sparc: fix PCI device
      proc file mmap(2)").
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Tested-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2cbe3b0a
    • F
      writeback: fix negative bdi max pause · e3b6c655
      Fengguang Wu 提交于
      Toralf runs trinity on UML/i386.  After some time it hangs and the last
      message line is
      
      	BUG: soft lockup - CPU#0 stuck for 22s! [trinity-child0:1521]
      
      It's found that pages_dirtied becomes very large.  More than 1000000000
      pages in this case:
      
      	period = HZ * pages_dirtied / task_ratelimit;
      	BUG_ON(pages_dirtied > 2000000000);
      	BUG_ON(pages_dirtied > 1000000000);      <---------
      
      UML debug printf shows that we got negative pause here:
      
      	ick: pause : -984
      	ick: pages_dirtied : 0
      	ick: task_ratelimit: 0
      
      	 pause:
      	+       if (pause < 0)  {
      	+               extern int printf(char *, ...);
      	+               printf("ick : pause : %li\n", pause);
      	+               printf("ick: pages_dirtied : %lu\n", pages_dirtied);
      	+               printf("ick: task_ratelimit: %lu\n", task_ratelimit);
      	+               BUG_ON(1);
      	+       }
      	        trace_balance_dirty_pages(bdi,
      
      Since pause is bounded by [min_pause, max_pause] where min_pause is also
      bounded by max_pause.  It's suspected and demonstrated that the
      max_pause calculation goes wrong:
      
      	ick: pause : -717
      	ick: min_pause : -177
      	ick: max_pause : -717
      	ick: pages_dirtied : 14
      	ick: task_ratelimit: 0
      
      The problem lies in the two "long = unsigned long" assignments in
      bdi_max_pause() which might go negative if the highest bit is 1, and the
      min_t(long, ...) check failed to protect it falling under 0.  Fix all of
      them by using "unsigned long" throughout the function.
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Reported-by: NToralf Förster <toralf.foerster@gmx.de>
      Tested-by: NToralf Förster <toralf.foerster@gmx.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3b6c655
    • M
      percpu_refcount: export symbols · 5e9dd373
      Matias Bjorling 提交于
      Export the interface to be used within modules.
      Signed-off-by: NMatias Bjorling <m@bjorling.me>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e9dd373
    • J
      fs: buffer: move allocation failure loop into the allocator · 84235de3
      Johannes Weiner 提交于
      Buffer allocation has a very crude indefinite loop around waking the
      flusher threads and performing global NOFS direct reclaim because it can
      not handle allocation failures.
      
      The most immediate problem with this is that the allocation may fail due
      to a memory cgroup limit, where flushers + direct reclaim might not make
      any progress towards resolving the situation at all.  Because unlike the
      global case, a memory cgroup may not have any cache at all, only
      anonymous pages but no swap.  This situation will lead to a reclaim
      livelock with insane IO from waking the flushers and thrashing unrelated
      filesystem cache in a tight loop.
      
      Use __GFP_NOFAIL allocations for buffers for now.  This makes sure that
      any looping happens in the page allocator, which knows how to
      orchestrate kswapd, direct reclaim, and the flushers sensibly.  It also
      allows memory cgroups to detect allocations that can't handle failure
      and will allow them to ultimately bypass the limit if reclaim can not
      make progress.
      Reported-by: NazurIt <azurit@pobox.sk>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84235de3
    • J
      mm: memcg: handle non-error OOM situations more gracefully · 49426420
      Johannes Weiner 提交于
      Commit 3812c8c8 ("mm: memcg: do not trap chargers with full
      callstack on OOM") assumed that only a few places that can trigger a
      memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
      readahead.  But there are many more and it's impractical to annotate
      them all.
      
      First of all, we don't want to invoke the OOM killer when the failed
      allocation is gracefully handled, so defer the actual kill to the end of
      the fault handling as well.  This simplifies the code quite a bit for
      added bonus.
      
      Second, since a failed allocation might not be the abrupt end of the
      fault, the memcg OOM handler needs to be re-entrant until the fault
      finishes for subsequent allocation attempts.  If an allocation is
      attempted after the task already OOMed, allow it to bypass the limit so
      that it can quickly finish the fault and invoke the OOM killer.
      Reported-by: NazurIt <azurit@pobox.sk>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49426420
    • F
      tools/testing/selftests: fix uninitialized variable · c88b05b2
      Felipe Pena 提交于
      The err variable is intended to receive the timer_create() return before
      checking it
      Signed-off-by: NFelipe Pena <felipensp@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c88b05b2
    • D
      block/partitions/efi.c: treat size mismatch as a warning, not an error · 87fc0ad2
      Doug Anderson 提交于
      In commit 27a7c642 ("partitions/efi: account for pmbr size in lba")
      we started treating bad sizes in lba field of the partition that has the
      0xEE (GPT protective) as errors.
      
      However, we may run into these "bad sizes" in the real world if someone
      uses dd to copy an image from a smaller disk to a bigger disk.  Since
      this case used to work (even without using force_gpt), keep it working
      and treat the size mismatch as a warning instead of an error.
      Reported-by: NJosh Triplett <josh@joshtriplett.org>
      Reported-by: NSean Paul <seanpaul@chromium.org>
      Signed-off-by: NDoug Anderson <dianders@chromium.org>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Tested-by: NArtem Bityutskiy <dedekind1@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87fc0ad2
    • A
      mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages · ef5a22be
      Andrea Arcangeli 提交于
      Commit 11feeb49 ("kvm: optimize away THP checks in
      kvm_is_mmio_pfn()") introduced a memory leak when KVM is run on gigantic
      compound pages.
      
      That commit depends on the assumption that PG_reserved is identical for
      all head and tail pages of a compound page.  So that if get_user_pages
      returns a tail page, we don't need to check the head page in order to
      know if we deal with a reserved page that requires different
      refcounting.
      
      The assumption that PG_reserved is the same for head and tail pages is
      certainly correct for THP and regular hugepages, but gigantic hugepages
      allocated through bootmem don't clear the PG_reserved on the tail pages
      (the clearing of PG_reserved is done later only if the gigantic hugepage
      is freed).
      
      This patch corrects the gigantic compound page initialization so that we
      can retain the optimization in 11feeb49.  The cacheline was already
      modified in order to set PG_tail so this won't affect the boot time of
      large memory systems.
      
      [akpm@linux-foundation.org: tweak comment layout and grammar]
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: Nandy123 <ajs124.ajs124@gmail.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Acked-by: NRafael Aquini <aquini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef5a22be
    • W
      mm/zswap: bugfix: memory leak when re-swapon · aa9bca05
      Weijie Yang 提交于
      zswap_tree is not freed when swapoff, and it got re-kmalloced in swapon,
      so a memory leak occurs.
      
      Free the memory of zswap_tree in zswap_frontswap_invalidate_area().
      Signed-off-by: NWeijie Yang <weijie.yang@samsung.com>
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>
      From: Weijie Yang <weijie.yang@samsung.com>
      Subject: mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently
      
      Consider the following scenario:
      thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page)
      thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
      	finished, entry x and its zbud is not freed as its refcount != 0
      	now, the swap_map[x] = 0
      thread 0: now call zswap_get_swap_cache_page
      	swapcache_prepare return -ENOENT because entry x is not used any more
      	zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
      	zswap_writeback_entry do nothing except put refcount
      Now, the memory of zswap_entry x and its zpage leak.
      
      Modify:
       - check the refcount in fail path, free memory if it is not referenced.
      
       - use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
         can be not only caused by nomem but also by invalidate.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NWeijie Yang <weijie.yang@samsung.com>
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>
      Acked-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa9bca05
    • C
      mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages · e9cdd6e7
      Cyrill Gorcunov 提交于
      If a page we are inspecting is in swap we may occasionally report it as
      having soft dirty bit (even if it is clean).  The pte_soft_dirty helper
      should be called on present pte only.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9cdd6e7
    • C
      mm: migration: do not lose soft dirty bit if page is in migration state · c3d16e16
      Cyrill Gorcunov 提交于
      If page migration is turned on in config and the page is migrating, we
      may lose the soft dirty bit.  If fork and mprotect are called on
      migrating pages (once migration is complete) pages do not obtain the
      soft dirty bit in the correspond pte entries.  Fix it adding an
      appropriate test on swap entries.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3d16e16
    • P
    • J
      mm/hugetlb.c: correct missing private flag clearing · 16c794b4
      Joonsoo Kim 提交于
      We should clear the page's private flag when returing the page to the
      hugepage pool.  Otherwise, marked hugepage can be allocated to the user
      who tries to allocate the non-reserved hugepage.  If this user fail to
      map this hugepage, he would try to return the page to the hugepage pool.
      Since this page has a private flag, resv_huge_pages would mistakenly
      increase.  This patch fixes this situation.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16c794b4
    • A
      mm/vmscan.c: don't forget to free shrinker->nr_deferred · ae393321
      Andrew Vagin 提交于
      This leak was added by commit 1d3d4437 ("vmscan: per-node deferred
      work").
      
      unreferenced object 0xffff88006ada3bd0 (size 8):
        comm "criu", pid 14781, jiffies 4295238251 (age 105.641s)
        hex dump (first 8 bytes):
          00 00 00 00 00 00 00 00                          ........
        backtrace:
          [<ffffffff8170caee>] kmemleak_alloc+0x5e/0xc0
          [<ffffffff811c0527>] __kmalloc+0x247/0x310
          [<ffffffff8117848c>] register_shrinker+0x3c/0xa0
          [<ffffffff811e115b>] sget+0x5ab/0x670
          [<ffffffff812532f4>] proc_mount+0x54/0x170
          [<ffffffff811e1893>] mount_fs+0x43/0x1b0
          [<ffffffff81202dd2>] vfs_kern_mount+0x72/0x110
          [<ffffffff81202e89>] kern_mount_data+0x19/0x30
          [<ffffffff812530a0>] pid_ns_prepare_proc+0x20/0x40
          [<ffffffff81083c56>] alloc_pid+0x466/0x4a0
          [<ffffffff8105aeda>] copy_process+0xc6a/0x1860
          [<ffffffff8105beab>] do_fork+0x8b/0x370
          [<ffffffff8105c1a6>] SyS_clone+0x16/0x20
          [<ffffffff8171f739>] stub_clone+0x69/0x90
          [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: NAndrew Vagin <avagin@openvz.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Glauber Costa <glommer@openvz.org>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae393321
    • M
      ipc/sem.c: synchronize semop and semctl with IPC_RMID · 6e224f94
      Manfred Spraul 提交于
      After acquiring the semlock spinlock, operations must test that the
      array is still valid.
      
       - semctl() and exit_sem() would walk stale linked lists (ugly, but
         should be ok: all lists are empty)
      
       - semtimedop() would sleep forever - and if woken up due to a signal -
         access memory after free.
      
      The patch also:
       - standardizes the tests for .deleted, so that all tests in one
         function leave the function with the same approach.
       - unconditionally tests for .deleted immediately after every call to
         sem_lock - even it it means that for semctl(GETALL), .deleted will be
         tested twice.
      
      Both changes make the review simpler: After every sem_lock, there must
      be a test of .deleted, followed by a goto to the cleanup code (if the
      function uses "goto cleanup").
      
      The only exception is semctl_down(): If sem_ids().rwsem is locked, then
      the presence in ids->ipcs_idr is equivalent to !.deleted, thus no
      additional test is required.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e224f94
    • D
      ipc: update locking scheme comments · 18ccee26
      Davidlohr Bueso 提交于
      The initial documentation was a bit incomplete, update accordingly.
      
      [akpm@linux-foundation.org: make it more readable in 80 columns]
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18ccee26
    • D
      mm, memcg: protect mem_cgroup_read_events for cpu hotplug · 9c567512
      David Rientjes 提交于
      for_each_online_cpu() needs the protection of {get,put}_online_cpus() so
      cpu_online_mask doesn't change during the iteration.
      
      cpu_hotplug.lock is held while a cpu is going down, it's a coarse lock
      that is used kernel-wide to synchronize cpu hotplug activity.  Memcg has
      a cpu hotplug notifier, called while there may not be any cpu hotplug
      refcounts, which drains per-cpu event counts to memcg->nocpu_base.events
      to maintain a cumulative event count as cpus disappear.  Without
      get_online_cpus() in mem_cgroup_read_events(), it's possible to account
      for the event count on a dying cpu twice, and this value may be
      significantly large.
      
      In fact, all memcg->pcp_counter_lock use should be nested by
      {get,put}_online_cpus().
      
      This fixes that issue and ensures the reported statistics are not vastly
      over-reported during cpu hotplug.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c567512
    • Y
      driver core: Release device_hotplug_lock when store_mem_state returns EINVAL · a37f8630
      Yasuaki Ishimatsu 提交于
      When inserting a wrong value to /sys/devices/system/memory/memoryX/state file,
      following messages are shown. And device_hotplug_lock is never released.
      
      ================================================
      [ BUG: lock held when returning to user space! ]
      3.12.0-rc4-debug+ #3 Tainted: G        W
      ------------------------------------------------
      bash/6442 is leaving the kernel with locks still held!
      1 lock held by bash/6442:
       #0:  (device_hotplug_lock){+.+.+.}, at: [<ffffffff8146cbb5>] lock_device_hotplug_sysfs+0x15/0x50
      
      This issue was introdued by commit fa2be40f (drivers: base: use standard
      device online/offline for state change).
      
      This patch releases device_hotplug_lcok when store_mem_state returns EINVAL.
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      CC: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a37f8630
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 0056019d
      Linus Torvalds 提交于
      Pull tmpfile fix from Al Viro:
       "A fix for double iput() in ->tmpfile() on ext3 and ext4; I'd fucked it
        up, Miklos has caught it"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        ext[34]: fix double put in tmpfile
      0056019d
    • L
      Merge tag 'dm-3.12-fix-cve' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 8359ffa5
      Linus Torvalds 提交于
      Pull device-mapper fix from Alasdair Kergon:
       "A patch to avoid data corruption in a device-mapper snapshot.
      
        This is primarily a data corruption bug that all users of
        device-mapper snapshots will want to fix.  The CVE is due to a data
        leak under specific circumstances if, for example, the snapshot is
        presented to a virtual machine: a block written as data inside the VM
        can get interpreted incorrectly on the host outside the VM as
        metadata, causing the host to provide the VM with access to blocks it
        would not otherwise see.  This is likely to affect few, if any,
        people"
      
      * tag 'dm-3.12-fix-cve' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm snapshot: fix data corruption
      8359ffa5
    • L
      Merge tag 'gpio-v3.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 386aa051
      Linus Torvalds 提交于
      Pull gpio fixes from Linus Walleij:
       "Three GPIO fixes for the v3.12 series:
         - A fix to the Lynxpoint IRQ handler
         - Two late fixes to fallout from the gpiod refactoring"
      
      * tag 'gpio-v3.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpiolib: let gpiod_request() return -EPROBE_DEFER
        gpiolib: safer implementation of desc_to_gpio()
        gpio/lynxpoint: check if the interrupt is enabled in IRQ handler
      386aa051
    • K
      Drivers: hv: vmbus: Fix a bug in channel rescind code · 90d33f3e
      K. Y. Srinivasan 提交于
      Rescind of subchannels were not being correctly handled. Fix the bug.
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Cc: <stable@vger.kernel.org>        [3.11+]
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90d33f3e
    • M
      usb: misc: usb3503: Fix compile error due to incorrect regmap depedency · eea88512
      Matthew Dawson 提交于
      The USB3503 driver had an incorrect depedency on REGMAP, instead of
      REGMAP_I2C.  This caused the build to fail since the necessary regmap
      i2c pieces were not available.
      Signed-off-by: NMatthew Dawson <matthew@mjdsystems.ca>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eea88512
    • R
      usb/chipidea: fix oops on memory allocation failure · 41314fea
      Russell King - ARM Linux 提交于
      When CMA fails to initialize in v3.12-rc4, the chipidea driver oopses
      the kernel while trying to remove and put the HCD which doesn't exist:
      
      WARNING: CPU: 0 PID: 6 at /home/rmk/git/linux-rmk/arch/arm/mm/dma-mapping.c:511
      __dma_alloc+0x200/0x240()
      coherent pool not initialised!
      Modules linked in:
      CPU: 0 PID: 6 Comm: kworker/u2:0 Tainted: G        W    3.12.0-rc4+ #56
      Workqueue: deferwq deferred_probe_work_func
      Backtrace:
      [<c001218c>] (dump_backtrace+0x0/0x10c) from [<c0012328>] (show_stack+0x18/0x1c)
       r6:c05fd9cc r5:000001ff r4:00000000 r3:df86ad00
      [<c0012310>] (show_stack+0x0/0x1c) from [<c05f3a4c>] (dump_stack+0x70/0x8c)
      [<c05f39dc>] (dump_stack+0x0/0x8c) from [<c00230a8>] (warn_slowpath_common+0x6c/0x8c)
       r4:df883a60 r3:df86ad00
      [<c002303c>] (warn_slowpath_common+0x0/0x8c) from [<c002316c>] (warn_slowpath_fmt+0x38/0x40)
       r8:ffffffff r7:00001000 r6:c083b808 r5:00000000 r4:df2efe80
      [<c0023134>] (warn_slowpath_fmt+0x0/0x40) from [<c00196bc>] (__dma_alloc+0x200/0x240)
       r3:00000000 r2:c05fda00
      [<c00194bc>] (__dma_alloc+0x0/0x240) from [<c001982c>] (arm_dma_alloc+0x88/0xa0)
      [<c00197a4>] (arm_dma_alloc+0x0/0xa0) from [<c03e2904>] (ehci_setup+0x1f4/0x438)
      [<c03e2710>] (ehci_setup+0x0/0x438) from [<c03cbd60>] (usb_add_hcd+0x18c/0x664)
      [<c03cbbd4>] (usb_add_hcd+0x0/0x664) from [<c03e89f4>] (host_start+0xf0/0x180)
      [<c03e8904>] (host_start+0x0/0x180) from [<c03e7c34>] (ci_hdrc_probe+0x360/0x670
      )
       r6:df2ef410 r5:00000000 r4:df2c3010 r3:c03e8904
      [<c03e78d4>] (ci_hdrc_probe+0x0/0x670) from [<c0311044>] (platform_drv_probe+0x20/0x24)
      [<c0311024>] (platform_drv_probe+0x0/0x24) from [<c030fcac>] (driver_probe_device+0x9c/0x234)
      ...
      ---[ end trace c88ccaf3969e8422 ]---
      Unable to handle kernel NULL pointer dereference at virtual address 00000028
      pgd = c0004000
      [00000028] *pgd=00000000
      Internal error: Oops: 17 [#1] SMP ARM
      Modules linked in:
      CPU: 0 PID: 6 Comm: kworker/u2:0 Tainted: G        W    3.12.0-rc4+ #56
      Workqueue: deferwq deferred_probe_work_func
      task: df86ad00 ti: df882000 task.ti: df882000
      PC is at usb_remove_hcd+0x10/0x150
      LR is at host_stop+0x1c/0x3c
      pc : [<c03cacec>]    lr : [<c03e88e4>]    psr: 60000013
      sp : df883b50  ip : df883b78  fp : df883b74
      r10: c11f4c54  r9 : c0836450  r8 : df30c400
      r7 : fffffff4  r6 : df2ef410  r5 : 00000000  r4 : df2c3010
      r3 : 00000000  r2 : 00000000  r1 : df86b0a0  r0 : 00000000
      Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
      Control: 10c53c7d  Table: 2f29404a  DAC: 00000015
      Process kworker/u2:0 (pid: 6, stack limit = 0xdf882240)
      Stack: (0xdf883b50 to 0xdf884000)
      ...
      Backtrace:
      [<c03cacdc>] (usb_remove_hcd+0x0/0x150) from [<c03e88e4>] (host_stop+0x1c/0x3c)
       r6:df2ef410 r5:00000000 r4:df2c3010
      [<c03e88c8>] (host_stop+0x0/0x3c) from [<c03e8aa0>] (ci_hdrc_host_destroy+0x1c/0x20)
       r5:00000000 r4:df2c3010
      [<c03e8a84>] (ci_hdrc_host_destroy+0x0/0x20) from [<c03e7c80>] (ci_hdrc_probe+0x3ac/0x670)
      [<c03e78d4>] (ci_hdrc_probe+0x0/0x670) from [<c0311044>] (platform_drv_probe+0x20/0x24)
      [<c0311024>] (platform_drv_probe+0x0/0x24) from [<c030fcac>] (driver_probe_device+0x9c/0x234)
      [<c030fc10>] (driver_probe_device+0x0/0x234) from [<c030ff28>] (__device_attach+0x44/0x48)
      ...
      ---[ end trace c88ccaf3969e8423 ]---
      
      Fix this so at least we can continue booting and get to a shell prompt.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Tested-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41314fea
    • O
      usb-storage: add quirk for mandatory READ_CAPACITY_16 · 32c37fc3
      Oliver Neukum 提交于
      Some USB drive enclosures do not correctly report an
      overflow condition if they hold a drive with a capacity
      over 2TB and are confronted with a READ_CAPACITY_10.
      They answer with their capacity modulo 2TB.
      The generic layer cannot cope with that. It must be told
      to use READ_CAPACITY_16 from the beginning.
      Signed-off-by: NOliver Neukum <oneukum@suse.de>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      32c37fc3