1. 06 5月, 2016 8 次提交
    • A
      mm: thp: kvm: fix memory corruption in KVM with THP enabled · 127393fb
      Andrea Arcangeli 提交于
      After the THP refcounting change, obtaining a compound pages from
      get_user_pages() no longer allows us to assume the entire compound page
      is immediately mappable from a secondary MMU.
      
      A secondary MMU doesn't want to call get_user_pages() more than once for
      each compound page, in order to know if it can map the whole compound
      page.  So a secondary MMU needs to know from a single get_user_pages()
      invocation when it can map immediately the entire compound page to avoid
      a flood of unnecessary secondary MMU faults and spurious
      atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
      users).
      
      Ideally instead of the page->_mapcount < 1 check, get_user_pages()
      should return the granularity of the "page" mapping in the "mm" passed
      to get_user_pages().  However it's non trivial change to pass the "pmd"
      status belonging to the "mm" walked by get_user_pages up the stack (up
      to the caller of get_user_pages).  So the fix just checks if there is
      not a single pte mapping on the page returned by get_user_pages, and in
      turn if the caller can assume that the whole compound page is mapped in
      the current "mm" (in a pmd_trans_huge()).  In such case the entire
      compound page is safe to map into the secondary MMU without additional
      get_user_pages() calls on the surrounding tail/head pages.  In addition
      of being faster, not having to run other get_user_pages() calls also
      reduces the memory footprint of the secondary MMU fault in case the pmd
      split happened as result of memory pressure.
      
      Without this fix after a MADV_DONTNEED (like invoked by QEMU during
      postcopy live migration or balloning) or after generic swapping (with a
      failure in split_huge_page() that would only result in pmd splitting and
      not a physical page split), KVM would map the whole compound page into
      the shadow pagetables, despite regular faults or userfaults (like
      UFFDIO_COPY) may map regular pages into the primary MMU as result of the
      pte faults, leading to the guest mode and userland mode going out of
      sync and not working on the same memory at all times.
      
      Any other secondary MMU notifier manager (KVM is just one of the many
      MMU notifier users) will need the same information if it doesn't want to
      run a flood of get_user_pages_fast and it can support multiple
      granularity in the secondary MMU mappings, so I think it is justified to
      be exposed not just to KVM.
      
      The other option would be to move transparent_hugepage_adjust to
      mm/huge_memory.c but that currently has all kind of KVM data structures
      in it, so it's definitely not a cut-and-paste work, so I couldn't do a
      fix as cleaner as this one for 4.6.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: "Li, Liang Z" <liang.z.li@intel.com>
      Cc: Amit Shah <amit.shah@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      127393fb
    • E
      MAINTAINERS: fix Rajendra Nayak's address · ff2de822
      Eric Engestrom 提交于
      Signed-off-by: NEric Engestrom <eric.engestrom@imgtec.com>
      Cc: Rajendra Nayak <rnayak@codeaurora.org>
      Cc: Afzal Mohammed <afzal.mohd.ma@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff2de822
    • H
      mm, cma: prevent nr_isolated_* counters from going negative · 14af4a5e
      Hugh Dickins 提交于
      /proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
      increasingly negative under compaction: which would add delay when
      should be none, or no delay when should delay.  The bug in compaction
      was due to a recent mmotm patch, but much older instance of the bug was
      also noticed in isolate_migratepages_range() which is used for CMA and
      gigantic hugepage allocations.
      
      The bug is caused by putback_movable_pages() in an error path
      decrementing the isolated counters without them being previously
      incremented by acct_isolated().  Fix isolate_migratepages_range() by
      removing the error-path putback, thus reaching acct_isolated() with
      migratepages still isolated, and leaving putback to caller like most
      other places do.
      
      Fixes: edc2ca61 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
      [vbabka@suse.cz: expanded the changelog]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      14af4a5e
    • J
      mm: update min_free_kbytes from khugepaged after core initialization · bc22af74
      Jason Baron 提交于
      Khugepaged attempts to raise min_free_kbytes if its set too low.
      However, on boot khugepaged sets min_free_kbytes first from
      subsys_initcall(), and then the mm 'core' over-rides min_free_kbytes
      after from init_per_zone_wmark_min(), via a module_init() call.
      
      Khugepaged used to use a late_initcall() to set min_free_kbytes (such
      that it occurred after the core initialization), however this was
      removed when the initialization of min_free_kbytes was integrated into
      the starting of the khugepaged thread.
      
      The fix here is simply to invoke the core initialization using a
      core_initcall() instead of module_init(), such that the previous
      initialization ordering is restored.  I didn't restore the
      late_initcall() since start_stop_khugepaged() already sets
      min_free_kbytes via set_recommended_min_free_kbytes().
      
      This was noticed when we had a number of page allocation failures when
      moving a workload to a kernel with this new initialization ordering.  On
      an 8GB system this restores min_free_kbytes back to 67584 from 11365
      when CONFIG_TRANSPARENT_HUGEPAGE=y is set and either
      CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y or
      CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.
      
      Fixes: 79553da2 ("thp: cleanup khugepaged startup")
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc22af74
    • H
      huge pagecache: mmap_sem is unlocked when truncation splits pmd · 68428398
      Hugh Dickins 提交于
      zap_pmd_range()'s CONFIG_DEBUG_VM !rwsem_is_locked(&mmap_sem) BUG() will
      be invalid with huge pagecache, in whatever way it is implemented:
      truncation of a hugely-mapped file to an unhugely-aligned size would
      easily hit it.
      
      (Although anon THP could in principle apply khugepaged to private file
      mappings, which are not excluded by the MADV_HUGEPAGE restrictions, in
      practice there's a vm_ops check which excludes them, so it never hits
      this BUG() - there's no interface to "truncate" an anonymous mapping.)
      
      We could complicate the test, to check i_mmap_rwsem also when there's a
      vm_file; but my inclination was to make zap_pmd_range() more readable by
      simply deleting this check.  A search has shown no report of the issue
      in the years since commit e0897d75 ("mm, thp: print useful
      information when mmap_sem is unlocked in zap_pmd_range") expanded it
      from VM_BUG_ON() - though I cannot point to what commit I would say then
      fixed the issue.
      
      But there are a couple of other patches now floating around, neither yet
      in the tree: let's agree to retain the check as a VM_BUG_ON_VMA(), as
      Matthew Wilcox has done; but subject to a vma_is_anonymous() check, as
      Kirill Shutemov has done.  And let's get this in, without waiting for
      any particular huge pagecache implementation to reach the tree.
      
      Matthew said "We can reproduce this BUG() in the current Linus tree with
      DAX PMDs".
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Tested-by: NMatthew Wilcox <willy@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68428398
    • A
      rapidio/mport_cdev: fix uapi type definitions · 4e1016da
      Alexandre Bounine 提交于
      Fix problems in uapi definitions reported by Gabriel Laskar: (see
      https://lkml.org/lkml/2016/4/5/205 for details)
      
       - move public header file rio_mport_cdev.h to include/uapi/linux directory
       - change types in data structures passed as IOCTL parameters
       - improve parameter checking in some IOCTL service routines
      Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Reported-by: NGabriel Laskar <gabriel@lse.epita.fr>
      Tested-by: NBarry Wood <barry.wood@idt.com>
      Cc: Gabriel Laskar <gabriel@lse.epita.fr>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
      Cc: Barry Wood <barry.wood@idt.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e1016da
    • J
      mm: memcontrol: let v2 cgroups follow changes in system swappiness · 4550c4e1
      Johannes Weiner 提交于
      Cgroup2 currently doesn't have a per-cgroup swappiness setting.  We
      might want to add one later - that's a different discussion - but until
      we do, the cgroups should always follow the system setting.  Otherwise
      it will be unchangeably set to whatever the ancestor inherited from the
      system setting at the time of cgroup creation.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Cc: <stable@vger.kernel.org>	[4.5]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4550c4e1
    • Y
      mm: thp: correct split_huge_pages file permission · 145bdaa1
      Yang Shi 提交于
      split_huge_pages doesn't support get method at all, so the read
      permission sounds confusing, change the permission to write only.
      
      And, add "\n" to the output of set method to make it more readable.
      Signed-off-by: NYang Shi <yang.shi@linaro.org>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      145bdaa1
  2. 05 5月, 2016 6 次提交
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · c5e0666c
      Linus Torvalds 提交于
      Pull userns fix from Eric Biederman:
       "This contains just a single fix for a nasty oops"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
        propogate_mnt: Handle the first propogated copy being a slave
      c5e0666c
    • L
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 3cedbec3
      Linus Torvalds 提交于
      Pull virtio/qemu fixes from Michael Tsirkin:
       "A couple of fixes for virtio and for the new QEMU fw cfg driver"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio: Silence uninitialized variable warning
        firmware: qemu_fw_cfg.c: potential unintialized variable
      3cedbec3
    • E
      propogate_mnt: Handle the first propogated copy being a slave · 5ec0811d
      Eric W. Biederman 提交于
      When the first propgated copy was a slave the following oops would result:
      > BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      > IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
      > PGD bacd4067 PUD bac66067 PMD 0
      > Oops: 0000 [#1] SMP
      > Modules linked in:
      > CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523
      > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      > task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000
      > RIP: 0010:[<ffffffff811fba4e>]  [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
      > RSP: 0018:ffff8800bac3fd38  EFLAGS: 00010283
      > RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010
      > RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480
      > RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000
      > R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000
      > R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00
      > FS:  00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      > CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0
      > Stack:
      >  ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85
      >  ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40
      >  0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0
      > Call Trace:
      >  [<ffffffff811fbf85>] propagate_mnt+0x105/0x140
      >  [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0
      >  [<ffffffff811f1ec3>] graft_tree+0x63/0x70
      >  [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100
      >  [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0
      >  [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70
      >  [<ffffffff811f3a45>] SyS_mount+0x75/0xc0
      >  [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0
      >  [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25
      > Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30
      > RIP  [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
      >  RSP <ffff8800bac3fd38>
      > CR2: 0000000000000010
      > ---[ end trace 2725ecd95164f217 ]---
      
      This oops happens with the namespace_sem held and can be triggered by
      non-root users.  An all around not pleasant experience.
      
      To avoid this scenario when finding the appropriate source mount to
      copy stop the walk up the mnt_master chain when the first source mount
      is encountered.
      
      Further rewrite the walk up the last_source mnt_master chain so that
      it is clear what is going on.
      
      The reason why the first source mount is special is that it it's
      mnt_parent is not a mount in the dest_mnt propagation tree, and as
      such termination conditions based up on the dest_mnt mount propgation
      tree do not make sense.
      
      To avoid other kinds of confusion last_dest is not changed when
      computing last_source.  last_dest is only used once in propagate_one
      and that is above the point of the code being modified, so changing
      the global variable is meaningless and confusing.
      
      Cc: stable@vger.kernel.org
      fixes: f2ebb3a9 ("smarter propagate_mnt()")
      Reported-by: NTycho Andersen <tycho.andersen@canonical.com>
      Reviewed-by: NSeth Forshee <seth.forshee@canonical.com>
      Tested-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      5ec0811d
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 21a9703d
      Linus Torvalds 提交于
      Pull input fixes from Dmitry Torokhov.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: atmel_mxt_ts - use mxt_acquire_irq in mxt_soft_reset
        Input: zforce_ts - fix dual touch recognition
        Input: twl6040-vibra - fix atomic schedule panic
      21a9703d
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · 4810d968
      Linus Torvalds 提交于
      Pull IMA fix from James Morris.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
        ima: fix the string representation of the LSM/IMA hook enumeration ordering
      4810d968
    • L
      Merge tag 'for-linus-4.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 41143b77
      Linus Torvalds 提交于
      Pull xen regression fixes from David Vrabel:
      
       - Fix two regressions causing crashes in 32-bit PV guests
      
       - Fix a regression in the evtchn driver
      
      * tag 'for-linus-4.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/evtchn: fix ring resize when binding new events
        xen/balloon: Fix crash when ballooning on x86 32 bit PAE
        xen: Fix page <-> pfn conversion on 32 bit systems
      41143b77
  3. 04 5月, 2016 24 次提交
  4. 03 5月, 2016 2 次提交