1. 06 10月, 2018 1 次提交
    • M
      mm: migration: fix migration of huge PMD shared pages · 017b1660
      Mike Kravetz 提交于
      The page migration code employs try_to_unmap() to try and unmap the source
      page.  This is accomplished by using rmap_walk to find all vmas where the
      page is mapped.  This search stops when page mapcount is zero.  For shared
      PMD huge pages, the page map count is always 1 no matter the number of
      mappings.  Shared mappings are tracked via the reference count of the PMD
      page.  Therefore, try_to_unmap stops prematurely and does not completely
      unmap all mappings of the source page.
      
      This problem can result is data corruption as writes to the original
      source page can happen after contents of the page are copied to the target
      page.  Hence, data is lost.
      
      This problem was originally seen as DB corruption of shared global areas
      after a huge page was soft offlined due to ECC memory errors.  DB
      developers noticed they could reproduce the issue by (hotplug) offlining
      memory used to back huge pages.  A simple testcase can reproduce the
      problem by creating a shared PMD mapping (note that this must be at least
      PUD_SIZE in size and PUD_SIZE aligned (1GB on x86)), and using
      migrate_pages() to migrate process pages between nodes while continually
      writing to the huge pages being migrated.
      
      To fix, have the try_to_unmap_one routine check for huge PMD sharing by
      calling huge_pmd_unshare for hugetlbfs huge pages.  If it is a shared
      mapping it will be 'unshared' which removes the page table entry and drops
      the reference on the PMD page.  After this, flush caches and TLB.
      
      mmu notifiers are called before locking page tables, but we can not be
      sure of PMD sharing until page tables are locked.  Therefore, check for
      the possibility of PMD sharing before locking so that notifiers can
      prepare for the worst possible case.
      
      Link: http://lkml.kernel.org/r/20180823205917.16297-2-mike.kravetz@oracle.com
      [mike.kravetz@oracle.com: make _range_in_vma() a static inline]
        Link: http://lkml.kernel.org/r/6063f215-a5c8-2f0c-465a-2c515ddc952d@oracle.com
      Fixes: 39dde65c ("shared page table for hugetlb page")
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      017b1660
  2. 05 10月, 2018 14 次提交
  3. 04 10月, 2018 14 次提交
    • F
      drm/amdkfd: Fix incorrect use of process->mm · 11b29c9e
      Felix Kuehling 提交于
      This mm_struct pointer should never be dereferenced. If running in
      a user thread, just use current->mm. If running in a kernel worker
      use get_task_mm to get a safe reference to the mm_struct.
      Reviewed-by: NOded Gabbay <oded.gabbay@gmail.com>
      Acked-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      11b29c9e
    • S
      drm/amd/display: Signal hw_done() after waiting for flip_done() · 987bf116
      Shirish S 提交于
      In amdgpu_dm_commit_tail(), wait until flip_done() is signaled before
      we signal hw_done().
      
      [Why]
      
      This is to temporarily address a paging error that occurs when a
      nonblocking commit contends with another commit, particularly in a
      mirrored display configuration where at least 2 CRTCs are updated.
      The error occurs in drm_atomic_helper_wait_for_flip_done(), when we
      attempt to access the contents of new_crtc_state->commit.
      
      Here's the sequence for a mirrored 2 display setup (irrelevant steps
      left out for clarity):
      
      **THREAD 1**                        | **THREAD 2**
                                          |
      Initialize atomic state for flip    |
                                          |
      Queue worker                        |
                                         ...
      
                                          | Do work for flip
                                          |
                                          | Signal hw_done() on CRTC 1
                                          | Signal hw_done() on CRTC 2
                                          |
                                          | Wait for flip_done() on CRTC 1
      
                                      <---- **PREEMPTED BY THREAD 1**
      
      Initialize atomic state for cursor  |
      update (1)                          |
                                          |
      Do cursor update work on both CRTCs |
                                          |
      Clear atomic state (2)              |
      **DONE**                            |
                                         ...
                                          |
                                          | Wait for flip_done() on CRTC 2
                                          | *ERROR*
                                          |
      
      The issue starts with (1). When the atomic state is initialized, the
      current CRTC states are duplicated to be the new_crtc_states, and
      referenced to be the old_crtc_states. (The new_crtc_states are to be
      filled with update data.)
      
      Some things to note:
      
      * Due to the mirrored configuration, the cursor updates on both CRTCs.
      
      * At this point, the pflip IRQ has already been handled, and flip_done
        signaled on all CRTCs. The cursor commit can therefore continue.
      
      * The old_crtc_states used by the cursor update are the **same states**
        as the new_crtc_states used by the flip worker.
      
      At (2), the old_crtc_state is freed (*), and the cursor commit
      completes. We then context switch back to the flip worker, where we
      attempt to access the new_crtc_state->commit object. This is
      problematic, as this state has already been freed.
      
      (*) Technically, 'state->crtcs[i].state' is freed, which was made to
          reference old_crtc_state in drm_atomic_helper_swap_state()
      
      [How]
      
      By moving hw_done() after wait_for_flip_done(), we're guaranteed that
      the new_crtc_state (from the flip worker's perspective) still exists.
      This is because any other commit will be blocked, waiting for the
      hw_done() signal.
      
      Note that both the i915 and imx drivers have this sequence flipped
      already, masking this problem.
      Signed-off-by: NShirish S <shirish.s@amd.com>
      Signed-off-by: NLeo Li <sunpeng.li@amd.com>
      Reviewed-by: NHarry Wentland <harry.wentland@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      987bf116
    • P
      kvm: nVMX: fix entry with pending interrupt if APICv is enabled · 7e712684
      Paolo Bonzini 提交于
      Commit b5861e5c introduced a check on
      the interrupt-window and NMI-window CPU execution controls in order to
      inject an external interrupt vmexit before the first guest instruction
      executes.  However, when APIC virtualization is enabled the host does not
      need a vmexit in order to inject an interrupt at the next interrupt window;
      instead, it just places the interrupt vector in RVI and the processor will
      inject it as soon as possible.  Therefore, on machines with APICv it is
      not enough to check the CPU execution controls: the same scenario can also
      happen if RVI>vPPR.
      
      Fixes: b5861e5cReviewed-by: NNikita Leshchenko <nikita.leshchenko@oracle.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e712684
    • M
      ovl: fix format of setxattr debug · 1a8f8d2a
      Miklos Szeredi 提交于
      Format has a typo: it was meant to be "%.*s", not "%*s".  But at some point
      callers grew nonprintable values as well, so use "%*pE" instead with a
      maximized length.
      Reported-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 3a1e819b ("ovl: store file handle of lower inode on copy up")
      Cc: <stable@vger.kernel.org> # v4.12
      1a8f8d2a
    • A
      ovl: fix access beyond unterminated strings · 601350ff
      Amir Goldstein 提交于
      KASAN detected slab-out-of-bounds access in printk from overlayfs,
      because string format used %*s instead of %.*s.
      
      > BUG: KASAN: slab-out-of-bounds in string+0x298/0x2d0 lib/vsprintf.c:604
      > Read of size 1 at addr ffff8801c36c66ba by task syz-executor2/27811
      >
      > CPU: 0 PID: 27811 Comm: syz-executor2 Not tainted 4.19.0-rc5+ #36
      ...
      >  printk+0xa7/0xcf kernel/printk/printk.c:1996
      >  ovl_lookup_index.cold.15+0xe8/0x1f8 fs/overlayfs/namei.c:689
      
      Reported-by: syzbot+376cea2b0ef340db3dd4@syzkaller.appspotmail.com
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: 359f392c ("ovl: lookup index entry for copy up origin")
      Cc: <stable@vger.kernel.org> # v4.13
      601350ff
    • P
      KVM: VMX: hide flexpriority from guest when disabled at the module level · 2cf7ea9f
      Paolo Bonzini 提交于
      As of commit 8d860bbe ("kvm: vmx: Basic APIC virtualization controls
      have three settings"), KVM will disable VIRTUALIZE_APIC_ACCESSES when
      a nested guest writes APIC_BASE MSR and kvm-intel.flexpriority=0,
      whereas previously KVM would allow a nested guest to enable
      VIRTUALIZE_APIC_ACCESSES so long as it's supported in hardware.  That is,
      KVM now advertises VIRTUALIZE_APIC_ACCESSES to a guest but doesn't
      (always) allow setting it when kvm-intel.flexpriority=0, and may even
      initially allow the control and then clear it when the nested guest
      writes APIC_BASE MSR, which is decidedly odd even if it doesn't cause
      functional issues.
      
      Hide the control completely when the module parameter is cleared.
      reported-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Fixes: 8d860bbe ("kvm: vmx: Basic APIC virtualization controls have three settings")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2cf7ea9f
    • S
      KVM: VMX: check for existence of secondary exec controls before accessing · fd6b6d9b
      Sean Christopherson 提交于
      Return early from vmx_set_virtual_apic_mode() if the processor doesn't
      support VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE, both of
      which reside in SECONDARY_VM_EXEC_CONTROL.  This eliminates warnings
      due to VMWRITEs to SECONDARY_VM_EXEC_CONTROL (VMCS field 401e) failing
      on processors without secondary exec controls.
      
      Remove the similar check for TPR shadowing as it is incorporated in the
      flexpriority_enabled check and the APIC-related code in
      vmx_update_msr_bitmap() is further gated by VIRTUALIZE_X2APIC_MODE.
      Reported-by: NGerhard Wiesinger <redhat@wiesinger.com>
      Fixes: 8d860bbe ("kvm: vmx: Basic APIC virtualization controls have three settings")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fd6b6d9b
    • P
      KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault · 6579804c
      Paul Mackerras 提交于
      Commit 71d29f43 ("KVM: PPC: Book3S HV: Don't use compound_order to
      determine host mapping size", 2018-09-11) added a call to 
      __find_linux_pte() and a dereference of the returned PTE pointer to the
      radix page fault path in the common case where the page is normal
      system memory.  Previously, __find_linux_pte() was only called for
      mappings to physical addresses which don't have a page struct (e.g.
      memory-mapped I/O) or where the page struct is marked as reserved
      memory.
      
      This exposes us to the possibility that the returned PTE pointer
      could be NULL, for example in the case of a concurrent THP collapse
      operation.  Dereferencing the returned NULL pointer causes a host
      crash.
      
      To fix this, we check for NULL, and if it is NULL, we retry the
      operation by returning to the guest, with the expectation that it
      will generate the same page fault again (unless of course it has
      been fixed up by another CPU in the meantime).
      
      Fixes: 71d29f43 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6579804c
    • D
    • D
      Merge tag 'drm-intel-fixes-2018-10-03' of... · 659c9370
      Dave Airlie 提交于
      Merge tag 'drm-intel-fixes-2018-10-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
      
      There's one fix for our zlib incomlete Z_FINISH on our error state handling,
      plus a compilation warning fix and a tiny code clean up.
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      
      From: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181003202840.GA23560@intel.com
      659c9370
    • G
      Merge gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net · cec4de30
      Greg Kroah-Hartman 提交于
      David writes:
        "Networking fixes:
         1) Prefix length validation in xfrm layer, from Steffen Klassert.
      
         2) TX status reporting fix in mac80211, from Andrei Otcheretianski.
      
         3) Fix hangs due to TX_DROP in mac80211, from Bob Copeland.
      
         4) Fix DMA error regression in b43, from Larry Finger.
      
         5) Add input validation to xenvif_set_hash_mapping(), from Jan Beulich.
      
         6) SMMU unmapping fix in hns driver, from Yunsheng Lin.
      
         7) Bluetooh crash in unpairing on SMP, from Matias Karhumaa.
      
         8) WoL handling fixes in the phy layer, from Heiner Kallweit.
      
         9) Fix deadlock in bonding, from Mahesh Bandewar.
      
         10) Fill ttl inherit infor in vxlan driver, from Hangbin Liu.
      
         11) Fix TX timeouts during netpoll, from Michael Chan.
      
         12) RXRPC layer fixes from David Howells.
      
         13) Another batch of ndo_poll_controller() removals to deal with
             excessive resource consumption during load.  From Eric Dumazet.
      
         14) Fix a specific TIPC failure secnario, from LUU Duc Canh.
      
         15) Really disable clocks in r8169 during suspend so that low
             power states can actually be reached.
      
         16) Fix SYN backlog lockdep issue in tcp and dccp, from Eric Dumazet.
      
         17) Fix RCU locking in netpoll SKB send, which shows up in bonding,
             from Dave Jones.
      
         18) Fix TX stalls in r8169, from Heiner Kallweit.
      
         19) Fix locksup in nfp due to control message storms, from Jakub
             Kicinski.
      
         20) Various rmnet bug fixes from Subash Abhinov Kasiviswanathan and
             Sean Tranchetti.
      
         21) Fix use after free in ip_cmsg_recv_dstaddr(), from Eric Dumazet."
      
      * gitolite.kernel.org:/pub/scm/linux/kernel/git/davem/net: (122 commits)
        ixgbe: check return value of napi_complete_done()
        sctp: fix fall-through annotation
        r8169: always autoneg on resume
        ipv4: fix use-after-free in ip_cmsg_recv_dstaddr()
        net: qualcomm: rmnet: Fix incorrect allocation flag in receive path
        net: qualcomm: rmnet: Fix incorrect allocation flag in transmit
        net: qualcomm: rmnet: Skip processing loopback packets
        net: systemport: Fix wake-up interrupt race during resume
        rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096
        bonding: fix warning message
        inet: make sure to grab rcu_read_lock before using ireq->ireq_opt
        nfp: avoid soft lockups under control message storm
        declance: Fix continuation with the adapter identification message
        net: fec: fix rare tx timeout
        r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO
        tun: napi flags belong to tfile
        tun: initialize napi_mutex unconditionally
        tun: remove unused parameters
        bond: take rcu lock in netpoll_send_skb_on_dev
        rtnetlink: Fail dump if target netnsid is invalid
        ...
      cec4de30
    • S
      ixgbe: check return value of napi_complete_done() · 4233cfe6
      Song Liu 提交于
      The NIC driver should only enable interrupts when napi_complete_done()
      returns true. This patch adds the check for ixgbe.
      
      Cc: stable@vger.kernel.org # 4.10+
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4233cfe6
    • G
      Merge tag 'linux-kselftest-4.19-rc7' of... · 95773dc0
      Greg Kroah-Hartman 提交于
      Merge tag 'linux-kselftest-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Shuah writes:
        "kselftest fixes for 4.19-rc7
      
         This fixes update for 4.19-rc7 consists one fix to rseq test to
         prevent it from seg-faulting when compiled with -fpie."
      
      * tag 'linux-kselftest-4.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        rseq/selftests: fix parametrized test with -fpie
      95773dc0
    • G
      sctp: fix fall-through annotation · 2cc543f5
      Gustavo A. R. Silva 提交于
      Replace "fallthru" with a proper "fall through" annotation.
      
      This fix is part of the ongoing efforts to enabling
      -Wimplicit-fallthrough
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cc543f5
  4. 03 10月, 2018 11 次提交