1. 19 2月, 2016 1 次提交
    • D
      mm/core: Do not enforce PKEY permissions on remote mm access · 1b2ee126
      Dave Hansen 提交于
      We try to enforce protection keys in software the same way that we
      do in hardware.  (See long example below).
      
      But, we only want to do this when accessing our *own* process's
      memory.  If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
      tried to PTRACE_POKE a target process which just happened to have
      some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
      debugger access to that memory.  PKRU is fundamentally a
      thread-local structure and we do not want to enforce it on access
      to _another_ thread's data.
      
      This gets especially tricky when we have workqueues or other
      delayed-work mechanisms that might run in a random process's context.
      We can check that we only enforce pkeys when operating on our *own* mm,
      but delayed work gets performed when a random user context is active.
      We might end up with a situation where a delayed-work gup fails when
      running randomly under its "own" task but succeeds when running under
      another process.  We want to avoid that.
      
      To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
      fault flag: FAULT_FLAG_REMOTE.  They indicate that we are
      walking an mm which is not guranteed to be the same as
      current->mm and should not be subject to protection key
      enforcement.
      
      Thanks to Jerome Glisse for pointing out this scenario.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Geliang Tang <geliangtang@163.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-s390@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1b2ee126
  2. 14 12月, 2015 4 次提交
  3. 15 10月, 2015 1 次提交
    • J
      iommu/amd: Fix BUG when faulting a PROT_NONE VMA · d14f6fce
      Jay Cornwall 提交于
      handle_mm_fault indirectly triggers a BUG in do_numa_page
      when given a VMA without read/write/execute access. Check
      this condition in do_fault.
      
      do_fault -> handle_mm_fault -> handle_pte_fault -> do_numa_page
      
        mm/memory.c
        3147  static int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
        ....
        3159  /* A PROT_NONE fault should not end up here */
        3160  BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
      Signed-off-by: NJay Cornwall <jay@jcornwall.me>
      Cc: <stable@vger.kernel.org> # v4.1+
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      d14f6fce
  4. 14 8月, 2015 1 次提交
  5. 30 7月, 2015 1 次提交
  6. 04 5月, 2015 1 次提交
  7. 04 3月, 2015 1 次提交
  8. 04 2月, 2015 3 次提交
  9. 14 12月, 2014 1 次提交
  10. 12 11月, 2014 1 次提交
    • O
      iommu/amd: Fix accounting of device_state · 1c51099a
      Oded Gabbay 提交于
      This patch fixes a bug in the accounting of the
      device_state.  In the current code, the device_state was put
      (decremented) too many times, which sometimes lead to the
      driver getting stuck permanently in put_device_state_wait().
      That happen because the device_state->count would go below
      zero, which is never supposed to happen.
      
      The root cause is that the device_state was decremented in
      put_pasid_state() and put_pasid_state_wait() but also in all
      the functions that call those functions. Therefore, the
      device_state was decremented twice in each of these code
      paths.
      
      The fix is to decouple the device_state accounting from the
      pasid_state accounting - remove the call to
      put_device_state() from the put_pasid_state() and the
      put_pasid_state_wait())
      Signed-off-by: NOded Gabbay <oded.gabbay@amd.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      1c51099a
  11. 24 9月, 2014 1 次提交
    • A
      kvm: Fix page ageing bugs · 57128468
      Andres Lagar-Cavilla 提交于
      1. We were calling clear_flush_young_notify in unmap_one, but we are
      within an mmu notifier invalidate range scope. The spte exists no more
      (due to range_start) and the accessed bit info has already been
      propagated (due to kvm_pfn_set_accessed). Simply call
      clear_flush_young.
      
      2. We clear_flush_young on a primary MMU PMD, but this may be mapped
      as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
      This required expanding the interface of the clear_flush_young mmu
      notifier, so a lot of code has been trivially touched.
      
      3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
      the access bit by blowing the spte. This requires proper synchronizing
      with MMU notifier consumers, like every other removal of spte's does.
      Signed-off-by: NAndres Lagar-Cavilla <andreslc@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      57128468
  12. 30 7月, 2014 4 次提交
  13. 10 7月, 2014 9 次提交
  14. 09 7月, 2014 1 次提交
  15. 20 6月, 2014 1 次提交
    • J
      iommu/amd: Fix small race between invalidate_range_end/start · d73a6d72
      Joerg Roedel 提交于
      Commit e79df31c introduced mmu_notifer_count to protect
      against parallel mmu_notifier_invalidate_range_start/end
      calls. The patch left a small race condition when
      invalidate_range_end() races with a new
      invalidate_range_start() the empty page-table may be
      reverted leading to stale TLB entries in the IOMMU and the
      device. Use a spin_lock instead of just an atomic variable
      to eliminate the race.
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      d73a6d72
  16. 26 5月, 2014 5 次提交
  17. 13 5月, 2014 1 次提交
  18. 10 11月, 2014 1 次提交
    • O
      iommu/amd: fix accounting of device_state · a015c1e9
      Oded Gabbay 提交于
      This patch fixes a bug in the accounting of the device_state.
      In the current code, the device_state was put (decremented) too many times,
      which sometimes lead to the driver getting stuck permanently in
      put_device_state_wait(). That happen because the device_state->count would go
      below zero, which is never supposed to happen.
      
      The root cause is that the device_state was decremented in put_pasid_state()
      and put_pasid_state_wait() but also in all the functions that call those
      functions. Therefore, the device_state was decremented twice in each of these
      code paths.
      
      The fix is to decouple the device_state accounting from the pasid_state
      accounting - remove the call to put_device_state() from the
      put_pasid_state() and the put_pasid_state_wait())
      Signed-off-by: NOded Gabbay <oded.gabbay@amd.com>
      a015c1e9
  19. 13 11月, 2014 1 次提交
  20. 24 7月, 2012 1 次提交