提交 · 1b2ee1266ea647713dbaf44825967c180dfc8d76 · openeuler / raspberrypi-kernel

19 2月, 2016 1 次提交

mm/core: Do not enforce PKEY permissions on remote mm access · 1b2ee126

由 Dave Hansen 提交于 2月 12, 2016

We try to enforce protection keys in software the same way that we
do in hardware.  (See long example below).

But, we only want to do this when accessing our *own* process's
memory.  If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
tried to PTRACE_POKE a target process which just happened to have
some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
debugger access to that memory.  PKRU is fundamentally a
thread-local structure and we do not want to enforce it on access
to _another_ thread's data.

This gets especially tricky when we have workqueues or other
delayed-work mechanisms that might run in a random process's context.
We can check that we only enforce pkeys when operating on our *own* mm,
but delayed work gets performed when a random user context is active.
We might end up with a situation where a delayed-work gup fails when
running randomly under its "own" task but succeeds when running under
another process.  We want to avoid that.

To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
fault flag: FAULT_FLAG_REMOTE.  They indicate that we are
walking an mm which is not guranteed to be the same as
current->mm and should not be subject to protection key
enforcement.

Thanks to Jerome Glisse for pointing out this scenario.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

1b2ee126

14 12月, 2015 4 次提交

iommu/amd: Constify mmu_notifier_ops structures · 759ce23b

由 Julia Lawall 提交于 11月 29, 2015

This mmu_notifier_ops structure is never modified, so declare it as
const, like the other mmu_notifier_ops structures.

Done with the help of Coccinelle.
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

759ce23b

iommu/amd: Cleanup error handling in do_fault() · 492e7459

由 Joerg Roedel 提交于 11月 17, 2015

Get rid of the three error paths that look the same and move
error handling to a single place.
Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Acked-By: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

492e7459

iommu/amd: Correctly set flags for handle_mm_fault call · 43c0ea20

由 Joerg Roedel 提交于 11月 17, 2015

Instead of just checking for a write access, calculate the
flags that are passed to handle_mm_fault() more precisly and
use the pre-defined macros.
Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Acked-By: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

43c0ea20

iommu/amd: Do proper access checking before calling handle_mm_fault() · 7b5cc1a9

由 Joerg Roedel 提交于 11月 17, 2015

The handle_mm_fault function expects the caller to do the
access checks. Not doing so and calling the function with
wrong permissions is a bug (catched by a BUG_ON).
So fix this bug by adding proper access checking to the io
page-fault code in the AMD IOMMUv2 driver.
Reviewed-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Acked-By: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

7b5cc1a9

15 10月, 2015 1 次提交

iommu/amd: Fix BUG when faulting a PROT_NONE VMA · d14f6fce

由 Jay Cornwall 提交于 9月 16, 2015

handle_mm_fault indirectly triggers a BUG in do_numa_page
when given a VMA without read/write/execute access. Check
this condition in do_fault.

do_fault -> handle_mm_fault -> handle_pte_fault -> do_numa_page

  mm/memory.c
  3147  static int do_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
  ....
  3159  /* A PROT_NONE fault should not end up here */
  3160  BUG_ON(!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE)));
Signed-off-by: NJay Cornwall <jay@jcornwall.me>
Cc: <stable@vger.kernel.org> # v4.1+
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

d14f6fce

14 8月, 2015 1 次提交
- J
  iommu/amd: Use BUG_ON instead of if () BUG() · 23d3a98c
  由 Joerg Roedel 提交于 8月 13, 2015
```
Found by a coccicheck script.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
```
  23d3a98c
30 7月, 2015 1 次提交

iommu/amd: Use iommu_attach_group() · 55c99a4d

由 Joerg Roedel 提交于 7月 28, 2015

Since the conversion to default domains the
iommu_attach_device function only works for devices with
their own group. But this isn't always true for current
IOMMUv2 capable devices, so use iommu_attach_group instead.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

55c99a4d

04 5月, 2015 1 次提交

iommu/amd: Fix bug in put_pasid_state_wait · 1bf1b431

由 Oded Gabbay 提交于 4月 16, 2015

This patch fixes a bug in put_pasid_state_wait that appeared in kernel 4.0
The bug is that pasid_state->count wasn't decremented before entering the
wait_event. Thus, the condition in wait_event will never be true.

The fix is to decrement (atomically) the pasid_state->count before the
wait_event.
Signed-off-by: NOded Gabbay <oded.gabbay@amd.com>
Cc: stable@vger.kernel.org #v4.0
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

1bf1b431

04 3月, 2015 1 次提交

iommu/amd: Small cleanup in mn_release() · 940f700d

由 Dan Carpenter 提交于 2月 20, 2015

"pasid_state->device_state" and "dev_state" are the same, but it's nicer
to use dev_state consistently.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

940f700d

04 2月, 2015 3 次提交

iommu: Update my email address · 63ce3ae8

由 Joerg Roedel 提交于 2月 04, 2015

The AMD address is dead for a long time already, replace it
with a working one.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

63ce3ae8

iommu/amd: Use wait_event in put_pasid_state_wait · a1bec062

由 Joerg Roedel 提交于 2月 04, 2015

Now that I learned about possible spurious wakeups this
place needs fixing too. Replace the self-coded sleep variant
with the generic wait_event() helper.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

a1bec062

iommu/amd: Fix amd_iommu_free_device() · 91f65fac

由 Peter Zijlstra 提交于 2月 03, 2015

put_device_state_wait() doesn't loop on the condition and a spurious
wakeup will have it free the device state even though there might still
be references out to it.

Fix this by using 'normal' wait primitives.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

91f65fac

14 12月, 2014 1 次提交

iommu/amd: use handle_mm_fault directly · 9dc00f4c

由 Jesse Barnes 提交于 12月 12, 2014

This could be useful for debug in the future if we want to track
major/minor faults more closely, and also avoids the put_page trick we
used with gup.

In order to do this, we also track the task struct in the PASID state
structure.  This lets us update the appropriate task stats after the fault
has been handled, and may aid with debug in the future as well.
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: NOded Gabbay <oded.gabbay@amd.com>
Cc: Joerg Roedel <jroedel@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9dc00f4c

12 11月, 2014 1 次提交

iommu/amd: Fix accounting of device_state · 1c51099a

由 Oded Gabbay 提交于 11月 10, 2014

This patch fixes a bug in the accounting of the
device_state.  In the current code, the device_state was put
(decremented) too many times, which sometimes lead to the
driver getting stuck permanently in put_device_state_wait().
That happen because the device_state->count would go below
zero, which is never supposed to happen.

The root cause is that the device_state was decremented in
put_pasid_state() and put_pasid_state_wait() but also in all
the functions that call those functions. Therefore, the
device_state was decremented twice in each of these code
paths.

The fix is to decouple the device_state accounting from the
pasid_state accounting - remove the call to
put_device_state() from the put_pasid_state() and the
put_pasid_state_wait())
Signed-off-by: NOded Gabbay <oded.gabbay@amd.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

1c51099a

24 9月, 2014 1 次提交

kvm: Fix page ageing bugs · 57128468

由 Andres Lagar-Cavilla 提交于 9月 22, 2014

1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.
Signed-off-by: NAndres Lagar-Cavilla <andreslc@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

57128468

30 7月, 2014 4 次提交

iommu/amd: Fix 2 typos in comments · daff2f9c

由 Joerg Roedel 提交于 7月 30, 2014

amd_iommu_pasid_bind -> amd_iommu_bind_pasid
Signed-off-by: NJoerg Roedel <jroedel@suse.de>

daff2f9c

iommu/amd: Fix device_state reference counting · 75058a30

由 Joerg Roedel 提交于 7月 30, 2014

The references to the device state are not dropped
everywhere. This might cause a dead-lock in
amd_iommu_free_device(). Fix it.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Tested-by: NOded Gabbay <oded.gabbay@amd.com>

75058a30

iommu/amd: Remove change_pte mmu_notifier call-back · 8301da53

由 Joerg Roedel 提交于 7月 30, 2014

All calls to this call-back are wrapped with
mmu_notifer_invalidate_range_start()/end(), making this
notifier pretty useless, so remove it.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Tested-by: NOded Gabbay <oded.gabbay@amd.com>

8301da53

iommu/amd: Don't set pasid_state->mm to NULL in unbind_pasid · fcaa9606

由 Joerg Roedel 提交于 7月 30, 2014

With calling te mmu_notifier_register function we hold a
reference to the mm_struct that needs to be released in
mmu_notifier_unregister. This is true even if the notifier
was already unregistered from exit_mmap and the .release
call-back has already run.

So make sure we call mmu_notifier_unregister unconditionally
in amd_iommu_unbind_pasid.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Tested-by: NOded Gabbay <oded.gabbay@amd.com>

fcaa9606

10 7月, 2014 9 次提交

iommu/amd: Don't call the inv_ctx_cb when pasid is not set up · d9e1611e

由 Joerg Roedel 提交于 7月 09, 2014

On the error path of amd_iommu_bind_pasid() we call
mmu_notifier_unregister() for cleanup. This calls
mn_release() which calls the users inv_ctx_cb function if
one is available. Since the pasid is not set up yet there is
nothing the user can to tear down in this call-back. So
don't call inv_ctx_cb on the error path of
amd_iommu_unbind_pasid() and make life of the users simpler.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Tested-by: NOded Gabbay <Oded.Gabbay@amd.com>

d9e1611e

iommu/amd: Don't hold a reference to task_struct · dba3838d

由 Joerg Roedel 提交于 7月 09, 2014

Since we are only caring about the lifetime of the mm_struct
and not the task we can't safely keep a reference to it. The
reference is also not needed anymore, so remove that code
entirely.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Tested-by: NOded Gabbay <Oded.Gabbay@amd.com>

dba3838d

iommu/amd: Don't hold a reference to mm_struct · f0aac63b