提交 · d7e1633abf9b1cc198bb673a59a01a3767f16b94 · openeuler / raspberrypi-kernel

09 5月, 2016 3 次提交

由 Alexander Yarygin 提交于 4月 01, 2016

Let's add hypervisor-managed facility-apportionment indications field to
SCLP structs. KVM will use it to reduce maintenance cost of
Non-Hypervisor-Managed facility bits.
Signed-off-by: NAlexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NEric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

154fa27e

KVM: s390: cleanup cpuid handling · 9bb0ec09

由 David Hildenbrand 提交于 4月 04, 2016

We only have one cpuid for all VCPUs, so let's directly use the one in the
cpu model. Also always store it directly as u64, no need for struct cpuid.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

9bb0ec09

KVM: s390: enable SRS only if enabled for the guest · bd50e8ec

由 David Hildenbrand 提交于 3月 04, 2016

If we don't have SIGP SENSE RUNNING STATUS enabled for the guest, let's
not enable interpretation so we can correctly report an invalid order.
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

bd50e8ec

23 3月, 2016 1 次提交

s390/extable: use generic search and sort routines · c352e8b6

由 Ard Biesheuvel 提交于 3月 22, 2016

Replace the arch specific versions of search_extable() and
sort_extable() with calls to the generic ones, which now support
relative exception tables as well.
Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c352e8b6

17 3月, 2016 1 次提交

s390: disable postinit-readonly for now · df9ceff9

由 Kees Cook 提交于 3月 17, 2016

This is a temporary fix to let lkdtm run again on s390, though it'll
still fail the ro_after_init tests. Until rodata and ro_after_init
sections can be split on s390, disable special handling of ro_after_init.
Signed-off-by: NKees Cook <keescook@chromium.org>
Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

df9ceff9

14 3月, 2016 2 次提交

s390/pci: enforce fmb page boundary rule · 80c544de

由 Sebastian Ott 提交于 3月 14, 2016

The function measurement block must not cross a page boundary. Ensure
that by raising the alignment requirement to the smallest power of 2
larger than the size of the fmb.

Fixes: d0b08853 ("s390/pci: performance statistics and debug infrastructure")
Cc: stable@vger.kernel.org # v3.8+
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

80c544de

ipv4: Update parameters for csum_tcpudp_magic to their original types · 01cfbad7

由 Alexander Duyck 提交于 3月 11, 2016

This patch updates all instances of csum_tcpudp_magic and
csum_tcpudp_nofold to reflect the types that are usually used as the source
inputs.  For example the protocol field is populated based on nexthdr which
is actually an unsigned 8 bit value.  The length is usually populated based
on skb->len which is an unsigned integer.

This addresses an issue in which the IPv6 function csum_ipv6_magic was
generating a checksum using the full 32b of skb->len while
csum_tcpudp_magic was only using the lower 16 bits.  As a result we could
run into issues when attempting to adjust the checksum as there was no
protocol agnostic way to update it.

With this change the value is still truncated as many architectures use
"(len + proto) << 8", however this truncation only occurs for values
greater than 16776960 in length and as such is unlikely to occur as we stop
the inner headers at ~64K in size.

I did have to make a few minor changes in the arm, mn10300, nios2, and
score versions of the function in order to support these changes as they
were either using things such as an OR to combine the protocol and length,
or were using ntohs to convert the length which would have truncated the
value.

I also updated a few spots in terms of whitespace and type differences for
the addresses.  Most of this was just to make sure all of the definitions
were in sync going forward.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01cfbad7

10 3月, 2016 1 次提交

s390/mm: four page table levels vs. fork · 3446c13b

由 Martin Schwidefsky 提交于 2月 15, 2016

The fork of a process with four page table levels is broken since
git commit 6252d702 "[S390] dynamic page tables."

All new mm contexts are created with three page table levels and
an asce limit of 4TB. If the parent has four levels dup_mmap will
add vmas to the new context which are outside of the asce limit.
The subsequent call to copy_page_range will walk the three level
page table structure of the new process with non-zero pgd and pud
indexes. This leads to memory clobbers as the pgd_index *and* the
pud_index is added to the mm->pgd pointer without a pgd_deref
in between.

The init_new_context() function is selecting the number of page
table levels for a new context. The function is used by mm_init()
which in turn is called by dup_mm() and mm_alloc(). These two are
used by fork() and exec(). The init_new_context() function can
distinguish the two cases by looking at mm->context.asce_limit,
for fork() the mm struct has been copied and the number of page
table levels may not change. For exec() the mm_alloc() function
set the new mm structure to zero, in this case a three-level page
table is created as the temporary stack space is located at
STACK_TOP_MAX = 4TB.

This fixes CVE-2016-2143.
Reported-by: NMarcin Kościelnicki <koriakin@0x04.net>
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

3446c13b

08 3月, 2016 8 次提交

s390: Fix misspellings in comments · 7eb792bf

由 Adam Buchbinder 提交于 3月 04, 2016

Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

7eb792bf

s390/mm: split arch/s390/mm/pgtable.c · 1e133ab2

由 Martin Schwidefsky 提交于 3月 08, 2016

The pgtable.c file is quite big, before it grows any larger split it
into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related
header definitions into the new gmap.h header and all of the pgste
helpers from pgtable.h to pgtable.c.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1e133ab2

s390/mm: uninline pmdp_xxx functions from pgtable.h · 227be799

由 Martin Schwidefsky 提交于 3月 08, 2016

The pmdp_xxx function are smaller than their ptep_xxx counterparts
but to keep things symmetrical unline them as well.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

227be799

s390/mm: uninline ptep_xxx functions from pgtable.h · ebde765c

由 Martin Schwidefsky 提交于 3月 08, 2016

The code in the various ptep_xxx functions has grown quite large,
consolidate them to four out-of-line functions:
ptep_xchg_direct to exchange a pte with another with immediate flushing
ptep_xchg_lazy to exchange a pte with another in a batched update
ptep_modify_prot_start to begin a protection flags update
ptep_modify_prot_commit to commit a protection flags update
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ebde765c

KVM: s390: allocate only one DMA page per VM · c54f0d6a

由 David Hildenbrand 提交于 12月 02, 2015

We can fit the 2k for the STFLE interpretation and the crypto
control block into one DMA page. As we now only have to allocate
one DMA page, we can clean up the code a bit.

As a nice side effect, this also fixes a problem with crycbd alignment in
case special allocation debug options are enabled, debugged by Sascha
Silbe.
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c54f0d6a

KVM: s390: protect VCPU cpu timer with a seqcount · 9c23a131

由 David Hildenbrand 提交于 2月 17, 2016

For now, only the owning VCPU thread (that has loaded the VCPU) can get a
consistent cpu timer value when calculating the delta. However, other
threads might also be interested in a more recent, consistent value. Of
special interest will be the timer callback of a VCPU that executes without
having the VCPU loaded and could run in parallel with the VCPU thread.

The cpu timer has a nice property: it is only updated by the owning VCPU
thread. And speaking about accounting, a consistent value can only be
calculated by looking at cputm_start and the cpu timer itself in
one shot, otherwise the result might be wrong.

As we only have one writing thread at a time (owning VCPU thread), we can
use a seqcount instead of a seqlock and retry if the VCPU refreshed its
cpu timer. This avoids any heavy locking and only introduces a counter
update/check plus a handful of smp_wmb().

The owning VCPU thread should never have to retry on reads, and also for
other threads this might be a very rare scenario.

Please note that we have to use the raw_* variants for locking the seqcount
as lockdep will produce false warnings otherwise. The rq->lock held during
vcpu_load/put is also acquired from hardirq context. Lockdep cannot know
that we avoid potential deadlocks by disabling preemption and thereby
disable concurrent write locking attempts (via vcpu_put/load).
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

9c23a131

KVM: s390: step VCPU cpu timer during kvm_run ioctl · db0758b2

由 David Hildenbrand 提交于 2月 15, 2016

Architecturally we should only provide steal time if we are scheduled
away, and not if the host interprets a guest exit. We have to step
the guest CPU timer in these cases.

In the first shot, we will step the VCPU timer only during the kvm_run
ioctl. Therefore all time spent e.g. in interception handlers or on irq
delivery will be accounted for that VCPU.

We have to take care of a few special cases:
- Other VCPUs can test for pending irqs. We can only report a consistent
  value for the VCPU thread itself when adding the delta.
- We have to take care of STP sync, therefore we have to extend
  kvm_clock_sync() and disable preemption accordingly
- During any call to disable/enable/start/stop we could get premeempted
  and therefore get start/stop calls. Therefore we have to make sure we
  don't get into an inconsistent state.

Whenever a VCPU is scheduled out, sleeping, in user space or just about
to enter the SIE, the guest cpu timer isn't stepped.

Please note that all primitives are prepared to be called from both
environments (cpu timer accounting enabled or not), although not completely
used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called
while cpu timer accounting is enabled).
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

db0758b2

PCI: Move pci_dma_* helpers to common code · bc4b024a

由 Christoph Hellwig 提交于 3月 07, 2016

For a long time all architectures implement the pci_dma_* functions using
the generic DMA API, and they all use the same header to do so.

Move this header, pci-dma-compat.h, to include/linux and include it from
the generic pci.h instead of having each arch duplicate this include.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

bc4b024a

07 3月, 2016 3 次提交

s390/pci: add ioctl interface for CLP · 988b86e6

由 Martin Schwidefsky 提交于 1月 13, 2016

Provide a user space interface to issue call logical-processor instructions.
Only selected CLP commands are allowed, enough to get the full overview of
the installed PCI functions.
Reviewed-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

988b86e6

klp: remove CONFIG_LIVEPATCH dependency from klp headers · 335e073f

由 Jiri Kosina 提交于 3月 06, 2016

There is no need for livepatch.h (generic and arch-specific) to depend
on CONFIG_LIVEPATCH. Remove that superfluous dependency.
Reported-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

335e073f

klp: remove superfluous errors in asm/livepatch.h · b24b78a1

由 Miroslav Benes 提交于 3月 04, 2016

There is an #error in asm/livepatch.h for both x86 and s390 in
!CONFIG_LIVEPATCH cases. It does not make much sense as pointed out by
Michael Ellerman. One can happily include asm/livepatch.h with
CONFIG_LIVEPATCH. Remove it as useless.
Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NMiroslav Benes <mbenes@suse.cz>
Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

b24b78a1

02 3月, 2016 3 次提交

s390/dma: Allow per device dma ops · e82becfc

由 Christian Borntraeger 提交于 2月 02, 2016

As virtio-ccw will have dma ops, we can no longer default to the
zPCI ones. Make use of dev_archdata to keep the dma_ops per device.
The pci devices now use that to override the default, and the
default is changed to use the noop ops for everything that does not
specify a device specific one.
To compile without PCI support we will enable HAS_DMA all the time,
via the default config in lib/Kconfig.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NJoerg Roedel <jroedel@suse.de>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

e82becfc

s390/percpu: remove this_cpu_cmpxchg_double_4 · f369b98e

由 Heiko Carstens 提交于 3月 02, 2016

git commit 26f15caa ("s390/cmpxchg: simplify cmpxchg_double")
removed support for cmpxchg_double for two consecutive four byte
values, for which it would generate a cds instruction.

However I forgot to remove the corresponding define in our percpu
header file, which means that this_cpu_cmpxchg_double would now
incorrectly generate a cdsg instruction if being used on a double four
byte location. Therefore remove the percpu define as well.

There is currently no user and therefore no bug fixed with
this. Obviously any such user could and should simply use cmpxchg.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

f369b98e

s390/fault: merge report_user_fault implementations · 5d7eccec

由 Heiko Carstens 提交于 2月 24, 2016

We have two close to identical report_user_fault functions.
Add a parameter to one and get rid of the other one in order
to reduce code duplication.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

5d7eccec

25 2月, 2016 1 次提交

KVM: Use simple waitqueue for vcpu->wq · 8577370f

由 Marcelo Tosatti 提交于 2月 19, 2016

The problem:

On -rt, an emulated LAPIC timer instances has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:

This patch reduces the average latency in my tests from 14us to 11us.

Daniel writes:
Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
benchmark on mainline. The test was run 1000 times on
tip/sched/core 4.4.0-rc8-01134-g0905f04e:

  ./x86-run x86/tscdeadline_latency.flat -cpu host

with idle=poll.

The test seems not to deliver really stable numbers though most of
them are smaller. Paolo write:

"Anything above ~10000 cycles means that the host went to C1 or
lower---the number means more or less nothing in that case.

The mean shows an improvement indeed."

Before:

               min             max         mean           std
count  1000.000000     1000.000000  1000.000000   1000.000000
mean   5162.596000  2019270.084000  5824.491541  20681.645558
std      75.431231   622607.723969    89.575700   6492.272062
min    4466.000000    23928.000000  5537.926500    585.864966
25%    5163.000000  1613252.750000  5790.132275  16683.745433
50%    5175.000000  2281919.000000  5834.654000  23151.990026
75%    5190.000000  2382865.750000  5861.412950  24148.206168
max    5228.000000  4175158.000000  6254.827300  46481.048691

After
               min            max         mean           std
count  1000.000000     1000.00000  1000.000000   1000.000000
mean   5143.511000  2076886.10300  5813.312474  21207.357565
std      77.668322   610413.09583    86.541500   6331.915127
min    4427.000000    25103.00000  5529.756600    559.187707
25%    5148.000000  1691272.75000  5784.889825  17473.518244
50%    5160.000000  2308328.50000  5832.025000  23464.837068
75%    5172.000000  2393037.75000  5853.177675  24223.969976
max    5222.000000  3922458.00000  6186.720500  42520.379830

[Patch was originaly based on the swait implementation found in the -rt
 tree. Daniel ported it to mainline's version and gathered the
 benchmark numbers for tscdeadline_latency test.]
Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

8577370f

23 2月, 2016 6 次提交

s390/mm: correct comment about segment table entries · 13c6a790

由 Martin Schwidefsky 提交于 2月 10, 2016

The comment describing the bit encoding for segment table entries
is incorrect in regard to the read and write bits. The segment
read bit is 0x0002 and write is 0x0001, not the other way around.
Reported-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

13c6a790

s390/dumpstack: merge all four stack tracers · 758d39eb

由 Heiko Carstens 提交于 2月 09, 2016

We have four different stack tracers of which three had bugs. So it's
time to merge them to a single stack tracer which allows to specify a
call back function which will be called for each step.

This patch changes behavior a bit:

- the "nosched" and "in_sched_functions" check within
  save_stack_trace_tsk did work only for the last stack frame within a
  context. Now it considers the check for each stack frame like it
  should.

- both the oprofile variant and the perf_events variant did save a
  return address twice if a zero back chain was detected, which
  indicates an interrupt frame. The new dump_trace function will call
  the oprofile and perf_events backends with the psw address that is
  contained within the corresponding pt_regs structure instead.

- the original show_trace and save_context_stack functions did already
  use the psw address of the pt_regs structure if a zero back chain
  was detected. However now we ignore the psw address if it is a user
  space address. After all we trace the kernel stack and not the user
  space stack. This way we also get rid of the garbage user space
  address in case of warnings and / or panic call traces.

So this should make life easier since now there is only one stack
tracer left which we can break.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

758d39eb

s390/mm: remove unnecessary indirection with pgste_update_all · 3c2c126a

由 Martin Schwidefsky 提交于 2月 05, 2016

The first parameter of pgste_update_all is a pointer to a pte.
Simplify the code by passing the pte value.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

3c2c126a

s390: add current_stack_pointer() helper function · 76737ce1

由 Heiko Carstens 提交于 1月 31, 2016

Implement current_stack_pointer() helper function and use it
everywhere, instead of having several different inline assembly
variants.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

76737ce1

s390/xor: optimized xor routing using the XC instruction · 2cfc5f9c

由 Martin Schwidefsky 提交于 2月 02, 2016

Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

2cfc5f9c

s390/pci: remove pdev pointer from arch data · 9a99649f

由 Sebastian Ott 提交于 1月 29, 2016

For each PCI function we need to maintain arch specific data in
struct zpci_dev which also contains a pointer to struct pci_dev.

When a function is registered or deregistered (which is triggered by PCI
common code) we need to adjust that pointer which could interfere with
the machine check handler (triggered by FW) using zpci_dev->pdev.

Since multiple instances of the same pdev could exist at a time this can't
be solved with locking.

Fix that by ditching the pdev pointer and use a bus walk to reach
struct pci_dev (only one instance of a pdev can be registered at the bus
at a time).
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

9a99649f

22 2月, 2016 1 次提交

s390/fpu: signals vs. floating point control register · 1b17cb79

由 Martin Schwidefsky 提交于 2月 19, 2016

git commit 904818e2
"s390/kernel: introduce fpu-internal.h with fpu helper functions"
introduced the fpregs_store / fp_regs_load helper. These function
fail to save and restore the floating pointer control registers.

The effect is that the FPC is not correctly handled on signal
delivery and signal return.

Cc: stable@vger.kernel.org # 4.4
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1b17cb79

19 2月, 2016 2 次提交

mm/core, x86/mm/pkeys: Differentiate instruction fetches · d61172b4

由 Dave Hansen 提交于 2月 12, 2016

As discussed earlier, we attempt to enforce protection keys in
software.

However, the code checks all faults to ensure that they are not
violating protection key permissions.  It was assumed that all
faults are either write faults where we check PKRU[key].WD (write
disable) or read faults where we check the AD (access disable)
bit.

But, there is a third category of faults for protection keys:
instruction faults.  Instruction faults never run afoul of
protection keys because they do not affect instruction fetches.

So, plumb the PF_INSTR bit down in to the
arch_vma_access_permitted() function where we do the protection
key checks.

We also add a new FAULT_FLAG_INSTRUCTION.  This is because
handle_mm_fault() is not passed the architecture-specific
error_code where we keep PF_INSTR, so we need to encode the
instruction fetch information in to the arch-generic fault
flags.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d61172b4

mm/core: Do not enforce PKEY permissions on remote mm access · 1b2ee126

由 Dave Hansen 提交于 2月 12, 2016

We try to enforce protection keys in software the same way that we
do in hardware.  (See long example below).

But, we only want to do this when accessing our *own* process's
memory.  If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
tried to PTRACE_POKE a target process which just happened to have
some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
debugger access to that memory.  PKRU is fundamentally a
thread-local structure and we do not want to enforce it on access
to _another_ thread's data.

This gets especially tricky when we have workqueues or other
delayed-work mechanisms that might run in a random process's context.
We can check that we only enforce pkeys when operating on our *own* mm,
but delayed work gets performed when a random user context is active.
We might end up with a situation where a delayed-work gup fails when
running randomly under its "own" task but succeeds when running under
another process.  We want to avoid that.

To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
fault flag: FAULT_FLAG_REMOTE.  They indicate that we are
walking an mm which is not guranteed to be the same as
current->mm and should not be subject to protection key
enforcement.

Thanks to Jerome Glisse for pointing out this scenario.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

1b2ee126

18 2月, 2016 1 次提交

mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys · 33a709b2

由 Dave Hansen 提交于 2月 12, 2016

Today, for normal faults and page table walks, we check the VMA
and/or PTE to ensure that it is compatible with the action.  For
instance, if we get a write fault on a non-writeable VMA, we
SIGSEGV.

We try to do the same thing for protection keys.  Basically, we
try to make sure that if a user does this:

	mprotect(ptr, size, PROT_NONE);
	*ptr = foo;

they see the same effects with protection keys when they do this:

	mprotect(ptr, size, PROT_READ|PROT_WRITE);
	set_pkey(ptr, size, 4);
	wrpkru(0xffffff3f); // access disable pkey 4
	*ptr = foo;

The state to do that checking is in the VMA, but we also
sometimes have to do it on the page tables only, like when doing
a get_user_pages_fast() where we have no VMA.

We add two functions and expose them to generic code:

	arch_pte_access_permitted(pte_flags, write)
	arch_vma_access_permitted(vma, write)

These are, of course, backed up in x86 arch code with checks
against the PTE or VMA's protection key.

But, there are also cases where we do not want to respect
protection keys.  When we ptrace(), for instance, we do not want
to apply the tracer's PKRU permissions to the PTEs from the
process being traced.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

33a709b2

10 2月, 2016 1 次提交

KVM: s390: remove old fragment of vector registers · efa48163

由 David Hildenbrand 提交于 1月 14, 2016

Since commit 9977e886 ("s390/kernel: lazy restore fpu registers"),
vregs in struct sie_page is unsed. We can safely remove the field and
the definition.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

efa48163

26 1月, 2016 4 次提交

KVM: s390: fix memory overwrites when vx is disabled · 9abc2a08

由 David Hildenbrand 提交于 1月 14, 2016

The kernel now always uses vector registers when available, however KVM
has special logic if support is really enabled for a guest. If support
is disabled, guest_fpregs.fregs will only contain memory for the fpu.
The kernel, however, will store vector registers into that area,
resulting in crazy memory overwrites.

Simply extending that area is not enough, because the format of the
registers also changes. We would have to do additional conversions, making
the code even more complex. Therefore let's directly use one place for
the vector/fpu registers + fpc (in kvm_run). We just have to convert the
data properly when accessing it. This makes current code much easier.

Please note that vector/fpu registers are now always stored to
vcpu->run->s.regs.vrs. Although this data is visible to QEMU and
used for migration, we only guarantee valid values to user space  when
KVM_SYNC_VRS is set. As that is only the case when we have vector
register support, we are on the safe side.

Fixes: b5510d9b ("s390/fpu: always enable the vector facility if it is available")
Cc: stable@vger.kernel.org # v4.4 d9a3a09a s390/kvm: remove dependency on struct save_area definition
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
[adopt to d9a3a09a]

9abc2a08

s390/pci: improve ZPCI_* macros · bf19c94d

由 Sebastian Ott 提交于 1月 22, 2016

Most of the constants defined in pci_io.h depend on each other
and thus can be calculated.
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

bf19c94d

s390/pci: provide ZPCI_ADDR macro · 9e00caae

由 Sebastian Ott 提交于 1月 22, 2016

Provide and use a ZPCI_ADDR macro as the complement of ZPCI_IDX
to get rid of some constants in the code.
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

9e00caae

s390/pci: adjust IOMAP_MAX_ENTRIES · c2e1fcf3

由 Sebastian Ott 提交于 1月 22, 2016

ZPCI_IOMAP_MAX_ENTRIES is off by one. Let's adjust this
for the sake of correctness.
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

c2e1fcf3

21 1月, 2016 1 次提交

dma-mapping: always provide the dma_map_ops based implementation · e1c7e324

由 Christoph Hellwig 提交于 1月 20, 2016

Move the generic implementation to <linux/dma-mapping.h> now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.

[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NValentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e1c7e324

19 1月, 2016 1 次提交

s390: remove all usages of PSW_ADDR_INSN · 9cb1ccec

由 Heiko Carstens 提交于 1月 18, 2016

Yet another leftover from the 31 bit era. The usual operation
"y = x & PSW_ADDR_INSN" with the PSW_ADDR_INSN mask is a nop for
CONFIG_64BIT.

Therefore remove all usages and hope the code is a bit less confusing.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>

9cb1ccec