1. 27 7月, 2016 1 次提交
  2. 20 6月, 2016 2 次提交
    • D
      s390/mm: remember the int code for the last gmap fault · 4a494439
      David Hildenbrand 提交于
      For nested virtualization, we want to know if we are handling a protection
      exception, because these can directly be forwarded to the guest without
      additional checks.
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      4a494439
    • M
      s390/mm: add shadow gmap support · 4be130a0
      Martin Schwidefsky 提交于
      For a nested KVM guest the outer KVM host needs to create shadow
      page tables for the nested guest. This patch adds the basic support
      to the guest address space (gmap) code.
      
      For each guest address space the inner KVM host creates, the first
      outer KVM host needs to create shadow page tables. The address space
      is identified by the ASCE loaded into the control register 1 at the
      time the inner SIE instruction for the second nested KVM guest is
      executed. The outer KVM host creates the shadow tables starting with
      the table identified by the ASCE on a on-demand basis. The outer KVM
      host will get repeated faults for all the shadow tables needed to
      run the second KVM guest.
      
      While a shadow page table for the second KVM guest is active the access
      to the origin region, segment and page tables needs to be restricted
      for the first KVM guest. For region and segment and page tables the first
      KVM guest may read the memory, but write attempt has to lead to an
      unshadow.  This is done using the page invalid and read-only bits in the
      page table of the first KVM guest. If the first guest re-accesses one of
      the origin pages of a shadow, it gets a fault and the affected parts of
      the shadow page table hierarchy needs to be removed again.
      
      PGSTE tables don't have to be shadowed, as all interpretation assist can't
      deal with the invalid bits in the shadow pte being set differently than
      the original ones provided by the first KVM guest.
      
      Many bug fixes and improvements by David Hildenbrand.
      Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      4be130a0
  3. 13 6月, 2016 1 次提交
    • H
      s390: avoid extable collisions · 6c22c986
      Heiko Carstens 提交于
      We have some inline assemblies where the extable entry points to a
      label at the end of an inline assembly which is not followed by an
      instruction.
      
      On the other hand we have also inline assemblies where the extable
      entry points to the first instruction of an inline assembly.
      
      If a first type inline asm (extable point to empty label at the end)
      would be directly followed by a second type inline asm (extable points
      to first instruction) then we would have two different extable entries
      that point to the same instruction but would have a different target
      address.
      
      This can lead to quite random behaviour, depending on sorting order.
      
      I verified that we currently do not have such collisions within the
      kernel. However to avoid such subtle bugs add a couple of nop
      instructions to those inline assemblies which contain an extable that
      points to an empty label.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6c22c986
  4. 23 5月, 2016 1 次提交
    • M
      s390: fix info leak in do_sigsegv · cf0d44d5
      Michal Hocko 提交于
      Aleksa has reported incorrect si_errno value when stracing task which
      received SIGSEGV:
      [pid 20799] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_errno=2510266, si_addr=0x100000000000000}
      
      The reason seems to be that do_sigsegv is not initializing siginfo
      structure defined on the stack completely so it will leak 4B of
      the previous stack content. Fix it simply by initializing si_errno
      to 0 (same as do_sigbus does already).
      
      Cc: stable # introduced pre-git times
      Reported-by: NAleksa Sarai <asarai@suse.de>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      cf0d44d5
  5. 16 4月, 2016 1 次提交
  6. 08 3月, 2016 1 次提交
  7. 02 3月, 2016 1 次提交
  8. 19 1月, 2016 2 次提交
  9. 18 12月, 2015 1 次提交
  10. 14 10月, 2015 2 次提交
  11. 19 8月, 2015 1 次提交
  12. 19 5月, 2015 1 次提交
    • D
      mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler · 70ffdb93
      David Hildenbrand 提交于
      Introduce faulthandler_disabled() and use it to check for irq context and
      disabled pagefaults (via pagefault_disable()) in the pagefault handlers.
      
      Please note that we keep the in_atomic() checks in place - to detect
      whether in irq context (in which case preemption is always properly
      disabled).
      
      In contrast, preempt_disable() should never be used to disable pagefaults.
      With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
      counter, and therefore the result of in_atomic() differs.
      We validate that condition by using might_fault() checks when calling
      might_sleep().
      
      Therefore, add a comment to faulthandler_disabled(), describing why this
      is needed.
      
      faulthandler_disabled() and pagefault_disable() are defined in
      linux/uaccess.h, so let's properly add that include to all relevant files.
      
      This patch is based on a patch from Thomas Gleixner.
      Reviewed-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: airlied@linux.ie
      Cc: akpm@linux-foundation.org
      Cc: benh@kernel.crashing.org
      Cc: bigeasy@linutronix.de
      Cc: borntraeger@de.ibm.com
      Cc: daniel.vetter@intel.com
      Cc: heiko.carstens@de.ibm.com
      Cc: herbert@gondor.apana.org.au
      Cc: hocko@suse.cz
      Cc: hughd@google.com
      Cc: mst@redhat.com
      Cc: paulus@samba.org
      Cc: ralf@linux-mips.org
      Cc: schwidefsky@de.ibm.com
      Cc: yang.shi@windriver.com
      Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      70ffdb93
  13. 25 3月, 2015 1 次提交
    • H
      s390: remove 31 bit support · 5a79859a
      Heiko Carstens 提交于
      Remove the 31 bit support in order to reduce maintenance cost and
      effectively remove dead code. Since a couple of years there is no
      distribution left that comes with a 31 bit kernel.
      
      The 31 bit kernel also has been broken since more than a year before
      anybody noticed. In addition I added a removal warning to the kernel
      shown at ipl for 5 minutes: a960062e ("s390: add 31 bit warning
      message") which let everybody know about the plan to remove 31 bit
      code. We didn't get any response.
      
      Given that the last 31 bit only machine was introduced in 1999 let's
      remove the code.
      Anybody with 31 bit user space code can still use the compat mode.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      5a79859a
  14. 30 1月, 2015 1 次提交
    • L
      vm: add VM_FAULT_SIGSEGV handling support · 33692f27
      Linus Torvalds 提交于
      The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
      "you should SIGSEGV" error, because the SIGSEGV case was generally
      handled by the caller - usually the architecture fault handler.
      
      That results in lots of duplication - all the architecture fault
      handlers end up doing very similar "look up vma, check permissions, do
      retries etc" - but it generally works.  However, there are cases where
      the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
      
      In particular, when accessing the stack guard page, libsigsegv expects a
      SIGSEGV.  And it usually got one, because the stack growth is handled by
      that duplicated architecture fault handler.
      
      However, when the generic VM layer started propagating the error return
      from the stack expansion in commit fee7e49d ("mm: propagate error
      from stack expansion even for guard page"), that now exposed the
      existing VM_FAULT_SIGBUS result to user space.  And user space really
      expected SIGSEGV, not SIGBUS.
      
      To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
      duplicate architecture fault handlers about it.  They all already have
      the code to handle SIGSEGV, so it's about just tying that new return
      value to the existing code, but it's all a bit annoying.
      
      This is the mindless minimal patch to do this.  A more extensive patch
      would be to try to gather up the mostly shared fault handling logic into
      one generic helper routine, and long-term we really should do that
      cleanup.
      
      Just from this patch, you can generally see that most architectures just
      copied (directly or indirectly) the old x86 way of doing things, but in
      the meantime that original x86 model has been improved to hold the VM
      semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
      "newer" things, so it would be a good idea to bring all those
      improvements to the generic case and teach other architectures about
      them too.
      Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
      Tested-by: NJan Engelhardt <jengelh@inai.de>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33692f27
  15. 29 1月, 2015 1 次提交
  16. 08 1月, 2015 1 次提交
  17. 21 11月, 2014 1 次提交
  18. 27 10月, 2014 1 次提交
  19. 26 8月, 2014 1 次提交
    • M
      KVM: s390/mm: use radix trees for guest to host mappings · 527e30b4
      Martin Schwidefsky 提交于
      Store the target address for the gmap segments in a radix tree
      instead of using invalid segment table entries. gmap_translate
      becomes a simple radix_tree_lookup, gmap_fault is split into the
      address translation with gmap_translate and the part that does
      the linking of the gmap shadow page table with the process page
      table.
      A second radix tree is used to keep the pointers to the segment
      table entries for segments that are mapped in the guest address
      space. On unmap of a segment the pointer is retrieved from the
      radix tree and is used to carry out the segment invalidation in
      the gmap shadow page table. As the radix tree can only store one
      pointer, each host segment may only be mapped to exactly one
      guest location.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      527e30b4
  20. 25 8月, 2014 1 次提交
  21. 20 5月, 2014 1 次提交
  22. 09 4月, 2014 1 次提交
  23. 03 4月, 2014 2 次提交
    • H
      s390/uaccess: rework uaccess code - fix locking issues · 457f2180
      Heiko Carstens 提交于
      The current uaccess code uses a page table walk in some circumstances,
      e.g. in case of the in atomic futex operations or if running on old
      hardware which doesn't support the mvcos instruction.
      
      However it turned out that the page table walk code does not correctly
      lock page tables when accessing page table entries.
      In other words: a different cpu may invalidate a page table entry while
      the current cpu inspects the pte. This may lead to random data corruption.
      
      Adding correct locking however isn't trivial for all uaccess operations.
      Especially copy_in_user() is problematic since that requires to hold at
      least two locks, but must be protected against ABBA deadlock when a
      different cpu also performs a copy_in_user() operation.
      
      So the solution is a different approach where we change address spaces:
      
      User space runs in primary address mode, or access register mode within
      vdso code, like it currently already does.
      
      The kernel usually also runs in home space mode, however when accessing
      user space the kernel switches to primary or secondary address mode if
      the mvcos instruction is not available or if a compare-and-swap (futex)
      instruction on a user space address is performed.
      KVM however is special, since that requires the kernel to run in home
      address space while implicitly accessing user space with the sie
      instruction.
      
      So we end up with:
      
      User space:
      - runs in primary or access register mode
      - cr1 contains the user asce
      - cr7 contains the user asce
      - cr13 contains the kernel asce
      
      Kernel space:
      - runs in home space mode
      - cr1 contains the user or kernel asce
        -> the kernel asce is loaded when a uaccess requires primary or
           secondary address mode
      - cr7 contains the user or kernel asce, (changed with set_fs())
      - cr13 contains the kernel asce
      
      In case of uaccess the kernel changes to:
      - primary space mode in case of a uaccess (copy_to_user) and uses
        e.g. the mvcp instruction to access user space. However the kernel
        will stay in home space mode if the mvcos instruction is available
      - secondary space mode in case of futex atomic operations, so that the
        instructions come from primary address space and data from secondary
        space
      
      In case of kvm the kernel runs in home space mode, but cr1 gets switched
      to contain the gmap asce before the sie instruction gets executed. When
      the sie instruction is finished cr1 will be switched back to contain the
      user asce.
      
      A context switch between two processes will always load the kernel asce
      for the next process in cr1. So the first exit to user space is a bit
      more expensive (one extra load control register instruction) than before,
      however keeps the code rather simple.
      
      In sum this means there is no need to perform any error prone page table
      walks anymore when accessing user space.
      
      The patch seems to be rather large, however it mainly removes the
      the page table walk code and restores the previously deleted "standard"
      uaccess code, with a couple of changes.
      
      The uaccess without mvcos mode can be enforced with the "uaccess_primary"
      kernel parameter.
      Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      457f2180
    • T
      s390/irq: Use defines for external interruption codes · 1dad093b
      Thomas Huth 提交于
      Use the new defines for external interruption codes to get rid
      of "magic" numbers in the s390 source code. And while we're at it,
      also rename the (un-)register_external_interrupt function to
      something shorter so that this patch does not exceed the 80
      columns all over the place.
      Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1dad093b
  24. 30 1月, 2014 1 次提交
  25. 04 11月, 2013 1 次提交
    • M
      s390/mm,tlb: correct tlb flush on page table upgrade · 10607864
      Martin Schwidefsky 提交于
      The IDTE instruction used to flush TLB entries for a specific address
      space uses the address-space-control element (ASCE) to identify
      affected TLB entries. The upgrade of a page table adds a new top
      level page table which changes the ASCE. The TLB entries associated
      with the old ASCE need to be flushed and the ASCE for the address space
      needs to be replaced synchronously on all CPUs which currently use it.
      The concept of a lazy ASCE update with an exception handler is broken.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      10607864
  26. 24 10月, 2013 1 次提交
  27. 13 9月, 2013 1 次提交
  28. 04 9月, 2013 1 次提交
  29. 15 7月, 2013 1 次提交
    • P
      s390: delete __cpuinit usage from all s390 files · e2741f17
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/s390 uses of the __cpuinit macros from
      all C files.  Currently s390 does not have any __CPUINIT used in
      assembly files.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: linux-s390@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      e2741f17
  30. 17 4月, 2013 1 次提交
    • M
      s390/mm: protection exception PSW for aborted transaction · f752ac4d
      Martin Schwidefsky 提交于
      Protection exception usually are suppressing and the fault handler
      needs to rewind the PSW by the instruction length to get the correct
      fault address. Except for protection exceptions while the CPU is in
      the middle of a transaction. The CPU stores the transaction abort
      PSW at the start of the transaction, if the transaction is aborted
      the PSW is already correct and may not be modified by the fault
      handler.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f752ac4d
  31. 08 1月, 2013 1 次提交
    • H
      s390/irq: remove split irq fields from /proc/stat · 420f42ec
      Heiko Carstens 提交于
      Now that irq sum accounting for /proc/stat's "intr" line works again we
      have the oddity that the sum field (first field) contains only the sum
      of the second (external irqs) and third field (I/O interrupts).
      The reason for that is that these two fields are already sums of all other
      fields. So if we would sum up everything we would count every interrupt
      twice.
      This is broken since the split interrupt accounting was merged two years
      ago: 052ff461 "[S390] irq: have detailed
      statistics for interrupt types".
      To fix this remove the split interrupt fields from /proc/stat's "intr"
      line again and only have them in /proc/interrupts.
      
      This restores the old behaviour, seems to be the only sane fix and mimics
      a behaviour from other architectures where /proc/interrupts also contains
      more than /proc/stat's "intr" line does.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      420f42ec
  32. 23 11月, 2012 2 次提交
  33. 09 10月, 2012 1 次提交
  34. 26 9月, 2012 2 次提交