1. 03 8月, 2015 1 次提交
  2. 22 7月, 2015 5 次提交
    • M
      s390/nmi: use the normal asynchronous stack for machine checks · 2acb94f4
      Martin Schwidefsky 提交于
      If a machine checks is received while the CPU is in the kernel, only
      the s390_do_machine_check function will be called. The call to
      s390_handle_mcck is postponed until the CPU returns to user space.
      Because of this it is safe to use the asynchronous stack for machine
      checks even if the CPU is already handling an interrupt.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      2acb94f4
    • M
      s390/kernel: squeeze a few more cycles out of the system call handler · a359bb11
      Martin Schwidefsky 提交于
      Reorder the instructions of UPDATE_VTIME to improve superscalar execution,
      remove duplicate checks for problem-state from the asynchronous interrupt
      handlers, and move the check for problem-state from the synchronous
      exit path to the program check path as it is only needed for program
      checks inside the kernel.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      a359bb11
    • M
      s390/kvm: integrate HANDLE_SIE_INTERCEPT into cleanup_critical · d0fc4107
      Martin Schwidefsky 提交于
      Currently there are two mechanisms to deal with cleanup work due to
      interrupts. The HANDLE_SIE_INTERCEPT macro is used to undo the changes
      required to enter SIE in sie64a. If the SIE instruction causes a program
      check, or an asynchronous interrupt is received the HANDLE_SIE_INTERCEPT
      code forwards the program execution to sie_exit.
      
      All the other critical sections in entry.S are handled by the code in
      cleanup_critical that is called by the SWITCH_ASYNC macro.
      
      Move the sie64a function to the beginning of the critical section and
      add the code from HANDLE_SIE_INTERCEPT to cleanup_critical. Add a special
      case for the sie64a cleanup to the program check handler.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      d0fc4107
    • M
      s390/kvm: fix interrupt race with HANDLE_SIE_INTERCEPT · dcd2a9aa
      Martin Schwidefsky 提交于
      The HANDLE_SIE_INTERCEPT macro is used in the interrupt handlers
      and the program check handler to undo a few changes done by sie64a.
      Among them are guest vs host LPP, the gmap ASCE vs kernel ASCE and
      the bit that indicates that SIE is currently running on the CPU.
      
      There is a race of a voluntary SIE exit vs asynchronous interrupts.
      If the CPU completed the SIE instruction and the TM instruction of
      the LPP macro at the time it receives an interrupt, the interrupt
      handler will run while the LPP, the ASCE and the SIE bit are still
      set up for guest execution. This might result in wrong sampling data,
      but it will not cause data corruption or lockups.
      
      The critical section in sie64a needs to be enlarged to include all
      instructions that undo the changes required for guest execution.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      dcd2a9aa
    • H
      s390/kernel: lazy restore fpu registers · 9977e886
      Hendrik Brueckner 提交于
      Improve the save and restore behavior of FPU register contents to use the
      vector extension within the kernel.
      
      The kernel does not use floating-point or vector registers and, therefore,
      saving and restoring the FPU register contents are performed for handling
      signals or switching processes only.  To prepare for using vector
      instructions and vector registers within the kernel, enhance the save
      behavior and implement a lazy restore at return to user space from a
      system call or interrupt.
      
      To implement the lazy restore, the save_fpu_regs() sets a CPU information
      flag, CIF_FPU, to indicate that the FPU registers must be restored.
      Saving and setting CIF_FPU is performed in an atomic fashion to be
      interrupt-safe.  When the kernel wants to use the vector extension or
      wants to change the FPU register state for a task during signal handling,
      the save_fpu_regs() must be called first.  The CIF_FPU flag is also set at
      process switch.  At return to user space, the FPU state is restored.  In
      particular, the FPU state includes the floating-point or vector register
      contents, as well as, vector-enablement and floating-point control.  The
      FPU state restore and clearing CIF_FPU is also performed in an atomic
      fashion.
      
      For KVM, the restore of the FPU register state is performed when restoring
      the general-purpose guest registers before the SIE instructions is started.
      Because the path towards the SIE instruction is interruptible, the CIF_FPU
      flag must be checked again right before going into SIE.  If set, the guest
      registers must be reloaded again by re-entering the outer SIE loop.  This
      is the same behavior as if the SIE critical section is interrupted.
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      9977e886
  3. 20 7月, 2015 1 次提交
  4. 08 5月, 2015 1 次提交
  5. 25 3月, 2015 2 次提交
  6. 08 12月, 2014 1 次提交
  7. 25 9月, 2014 1 次提交
  8. 20 5月, 2014 2 次提交
    • M
      s390: split TIF bits into CIF, PIF and TIF bits · d3a73acb
      Martin Schwidefsky 提交于
      The oi and ni instructions used in entry[64].S to set and clear bits
      in the thread-flags are not guaranteed to be atomic in regard to other
      CPUs. Split the TIF bits into CPU, pt_regs and thread-info specific
      bits. Updates on the TIF bits are done with atomic instructions,
      updates on CPU and pt_regs bits are done with non-atomic instructions.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      d3a73acb
    • M
      s390/uaccess: simplify control register updates · beef560b
      Martin Schwidefsky 提交于
      Always switch to the kernel ASCE in switch_mm. Load the secondary
      space ASCE in finish_arch_post_lock_switch after checking that
      any pending page table operations have completed. The primary
      ASCE is loaded in entry[64].S. With this the update_primary_asce
      call can be removed from the switch_to macro and from the start
      of switch_mm function. Remove the load_primary argument from
      update_user_asce/clear_user_asce, rename update_user_asce to
      set_user_asce and rename update_primary_asce to load_kernel_asce.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      beef560b
  9. 22 4月, 2014 1 次提交
  10. 03 4月, 2014 1 次提交
    • H
      s390/uaccess: rework uaccess code - fix locking issues · 457f2180
      Heiko Carstens 提交于
      The current uaccess code uses a page table walk in some circumstances,
      e.g. in case of the in atomic futex operations or if running on old
      hardware which doesn't support the mvcos instruction.
      
      However it turned out that the page table walk code does not correctly
      lock page tables when accessing page table entries.
      In other words: a different cpu may invalidate a page table entry while
      the current cpu inspects the pte. This may lead to random data corruption.
      
      Adding correct locking however isn't trivial for all uaccess operations.
      Especially copy_in_user() is problematic since that requires to hold at
      least two locks, but must be protected against ABBA deadlock when a
      different cpu also performs a copy_in_user() operation.
      
      So the solution is a different approach where we change address spaces:
      
      User space runs in primary address mode, or access register mode within
      vdso code, like it currently already does.
      
      The kernel usually also runs in home space mode, however when accessing
      user space the kernel switches to primary or secondary address mode if
      the mvcos instruction is not available or if a compare-and-swap (futex)
      instruction on a user space address is performed.
      KVM however is special, since that requires the kernel to run in home
      address space while implicitly accessing user space with the sie
      instruction.
      
      So we end up with:
      
      User space:
      - runs in primary or access register mode
      - cr1 contains the user asce
      - cr7 contains the user asce
      - cr13 contains the kernel asce
      
      Kernel space:
      - runs in home space mode
      - cr1 contains the user or kernel asce
        -> the kernel asce is loaded when a uaccess requires primary or
           secondary address mode
      - cr7 contains the user or kernel asce, (changed with set_fs())
      - cr13 contains the kernel asce
      
      In case of uaccess the kernel changes to:
      - primary space mode in case of a uaccess (copy_to_user) and uses
        e.g. the mvcp instruction to access user space. However the kernel
        will stay in home space mode if the mvcos instruction is available
      - secondary space mode in case of futex atomic operations, so that the
        instructions come from primary address space and data from secondary
        space
      
      In case of kvm the kernel runs in home space mode, but cr1 gets switched
      to contain the gmap asce before the sie instruction gets executed. When
      the sie instruction is finished cr1 will be switched back to contain the
      user asce.
      
      A context switch between two processes will always load the kernel asce
      for the next process in cr1. So the first exit to user space is a bit
      more expensive (one extra load control register instruction) than before,
      however keeps the code rather simple.
      
      In sum this means there is no need to perform any error prone page table
      walks anymore when accessing user space.
      
      The patch seems to be rather large, however it mainly removes the
      the page table walk code and restores the previously deleted "standard"
      uaccess code, with a couple of changes.
      
      The uaccess without mvcos mode can be enforced with the "uaccess_primary"
      kernel parameter.
      Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      457f2180
  11. 21 2月, 2014 1 次提交
    • M
      s390/mm,tlb: race of lazy TLB flush vs. recreation of TLB entries · 53e857f3
      Martin Schwidefsky 提交于
      Git commit 050eef36 "[S390] fix tlb flushing vs. concurrent
      /proc accesses" introduced the attach counter to avoid using the
      mm_users value to decide between IPTE for every PTE and lazy TLB
      flushing with IDTE. That fixed the problem with mm_users but it
      introduced another subtle race, fortunately one that is very hard
      to hit.
      The background is the requirement of the architecture that a valid
      PTE may not be changed while it can be used concurrently by another
      cpu. The decision between IPTE and lazy TLB flushing needs to be
      done while the PTE is still valid. Now if the virtual cpu is
      temporarily stopped after the decision to use lazy TLB flushing but
      before the invalid bit of the PTE has been set, another cpu can attach
      the mm, find that flush_mm is set, do the IDTE, return to userspace,
      and recreate a TLB that uses the PTE in question. When the first,
      stopped cpu continues it will change the PTE while it is attached on
      another cpu. The first cpu will do another IDTE shortly after the
      modification of the PTE which makes the race window quite short.
      
      To fix this race the CPU that wants to attach the address space of a
      user space thread needs to wait for the end of the PTE modification.
      The number of concurrent TLB flushers for an mm is tracked in the
      upper 16 bits of the attach_count and finish_arch_post_lock_switch
      is used to wait for the end of the flush operation if required.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      53e857f3
  12. 16 1月, 2014 1 次提交
  13. 16 12月, 2013 1 次提交
  14. 30 9月, 2013 1 次提交
  15. 28 8月, 2013 1 次提交
  16. 22 8月, 2013 1 次提交
    • M
      s390: convert interrupt handling to use generic hardirq · 1f44a225
      Martin Schwidefsky 提交于
      With the introduction of PCI it became apparent that s390 should
      convert to generic hardirqs as too many drivers do not have the
      correct dependency for GENERIC_HARDIRQS. On the architecture
      level s390 does not have irq lines. It has external interrupts,
      I/O interrupts and adapter interrupts. This patch hard-codes all
      external interrupts as irq #1, all I/O interrupts as irq #2 and
      all adapter interrupts as irq #3. The additional information from
      the lowcore associated with the interrupt is stored in the
      pt_regs of the interrupt frame, where the interrupt handler can
      pick it up. For PCI/MSI interrupts the adapter interrupt handler
      scans the relevant bit fields and calls generic_handle_irq with
      the virtual irq number for the MSI interrupt.
      Reviewed-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1f44a225
  17. 27 6月, 2013 1 次提交
  18. 17 6月, 2013 1 次提交
  19. 21 5月, 2013 3 次提交
  20. 26 4月, 2013 2 次提交
  21. 05 3月, 2013 1 次提交
    • M
      s390: critical section cleanup vs. machine checks · 6551fbdf
      Martin Schwidefsky 提交于
      The current machine check code uses the registers stored by the machine
      in the lowcore at __LC_GPREGS_SAVE_AREA as the registers of the interrupted
      context. The registers 0-7 of a user process can get clobbered if a machine
      checks interrupts the execution of a critical section in entry[64].S.
      
      The reason is that the critical section cleanup code may need to modify
      the PSW and the registers for the previous context to get to the end of a
      critical section. If registers 0-7 have to be replaced the relevant copy
      will be in the registers, which invalidates the copy in the lowcore. The
      machine check handler needs to explicitly store registers 0-7 to the stack.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6551fbdf
  22. 14 2月, 2013 1 次提交
  23. 23 11月, 2012 3 次提交
    • C
      s390/kvm: Fix address space mixup · ce6a04ac
      Christian Borntraeger 提交于
      I was chasing down a bug of random validity intercepts on s390.
      (guest prefix page not mapped in the host virtual aspace). Turns out
      that the problem was a wrong address space control element. The
      cause was quite complex:
      
      During paging activity a DAT protection during SIE caused a program
      interrupt. Normally, the sie retry loop tries to catch all
      interrupts during and shortly before sie to rerun the setup. The
      problem is now that protection causes a suppressing program interrupt,
      causing the PSW to point to the instruction AFTER SIE in case of DAT
      protection. This confused the logic of the retry loop to not trigger,
      instead we jumped directly back to SIE after return from
      the program  interrupt. (the protection fault handler itself did
      a rewind of the psw). This usually works quite well, but:
      
      If now the protection fault handler has to wait, another program
      might be scheduled in. Later on the sie process will be schedules
      in again. In that case the content of CR1 (primary address space)
      will be wrong because switch_to will put the user space ASCE into CR1
      and not the guest ASCE.
      
      In addition the program parameter is also wrong for every protection
      fault of a guest, since we dont issue the SPP instruction.
      
      So lets also check for PSW == instruction after SIE in the program
      check handler. Instead of expensively checking all program
      interruption codes that might be suppressing we assume that a program
      interrupt pointing after SIE was always a program interrupt in SIE.
      (Otherwise we have a kernel bug anyway).
      
      We also have to compensate the rewinding, since the C-level handlers
      will do that. Therefore we need to add a nop with the same length
      as SIE before the sie_loop.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      CC: stable@vger.kernel.org
      CC: Heiko Carstens <heiko.carstens@de.ibm.com>
      ce6a04ac
    • M
      s390/ptrace: race of single stepping vs signal delivery · 39efd4ec
      Martin Schwidefsky 提交于
      The current single step code is racy in regard to concurrent delivery
      of signals. If a signal is delivered after a PER program check occurred
      but before the TIF_PER_TRAP bit has been checked in entry[64].S the code
      clears TIF_PER_TRAP and then calls do_signal. This is wrong, if the
      instruction completed (or has been suppressed) a SIGTRAP should be
      delivered to the debugger in any case. Only if the instruction has been
      nullified the SIGTRAP may not be send.
      
      The new logic always sets TIF_PER_TRAP if the program check indicates PER
      tracing but removes it again for all program checks that are nullifying.
      The effect is that for each change in the PSW address we now get a
      single SIGTRAP.
      Reported-by: NAndreas Arnez <arnez@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      39efd4ec
    • H
      s390/traps: preinitialize program check table · b01a37a7
      Heiko Carstens 提交于
      Preinitialize the program check table, so we can put it into the
      read-only data section.
      Also use only four byte entries for the table, since each program
      check handler resides within the first 2GB. Therefore this reduces
      the size of the table by 50% on 64 bit builds.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b01a37a7
  24. 29 10月, 2012 1 次提交
  25. 09 10月, 2012 1 次提交
  26. 01 10月, 2012 3 次提交
  27. 26 9月, 2012 1 次提交