1. 21 4月, 2018 2 次提交
  2. 20 4月, 2018 2 次提交
  3. 19 4月, 2018 4 次提交
    • M
      powerpc/kvm: Fix lockups when running KVM guests on Power8 · 56376c58
      Michael Ellerman 提交于
      When running KVM guests on Power8 we can see a lockup where one CPU
      stops responding. This often leads to a message such as:
      
        watchdog: CPU 136 detected hard LOCKUP on other CPUs 72
        Task dump for CPU 72:
        qemu-system-ppc R  running task    10560 20917  20908 0x00040004
      
      And then backtraces on other CPUs, such as:
      
        Task dump for CPU 48:
        ksmd            R  running task    10032  1519      2 0x00000804
        Call Trace:
          ...
          --- interrupt: 901 at smp_call_function_many+0x3c8/0x460
              LR = smp_call_function_many+0x37c/0x460
          pmdp_invalidate+0x100/0x1b0
          __split_huge_pmd+0x52c/0xdb0
          try_to_unmap_one+0x764/0x8b0
          rmap_walk_anon+0x15c/0x370
          try_to_unmap+0xb4/0x170
          split_huge_page_to_list+0x148/0xa30
          try_to_merge_one_page+0xc8/0x990
          try_to_merge_with_ksm_page+0x74/0xf0
          ksm_scan_thread+0x10ec/0x1ac0
          kthread+0x160/0x1a0
          ret_from_kernel_thread+0x5c/0x78
      
      This is caused by commit 8c1c7fb0 ("powerpc/64s/idle: avoid sync
      for KVM state when waking from idle"), which added a check in
      pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is
      already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and
      test of kvm_hstate.hwthread_req.
      
      The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when
      entering the guest, so it can then come out to cede with
      KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after
      setting hwthread_req to 1, but because hwthread_state is still
      KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we
      wake up from idle and won't go to kvm_start_guest. From there the
      thread will return somewhere garbage and crash.
      
      Fix it by skipping the store of hwthread_state, but not the test of
      hwthread_req, when coming out of idle. It's OK to skip the sync in
      that case because hwthread_req will have been set on the same thread,
      so there is no synchronisation required.
      
      Fixes: 8c1c7fb0 ("powerpc/64s/idle: avoid sync for KVM state when waking from idle")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      56376c58
    • M
      powerpc/eeh: Fix enabling bridge MMIO windows · 13a83eac
      Michael Neuling 提交于
      On boot we save the configuration space of PCIe bridges. We do this so
      when we get an EEH event and everything gets reset that we can restore
      them.
      
      Unfortunately we save this state before we've enabled the MMIO space
      on the bridges. Hence if we have to reset the bridge when we come back
      MMIO is not enabled and we end up taking an PE freeze when the driver
      starts accessing again.
      
      This patch forces the memory/MMIO and bus mastering on when restoring
      bridges on EEH. Ideally we'd do this correctly by saving the
      configuration space writes later, but that will have to come later in
      a larger EEH rewrite. For now we have this simple fix.
      
      The original bug can be triggered on a boston machine by doing:
        echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
      On boston, this PHB has a PCIe switch on it.  Without this patch,
      you'll see two EEH events, 1 expected and 1 the failure we are fixing
      here. The second EEH event causes the anything under the PHB to
      disappear (i.e. the i40e eth).
      
      With this patch, only 1 EEH event occurs and devices properly recover.
      
      Fixes: 652defed ("powerpc/eeh: Check PCIe link after reset")
      Cc: stable@vger.kernel.org # v3.11+
      Reported-by: NPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      13a83eac
    • M
      MIPS: uaccess: Add micromips clobbers to bzero invocation · b3d7e55c
      Matt Redfearn 提交于
      The micromips implementation of bzero additionally clobbers registers t7
      & t8. Specify this in the clobbers list when invoking bzero.
      
      Fixes: 26c5e07d ("MIPS: microMIPS: Optimise 'memset' core library function.")
      Reported-by: NJames Hogan <jhogan@kernel.org>
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.10+
      Patchwork: https://patchwork.linux-mips.org/patch/19110/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      b3d7e55c
    • M
      MIPS: memset.S: Fix clobber of v1 in last_fixup · c96eebf0
      Matt Redfearn 提交于
      The label .Llast_fixup\@ is jumped to on page fault within the final
      byte set loop of memset (on < MIPSR6 architectures). For some reason, in
      this fault handler, the v1 register is randomly set to a2 & STORMASK.
      This clobbers v1 for the calling function. This can be observed with the
      following test code:
      
      static int __init __attribute__((optimize("O0"))) test_clear_user(void)
      {
        register int t asm("v1");
        char *test;
        int j, k;
      
        pr_info("\n\n\nTesting clear_user\n");
        test = vmalloc(PAGE_SIZE);
      
        for (j = 256; j < 512; j++) {
          t = 0xa5a5a5a5;
          if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
              pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, k);
          }
          if (t != 0xa5a5a5a5) {
             pr_err("v1 was clobbered to 0x%x!\n", t);
          }
        }
      
        return 0;
      }
      late_initcall(test_clear_user);
      
      Which demonstrates that v1 is indeed clobbered (MIPS64):
      
      Testing clear_user
      v1 was clobbered to 0x1!
      v1 was clobbered to 0x2!
      v1 was clobbered to 0x3!
      v1 was clobbered to 0x4!
      v1 was clobbered to 0x5!
      v1 was clobbered to 0x6!
      v1 was clobbered to 0x7!
      
      Since the number of bytes that could not be set is already contained in
      a2, the andi placing a value in v1 is not necessary and actively
      harmful in clobbering v1.
      Reported-by: NJames Hogan <jhogan@kernel.org>
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/19109/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      c96eebf0
  4. 18 4月, 2018 2 次提交
  5. 17 4月, 2018 6 次提交
    • M
      MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup · daf70d89
      Matt Redfearn 提交于
      The __clear_user function is defined to return the number of bytes that
      could not be cleared. From the underlying memset / bzero implementation
      this means setting register a2 to that number on return. Currently if a
      page fault is triggered within the memset_partial block, the value
      loaded into a2 on return is meaningless.
      
      The label .Lpartial_fixup\@ is jumped to on page fault. In order to work
      out how many bytes failed to copy, the exception handler should find how
      many bytes left in the partial block (andi a2, STORMASK), add that to
      the partial block end address (a2), and subtract the faulting address to
      get the remainder. Currently it incorrectly subtracts the partial block
      start address (t1), which has additionally been clobbered to generate a
      jump target in memset_partial. Fix this by adding the block end address
      instead.
      
      This issue was found with the following test code:
            int j, k;
            for (j = 0; j < 512; j++) {
              if ((k = clear_user(NULL, j)) != j) {
                 pr_err("clear_user (NULL %d) returned %d\n", j, k);
              }
            }
      Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64).
      Suggested-by: NJames Hogan <jhogan@kernel.org>
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/19108/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      daf70d89
    • M
      arm64: kasan: avoid pfn_to_nid() before page array is initialized · 800cb2e5
      Mark Rutland 提交于
      In arm64's kasan_init(), we use pfn_to_nid() to find the NUMA node a
      span of memory is in, hoping to allocate shadow from the same NUMA node.
      However, at this point, the page array has not been initialized, and
      thus this is bogus.
      
      Since commit:
      
        f165b378 ("mm: uninitialized struct page poisoning sanity")
      
      ... accessing fields of the page array results in a boot time Oops(),
      highlighting this problem:
      
      [    0.000000] Unable to handle kernel paging request at virtual address dfff200000000000
      [    0.000000] Mem abort info:
      [    0.000000]   ESR = 0x96000004
      [    0.000000]   Exception class = DABT (current EL), IL = 32 bits
      [    0.000000]   SET = 0, FnV = 0
      [    0.000000]   EA = 0, S1PTW = 0
      [    0.000000] Data abort info:
      [    0.000000]   ISV = 0, ISS = 0x00000004
      [    0.000000]   CM = 0, WnR = 0
      [    0.000000] [dfff200000000000] address between user and kernel address ranges
      [    0.000000] Internal error: Oops: 96000004 [#1] PREEMPT SMP
      [    0.000000] Modules linked in:
      [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.16.0-07317-gf165b378 #42
      [    0.000000] Hardware name: ARM Juno development board (r1) (DT)
      [    0.000000] pstate: 80000085 (Nzcv daIf -PAN -UAO)
      [    0.000000] pc : __asan_load8+0x8c/0xa8
      [    0.000000] lr : __dump_page+0x3c/0x3b8
      [    0.000000] sp : ffff2000099b7ca0
      [    0.000000] x29: ffff2000099b7ca0 x28: ffff20000a1762c0
      [    0.000000] x27: ffff7e0000000000 x26: ffff2000099dd000
      [    0.000000] x25: ffff200009a3f960 x24: ffff200008f9c38c
      [    0.000000] x23: ffff20000a9d3000 x22: ffff200009735430
      [    0.000000] x21: fffffffffffffffe x20: ffff7e0001e50420
      [    0.000000] x19: ffff7e0001e50400 x18: 0000000000001840
      [    0.000000] x17: ffffffffffff8270 x16: 0000000000001840
      [    0.000000] x15: 0000000000001920 x14: 0000000000000004
      [    0.000000] x13: 0000000000000000 x12: 0000000000000800
      [    0.000000] x11: 1ffff0012d0f89ff x10: ffff10012d0f89ff
      [    0.000000] x9 : 0000000000000000 x8 : ffff8009687c5000
      [    0.000000] x7 : 0000000000000000 x6 : ffff10000f282000
      [    0.000000] x5 : 0000000000000040 x4 : fffffffffffffffe
      [    0.000000] x3 : 0000000000000000 x2 : dfff200000000000
      [    0.000000] x1 : 0000000000000005 x0 : 0000000000000000
      [    0.000000] Process swapper (pid: 0, stack limit = 0x        (ptrval))
      [    0.000000] Call trace:
      [    0.000000]  __asan_load8+0x8c/0xa8
      [    0.000000]  __dump_page+0x3c/0x3b8
      [    0.000000]  dump_page+0xc/0x18
      [    0.000000]  kasan_init+0x2e8/0x5a8
      [    0.000000]  setup_arch+0x294/0x71c
      [    0.000000]  start_kernel+0xdc/0x500
      [    0.000000] Code: aa0403e0 9400063c 17ffffee d343fc00 (38e26800)
      [    0.000000] ---[ end trace 67064f0e9c0cc338 ]---
      [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
      [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
      
      Let's fix this by using early_pfn_to_nid(), as other architectures do in
      their kasan init code. Note that early_pfn_to_nid acquires the nid from
      the memblock array, which we iterate over in kasan_init(), so this
      should be fine.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Fixes: 39d114dd ("arm64: add KASAN support")
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      800cb2e5
    • M
      powerpc/64s: Default l1d_size to 64K in RFI fallback flush · 9dfbf78e
      Madhavan Srinivasan 提交于
      If there is no d-cache-size property in the device tree, l1d_size could
      be zero. We don't actually expect that to happen, it's only been seen
      on mambo (simulator) in some configurations.
      
      A zero-size l1d_size leads to the loop in the asm wrapping around to
      2^64-1, and then walking off the end of the fallback area and
      eventually causing a page fault which is fatal.
      
      Just default to 64K which is correct on some CPUs, and sane enough to
      not cause a crash on others.
      
      Fixes: aa8a5e00 ('powerpc/64s: Add support for RFI flush of L1-D cache')
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      [mpe: Rewrite comment and change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9dfbf78e
    • M
      s390/signal: cleanup uapi struct sigaction · fae76491
      Martin Schwidefsky 提交于
      The struct sigaction for user space in arch/s390/include/uapi/asm/signal.h
      is ill defined. The kernel uses two structures 'struct sigaction' and
      'struct old_sigaction', the correlation in the kernel for both 31 and
      64 bit is as follows
      
          sys_sigaction -> struct old_sigaction
          sys_rt_sigaction -> struct sigaction
      
      The correlation of the (single) uapi definition for 'struct sigaction'
      under '#ifndef __KERNEL__':
      
          31-bit: sys_sigaction -> uapi struct sigaction
          31-bit: sys_rt_sigaction -> no structure available
      
          64-bit: sys_sigaction -> no structure available
          64-bit: sys_rt_sigaction -> uapi struct sigaction
      
      This is quite confusing. To make it a bit less confusing make the
      uapi definition of 'struct sigaction' usable for sys_rt_sigaction for
      both 31-bit and 64-bit.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      fae76491
    • M
      MIPS: memset.S: EVA & fault support for small_memset · 8a8158c8
      Matt Redfearn 提交于
      The MIPS kernel memset / bzero implementation includes a small_memset
      branch which is used when the region to be set is smaller than a long (4
      bytes on 32bit, 8 bytes on 64bit). The current small_memset
      implementation uses a simple store byte loop to write the destination.
      There are 2 issues with this implementation:
      
      1. When EVA mode is active, user and kernel address spaces may overlap.
      Currently the use of the sb instruction means kernel mode addressing is
      always used and an intended write to userspace may actually overwrite
      some critical kernel data.
      
      2. If the write triggers a page fault, for example by calling
      __clear_user(NULL, 2), instead of gracefully handling the fault, an OOPS
      is triggered.
      
      Fix these issues by replacing the sb instruction with the EX() macro,
      which will emit EVA compatible instuctions as required. Additionally
      implement a fault fixup for small_memset which sets a2 to the number of
      bytes that could not be cleared (as defined by __clear_user).
      Reported-by: NChuanhua Lei <chuanhua.lei@intel.com>
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: stable@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/18975/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      8a8158c8
    • J
      x86/ldt: Fix support_pte_mask filtering in map_ldt_struct() · e6f39e87
      Joerg Roedel 提交于
      The |= operator will let us end up with an invalid PTE. Use
      the correct &= instead.
      
      [ The bug was also independently reported by Shuah Khan ]
      
      Fixes: fb43d6cb ('x86/mm: Do not auto-massage page protections')
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6f39e87
  6. 16 4月, 2018 20 次提交
  7. 14 4月, 2018 4 次提交