1. 01 5月, 2020 1 次提交
    • V
      arm64: vdso: Add -fasynchronous-unwind-tables to cflags · 1578e5d0
      Vincenzo Frascino 提交于
      On arm64 linux gcc uses -fasynchronous-unwind-tables -funwind-tables
      by default since gcc-8, so now the de facto platform ABI is to allow
      unwinding from async signal handlers.
      
      However on bare metal targets (aarch64-none-elf), and on old gcc,
      async and sync unwind tables are not enabled by default to avoid
      runtime memory costs.
      
      This means if linux is built with a baremetal toolchain the vdso.so
      may not have unwind tables which breaks the gcc platform ABI guarantee
      in userspace.
      
      Add -fasynchronous-unwind-tables explicitly to the vgettimeofday.o
      cflags to address the ABI change.
      
      Fixes: 28b1a824 ("arm64: vdso: Substitute gettimeofday() with C implementation")
      Cc: Will Deacon <will@kernel.org>
      Reported-by: NSzabolcs Nagy <szabolcs.nagy@arm.com>
      Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      1578e5d0
  2. 25 4月, 2020 5 次提交
  3. 23 4月, 2020 8 次提交
  4. 22 4月, 2020 10 次提交
  5. 21 4月, 2020 8 次提交
    • M
      arm64: sync kernel APIAKey when installing · 3fabb438
      Mark Rutland 提交于
      A direct write to a APxxKey_EL1 register requires a context
      synchronization event to ensure that indirect reads made by subsequent
      instructions (e.g. AUTIASP, PACIASP) observe the new value.
      
      When we initialize the boot task's APIAKey in boot_init_stack_canary()
      via ptrauth_keys_switch_kernel() we miss the necessary ISB, and so there
      is a window where instructions are not guaranteed to use the new APIAKey
      value. This has been observed to result in boot-time crashes where
      PACIASP and AUTIASP within a function used a mixture of the old and new
      key values.
      
      Fix this by having ptrauth_keys_switch_kernel() synchronize the new key
      value with an ISB. At the same time, __ptrauth_key_install() is renamed
      to __ptrauth_key_install_nosync() so that it is obvious that this
      performs no synchronization itself.
      
      Fixes: 28321582 ("arm64: initialize ptrauth keys for kernel booting task")
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NWill Deacon <will@kernel.org>
      Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Tested-by: NWill Deacon <will@kernel.org>
      3fabb438
    • C
      s390/mm: fix page table upgrade vs 2ndary address mode accesses · 316ec154
      Christian Borntraeger 提交于
      A page table upgrade in a kernel section that uses secondary address
      mode will mess up the kernel instructions as follows:
      
      Consider the following scenario: two threads are sharing memory.
      On CPU1 thread 1 does e.g. strnlen_user().  That gets to
              old_fs = enable_sacf_uaccess();
              len = strnlen_user_srst(src, size);
      and
                      "   la    %2,0(%1)\n"
                      "   la    %3,0(%0,%1)\n"
                      "   slgr  %0,%0\n"
                      "   sacf  256\n"
                      "0: srst  %3,%2\n"
      in strnlen_user_srst().  At that point we are in secondary space mode,
      control register 1 points to kernel page table and instruction fetching
      happens via c1, rather than usual c13.  Interrupts are not disabled, for
      obvious reasons.
      
      On CPU2 thread 2 does MAP_FIXED mmap(), forcing the upgrade of page table
      from 3-level to e.g. 4-level one.  We'd allocated new top-level table,
      set it up and now we hit this:
                      notify = 1;
                      spin_unlock_bh(&mm->page_table_lock);
              }
              if (notify)
                      on_each_cpu(__crst_table_upgrade, mm, 0);
      OK, we need to actually change over to use of new page table and we
      need that to happen in all threads that are currently running.  Which
      happens to include the thread 1.  IPI is delivered and we have
      static void __crst_table_upgrade(void *arg)
      {
              struct mm_struct *mm = arg;
      
              if (current->active_mm == mm)
                      set_user_asce(mm);
              __tlb_flush_local();
      }
      run on CPU1.  That does
      static inline void set_user_asce(struct mm_struct *mm)
      {
              S390_lowcore.user_asce = mm->context.asce;
      OK, user page table address updated...
              __ctl_load(S390_lowcore.user_asce, 1, 1);
      ... and control register 1 set to it.
              clear_cpu_flag(CIF_ASCE_PRIMARY);
      }
      
      IPI is run in home space mode, so it's fine - insns are fetched
      using c13, which always points to kernel page table.  But as soon
      as we return from the interrupt, previous PSW is restored, putting
      CPU1 back into secondary space mode, at which point we no longer
      get the kernel instructions from the kernel mapping.
      
      The fix is to only fixup the control registers that are currently in use
      for user processes during the page table update.  We must also disable
      interrupts in enable_sacf_uaccess to synchronize the cr and
      thread.mm_segment updates against the on_each-cpu.
      
      Fixes: 0aaba41b ("s390: remove all code using the access register mode")
      Cc: stable@vger.kernel.org # 4.15+
      Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      References: CVE-2020-11884
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      316ec154
    • D
      x86/hyperv: Suspend/resume the VP assist page for hibernation · 421f090c
      Dexuan Cui 提交于
      Unlike the other CPUs, CPU0 is never offlined during hibernation, so in the
      resume path, the "new" kernel's VP assist page is not suspended (i.e. not
      disabled), and later when we jump to the "old" kernel, the page is not
      properly re-enabled for CPU0 with the allocated page from the old kernel.
      
      So far, the VP assist page is used by hv_apic_eoi_write(), and is also
      used in the case of nested virtualization (running KVM atop Hyper-V).
      
      For hv_apic_eoi_write(), when the page is not properly re-enabled,
      hvp->apic_assist is always 0, so the HV_X64_MSR_EOI MSR is always written.
      This is not ideal with respect to performance, but Hyper-V can still
      correctly handle this according to the Hyper-V spec; nevertheless, Linux
      still must update the Hyper-V hypervisor with the correct VP assist page
      to prevent Hyper-V from writing to the stale page, which causes guest
      memory corruption and consequently may have caused the hangs and triple
      faults seen during non-boot CPUs resume.
      
      Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops.
      Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500.
      With the fix, hibernation can pass a long-haul test of 2000 runs.
      
      In the case of nested virtualization, disabling/reenabling the assist
      page upon hibernation may be unsafe if there are active L2 guests.
      It looks KVM should be enhanced to abort the hibernation request if
      there is any active L2 guest.
      
      Fixes: 05bd330a ("x86/hyperv: Suspend/resume the hypercall page for hibernation")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Link: https://lore.kernel.org/r/1587437171-2472-1-git-send-email-decui@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
      421f090c
    • M
      Drivers: hv: Move AEOI determination to architecture dependent code · 2ddddd0b
      Michael Kelley 提交于
      Hyper-V on ARM64 doesn't provide a flag for the AEOI recommendation
      in ms_hyperv.hints, so having the test in architecture independent
      code doesn't work. Resolve this by moving the check of the flag
      to an architecture dependent helper function. No functionality is
      changed.
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Link: https://lore.kernel.org/r/20200420164926.24471-1-mikelley@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
      2ddddd0b
    • C
      powerpc/setup_64: Set cache-line-size based on cache-block-size · 94c0b013
      Chris Packham 提交于
      If {i,d}-cache-block-size is set and {i,d}-cache-line-size is not, use
      the block-size value for both. Per the devicetree spec cache-line-size
      is only needed if it differs from the block size.
      
      Originally the code would fallback from block size to line size. An
      error message was printed if both properties were missing.
      
      Later the code was refactored to use clearer names and logic but it
      inadvertently made line size a required property, meaning on systems
      without a line size property we fall back to the default from the
      cputable.
      
      On powernv (OPAL) platforms, since the introduction of device tree CPU
      features (5a61ef74 ("powerpc/64s: Support new device tree binding
      for discovering CPU features")), that has led to the wrong value being
      used, as the fallback value is incorrect for Power8/Power9 CPUs.
      
      The incorrect values flow through to the VDSO and also to the sysconf
      values, SC_LEVEL1_ICACHE_LINESIZE etc.
      
      Fixes: bd067f83 ("powerpc/64: Fix naming of cache block vs. cache line")
      Cc: stable@vger.kernel.org # v4.11+
      Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
      Reported-by: NQian Cai <cai@lca.pw>
      [mpe: Add even more detail to change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200416221908.7886-1-chris.packham@alliedtelesis.co.nz
      94c0b013
    • L
      bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_B · aee194b1
      Luke Nelson 提交于
      This patch fixes an encoding bug in emit_stx for BPF_B when the source
      register is BPF_REG_FP.
      
      The current implementation for BPF_STX BPF_B in emit_stx saves one REX
      byte when the operands can be encoded using Mod-R/M alone. The lower 8
      bits of registers %rax, %rbx, %rcx, and %rdx can be accessed without using
      a REX prefix via %al, %bl, %cl, and %dl, respectively. Other registers,
      (e.g., %rsi, %rdi, %rbp, %rsp) require a REX prefix to use their 8-bit
      equivalents (%sil, %dil, %bpl, %spl).
      
      The current code checks if the source for BPF_STX BPF_B is BPF_REG_1
      or BPF_REG_2 (which map to %rdi and %rsi), in which case it emits the
      required REX prefix. However, it misses the case when the source is
      BPF_REG_FP (mapped to %rbp).
      
      The result is that BPF_STX BPF_B with BPF_REG_FP as the source operand
      will read from register %ch instead of the correct %bpl. This patch fixes
      the problem by fixing and refactoring the check on which registers need
      the extra REX byte. Since no BPF registers map to %rsp, there is no need
      to handle %spl.
      
      Fixes: 62258278 ("net: filter: x86: internal BPF JIT")
      Signed-off-by: NXi Wang <xi.wang@gmail.com>
      Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200418232655.23870-1-luke.r.nels@gmail.com
      aee194b1
    • P
      KVM: PPC: Book3S HV: Handle non-present PTEs in page fault functions · ae49deda
      Paul Mackerras 提交于
      Since cd758a9b "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT
      page fault handler", it's been possible in fairly rare circumstances to
      load a non-present PTE in kvmppc_book3s_hv_page_fault() when running a
      guest on a POWER8 host.
      
      Because that case wasn't checked for, we could misinterpret the non-present
      PTE as being a cache-inhibited PTE.  That could mismatch with the
      corresponding hash PTE, which would cause the function to fail with -EFAULT
      a little further down.  That would propagate up to the KVM_RUN ioctl()
      generally causing the KVM userspace (usually qemu) to fall over.
      
      This addresses the problem by catching that case and returning to the guest
      instead.
      
      For completeness, this fixes the radix page fault handler in the same
      way.  For radix this didn't cause any obvious misbehaviour, because we
      ended up putting the non-present PTE into the guest's partition-scoped
      page tables, leading immediately to another hypervisor data/instruction
      storage interrupt, which would go through the page fault path again
      and fix things up.
      
      Fixes: cd758a9b "KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler"
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820402Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Tested-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ae49deda
    • J
      kvm: Disable objtool frame pointer checking for vmenter.S · 7f4b5cde
      Josh Poimboeuf 提交于
      Frame pointers are completely broken by vmenter.S because it clobbers
      RBP:
      
        arch/x86/kvm/svm/vmenter.o: warning: objtool: __svm_vcpu_run()+0xe4: BP used as a scratch register
      
      That's unavoidable, so just skip checking that file when frame pointers
      are configured in.
      
      On the other hand, ORC can handle that code just fine, so leave objtool
      enabled in the !FRAME_POINTER case.
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Message-Id: <01fae42917bacad18be8d2cbc771353da6603473.1587398610.git.jpoimboe@redhat.com>
      Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Fixes: 199cd1d7 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7f4b5cde
  6. 20 4月, 2020 1 次提交
    • E
      KVM: s390: Fix PV check in deliverable_irqs() · d47c4c45
      Eric Farman 提交于
      The diag 0x44 handler, which handles a directed yield, goes into a
      a codepath that does a kvm_for_each_vcpu() and ultimately
      deliverable_irqs().  The new check for kvm_s390_pv_cpu_is_protected()
      contains an assertion that the vcpu->mutex is held, which isn't going
      to be the case in this scenario.
      
      The result is a plethora of these messages if the lock debugging
      is enabled, and thus an implication that we have a problem.
      
        WARNING: CPU: 9 PID: 16167 at arch/s390/kvm/kvm-s390.h:239 deliverable_irqs+0x1c6/0x1d0 [kvm]
        ...snip...
        Call Trace:
         [<000003ff80429bf2>] deliverable_irqs+0x1ca/0x1d0 [kvm]
        ([<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm])
         [<000003ff8042ba82>] kvm_s390_vcpu_has_irq+0x2a/0xa8 [kvm]
         [<000003ff804101e2>] kvm_arch_dy_runnable+0x22/0x38 [kvm]
         [<000003ff80410284>] kvm_vcpu_on_spin+0x8c/0x1d0 [kvm]
         [<000003ff80436888>] kvm_s390_handle_diag+0x3b0/0x768 [kvm]
         [<000003ff80425af4>] kvm_handle_sie_intercept+0x1cc/0xcd0 [kvm]
         [<000003ff80422bb0>] __vcpu_run+0x7b8/0xfd0 [kvm]
         [<000003ff80423de6>] kvm_arch_vcpu_ioctl_run+0xee/0x3e0 [kvm]
         [<000003ff8040ccd8>] kvm_vcpu_ioctl+0x2c8/0x8d0 [kvm]
         [<00000001504ced06>] ksys_ioctl+0xae/0xe8
         [<00000001504cedaa>] __s390x_sys_ioctl+0x2a/0x38
         [<0000000150cb9034>] system_call+0xd8/0x2d8
        2 locks held by CPU 2/KVM/16167:
         #0: 00000001951980c0 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x90/0x8d0 [kvm]
         #1: 000000019599c0f0 (&kvm->srcu){....}, at: __vcpu_run+0x4bc/0xfd0 [kvm]
        Last Breaking-Event-Address:
         [<000003ff80429b34>] deliverable_irqs+0x10c/0x1d0 [kvm]
        irq event stamp: 11967
        hardirqs last  enabled at (11975): [<00000001502992f2>] console_unlock+0x4ca/0x650
        hardirqs last disabled at (11982): [<0000000150298ee8>] console_unlock+0xc0/0x650
        softirqs last  enabled at (7940): [<0000000150cba6ca>] __do_softirq+0x422/0x4d8
        softirqs last disabled at (7929): [<00000001501cd688>] do_softirq_own_stack+0x70/0x80
      
      Considering what's being done here, let's fix this by removing the
      mutex assertion rather than acquiring the mutex for every other vcpu.
      
      Fixes: 201ae986 ("KVM: s390: protvirt: Implement interrupt injection")
      Signed-off-by: NEric Farman <farman@linux.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Link: https://lore.kernel.org/r/20200415190353.63625-1-farman@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      d47c4c45
  7. 18 4月, 2020 3 次提交
  8. 17 4月, 2020 4 次提交