1. 28 6月, 2020 1 次提交
  2. 26 6月, 2020 7 次提交
  3. 25 6月, 2020 4 次提交
  4. 24 6月, 2020 6 次提交
  5. 23 6月, 2020 18 次提交
    • M
      arm64: Depend on newer binutils when building PAC · 4dc9b282
      Mark Brown 提交于
      Versions of binutils prior to 2.33.1 don't understand the ELF notes that
      are added by modern compilers to indicate the PAC and BTI options used
      to build the code. This causes them to emit large numbers of warnings in
      the form:
      
      aarch64-linux-gnu-nm: warning: .tmp_vmlinux.kallsyms2: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0000000
      
      during the kernel build which is currently causing quite a bit of
      disruption for automated build testing using clang.
      
      In commit 15cd0e67 (arm64: Kconfig: ptrauth: Add binutils version
      check to fix mismatch) we added a dependency on binutils to avoid this
      issue when building with versions of GCC that emit the notes but did not
      do so for clang as it was believed that the existing check for
      .cfi_negate_ra_state was already requiring a new enough binutils. This
      does not appear to be the case for some versions of binutils (eg, the
      binutils in Debian 10) so instead refactor so we require a new enough
      GNU binutils in all cases other than when we are using an old GCC
      version that does not emit notes.
      
      Other, more exotic, combinations of tools are possible such as using
      clang, lld and gas together are possible and may have further problems
      but rather than adding further version checks it looks like the most
      robust thing will be to just test that we can build cleanly with the
      configured tools but that will require more review and discussion so do
      this for now to address the immediate problem disrupting build testing.
      Reported-by: NKernelCI <bot@kernelci.org>
      Reported-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Link: https://github.com/ClangBuiltLinux/linux/issues/1054
      Link: https://lore.kernel.org/r/20200619123550.48098-1-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>
      4dc9b282
    • W
      arm64: compat: Remove 32-bit sigreturn code from the vDSO · 2d071968
      Will Deacon 提交于
      The sigreturn code in the compat vDSO is unused. Remove it.
      Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      2d071968
    • W
      arm64: compat: Always use sigpage for sigreturn trampoline · 8e411be6
      Will Deacon 提交于
      The 32-bit sigreturn trampoline in the compat sigpage matches the binary
      representation of the arch/arm/ sigpage exactly. This is important for
      debuggers (e.g. GDB) and unwinders (e.g. libunwind) since they rely
      on matching the instruction sequence in order to identify that they are
      unwinding through a signal. The same cannot be said for the sigreturn
      trampoline in the compat vDSO, which defeats the unwinder heuristics and
      instead attempts to use unwind directives for the unwinding. This is in
      contrast to arch/arm/, which never uses the vDSO for sigreturn.
      
      Ensure compatibility with arch/arm/ and existing unwinders by always
      using the sigpage for the sigreturn trampoline, regardless of the
      presence of the compat vDSO.
      Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      8e411be6
    • W
      arm64: compat: Allow 32-bit vdso and sigpage to co-exist · a39060b0
      Will Deacon 提交于
      In preparation for removing the signal trampoline from the compat vDSO,
      allow the sigpage and the compat vDSO to co-exist.
      
      For the moment the vDSO signal trampoline will still be used when built.
      Subsequent patches will move to the sigpage consistently.
      Acked-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      a39060b0
    • W
      arm64: vdso: Disable dwarf unwinding through the sigreturn trampoline · 87676cfc
      Will Deacon 提交于
      Commit 7e9f5e66 ("arm64: vdso: Add --eh-frame-hdr to ldflags") results
      in a .eh_frame_hdr section for the vDSO, which in turn causes the libgcc
      unwinder to unwind out of signal handlers using the .eh_frame information
      populated by our .cfi directives. In conjunction with a4eb355a
      ("arm64: vdso: Fix CFI directives in sigreturn trampoline"), this has
      been shown to cause segmentation faults originating from within the
      unwinder during thread cancellation:
      
       | Thread 14 "virtio-net-rx" received signal SIGSEGV, Segmentation fault.
       | 0x0000000000435e24 in uw_frame_state_for ()
       | (gdb) bt
       | #0  0x0000000000435e24 in uw_frame_state_for ()
       | #1  0x0000000000436e88 in _Unwind_ForcedUnwind_Phase2 ()
       | #2  0x00000000004374d8 in _Unwind_ForcedUnwind ()
       | #3  0x0000000000428400 in __pthread_unwind (buf=<optimized out>) at unwind.c:121
       | #4  0x0000000000429808 in __do_cancel () at ./pthreadP.h:304
       | #5  sigcancel_handler (sig=32, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:200
       | #6  sigcancel_handler (sig=<optimized out>, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:165
       | #7  <signal handler called>
       | #8  futex_wait_cancelable (private=0, expected=0, futex_word=0x3890b708) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
      
      After considerable bashing of heads, it appears that our CFI directives
      for unwinding out of the sigreturn trampoline are only processed by libgcc
      when both a .eh_frame_hdr section is present *and* the mysterious NOP is
      covered by an entry in .eh_frame. With both of these now in place, it has
      highlighted that our CFI directives are not comprehensive enough to
      restore the stack pointer of the interrupted context. This results in libgcc
      falling back to an arm64-specific unwinder after computing a bogus PC value
      from the unwind tables. The unwinder promptly dereferences this bogus address
      in an attempt to see if the pointed-to instruction sequence looks like
      the sigreturn trampoline.
      
      Restore the old unwind behaviour, which relied solely on heuristics in
      the unwinder, by removing the .eh_frame_hdr section from the vDSO and
      commenting out the insufficient CFI directives for now. Add comments to
      explain the current, miserable state of affairs.
      
      Cc: Tamas Zsoldos <tamas.zsoldos@arm.com>
      Cc: Szabolcs Nagy <szabolcs.nagy@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Daniel Kiss <daniel.kiss@arm.com>
      Acked-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
      Reviewed-by: NArd Biesheuvel <ardb@kernel.org>
      Reported-by: NArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      87676cfc
    • C
      s390/debug: avoid kernel warning on too large number of pages · 827c4913
      Christian Borntraeger 提交于
      When specifying insanely large debug buffers a kernel warning is
      printed. The debug code does handle the error gracefully, though.
      Instead of duplicating the check let us silence the warning to
      avoid crashes when panic_on_warn is used.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      827c4913
    • V
      s390/kasan: fix early pgm check handler execution · 998f5bbe
      Vasily Gorbik 提交于
      Currently if early_pgm_check_handler is called it ends up in pgm check
      loop. The problem is that early_pgm_check_handler is instrumented by
      KASAN but executed without DAT flag enabled which leads to addressing
      exception when KASAN checks try to access shadow memory.
      
      Fix that by executing early handlers with DAT flag on under KASAN as
      expected.
      Reported-and-tested-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      998f5bbe
    • S
      s390: fix system call single stepping · e64a1618
      Sven Schnelle 提交于
      When single stepping an svc instruction on s390, the kernel is entered
      with a PER program check interruption. The program check handler than
      jumps to the system call handler by reloading the PSW. The code didn't
      set GPR13 to the thread pointer in struct task_struct. This made the
      kernel access invalid memory while trying to fetch the syscall function
      address. Fix this by always assigned GPR13 after .Lsysc_per.
      
      Fixes: 0b0ed657 ("s390: remove critical section cleanup from entry.S")
      Reported-and-tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NSven Schnelle <svens@linux.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      e64a1618
    • S
      KVM: VMX: Remove vcpu_vmx's defunct copy of host_pkru · e4553b49
      Sean Christopherson 提交于
      Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was
      moved to common x86 code.
      
      No functional change intended.
      
      Fixes: 37486135 ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e4553b49
    • M
      KVM: x86: allow TSC to differ by NTP correction bounds without TSC scaling · 26769f96
      Marcelo Tosatti 提交于
      The Linux TSC calibration procedure is subject to small variations
      (its common to see +-1 kHz difference between reboots on a given CPU, for example).
      
      So migrating a guest between two hosts with identical processor can fail, in case
      of a small variation in calibrated TSC between them.
      
      Without TSC scaling, the current kernel interface will either return an error
      (if user_tsc_khz <= tsc_khz) or enable TSC catchup mode.
      
      This change enables the following TSC tolerance check to
      accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default).
      
              /*
               * Compute the variation in TSC rate which is acceptable
               * within the range of tolerance and decide if the
               * rate being applied is within that bounds of the hardware
               * rate.  If so, no scaling or compensation need be done.
               */
              thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm);
              thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm);
              if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) {
                      pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi);
                      use_scaling = 1;
              }
      
      NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm).
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      
      Message-Id: <20200616114741.GA298183@fuller.cnet>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      26769f96
    • X
      KVM: X86: Fix MSR range of APIC registers in X2APIC mode · bf10bd0b
      Xiaoyao Li 提交于
      Only MSR address range 0x800 through 0x8ff is architecturally reserved
      and dedicated for accessing APIC registers in x2APIC mode.
      
      Fixes: 0105d1a5 ("KVM: x2apic interface to lapic")
      Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bf10bd0b
    • F
      ARM: dts: imx6ul-kontron: Change WDOG_ANY signal from push-pull to open-drain · d22a16cc
      Frieder Schrempf 提交于
      The WDOG_ANY signal is connected to the RESET_IN signal of the SoM
      and baseboard. It is currently configured as push-pull, which means
      that if some external device like a programmer wants to assert the
      RESET_IN signal by pulling it to ground, it drives against the high
      level WDOG_ANY output of the SoC.
      
      To fix this we set the WDOG_ANY signal to open-drain configuration.
      That way we make sure that the RESET_IN can be asserted by the
      watchdog as well as by external devices.
      
      Fixes: 1ea4b76c ("ARM: dts: imx6ul-kontron-n6310: Add Kontron i.MX6UL N6310 SoM and boards")
      Cc: stable@vger.kernel.org
      Signed-off-by: NFrieder Schrempf <frieder.schrempf@kontron.de>
      Signed-off-by: NShawn Guo <shawnguo@kernel.org>
      d22a16cc
    • F
      ARM: dts: imx6ul-kontron: Move watchdog from Kontron i.MX6UL/ULL board to SoM · 04a2c051
      Frieder Schrempf 提交于
      The watchdog's WDOG_ANY signal is used to trigger a POR of the SoC,
      if a soft reset is issued. As the SoM hardware connects the WDOG_ANY
      and the POR signals, the watchdog node itself and the pin
      configuration should be part of the common SoM devicetree.
      Let's move it from the baseboard's devicetree to its proper place.
      
      Fixes: 1ea4b76c ("ARM: dts: imx6ul-kontron-n6310: Add Kontron i.MX6UL N6310 SoM and boards")
      Cc: stable@vger.kernel.org
      Signed-off-by: NFrieder Schrempf <frieder.schrempf@kontron.de>
      Signed-off-by: NShawn Guo <shawnguo@kernel.org>
      04a2c051
    • S
      KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL · bf09fb6c
      Sean Christopherson 提交于
      Remove support for context switching between the guest's and host's
      desired UMWAIT_CONTROL.  Propagating the guest's value to hardware isn't
      required for correct functionality, e.g. KVM intercepts reads and writes
      to the MSR, and the latency effects of the settings controlled by the
      MSR are not architecturally visible.
      
      As a general rule, KVM should not allow the guest to control power
      management settings unless explicitly enabled by userspace, e.g. see
      KVM_CAP_X86_DISABLE_EXITS.  E.g. Intel's SDM explicitly states that C0.2
      can improve the performance of SMT siblings.  A devious guest could
      disable C0.2 so as to improve the performance of their workloads at the
      detriment to workloads running in the host or on other VMs.
      
      Wholesale removal of UMWAIT_CONTROL context switching also fixes a race
      condition where updates from the host may cause KVM to enter the guest
      with the incorrect value.  Because updates are are propagated to all
      CPUs via IPI (SMP function callback), the value in hardware may be
      stale with respect to the cached value and KVM could enter the guest
      with the wrong value in hardware.  As above, the guest can't observe the
      bad value, but it's a weird and confusing wart in the implementation.
      
      Removal also fixes the unnecessary usage of VMX's atomic load/store MSR
      lists.  Using the lists is only necessary for MSRs that are required for
      correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on
      old hardware, or for MSRs that need to-the-uop precision, e.g. perf
      related MSRs.  For UMWAIT_CONTROL, the effects are only visible in the
      kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in
      vcpu_vmx_run().  Using the atomic lists is undesirable as they are more
      expensive than direct RDMSR/WRMSR.
      
      Furthermore, even if giving the guest control of the MSR is legitimate,
      e.g. in pass-through scenarios, it's not clear that the benefits would
      outweigh the overhead.  E.g. saving and restoring an MSR across a VMX
      roundtrip costs ~250 cycles, and if the guest diverged from the host
      that cost would be paid on every run of the guest.  In other words, if
      there is a legitimate use case then it should be enabled by a new
      per-VM capability.
      
      Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can
      correctly expose other WAITPKG features to the guest, e.g. TPAUSE,
      UMWAIT and UMONITOR.
      
      Fixes: 6e3ba4ab ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
      Cc: stable@vger.kernel.org
      Cc: Jingqi Liu <jingqi.liu@intel.com>
      Cc: Tao Xu <tao3.xu@intel.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200623005135.10414-1-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bf09fb6c
    • S
      KVM: nVMX: Plumb L2 GPA through to PML emulation · 2dbebf7a
      Sean Christopherson 提交于
      Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all
      intents and purposes is vmx_write_pml_buffer(), instead of having the
      latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS.  If the dirty bit
      update is the result of KVM emulation (rare for L2), then the GPA in the
      VMCS may be stale and/or hold a completely unrelated GPA.
      
      Fixes: c5f983f6 ("nVMX: Implement emulated Page Modification Logging")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2dbebf7a
    • V
      KVM: x86/mmu: Avoid mixing gpa_t with gfn_t in walk_addr_generic() · 312d16c7
      Vitaly Kuznetsov 提交于
      translate_gpa() returns a GPA, assigning it to 'real_gfn' seems obviously
      wrong. There is no real issue because both 'gpa_t' and 'gfn_t' are u64 and
      we don't use the value in 'real_gfn' as a GFN, we do
      
       real_gfn = gpa_to_gfn(real_gfn);
      
      instead. 'If you see a "buffalo" sign on an elephant's cage, do not trust
      your eyes', but let's fix it for good.
      
      No functional change intended.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200622151435.752560-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      312d16c7
    • P
      KVM: LAPIC: ensure APIC map is up to date on concurrent update requests · 44d52717
      Paolo Bonzini 提交于
      The following race can cause lost map update events:
      
               cpu1                            cpu2
      
                                      apic_map_dirty = true
        ------------------------------------------------------------
                                      kvm_recalculate_apic_map:
                                           pass check
                                               mutex_lock(&kvm->arch.apic_map_lock);
                                               if (!kvm->arch.apic_map_dirty)
                                           and in process of updating map
        -------------------------------------------------------------
          other calls to
             apic_map_dirty = true         might be too late for affected cpu
        -------------------------------------------------------------
                                           apic_map_dirty = false
        -------------------------------------------------------------
          kvm_recalculate_apic_map:
          bail out on
            if (!kvm->arch.apic_map_dirty)
      
      To fix it, record the beginning of an update of the APIC map in
      apic_map_dirty.  If another APIC map change switches apic_map_dirty
      back to DIRTY during the update, kvm_recalculate_apic_map should not
      make it CLEAN, and the other caller will go through the slow path.
      Reported-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      44d52717
    • I
      kvm: lapic: fix broken vcpu hotplug · af28dfac
      Igor Mammedov 提交于
      Guest fails to online hotplugged CPU with error
        smpboot: do_boot_cpu failed(-1) to wakeup CPU#4
      
      It's caused by the fact that kvm_apic_set_state(), which used to call
      recalculate_apic_map() unconditionally and pulled hotplugged CPU into
      apic map, is updating map conditionally on state changes.  In this case
      the APIC map is not considered dirty and the is not updated.
      
      Fix the issue by forcing unconditional update from kvm_apic_set_state(),
      like it used to be.
      
      Fixes: 4abaffce ("KVM: LAPIC: Recalculate apic map in batch")
      Cc: stable@vger.kernel.org
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Message-Id: <20200622160830.426022-1-imammedo@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      af28dfac
  6. 22 6月, 2020 2 次提交
  7. 20 6月, 2020 2 次提交
    • C
      powerpc/8xx: Provide ptep_get() with 16k pages · c0e1c8c2
      Christophe Leroy 提交于
      READ_ONCE() now enforces atomic read, which leads to:
      
        CC      mm/gup.o
      In file included from ./include/linux/kernel.h:11:0,
                       from mm/gup.c:2:
      In function 'gup_hugepte.constprop',
          inlined from 'gup_huge_pd.isra.79' at mm/gup.c:2465:8:
      ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_222' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                            ^
      ./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
          prefix ## suffix();    \
          ^
      ./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
        ^
      ./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
        compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
        ^
      ./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
        compiletime_assert_rwonce_type(x);    \
        ^
      mm/gup.c:2428:8: note: in expansion of macro 'READ_ONCE'
        pte = READ_ONCE(*ptep);
              ^
      In function 'gup_get_pte',
          inlined from 'gup_pte_range' at mm/gup.c:2228:9,
          inlined from 'gup_pmd_range' at mm/gup.c:2613:15,
          inlined from 'gup_pud_range' at mm/gup.c:2641:15,
          inlined from 'gup_p4d_range' at mm/gup.c:2666:15,
          inlined from 'gup_pgd_range' at mm/gup.c:2694:15,
          inlined from 'internal_get_user_pages_fast' at mm/gup.c:2795:3:
      ./include/linux/compiler.h:392:38: error: call to '__compiletime_assert_219' declared with attribute error: Unsupported access size for {READ,WRITE}_ONCE().
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                            ^
      ./include/linux/compiler.h:373:4: note: in definition of macro '__compiletime_assert'
          prefix ## suffix();    \
          ^
      ./include/linux/compiler.h:392:2: note: in expansion of macro '_compiletime_assert'
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
        ^
      ./include/linux/compiler.h:405:2: note: in expansion of macro 'compiletime_assert'
        compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
        ^
      ./include/linux/compiler.h:291:2: note: in expansion of macro 'compiletime_assert_rwonce_type'
        compiletime_assert_rwonce_type(x);    \
        ^
      mm/gup.c:2199:9: note: in expansion of macro 'READ_ONCE'
        return READ_ONCE(*ptep);
               ^
      make[2]: *** [mm/gup.o] Error 1
      
      Define ptep_get() on 8xx when using 16k pages.
      
      Fixes: 9e343b46 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
      Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/341688399c1b102756046d19ea6ce39db1ae4742.1592225558.git.christophe.leroy@csgroup.eu
      c0e1c8c2
    • M
      x86/asm/64: Align start of __clear_user() loop to 16-bytes · bb5570ad
      Matt Fleming 提交于
      x86 CPUs can suffer severe performance drops if a tight loop, such as
      the ones in __clear_user(), straddles a 16-byte instruction fetch
      window, or worse, a 64-byte cacheline. This issues was discovered in the
      SUSE kernel with the following commit,
      
        11539337 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")
      
      which increased the code object size from 10 bytes to 15 bytes and
      caused the 8-byte copy loop in __clear_user() to be split across a
      64-byte cacheline.
      
      Aligning the start of the loop to 16-bytes makes this fit neatly inside
      a single instruction fetch window again and restores the performance of
      __clear_user() which is used heavily when reading from /dev/zero.
      
      Here are some numbers from running libmicro's read_z* and pread_z*
      microbenchmarks which read from /dev/zero:
      
        Zen 1 (Naples)
      
        libmicro-file
                                              5.7.0-rc6              5.7.0-rc6              5.7.0-rc6
                                                          revert-11539337+               align16+
        Time mean95-pread_z100k       9.9195 (   0.00%)      5.9856 (  39.66%)      5.9938 (  39.58%)
        Time mean95-pread_z10k        1.1378 (   0.00%)      0.7450 (  34.52%)      0.7467 (  34.38%)
        Time mean95-pread_z1k         0.2623 (   0.00%)      0.2251 (  14.18%)      0.2252 (  14.15%)
        Time mean95-pread_zw100k      9.9974 (   0.00%)      6.0648 (  39.34%)      6.0756 (  39.23%)
        Time mean95-read_z100k        9.8940 (   0.00%)      5.9885 (  39.47%)      5.9994 (  39.36%)
        Time mean95-read_z10k         1.1394 (   0.00%)      0.7483 (  34.33%)      0.7482 (  34.33%)
      
      Note that this doesn't affect Haswell or Broadwell microarchitectures
      which seem to avoid the alignment issue by executing the loop straight
      out of the Loop Stream Detector (verified using perf events).
      
      Fixes: 11539337 ("x86/asm/64: Micro-optimize __clear_user() - Use immediate constants")
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org> # v4.19+
      Link: https://lkml.kernel.org/r/20200618102002.30034-1-matt@codeblueprint.co.uk
      bb5570ad