1. 16 11月, 2019 15 次提交
  2. 07 11月, 2019 1 次提交
  3. 06 11月, 2019 4 次提交
  4. 05 11月, 2019 4 次提交
    • M
      x86/tsc: Respect tsc command line paraemeter for clocksource_tsc_early · 63ec58b4
      Michael Zhivich 提交于
      The introduction of clocksource_tsc_early broke the functionality of
      "tsc=reliable" and "tsc=nowatchdog" command line parameters, since
      clocksource_tsc_early is unconditionally registered with
      CLOCK_SOURCE_MUST_VERIFY and thus put on the watchdog list.
      
      This can cause the TSC to be declared unstable during boot:
      
        clocksource: timekeeping watchdog on CPU0: Marking clocksource
                     'tsc-early' as unstable because the skew is too large:
        clocksource: 'refined-jiffies' wd_now: fffb7018 wd_last: fffb6e9d
                     mask: ffffffff
        clocksource: 'tsc-early' cs_now: 68a6a7070f6a0 cs_last: 68a69ab6f74d6
                     mask: ffffffffffffffff
        tsc: Marking TSC unstable due to clocksource watchdog
      
      The corresponding elapsed times are cs_nsec=1224152026 and wd_nsec=378942392, so
      the watchdog differs from TSC by 0.84 seconds.
      
      This happens when HPET is not available and jiffies are used as the TSC
      watchdog instead and the jiffies update is not happening due to lost timer
      interrupts in periodic mode, which can happen e.g. with expensive debug
      mechanisms enabled or under massive overload conditions in virtualized
      environments.
      
      Before the introduction of the early TSC clocksource the command line
      parameters "tsc=reliable" and "tsc=nowatchdog" could be used to work around
      this issue.
      
      Restore the behaviour by disabling the watchdog if requested on the kernel
      command line.
      
      [ tglx: Clarify changelog ]
      
      Fixes: aa83c457 ("x86/tsc: Introduce early tsc clocksource")
      Signed-off-by: NMichael Zhivich <mzhivich@akamai.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20191024175945.14338-1-mzhivich@akamai.com
      63ec58b4
    • T
      x86/dumpstack/64: Don't evaluate exception stacks before setup · e361362b
      Thomas Gleixner 提交于
      Cyrill reported the following crash:
      
        BUG: unable to handle page fault for address: 0000000000001ff0
        #PF: supervisor read access in kernel mode
        RIP: 0010:get_stack_info+0xb3/0x148
      
      It turns out that if the stack tracer is invoked before the exception stack
      mappings are initialized in_exception_stack() can erroneously classify an
      invalid address as an address inside of an exception stack:
      
          begin = this_cpu_read(cea_exception_stacks);  <- 0
          end = begin + sizeof(exception stacks);
      
      i.e. any address between 0 and end will be considered as exception stack
      address and the subsequent code will then try to derefence the resulting
      stack frame at a non mapped address.
      
       end = begin + (unsigned long)ep->size;
           ==> end = 0x2000
      
       regs = (struct pt_regs *)end - 1;
           ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0
      
       info->next_sp   = (unsigned long *)regs->sp;
           ==> Crashes due to accessing 0x1ff0
      
      Prevent this by checking the validity of the cea_exception_stack base
      address and bailing out if it is zero.
      
      Fixes: afcd21da ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist")
      Reported-by: NCyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NCyrill Gorcunov <gorcunov@gmail.com>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910231950590.1852@nanos.tec.linutronix.de
      e361362b
    • J
      x86/apic/32: Avoid bogus LDR warnings · fe6f85ca
      Jan Beulich 提交于
      The removal of the LDR initialization in the bigsmp_32 APIC code unearthed
      a problem in setup_local_APIC().
      
      The code checks unconditionally for a mismatch of the logical APIC id by
      comparing the early APIC id which was initialized in get_smp_config() with
      the actual LDR value in the APIC.
      
      Due to the removal of the bogus LDR initialization the check now can
      trigger on bigsmp_32 APIC systems emitting a warning for every booting
      CPU. This is of course a false positive because the APIC is not using
      logical destination mode.
      
      Restrict the check and the possibly resulting fixup to systems which are
      actually using the APIC in logical destination mode.
      
      [ tglx: Massaged changelog and added Cc stable ]
      
      Fixes: bae3a8d3 ("x86/apic: Do not initialize LDR and DFR for bigsmp")
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/666d8f91-b5a8-1afd-7add-821e72a35f03@suse.com
      fe6f85ca
    • H
      timekeeping/vsyscall: Update VDSO data unconditionally · 52338415
      Huacai Chen 提交于
      The update of the VDSO data is depending on __arch_use_vsyscall() returning
      True. This is a leftover from the attempt to map the features of various
      architectures 1:1 into generic code.
      
      The usage of __arch_use_vsyscall() in the actual vsyscall implementations
      got dropped and replaced by the requirement for the architecture code to
      return U64_MAX if the global clocksource is not usable in the VDSO.
      
      But the __arch_use_vsyscall() check in the update code stayed which causes
      the VDSO data to be stale or invalid when an architecture actually
      implements that function and returns False when the current clocksource is
      not usable in the VDSO.
      
      As a consequence the VDSO implementations of clock_getres(), time(),
      clock_gettime(CLOCK_.*_COARSE) operate on invalid data and return bogus
      information.
      
      Remove the __arch_use_vsyscall() check from the VDSO update function and
      update the VDSO data unconditionally.
      
      [ tglx: Massaged changelog and removed the now useless implementations in
        	asm-generic/ARM64/MIPS ]
      
      Fixes: 44f57d78 ("timekeeping: Provide a generic update_vsyscall() implementation")
      Signed-off-by: NHuacai Chen <chenhc@lemote.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1571887709-11447-1-git-send-email-chenhc@lemote.com
      52338415
  5. 04 11月, 2019 2 次提交
  6. 02 11月, 2019 1 次提交
    • E
      powerpc/bpf: Fix tail call implementation · 7de08690
      Eric Dumazet 提交于
      We have seen many crashes on powerpc hosts while loading bpf programs.
      
      The problem here is that bpf_int_jit_compile() does a first pass
      to compute the program length.
      
      Then it allocates memory to store the generated program and
      calls bpf_jit_build_body() a second time (and a third time
      later)
      
      What I have observed is that the second bpf_jit_build_body()
      could end up using few more words than expected.
      
      If bpf_jit_binary_alloc() put the space for the program
      at the end of the allocated page, we then write on
      a non mapped memory.
      
      It appears that bpf_jit_emit_tail_call() calls
      bpf_jit_emit_common_epilogue() while ctx->seen might not
      be stable.
      
      Only after the second pass we can be sure ctx->seen wont be changed.
      
      Trying to avoid a second pass seems quite complex and probably
      not worth it.
      
      Fixes: ce076141 ("powerpc/bpf: Implement support for tail calls")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20191101033444.143741-1-edumazet@google.com
      7de08690
  7. 01 11月, 2019 6 次提交
  8. 31 10月, 2019 4 次提交
    • B
      arm64: cpufeature: Enable Qualcomm Falkor errata 1009 for Kryo · 36c602dc
      Bjorn Andersson 提交于
      The Kryo cores share errata 1009 with Falkor, so add their model
      definitions and enable it for them as well.
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      [will: Update entry in silicon-errata.rst]
      Signed-off-by: NWill Deacon <will@kernel.org>
      36c602dc
    • P
      KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active · 9167ab79
      Paolo Bonzini 提交于
      VMX already does so if the host has SMEP, in order to support the combination of
      CR0.WP=1 and CR4.SMEP=1.  However, it is perfectly safe to always do so, and in
      fact VMX already ends up running with EFER.NXE=1 on old processors that lack the
      "load EFER" controls, because it may help avoiding a slow MSR write.  Removing
      all the conditionals simplifies the code.
      
      SVM does not have similar code, but it should since recent AMD processors do
      support SMEP.  So this patch also makes the code for the two vendors more similar
      while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors.
      
      Cc: stable@vger.kernel.org
      Cc: Joerg Roedel <jroedel@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9167ab79
    • K
      x86, efi: Never relocate kernel below lowest acceptable address · 220dd769
      Kairui Song 提交于
      Currently, kernel fails to boot on some HyperV VMs when using EFI.
      And it's a potential issue on all x86 platforms.
      
      It's caused by broken kernel relocation on EFI systems, when below three
      conditions are met:
      
      1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR)
         by the loader.
      2. There isn't enough room to contain the kernel, starting from the
         default load address (eg. something else occupied part the region).
      3. In the memmap provided by EFI firmware, there is a memory region
         starts below LOAD_PHYSICAL_ADDR, and suitable for containing the
         kernel.
      
      EFI stub will perform a kernel relocation when condition 1 is met. But
      due to condition 2, EFI stub can't relocate kernel to the preferred
      address, so it fallback to ask EFI firmware to alloc lowest usable memory
      region, got the low region mentioned in condition 3, and relocated
      kernel there.
      
      It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This
      is the lowest acceptable kernel relocation address.
      
      The first thing goes wrong is in arch/x86/boot/compressed/head_64.S.
      Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output
      address if kernel is located below it. Then the relocation before
      decompression, which move kernel to the end of the decompression buffer,
      will overwrite other memory region, as there is no enough memory there.
      
      To fix it, just don't let EFI stub relocate the kernel to any address
      lower than lowest acceptable address.
      
      [ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ]
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      220dd769
    • S
      parisc: fix frame pointer in ftrace_regs_caller() · 3d252454
      Sven Schnelle 提交于
      The current code in ftrace_regs_caller() doesn't assign
      %r3 to contain the address of the current frame. This
      is hidden if the kernel is compiled with FRAME_POINTER,
      but without it just crashes because it tries to dereference
      an arbitrary address. Fix this by always setting %r3 to the
      current stack frame.
      Signed-off-by: NSven Schnelle <svens@stackframe.org>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      3d252454
  9. 30 10月, 2019 3 次提交
    • B
      arm64: cpufeature: Enable Qualcomm Falkor/Kryo errata 1003 · d4af3c4b
      Bjorn Andersson 提交于
      With the introduction of 'cce360b5 ("arm64: capabilities: Filter the
      entries based on a given mask")' the Qualcomm Falkor/Kryo errata 1003 is
      no long applied.
      
      The result of not applying errata 1003 is that MSM8996 runs into various
      RCU stalls and fails to boot most of the times.
      
      Give 1003 a "type" to ensure they are not filtered out in
      update_cpu_capabilities().
      
      Fixes: cce360b5 ("arm64: capabilities: Filter the entries based on a given mask")
      Cc: stable@vger.kernel.org
      Reported-by: NMark Brown <broonie@kernel.org>
      Suggested-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      d4af3c4b
    • C
      arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default · aa57157b
      Catalin Marinas 提交于
      Shared and writable mappings (__S.1.) should be clean (!dirty) initially
      and made dirty on a subsequent write either through the hardware DBM
      (dirty bit management) mechanism or through a write page fault. A clean
      pte for the arm64 kernel is one that has PTE_RDONLY set and PTE_DIRTY
      clear.
      
      The PAGE_SHARED{,_EXEC} attributes have PTE_WRITE set (PTE_DBM) and
      PTE_DIRTY clear. Prior to commit 73e86cb0 ("arm64: Move PTE_RDONLY
      bit handling out of set_pte_at()"), it was the responsibility of
      set_pte_at() to set the PTE_RDONLY bit and mark the pte clean if the
      software PTE_DIRTY bit was not set. However, the above commit removed
      the pte_sw_dirty() check and the subsequent setting of PTE_RDONLY in
      set_pte_at() while leaving the PAGE_SHARED{,_EXEC} definitions
      unchanged. The result is that shared+writable mappings are now dirty by
      default
      
      Fix the above by explicitly setting PTE_RDONLY in PAGE_SHARED{,_EXEC}.
      In addition, remove the superfluous PTE_DIRTY bit from the kernel PROT_*
      attributes.
      
      Fixes: 73e86cb0 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()")
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      aa57157b
    • A
      um-ubd: Entrust re-queue to the upper layers · d848074b
      Anton Ivanov 提交于
      Fixes crashes due to ubd requeue logic conflicting with the block-mq
      logic. Crash is reproducible in 5.0 - 5.3.
      
      Fixes: 53766def ("um: Clean-up command processing in UML UBD driver")
      Cc: stable@vger.kernel.org # v5.0+
      Signed-off-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d848074b