1. 07 8月, 2019 25 次提交
    • T
      x86/speculation/swapgs: Exclude ATOMs from speculation through SWAPGS · b88241ae
      Thomas Gleixner 提交于
      commit f36cf386e3fec258a341d446915862eded3e13d8 upstream
      
      Intel provided the following information:
      
       On all current Atom processors, instructions that use a segment register
       value (e.g. a load or store) will not speculatively execute before the
       last writer of that segment retires. Thus they will not use a
       speculatively written segment value.
      
      That means on ATOMs there is no speculation through SWAPGS, so the SWAPGS
      entry paths can be excluded from the extra LFENCE if PTI is disabled.
      
      Create a separate bug flag for the through SWAPGS speculation and mark all
      out-of-order ATOMs and AMD/HYGON CPUs as not affected. The in-order ATOMs
      are excluded from the whole mitigation mess anyway.
      Reported-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b88241ae
    • J
      x86/entry/64: Use JMP instead of JMPQ · 931b6bfe
      Josh Poimboeuf 提交于
      commit 64dbc122b20f75183d8822618c24f85144a5a94d upstream
      
      Somehow the swapgs mitigation entry code patch ended up with a JMPQ
      instruction instead of JMP, where only the short jump is needed.  Some
      assembler versions apparently fail to optimize JMPQ into a two-byte JMP
      when possible, instead always using a 7-byte JMP with relocation.  For
      some reason that makes the entry code explode with a #GP during boot.
      
      Change it back to "JMP" as originally intended.
      
      Fixes: 18ec54fdd6d1 ("x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations")
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      931b6bfe
    • J
      x86/speculation: Enable Spectre v1 swapgs mitigations · 23e7a7b3
      Josh Poimboeuf 提交于
      commit a2059825986a1c8143fd6698774fa9d83733bb11 upstream
      
      The previous commit added macro calls in the entry code which mitigate the
      Spectre v1 swapgs issue if the X86_FEATURE_FENCE_SWAPGS_* features are
      enabled.  Enable those features where applicable.
      
      The mitigations may be disabled with "nospectre_v1" or "mitigations=off".
      
      There are different features which can affect the risk of attack:
      
      - When FSGSBASE is enabled, unprivileged users are able to place any
        value in GS, using the wrgsbase instruction.  This means they can
        write a GS value which points to any value in kernel space, which can
        be useful with the following gadget in an interrupt/exception/NMI
        handler:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	// dependent load or store based on the value of %reg
      	// for example: mov %(reg1), %reg2
      
        If an interrupt is coming from user space, and the entry code
        speculatively skips the swapgs (due to user branch mistraining), it
        may speculatively execute the GS-based load and a subsequent dependent
        load or store, exposing the kernel data to an L1 side channel leak.
      
        Note that, on Intel, a similar attack exists in the above gadget when
        coming from kernel space, if the swapgs gets speculatively executed to
        switch back to the user GS.  On AMD, this variant isn't possible
        because swapgs is serializing with respect to future GS-based
        accesses.
      
        NOTE: The FSGSBASE patch set hasn't been merged yet, so the above case
      	doesn't exist quite yet.
      
      - When FSGSBASE is disabled, the issue is mitigated somewhat because
        unprivileged users must use prctl(ARCH_SET_GS) to set GS, which
        restricts GS values to user space addresses only.  That means the
        gadget would need an additional step, since the target kernel address
        needs to be read from user space first.  Something like:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg1
      	mov (%reg1), %reg2
      	// dependent load or store based on the value of %reg2
      	// for example: mov %(reg2), %reg3
      
        It's difficult to audit for this gadget in all the handlers, so while
        there are no known instances of it, it's entirely possible that it
        exists somewhere (or could be introduced in the future).  Without
        tooling to analyze all such code paths, consider it vulnerable.
      
        Effects of SMAP on the !FSGSBASE case:
      
        - If SMAP is enabled, and the CPU reports RDCL_NO (i.e., not
          susceptible to Meltdown), the kernel is prevented from speculatively
          reading user space memory, even L1 cached values.  This effectively
          disables the !FSGSBASE attack vector.
      
        - If SMAP is enabled, but the CPU *is* susceptible to Meltdown, SMAP
          still prevents the kernel from speculatively reading user space
          memory.  But it does *not* prevent the kernel from reading the
          user value from L1, if it has already been cached.  This is probably
          only a small hurdle for an attacker to overcome.
      
      Thanks to Dave Hansen for contributing the speculative_smap() function.
      
      Thanks to Andrew Cooper for providing the inside scoop on whether swapgs
      is serializing on AMD.
      
      [ tglx: Fixed the USER fence decision and polished the comment as suggested
        	by Dave Hansen ]
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23e7a7b3
    • J
      x86/speculation: Prepare entry code for Spectre v1 swapgs mitigations · befb822c
      Josh Poimboeuf 提交于
      commit 18ec54fdd6d18d92025af097cd042a75cf0ea24c upstream
      
      Spectre v1 isn't only about array bounds checks.  It can affect any
      conditional checks.  The kernel entry code interrupt, exception, and NMI
      handlers all have conditional swapgs checks.  Those may be problematic in
      the context of Spectre v1, as kernel code can speculatively run with a user
      GS.
      
      For example:
      
      	if (coming from user space)
      		swapgs
      	mov %gs:<percpu_offset>, %reg
      	mov (%reg), %reg1
      
      When coming from user space, the CPU can speculatively skip the swapgs, and
      then do a speculative percpu load using the user GS value.  So the user can
      speculatively force a read of any kernel value.  If a gadget exists which
      uses the percpu value as an address in another load/store, then the
      contents of the kernel value may become visible via an L1 side channel
      attack.
      
      A similar attack exists when coming from kernel space.  The CPU can
      speculatively do the swapgs, causing the user GS to get used for the rest
      of the speculative window.
      
      The mitigation is similar to a traditional Spectre v1 mitigation, except:
      
        a) index masking isn't possible; because the index (percpu offset)
           isn't user-controlled; and
      
        b) an lfence is needed in both the "from user" swapgs path and the
           "from kernel" non-swapgs path (because of the two attacks described
           above).
      
      The user entry swapgs paths already have SWITCH_TO_KERNEL_CR3, which has a
      CR3 write when PTI is enabled.  Since CR3 writes are serializing, the
      lfences can be skipped in those cases.
      
      On the other hand, the kernel entry swapgs paths don't depend on PTI.
      
      To avoid unnecessary lfences for the user entry case, create two separate
      features for alternative patching:
      
        X86_FEATURE_FENCE_SWAPGS_USER
        X86_FEATURE_FENCE_SWAPGS_KERNEL
      
      Use these features in entry code to patch in lfences where needed.
      
      The features aren't enabled yet, so there's no functional change.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      befb822c
    • F
      x86/cpufeatures: Combine word 11 and 12 into a new scattered features word · b5dd7f61
      Fenghua Yu 提交于
      commit acec0ce081de0c36459eea91647faf99296445a3 upstream
      
      It's a waste for the four X86_FEATURE_CQM_* feature bits to occupy two
      whole feature bits words. To better utilize feature words, re-define
      word 11 to host scattered features and move the four X86_FEATURE_CQM_*
      features into Linux defined word 11. More scattered features can be
      added in word 11 in the future.
      
      Rename leaf 11 in cpuid_leafs to CPUID_LNX_4 to reflect it's a
      Linux-defined leaf.
      
      Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful
      name in the next patch when CPUID.7.1:EAX occupies world 12.
      
      Maximum number of RMID and cache occupancy scale are retrieved from
      CPUID.0xf.1 after scattered CQM features are enumerated. Carve out the
      code into a separate function.
      
      KVM doesn't support resctrl now. So it's safe to move the
      X86_FEATURE_CQM_* features to scattered features word 11 for KVM.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Aaron Lewis <aaronlewis@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Babu Moger <babu.moger@amd.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: "Sean J Christopherson" <sean.j.christopherson@intel.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: x86 <x86@kernel.org>
      Link: https://lkml.kernel.org/r/1560794416-217638-2-git-send-email-fenghua.yu@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5dd7f61
    • B
      x86/cpufeatures: Carve out CQM features retrieval · 16ad0b63
      Borislav Petkov 提交于
      commit 45fc56e629caa451467e7664fbd4c797c434a6c4 upstream
      
      ... into a separate function for better readability. Split out from a
      patch from Fenghua Yu <fenghua.yu@intel.com> to keep the mechanical,
      sole code movement separate for easy review.
      
      No functional changes.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: x86@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      16ad0b63
    • A
      x86/vdso: Prevent segfaults due to hoisted vclock reads · 3732a473
      Andy Lutomirski 提交于
      commit ff17bbe0bb405ad8b36e55815d381841f9fdeebc upstream.
      
      GCC 5.5.0 sometimes cleverly hoists reads of the pvclock and/or hvclock
      pages before the vclock mode checks.  This creates a path through
      vclock_gettime() in which no vclock is enabled at all (due to disabled
      TSC on old CPUs, for example) but the pvclock or hvclock page
      nevertheless read.  This will segfault on bare metal.
      
      This fixes commit 459e3a21535a ("gcc-9: properly declare the
      {pv,hv}clock_page storage") in the sense that, before that commit, GCC
      didn't seem to generate the offending code.  There was nothing wrong
      with that commit per se, and -stable maintainers should backport this to
      all supported kernels regardless of whether the offending commit was
      present, since the same crash could just as easily be triggered by the
      phase of the moon.
      
      On GCC 9.1.1, this doesn't seem to affect the generated code at all, so
      I'm not too concerned about performance regressions from this fix.
      
      Cc: stable@vger.kernel.org
      Cc: x86@kernel.org
      Cc: Borislav Petkov <bp@alien8.de>
      Reported-by: NDuncan Roe <duncan_roe@optusnet.com.au>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3732a473
    • L
      gcc-9: properly declare the {pv,hv}clock_page storage · 8320768d
      Linus Torvalds 提交于
      commit 459e3a21535ae3c7a9a123650e54f5c882b8fcbf upstream.
      
      The pvlock_page and hvclock_page variables are (as the name implies)
      addresses to pages, created by the linker script.
      
      But we declared them as just "extern u8" variables, which _works_, but
      now that gcc does some more bounds checking, it causes warnings like
      
          warning: array subscript 1 is outside array bounds of ‘u8[1]’
      
      when we then access more than one byte from those variables.
      
      Fix this by simply making the declaration of the variables match
      reality, which makes the compiler happy too.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8320768d
    • E
      ARC: enable uboot support unconditionally · 89f3896b
      Eugeniy Paltsev 提交于
      commit 493a2f812446e92bcb1e69a77381b4d39808d730 upstream.
      
      After reworking U-boot args handling code and adding paranoid
      arguments check we can eliminate CONFIG_ARC_UBOOT_SUPPORT and
      enable uboot support unconditionally.
      
      For JTAG case we can assume that core registers will come up
      reset value of 0 or in worst case we rely on user passing
      '-on=clear_regs' to Metaware debugger.
      
      Cc: stable@vger.kernel.org
      Tested-by: NCorentin LABBE <clabbe@baylibre.com>
      Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89f3896b
    • W
      arm64: cpufeature: Fix feature comparison for CTR_EL0.{CWG,ERG} · 8dfef0f4
      Will Deacon 提交于
      commit 147b9635e6347104b91f48ca9dca61eb0fbf2a54 upstream.
      
      If CTR_EL0.{CWG,ERG} are 0b0000 then they must be interpreted to have
      their architecturally maximum values, which defeats the use of
      FTR_HIGHER_SAFE when sanitising CPU ID registers on heterogeneous
      machines.
      
      Introduce FTR_HIGHER_OR_ZERO_SAFE so that these fields effectively
      saturate at zero.
      
      Fixes: 3c739b57 ("arm64: Keep track of CPU feature registers")
      Cc: <stable@vger.kernel.org> # 4.4.x-
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8dfef0f4
    • W
      arm64: compat: Allow single-byte watchpoints on all addresses · 2bddc985
      Will Deacon 提交于
      commit 849adec41203ac5837c40c2d7e08490ffdef3c2c upstream.
      
      Commit d968d2b8 ("ARM: 7497/1: hw_breakpoint: allow single-byte
      watchpoints on all addresses") changed the validation requirements for
      hardware watchpoints on arch/arm/. Update our compat layer to implement
      the same relaxation.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bddc985
    • H
      parisc: Fix build of compressed kernel even with debug enabled · 5f80ac50
      Helge Deller 提交于
      commit 3fe6c873af2f2247544debdbe51ec29f690a2ccf upstream.
      
      With debug info enabled (CONFIG_DEBUG_INFO=y) the resulting vmlinux may get
      that huge that we need to increase the start addresss for the decompression
      text section otherwise one will face a linker error.
      Reported-by: NSven Schnelle <svens@stackframe.org>
      Tested-by: NSven Schnelle <svens@stackframe.org>
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f80ac50
    • Z
      x86, boot: Remove multiple copy of static function sanitize_boot_params() · 84ce0452
      Zhenzhong Duan 提交于
      [ Upstream commit 8c5477e8046ca139bac250386c08453da37ec1ae ]
      
      Kernel build warns:
       'sanitize_boot_params' defined but not used [-Wunused-function]
      
      at below files:
        arch/x86/boot/compressed/cmdline.c
        arch/x86/boot/compressed/error.c
        arch/x86/boot/compressed/early_serial_console.c
        arch/x86/boot/compressed/acpi.c
      
      That's becausethey each include misc.h which includes a definition of
      sanitize_boot_params() via bootparam_utils.h.
      
      Remove the inclusion from misc.h and have the c file including
      bootparam_utils.h directly.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1563283092-1189-1-git-send-email-zhenzhong.duan@oracle.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      84ce0452
    • J
      x86/paravirt: Fix callee-saved function ELF sizes · 740e0167
      Josh Poimboeuf 提交于
      [ Upstream commit 083db6764821996526970e42d09c1ab2f4155dd4 ]
      
      The __raw_callee_save_*() functions have an ELF symbol size of zero,
      which confuses objtool and other tools.
      
      Fixes a bunch of warnings like the following:
      
        arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pte_val() is missing an ELF size annotation
        arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pgd_val() is missing an ELF size annotation
        arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pte() is missing an ELF size annotation
        arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pgd() is missing an ELF size annotation
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/afa6d49bb07497ca62e4fc3b27a2d0cece545b4e.1563413318.git.jpoimboe@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      740e0167
    • J
      x86/kvm: Don't call kvm_spurious_fault() from .fixup · ba5c072f
      Josh Poimboeuf 提交于
      [ Upstream commit 3901336ed9887b075531bffaeef7742ba614058b ]
      
      After making a change to improve objtool's sibling call detection, it
      started showing the following warning:
      
        arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame
      
      The problem is the ____kvm_handle_fault_on_reboot() macro.  It does a
      fake call by pushing a fake RIP and doing a jump.  That tricks the
      unwinder into printing the function which triggered the exception,
      rather than the .fixup code.
      
      Instead of the hack to make it look like the original function made the
      call, just change the macro so that the original function actually does
      make the call.  This allows removal of the hack, and also makes objtool
      happy.
      
      I triggered a vmx instruction exception and verified that the stack
      trace is still sane:
      
        kernel BUG at arch/x86/kvm/x86.c:358!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16
        Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017
        RIP: 0010:kvm_spurious_fault+0x5/0x10
        Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
        RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246
        RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000
        RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0
        RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000
        R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0
        R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000
        FS:  00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
         loaded_vmcs_init+0x4f/0xe0
         alloc_loaded_vmcs+0x38/0xd0
         vmx_create_vcpu+0xf7/0x600
         kvm_vm_ioctl+0x5e9/0x980
         ? __switch_to_asm+0x40/0x70
         ? __switch_to_asm+0x34/0x70
         ? __switch_to_asm+0x40/0x70
         ? __switch_to_asm+0x34/0x70
         ? free_one_page+0x13f/0x4e0
         do_vfs_ioctl+0xa4/0x630
         ksys_ioctl+0x60/0x90
         __x64_sys_ioctl+0x16/0x20
         do_syscall_64+0x55/0x1c0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7fa349b1ee5b
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      ba5c072f
    • Z
      xen/pv: Fix a boot up hang revealed by int3 self test · 11cb9f87
      Zhenzhong Duan 提交于
      [ Upstream commit b23e5844dfe78a80ba672793187d3f52e4b528d7 ]
      
      Commit 7457c0da024b ("x86/alternatives: Add int3_emulate_call()
      selftest") is used to ensure there is a gap setup in int3 exception stack
      which could be used for inserting call return address.
      
      This gap is missed in XEN PV int3 exception entry path, then below panic
      triggered:
      
      [    0.772876] general protection fault: 0000 [#1] SMP NOPTI
      [    0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11
      [    0.772893] RIP: e030:int3_magic+0x0/0x7
      [    0.772905] RSP: 3507:ffffffff82203e98 EFLAGS: 00000246
      [    0.773334] Call Trace:
      [    0.773334]  alternative_instructions+0x3d/0x12e
      [    0.773334]  check_bugs+0x7c9/0x887
      [    0.773334]  ? __get_locked_pte+0x178/0x1f0
      [    0.773334]  start_kernel+0x4ff/0x535
      [    0.773334]  ? set_init_arg+0x55/0x55
      [    0.773334]  xen_start_kernel+0x571/0x57a
      
      For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with
      %rcx/%r11 on the stack. To convert back to "normal" looking exceptions,
      the xen thunks do 'xen_*: pop %rcx; pop %r11; jmp *'.
      
      E.g. Extracting 'xen_pv_trap xenint3' we have:
      xen_xenint3:
       pop %rcx;
       pop %r11;
       jmp xenint3
      
      As xenint3 and int3 entry code are same except xenint3 doesn't generate
      a gap, we can fix it by using int3 and drop useless xenint3.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      11cb9f87
    • A
      x86: math-emu: Hide clang warnings for 16-bit overflow · 1b84e674
      Arnd Bergmann 提交于
      [ Upstream commit 29e7e9664aec17b94a9c8c5a75f8d216a206aa3a ]
      
      clang warns about a few parts of the math-emu implementation
      where a 16-bit integer becomes negative during assignment:
      
      arch/x86/math-emu/poly_tan.c:88:35: error: implicit conversion from 'int' to 'short' changes value from 49216 to -16320 [-Werror,-Wconstant-conversion]
                                            (0x41 + EXTENDED_Ebias) | SIGN_Negative);
                                            ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
      arch/x86/math-emu/fpu_emu.h:180:58: note: expanded from macro 'setexponent16'
       #define setexponent16(x,y)  { (*(short *)&((x)->exp)) = (y); }
                                                            ~  ^
      arch/x86/math-emu/reg_constant.c:37:32: error: implicit conversion from 'int' to 'short' changes value from 49085 to -16451 [-Werror,-Wconstant-conversion]
      FPU_REG const CONST_PI2extra = MAKE_REG(NEG, -66,
                                     ^~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:21:25: note: expanded from macro 'MAKE_REG'
                      ((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
                       ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:48:28: error: implicit conversion from 'int' to 'short' changes value from 65535 to -1 [-Werror,-Wconstant-conversion]
      FPU_REG const CONST_QNaN = MAKE_REG(NEG, EXP_OVER, 0x00000000, 0xC0000000);
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      arch/x86/math-emu/reg_constant.c:21:25: note: expanded from macro 'MAKE_REG'
                      ((EXTENDED_Ebias+(e)) | ((SIGN_##s != 0)*0x8000)) }
                       ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
      
      The code is correct as is, so add a typecast to shut up the warnings.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190712090816.350668-1-arnd@arndb.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      1b84e674
    • Q
      x86/apic: Silence -Wtype-limits compiler warnings · 242666b2
      Qian Cai 提交于
      [ Upstream commit ec6335586953b0df32f83ef696002063090c7aef ]
      
      There are many compiler warnings like this,
      
      In file included from ./arch/x86/include/asm/smp.h:13,
                       from ./arch/x86/include/asm/mmzone_64.h:11,
                       from ./arch/x86/include/asm/mmzone.h:5,
                       from ./include/linux/mmzone.h:969,
                       from ./include/linux/gfp.h:6,
                       from ./include/linux/mm.h:10,
                       from arch/x86/kernel/apic/io_apic.c:34:
      arch/x86/kernel/apic/io_apic.c: In function 'check_timer':
      ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
      expression >= 0 is always true [-Wtype-limits]
         if ((v) <= apic_verbosity) \
                 ^~
      arch/x86/kernel/apic/io_apic.c:2160:2: note: in expansion of macro
      'apic_printk'
        apic_printk(APIC_QUIET, KERN_INFO "..TIMER: vector=0x%02X "
        ^~~~~~~~~~~
      ./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
      expression >= 0 is always true [-Wtype-limits]
         if ((v) <= apic_verbosity) \
                 ^~
      arch/x86/kernel/apic/io_apic.c:2207:4: note: in expansion of macro
      'apic_printk'
          apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
          ^~~~~~~~~~~
      
      APIC_QUIET is 0, so silence them by making apic_verbosity type int.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/1562621805-24789-1-git-send-email-cai@lca.pwSigned-off-by: NSasha Levin <sashal@kernel.org>
      242666b2
    • A
      x86: kvm: avoid constant-conversion warning · 80f58147
      Arnd Bergmann 提交于
      [ Upstream commit a6a6d3b1f867d34ba5bd61aa7bb056b48ca67cff ]
      
      clang finds a contruct suspicious that converts an unsigned
      character to a signed integer and back, causing an overflow:
      
      arch/x86/kvm/mmu.c:4605:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -205 to 51 [-Werror,-Wconstant-conversion]
                      u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0;
                         ~~                               ^~
      arch/x86/kvm/mmu.c:4607:38: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -241 to 15 [-Werror,-Wconstant-conversion]
                      u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0;
                         ~~                              ^~
      arch/x86/kvm/mmu.c:4609:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -171 to 85 [-Werror,-Wconstant-conversion]
                      u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0;
                         ~~                               ^~
      
      Add an explicit cast to tell clang that everything works as
      intended here.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Link: https://github.com/ClangBuiltLinux/linux/issues/95Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      80f58147
    • P
      MIPS: lantiq: Fix bitfield masking · a3524486
      Petr Cvek 提交于
      [ Upstream commit ba1bc0fcdeaf3bf583c1517bd2e3e29cf223c969 ]
      
      The modification of EXIN register doesn't clean the bitfield before
      the writing of a new value. After a few modifications the bitfield would
      accumulate only '1's.
      Signed-off-by: NPetr Cvek <petrcvekcz@gmail.com>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Cc: hauke@hauke-m.de
      Cc: john@phrozen.org
      Cc: linux-mips@vger.kernel.org
      Cc: openwrt-devel@lists.openwrt.org
      Cc: pakahmar@hotmail.com
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a3524486
    • H
      arm64: dts: rockchip: fix isp iommu clocks and power domain · fd53e45a
      Helen Koike 提交于
      [ Upstream commit c432a29d3fc9ee928caeca2f5cf68b3aebfa6817 ]
      
      isp iommu requires wrapper variants of the clocks.
      noc variants are always on and using the wrapper variants will activate
      {A,H}CLK_ISP{0,1} due to the hierarchy.
      
      Tested using the pending isp patch set (which is not upstream
      yet). Without this patch, streaming from the isp stalls.
      
      Also add the respective power domain and remove the "disabled" status.
      
      Refer:
       RK3399 TRM v1.4 Fig. 2-4 RK3399 Clock Architecture Diagram
       RK3399 TRM v1.4 Fig. 8-1 RK3399 Power Domain Partition
      Signed-off-by: NHelen Koike <helen.koike@collabora.com>
      Tested-by: NManivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      fd53e45a
    • D
      ARM: dts: rockchip: Mark that the rk3288 timer might stop in suspend · ea26b427
      Douglas Anderson 提交于
      [ Upstream commit 8ef1ba39a9fa53d2205e633bc9b21840a275908e ]
      
      This is similar to commit e6186820 ("arm64: dts: rockchip: Arch
      counter doesn't tick in system suspend").  Specifically on the rk3288
      it can be seen that the timer stops ticking in suspend if we end up
      running through the "osc_disable" path in rk3288_slp_mode_set().  In
      that path the 24 MHz clock will turn off and the timer stops.
      
      To test this, I ran this on a Chrome OS filesystem:
        before=$(date); \
        suspend_stress_test -c1 --suspend_min=30 --suspend_max=31; \
        echo ${before}; date
      
      ...and I found that unless I plug in a device that requests USB wakeup
      to be active that the two calls to "date" would show that fewer than
      30 seconds passed.
      
      NOTE: deep suspend (where the 24 MHz clock gets disabled) isn't
      supported yet on upstream Linux so this was tested on a downstream
      kernel.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ea26b427
    • D
      ARM: dts: rockchip: Make rk3288-veyron-mickey's emmc work again · 22befe67
      Douglas Anderson 提交于
      [ Upstream commit 99fa066710f75f18f4d9a5bc5f6a711968a581d5 ]
      
      When I try to boot rk3288-veyron-mickey I totally fail to make the
      eMMC work.  Specifically my logs (on Chrome OS 4.19):
      
        mmc_host mmc1: card is non-removable.
        mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
        mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, actual 50000000HZ div = 0)
        mmc1: switch to bus width 8 failed
        mmc1: switch to bus width 4 failed
        mmc1: new high speed MMC card at address 0001
        mmcblk1: mmc1:0001 HAG2e 14.7 GiB
        mmcblk1boot0: mmc1:0001 HAG2e partition 1 4.00 MiB
        mmcblk1boot1: mmc1:0001 HAG2e partition 2 4.00 MiB
        mmcblk1rpmb: mmc1:0001 HAG2e partition 3 4.00 MiB, chardev (243:0)
        mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
        mmc_host mmc1: Bus speed (slot 0) = 50000000Hz (slot req 52000000Hz, actual 50000000HZ div = 0)
        mmc1: switch to bus width 8 failed
        mmc1: switch to bus width 4 failed
        mmc1: tried to HW reset card, got error -110
        mmcblk1: error -110 requesting status
        mmcblk1: recovery failed!
        print_req_error: I/O error, dev mmcblk1, sector 0
        ...
      
      When I remove the '/delete-property/mmc-hs200-1_8v' then everything is
      hunky dory.
      
      That line comes from the original submission of the mickey dts
      upstream, so presumably at the time the HS200 was failing and just
      enumerating things as a high speed device was fine.  ...or maybe it's
      just that some mickey devices work when enumerating at "high speed",
      just not mine?
      
      In any case, hs200 seems good now.  Let's turn it on.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      22befe67
    • D
      ARM: dts: rockchip: Make rk3288-veyron-minnie run at hs200 · 8c5a33d3
      Douglas Anderson 提交于
      [ Upstream commit 1c0479023412ab7834f2e98b796eb0d8c627cd62 ]
      
      As some point hs200 was failing on rk3288-veyron-minnie.  See commit
      98492678 ("ARM: dts: rockchip: temporarily remove emmc hs200 speed
      from rk3288 minnie").  Although I didn't track down exactly when it
      started working, it seems to work OK now, so let's turn it back on.
      
      To test this, I booted from SD card and then used this script to
      stress the enumeration process after fixing a memory leak [1]:
        cd /sys/bus/platform/drivers/dwmmc_rockchip
        for i in $(seq 1 3000); do
          echo "========================" $i
          echo ff0f0000.dwmmc > unbind
          sleep .5
          echo ff0f0000.dwmmc > bind
          while true; do
            if [ -e /dev/mmcblk2 ]; then
              break;
            fi
            sleep .1
          done
        done
      
      It worked fine.
      
      [1] https://lkml.kernel.org/r/20190503233526.226272-1-dianders@chromium.orgSigned-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8c5a33d3
    • R
      ARM: riscpc: fix DMA · 3c1d1bad
      Russell King 提交于
      [ Upstream commit ffd9a1ba9fdb7f2bd1d1ad9b9243d34e96756ba2 ]
      
      DMA got broken a while back in two different ways:
      1) a change in the behaviour of disable_irq() to wait for the interrupt
         to finish executing causes us to deadlock at the end of DMA.
      2) a change to avoid modifying the scatterlist left the first transfer
         uninitialised.
      
      DMA is only used with expansion cards, so has gone unnoticed.
      
      Fixes: fa4e9989 ("[ARM] dma: RiscPC: don't modify DMA SG entries")
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3c1d1bad
  2. 04 8月, 2019 2 次提交
  3. 31 7月, 2019 13 次提交
    • M
      powerpc/tm: Fix oops on sigreturn on systems without TM · b993a66d
      Michael Neuling 提交于
      commit f16d80b75a096c52354c6e0a574993f3b0dfbdfe upstream.
      
      On systems like P9 powernv where we have no TM (or P8 booted with
      ppc_tm=off), userspace can construct a signal context which still has
      the MSR TS bits set. The kernel tries to restore this context which
      results in the following crash:
      
        Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033
        Oops: Unrecoverable exception, sig: 6 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69
        NIP:  c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000
        REGS: c00000003fffbd70 TRAP: 0700   Not tainted  (5.2.0-11045-g7142b497d8)
        MSR:  8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]>  CR: 42004242  XER: 00000000
        CFAR: c0000000000022e0 IRQMASK: 0
        GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669
        GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8
        GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000
        GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420
        GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000
        GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000
        GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728
        NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80
        LR [00007fffb2d67e48] 0x7fffb2d67e48
        Call Trace:
        Instruction dump:
        e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00
        e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18
      
      The problem is the signal code assumes TM is enabled when
      CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
      with P9 powernv or if `ppc_tm=off` is used on P8.
      
      This means any local user can crash the system.
      
      Fix the problem by returning a bad stack frame to the user if they try
      to set the MSR TS bits with sigreturn() on systems where TM is not
      supported.
      
      Found with sigfuz kernel selftest on P9.
      
      This fixes CVE-2019-13648.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Cc: stable@vger.kernel.org # v3.9
      Reported-by: NPraveen Pandey <Praveen.Pandey@in.ibm.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b993a66d
    • G
      powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() · b9310c56
      Gautham R. Shenoy 提交于
      commit 4d202c8c8ed3822327285747db1765967110b274 upstream.
      
      xive_find_target_in_mask() has the following for(;;) loop which has a
      bug when @first == cpumask_first(@mask) and condition 1 fails to hold
      for every CPU in @mask. In this case we loop forever in the for-loop.
      
        first = cpu;
        for (;;) {
        	  if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1
      		  return cpu;
      	  cpu = cpumask_next(cpu, mask);
      	  if (cpu == first) // condition 2
      		  break;
      
      	  if (cpu >= nr_cpu_ids) // condition 3
      		  cpu = cpumask_first(mask);
        }
      
      This is because, when @first == cpumask_first(@mask), we never hit the
      condition 2 (cpu == first) since prior to this check, we would have
      executed "cpu = cpumask_next(cpu, mask)" which will set the value of
      @cpu to a value greater than @first or to nr_cpus_ids. When this is
      coupled with the fact that condition 1 is not met, we will never exit
      this loop.
      
      This was discovered by the hard-lockup detector while running LTP test
      concurrently with SMT switch tests.
      
       watchdog: CPU 12 detected hard LOCKUP on other CPUs 68
       watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago)
       watchdog: CPU 68 Hard LOCKUP
       watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago)
       CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1
       NIP:  c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000
       REGS: c000201fff3c7d80 TRAP: 0100   Not tainted  (4.18.0-100.el8.ppc64le)
       MSR:  9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 24028424  XER: 00000000
       CFAR: c0000000006f558c IRQMASK: 1
       GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18
       GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8
       GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8
       GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000
       GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001
       GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18
       GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0
       GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f
       NIP [c0000000006f5578] find_next_bit+0x38/0x90
       LR [c000000000cba9ec] cpumask_next+0x2c/0x50
       Call Trace:
       [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable)
       [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240
       [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0
       [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260
       [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0
       [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0
       [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90
       [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220
       [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x]
       [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x]
       [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0
       [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50
       [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0
       [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0
       [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70
       [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160
       [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70
       Instruction dump:
       78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039
       4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040
      
      To fix this, move the check for condition 2 after the check for
      condition 3, so that we are able to break out of the loop soon after
      iterating through all the CPUs in the @mask in the problem case. Use
      do..while() to achieve this.
      
      Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Reported-by: NIndira P. Joga <indira.priya@in.ibm.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9310c56
    • Z
      x86/speculation/mds: Apply more accurate check on hypervisor platform · 7d20e3ba
      Zhenzhong Duan 提交于
      commit 517c3ba00916383af6411aec99442c307c23f684 upstream.
      
      X86_HYPER_NATIVE isn't accurate for checking if running on native platform,
      e.g. CONFIG_HYPERVISOR_GUEST isn't set or "nopv" is enabled.
      
      Checking the CPU feature bit X86_FEATURE_HYPERVISOR to determine if it's
      running on native platform is more accurate.
      
      This still doesn't cover the platforms on which X86_FEATURE_HYPERVISOR is
      unsupported, e.g. VMware, but there is nothing which can be done about this
      scenario.
      
      Fixes: 8a4b06d391b0 ("x86/speculation/mds: Add sysfs reporting for MDS")
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1564022349-17338-1-git-send-email-zhenzhong.duan@oracle.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d20e3ba
    • H
      x86/sysfb_efi: Add quirks for some devices with swapped width and height · 5e87e8b4
      Hans de Goede 提交于
      commit d02f1aa39189e0619c3525d5cd03254e61bf606a upstream.
      
      Some Lenovo 2-in-1s with a detachable keyboard have a portrait screen but
      advertise a landscape resolution and pitch, resulting in a messed up
      display if the kernel tries to show anything on the efifb (because of the
      wrong pitch).
      
      Fix this by adding a new DMI match table for devices which need to have
      their width and height swapped.
      
      At first it was tried to use the existing table for overriding some of the
      efifb parameters, but some of the affected devices have variants with
      different LCD resolutions which will not work with hardcoded override
      values.
      
      Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1730783Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190721152418.11644-1-hdegoede@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e87e8b4
    • S
      sh: prevent warnings when using iounmap · 7bd5902a
      Sam Ravnborg 提交于
      [ Upstream commit 733f0025f0fb43e382b84db0930ae502099b7e62 ]
      
      When building drm/exynos for sh, as part of an allmodconfig build, the
      following warning triggered:
      
        exynos7_drm_decon.c: In function `decon_remove':
        exynos7_drm_decon.c:769:24: warning: unused variable `ctx'
          struct decon_context *ctx = dev_get_drvdata(&pdev->dev);
      
      The ctx variable is only used as argument to iounmap().
      
      In sh - allmodconfig CONFIG_MMU is not defined
      so it ended up in:
      
      \#define __iounmap(addr)	do { } while (0)
      \#define iounmap		__iounmap
      
      Fix the warning by introducing a static inline function for iounmap.
      
      This is similar to several other architectures.
      
      Link: http://lkml.kernel.org/r/20190622114208.24427-1-sam@ravnborg.orgSigned-off-by: NSam Ravnborg <sam@ravnborg.org>
      Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Inki Dae <inki.dae@samsung.com>
      Cc: Krzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7bd5902a
    • O
      powerpc/eeh: Handle hugepages in ioremap space · 7f775a67
      Oliver O'Halloran 提交于
      [ Upstream commit 33439620680be5225c1b8806579a291e0d761ca0 ]
      
      In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap
      space") support for using hugepages in the vmalloc and ioremap areas was
      enabled for radix. Unfortunately this broke EEH MMIO error checking.
      
      Detection works by inserting a hook which checks the results of the
      ioreadXX() set of functions.  When a read returns a 0xFFs response we
      need to check for an error which we do by mapping the (virtual) MMIO
      address back to a physical address, then mapping physical address to a
      PCI device via an interval tree.
      
      When translating virt -> phys we currently assume the ioremap space is
      only populated by PAGE_SIZE mappings. If a hugepage mapping is found we
      emit a WARN_ON(), but otherwise handles the check as though a normal
      page was found. In pathalogical cases such as copying a buffer
      containing a lot of 0xFFs from BAR memory this can result in the system
      not booting because it's too busy printing WARN_ON()s.
      
      There's no real reason to assume huge pages can't be present and we're
      prefectly capable of handling them, so do that.
      
      Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space")
      Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190710150517.27114-1-oohall@gmail.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      7f775a67
    • M
      powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h · 4b9dc73a
      Masahiro Yamada 提交于
      [ Upstream commit 9e005b761e7ad153dcf40a6cba1d681fe0830ac6 ]
      
      The next commit will make the way of passing CONFIG options more robust.
      Unfortunately, it would uncover another hidden issue; without this
      commit, skiroot_defconfig would be broken like this:
      
      |   WRAP    arch/powerpc/boot/zImage.pseries
      | arch/powerpc/boot/wrapper.a(decompress.o): In function `bcj_powerpc.isra.10':
      | decompress.c:(.text+0x720): undefined reference to `get_unaligned_be32'
      | decompress.c:(.text+0x7a8): undefined reference to `put_unaligned_be32'
      | make[1]: *** [arch/powerpc/boot/Makefile;383: arch/powerpc/boot/zImage.pseries] Error 1
      | make: *** [arch/powerpc/Makefile;295: zImage] Error 2
      
      skiroot_defconfig is the only defconfig that enables CONFIG_KERNEL_XZ
      for ppc, which has never been correctly built before.
      
      I figured out the root cause in lib/decompress_unxz.c:
      
      | #ifdef CONFIG_PPC
      | #      define XZ_DEC_POWERPC
      | #endif
      
      CONFIG_PPC is undefined here in the ppc bootwrapper because autoconf.h
      is not included except by arch/powerpc/boot/serial.c
      
      XZ_DEC_POWERPC is not defined, therefore, bcj_powerpc() is not compiled
      for the bootwrapper.
      
      With the next commit passing CONFIG_PPC correctly, we would realize that
      {get,put}_unaligned_be32 was missing.
      
      Unlike the other decompressors, the ppc bootwrapper duplicates all the
      necessary helpers in arch/powerpc/boot/.
      
      The other architectures define __KERNEL__ and pull in helpers for
      building the decompressors.
      
      If ppc bootwrapper had defined __KERNEL__, lib/xz/xz_private.h would
      have included <asm/unaligned.h>:
      
      | #ifdef __KERNEL__
      | #       include <linux/xz.h>
      | #       include <linux/kernel.h>
      | #       include <asm/unaligned.h>
      
      However, doing so would cause tons of definition conflicts since the
      bootwrapper has duplicated everything.
      
      I just added copies of {get,put}_unaligned_be32, following the
      bootwrapper coding convention.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190705100144.28785-1-yamada.masahiro@socionext.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      4b9dc73a
    • J
      arm64: assembler: Switch ESB-instruction with a vanilla nop if !ARM64_HAS_RAS · 05959ed8
      James Morse 提交于
      [ Upstream commit 2b68a2a963a157f024c67c0697b16f5f792c8a35 ]
      
      The ESB-instruction is a nop on CPUs that don't implement the RAS
      extensions. This lets us use it in places like the vectors without
      having to use alternatives.
      
      If someone disables CONFIG_ARM64_RAS_EXTN, this instruction still has
      its RAS extensions behaviour, but we no longer read DISR_EL1 as this
      register does depend on alternatives.
      
      This could go wrong if we want to synchronize an SError from a KVM
      guest. On a CPU that has the RAS extensions, but the KConfig option
      was disabled, we consume the pending SError with no chance of ever
      reading it.
      
      Hide the ESB-instruction behind the CONFIG_ARM64_RAS_EXTN option,
      outputting a regular nop if the feature has been disabled.
      Reported-by: NJulien Thierry <julien.thierry@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      05959ed8
    • A
      powerpc/mm: Handle page table allocation failures · d48720ba
      Aneesh Kumar K.V 提交于
      [ Upstream commit 2230ebf6e6dd0b7751e2921b40f6cfe34f09bb16 ]
      
      This fixes kernel crash that arises due to not handling page table allocation
      failures while allocating hugetlb page table.
      
      Fixes: e2b3d202 ("powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d48720ba
    • C
      powerpc/4xx/uic: clear pending interrupt after irq type/pol change · 52373ab6
      Christian Lamparter 提交于
      [ Upstream commit 3ab3a0689e74e6aa5b41360bc18861040ddef5b1 ]
      
      When testing out gpio-keys with a button, a spurious
      interrupt (and therefore a key press or release event)
      gets triggered as soon as the driver enables the irq
      line for the first time.
      
      This patch clears any potential bogus generated interrupt
      that was caused by the switching of the associated irq's
      type and polarity.
      Signed-off-by: NChristian Lamparter <chunkeey@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      52373ab6
    • J
      um: Silence lockdep complaint about mmap_sem · 74520144
      Johannes Berg 提交于
      [ Upstream commit 80bf6ceaf9310b3f61934c69b382d4912deee049 ]
      
      When we get into activate_mm(), lockdep complains that we're doing
      something strange:
      
          WARNING: possible circular locking dependency detected
          5.1.0-10252-gb00152307319-dirty #121 Not tainted
          ------------------------------------------------------
          inside.sh/366 is trying to acquire lock:
          (____ptrval____) (&(&p->alloc_lock)->rlock){+.+.}, at: flush_old_exec+0x703/0x8d7
      
          but task is already holding lock:
          (____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&mm->mmap_sem){++++}:
                 [...]
                 __lock_acquire+0x12ab/0x139f
                 lock_acquire+0x155/0x18e
                 down_write+0x3f/0x98
                 flush_old_exec+0x748/0x8d7
                 load_elf_binary+0x2ca/0xddb
                 [...]
      
          -> #0 (&(&p->alloc_lock)->rlock){+.+.}:
                 [...]
                 __lock_acquire+0x12ab/0x139f
                 lock_acquire+0x155/0x18e
                 _raw_spin_lock+0x30/0x83
                 flush_old_exec+0x703/0x8d7
                 load_elf_binary+0x2ca/0xddb
                 [...]
      
          other info that might help us debug this:
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(&mm->mmap_sem);
                                         lock(&(&p->alloc_lock)->rlock);
                                         lock(&mm->mmap_sem);
            lock(&(&p->alloc_lock)->rlock);
      
           *** DEADLOCK ***
      
          2 locks held by inside.sh/366:
           #0: (____ptrval____) (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file+0x12d/0x869
           #1: (____ptrval____) (&mm->mmap_sem){++++}, at: flush_old_exec+0x6c5/0x8d7
      
          stack backtrace:
          CPU: 0 PID: 366 Comm: inside.sh Not tainted 5.1.0-10252-gb00152307319-dirty #121
          Stack:
           [...]
          Call Trace:
           [<600420de>] show_stack+0x13b/0x155
           [<6048906b>] dump_stack+0x2a/0x2c
           [<6009ae64>] print_circular_bug+0x332/0x343
           [<6009c5c6>] check_prev_add+0x669/0xdad
           [<600a06b4>] __lock_acquire+0x12ab/0x139f
           [<6009f3d0>] lock_acquire+0x155/0x18e
           [<604a07e0>] _raw_spin_lock+0x30/0x83
           [<60151e6a>] flush_old_exec+0x703/0x8d7
           [<601a8eb8>] load_elf_binary+0x2ca/0xddb
           [...]
      
      I think it's because in exec_mmap() we have
      
      	down_read(&old_mm->mmap_sem);
      ...
              task_lock(tsk);
      ...
      	activate_mm(active_mm, mm);
      	(which does down_write(&mm->mmap_sem))
      
      I'm not really sure why lockdep throws in the whole knowledge
      about the task lock, but it seems that old_mm and mm shouldn't
      ever be the same (and it doesn't deadlock) so tell lockdep that
      they're different.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      74520144
    • N
      powerpc/xmon: Fix disabling tracing while in xmon · 9fac3948
      Naveen N. Rao 提交于
      [ Upstream commit aaf06665f7ea3ee9f9754e16c1a507a89f1de5b1 ]
      
      Commit ed49f7fd ("powerpc/xmon: Disable tracing when entering
      xmon") added code to disable recording trace entries while in xmon. The
      commit introduced a variable 'tracing_enabled' to record if tracing was
      enabled on xmon entry, and used this to conditionally enable tracing
      during exit from xmon.
      
      However, we are not checking the value of 'fromipi' variable in
      xmon_core() when setting 'tracing_enabled'. Due to this, when secondary
      cpus enter xmon, they will see tracing as being disabled already and
      tracing won't be re-enabled on exit. Fix the same.
      
      Fixes: ed49f7fd ("powerpc/xmon: Disable tracing when entering xmon")
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      9fac3948
    • Q
      powerpc/cacheflush: fix variable set but not used · a80f67d5
      Qian Cai 提交于
      [ Upstream commit 04db3ede40ae4fc23a5c4237254c4a53bbe4c1f2 ]
      
      The powerpc's flush_cache_vmap() is defined as a macro and never use
      both of its arguments, so it will generate a compilation warning,
      
      lib/ioremap.c: In function 'ioremap_page_range':
      lib/ioremap.c:203:16: warning: variable 'start' set but not used
      [-Wunused-but-set-variable]
      
      Fix it by making it an inline function.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a80f67d5