1. 13 2月, 2019 5 次提交
  2. 31 1月, 2019 8 次提交
  3. 26 1月, 2019 2 次提交
    • H
      x86/topology: Use total_cpus for max logical packages calculation · a4772e8b
      Hui Wang 提交于
      [ Upstream commit aa02ef099cff042c2a9109782ec2bf1bffc955d4 ]
      
      nr_cpu_ids can be limited on the command line via nr_cpus=. This can break the
      logical package management because it results in a smaller number of packages
      while in kdump kernel.
      
      Check below case:
      There is a two sockets system, each socket has 8 cores, which has 16 logical
      cpus while HT was turn on.
      
       0  1  2  3  4  5  6  7     |    16 17 18 19 20 21 22 23
       cores on socket 0               threads on socket 0
       8  9 10 11 12 13 14 15     |    24 25 26 27 28 29 30 31
       cores on socket 1               threads on socket 1
      
      While starting the kdump kernel with command line option nr_cpus=16 panic
      was triggered on one of the cpus 24-31 eg. 26, then online cpu will be
      1-15, 26(cpu 0 was disabled in kdump), ncpus will be 16 and
      __max_logical_packages will be 1, but actually two packages were booted on.
      
      This issue can reproduced by set kdump option nr_cpus=<real physical core
      numbers>, and then trigger panic on last socket's thread, for example:
      
      taskset -c 26 echo c > /proc/sysrq-trigger
      
      Use total_cpus which will not be limited by nr_cpus command line to calculate
      the value of __max_logical_packages.
      Signed-off-by: NHui Wang <john.wanghui@huawei.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: <guijianfeng@huawei.com>
      Cc: <wencongyang2@huawei.com>
      Cc: <douliyang1@huawei.com>
      Cc: <qiaonuohan@huawei.com>
      Link: https://lkml.kernel.org/r/20181107023643.22174-1-john.wanghui@huawei.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      a4772e8b
    • B
      x86/mce: Fix -Wmissing-prototypes warnings · 1d839c72
      Borislav Petkov 提交于
      [ Upstream commit 68b5e4326e4b8ac9080835005d8254fed0fb3c56 ]
      
      Add the proper includes and make smca_get_name() static.
      
      Fix an actual bug too which the warning triggered:
      
        arch/x86/kernel/cpu/mcheck/therm_throt.c:395:39: error: conflicting \
        types for ‘smp_thermal_interrupt’
         asmlinkage __visible void __irq_entry smp_thermal_interrupt(struct pt_regs *r)
                                               ^~~~~~~~~~~~~~~~~~~~~
        In file included from arch/x86/kernel/cpu/mcheck/therm_throt.c:29:
        ./arch/x86/include/asm/traps.h:107:17: note: previous declaration of \
      	  ‘smp_thermal_interrupt’ was here
         asmlinkage void smp_thermal_interrupt(void);
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Yi Wang <wang.yi59@zte.com.cn>
      Cc: Michael Matz <matz@suse.de>
      Cc: x86@kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811081633160.1549@nanos.tec.linutronix.deSigned-off-by: NSasha Levin <sashal@kernel.org>
      1d839c72
  4. 23 1月, 2019 1 次提交
  5. 17 1月, 2019 1 次提交
    • W
      x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE · 4bef2bac
      WANG Chao 提交于
      commit e4f358916d528d479c3c12bd2fd03f2d5a576380 upstream.
      
      Commit
      
        4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
      
      replaced the RETPOLINE define with CONFIG_RETPOLINE checks. Remove the
      remaining pieces.
      
       [ bp: Massage commit message. ]
      
      Fixes: 4cd24de3a098 ("x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support")
      Signed-off-by: NWANG Chao <chao.wang@ucloud.cn>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: linux-kbuild@vger.kernel.org
      Cc: srinivas.eeda@oracle.com
      Cc: stable <stable@vger.kernel.org>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20181210163725.95977-1-chao.wang@ucloud.cnSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4bef2bac
  6. 13 1月, 2019 2 次提交
  7. 10 1月, 2019 4 次提交
    • S
      KVM: nVMX: Free the VMREAD/VMWRITE bitmaps if alloc_kvm_area() fails · c9dae887
      Sean Christopherson 提交于
      commit 1b3ab5ad1b8ad99bae76ec583809c5f5a31c707c upstream.
      
      Fixes: 34a1cd60 ("kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9dae887
    • S
      KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup · edcf33b1
      Sean Christopherson 提交于
      commit e81434995081fd7efb755fd75576b35dbb0850b1 upstream.
      
      ____kvm_handle_fault_on_reboot() provides a generic exception fixup
      handler that is used to cleanly handle faults on VMX/SVM instructions
      during reboot (or at least try to).  If there isn't a reboot in
      progress, ____kvm_handle_fault_on_reboot() treats any exception as
      fatal to KVM and invokes kvm_spurious_fault(), which in turn generates
      a BUG() to get a stack trace and die.
      
      When it was originally added by commit 4ecac3fd ("KVM: Handle
      virtualization instruction #UD faults during reboot"), the "call" to
      kvm_spurious_fault() was handcoded as PUSH+JMP, where the PUSH'd value
      is the RIP of the faulting instructing.
      
      The PUSH+JMP trickery is necessary because the exception fixup handler
      code lies outside of its associated function, e.g. right after the
      function.  An actual CALL from the .fixup code would show a slightly
      bogus stack trace, e.g. an extra "random" function would be inserted
      into the trace, as the return RIP on the stack would point to no known
      function (and the unwinder will likely try to guess who owns the RIP).
      
      Unfortunately, the JMP was replaced with a CALL when the macro was
      reworked to not spin indefinitely during reboot (commit b7c4145b
      "KVM: Don't spin on virt instruction faults during reboot").  This
      causes the aforementioned behavior where a bogus function is inserted
      into the stack trace, e.g. my builds like to blame free_kvm_area().
      
      Revert the CALL back to a JMP.  The changelog for commit b7c4145b
      ("KVM: Don't spin on virt instruction faults during reboot") contains
      nothing that indicates the switch to CALL was deliberate.  This is
      backed up by the fact that the PUSH <insn RIP> was left intact.
      
      Note that an alternative to the PUSH+JMP magic would be to JMP back
      to the "real" code and CALL from there, but that would require adding
      a JMP in the non-faulting path to avoid calling kvm_spurious_fault()
      and would add no value, i.e. the stack trace would be the same.
      
      Using CALL:
      
      ------------[ cut here ]------------
      kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
      invalid opcode: 0000 [#1] SMP
      CPU: 4 PID: 1057 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
      Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
      RSP: 0018:ffffc900004bbcc8 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff888273fd8000 R08: 00000000000003e8 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000371fb0
      R13: 0000000000000000 R14: 000000026d763cf4 R15: ffff888273fd8000
      FS:  00007f3d69691700(0000) GS:ffff888277800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055f89bc56fe0 CR3: 0000000271a5a001 CR4: 0000000000362ee0
      Call Trace:
       free_kvm_area+0x1044/0x43ea [kvm_intel]
       ? vmx_vcpu_run+0x156/0x630 [kvm_intel]
       ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? __set_task_blocked+0x38/0x90
       ? __set_current_blocked+0x50/0x60
       ? __fpu__restore_sig+0x97/0x490
       ? do_vfs_ioctl+0xa1/0x620
       ? __x64_sys_futex+0x89/0x180
       ? ksys_ioctl+0x66/0x70
       ? __x64_sys_ioctl+0x16/0x20
       ? do_syscall_64+0x4f/0x100
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
      ---[ end trace 9775b14b123b1713 ]---
      
      Using JMP:
      
      ------------[ cut here ]------------
      kernel BUG at /home/sean/go/src/kernel.org/linux/arch/x86/kvm/x86.c:356!
      invalid opcode: 0000 [#1] SMP
      CPU: 6 PID: 1067 Comm: qemu-system-x86 Not tainted 4.20.0-rc6+ #75
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_spurious_fault+0x5/0x10 [kvm]
      Code: <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
      RSP: 0018:ffffc90000497cd0 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffffffffff
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff88827058bd40 R08: 00000000000003e8 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000784 R12: ffffc90000369fb0
      R13: 0000000000000000 R14: 00000003c8fc6642 R15: ffff88827058bd40
      FS:  00007f3d7219e700(0000) GS:ffff888277900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3d64001000 CR3: 0000000271c6b004 CR4: 0000000000362ee0
      Call Trace:
       vmx_vcpu_run+0x156/0x630 [kvm_intel]
       ? kvm_arch_vcpu_ioctl_run+0x447/0x1a40 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? kvm_vcpu_ioctl+0x368/0x5c0 [kvm]
       ? __set_task_blocked+0x38/0x90
       ? __set_current_blocked+0x50/0x60
       ? __fpu__restore_sig+0x97/0x490
       ? do_vfs_ioctl+0xa1/0x620
       ? __x64_sys_futex+0x89/0x180
       ? ksys_ioctl+0x66/0x70
       ? __x64_sys_ioctl+0x16/0x20
       ? do_syscall_64+0x4f/0x100
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Modules linked in: vhost_net vhost tap kvm_intel kvm irqbypass bridge stp llc
      ---[ end trace f9daedb85ab3ddba ]---
      
      Fixes: b7c4145b ("KVM: Don't spin on virt instruction faults during reboot")
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      edcf33b1
    • D
      x86/mm: Drop usage of __flush_tlb_all() in kernel_physical_mapping_init() · 49102719
      Dan Williams 提交于
      commit ba6f508d0ec4adb09f0a939af6d5e19cdfa8667d upstream.
      
      Commit:
      
        f77084d96355 "x86/mm/pat: Disable preemption around __flush_tlb_all()"
      
      addressed a case where __flush_tlb_all() is called without preemption
      being disabled. It also left a warning to catch other cases where
      preemption is not disabled.
      
      That warning triggers for the memory hotplug path which is also used for
      persistent memory enabling:
      
       WARNING: CPU: 35 PID: 911 at ./arch/x86/include/asm/tlbflush.h:460
       RIP: 0010:__flush_tlb_all+0x1b/0x3a
       [..]
       Call Trace:
        phys_pud_init+0x29c/0x2bb
        kernel_physical_mapping_init+0xfc/0x219
        init_memory_mapping+0x1a5/0x3b0
        arch_add_memory+0x2c/0x50
        devm_memremap_pages+0x3aa/0x610
        pmem_attach_disk+0x585/0x700 [nd_pmem]
      
      Andy wondered why a path that can sleep was using __flush_tlb_all() [1]
      and Dave confirmed the expectation for TLB flush is for modifying /
      invalidating existing PTE entries, but not initial population [2]. Drop
      the usage of __flush_tlb_all() in phys_{p4d,pud,pmd}_init() on the
      expectation that this path is only ever populating empty entries for the
      linear map. Note, at linear map teardown time there is a call to the
      all-cpu flush_tlb_all() to invalidate the removed mappings.
      
      [1]: https://lkml.kernel.org/r/9DFD717D-857D-493D-A606-B635D72BAC21@amacapital.net
      [2]: https://lkml.kernel.org/r/749919a4-cdb1-48a3-adb4-adb81a5fa0b5@intel.com
      
      [ mingo: Minor readability edits. ]
      Suggested-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@intel.com
      Fixes: f77084d96355 ("x86/mm/pat: Disable preemption around __flush_tlb_all()")
      Link: http://lkml.kernel.org/r/154395944713.32119.15611079023837132638.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49102719
    • M
      x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off · 86ba6f66
      Michal Hocko 提交于
      commit 5b5e4d623ec8a34689df98e42d038a3b594d2ff9 upstream.
      
      Swap storage is restricted to max_swapfile_size (~16TB on x86_64) whenever
      the system is deemed affected by L1TF vulnerability. Even though the limit
      is quite high for most deployments it seems to be too restrictive for
      deployments which are willing to live with the mitigation disabled.
      
      We have a customer to deploy 8x 6,4TB PCIe/NVMe SSD swap devices which is
      clearly out of the limit.
      
      Drop the swap restriction when l1tf=off is specified. It also doesn't make
      much sense to warn about too much memory for the l1tf mitigation when it is
      forcefully disabled by the administrator.
      
      [ tglx: Folded the documentation delta change ]
      
      Fixes: 377eeaa8 ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Kosina <jkosina@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: <linux-mm@kvack.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181113184910.26697-1-mhocko@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      86ba6f66
  8. 29 12月, 2018 7 次提交
    • R
      x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence · 0a95cba5
      Reinette Chatre 提交于
      commit 80b71c340f17705ec145911b9a193ea781811b16 upstream.
      
      The user triggers the creation of a pseudo-locked region when writing
      the requested schemata to the schemata resctrl file. The pseudo-locking
      of a region is required to be done on a CPU that is associated with the
      cache on which the pseudo-locked region will reside. In order to run the
      locking code on a specific CPU, the needed CPU has to be selected and
      ensured to remain online during the entire locking sequence.
      
      At this time, the cpu_hotplug_lock is not taken during the pseudo-lock
      region creation and it is thus possible for a CPU to be selected to run
      the pseudo-locking code and then that CPU to go offline before the
      thread is able to run on it.
      
      Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on
      which code has to run needs to be controlled. Since the cpu_hotplug_lock
      is always taken before rdtgroup_mutex the lock order is maintained.
      
      Fixes: e0bdfe8e ("x86/intel_rdt: Support creation/removal of pseudo-locked region")
      Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: gavin.hindman@intel.com
      Cc: jithu.joseph@intel.com
      Cc: stable <stable@vger.kernel.org>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a95cba5
    • A
      x86/vdso: Pass --eh-frame-hdr to the linker · 56f7bfac
      Alistair Strachan 提交于
      commit cd01544a268ad8ee5b1dfe42c4393f1095f86879 upstream.
      
      Commit
      
        379d98dd ("x86: vdso: Use $LD instead of $CC to link")
      
      accidentally broke unwinding from userspace, because ld would strip the
      .eh_frame sections when linking.
      
      Originally, the compiler would implicitly add --eh-frame-hdr when
      invoking the linker, but when this Makefile was converted from invoking
      ld via the compiler, to invoking it directly (like vmlinux does),
      the flag was missed. (The EH_FRAME section is important for the VDSO
      shared libraries, but not for vmlinux.)
      
      Fix the problem by explicitly specifying --eh-frame-hdr, which restores
      parity with the old method.
      
      See relevant bug reports for additional info:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=201741
        https://bugzilla.redhat.com/show_bug.cgi?id=1659295
      
      Fixes: 379d98dd ("x86: vdso: Use $LD instead of $CC to link")
      Reported-by: NFlorian Weimer <fweimer@redhat.com>
      Reported-by: NCarlos O'Donell <carlos@redhat.com>
      Reported-by: N"H. J. Lu" <hjl.tools@gmail.com>
      Signed-off-by: NAlistair Strachan <astrachan@google.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Carlos O'Donell <carlos@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: kernel-team@android.com
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: stable <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: X86 ML <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20181214223637.35954-1-astrachan@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56f7bfac
    • D
      x86/mm: Fix decoy address handling vs 32-bit builds · 1e3b98b2
      Dan Williams 提交于
      commit 51c3fbd89d7554caa3290837604309f8d8669d99 upstream.
      
      A decoy address is used by set_mce_nospec() to update the cache attributes
      for a page that may contain poison (multi-bit ECC error) while attempting
      to minimize the possibility of triggering a speculative access to that
      page.
      
      When reserve_memtype() is handling a decoy address it needs to convert it
      to its real physical alias. The conversion, AND'ing with __PHYSICAL_MASK,
      is broken for a 32-bit physical mask and reserve_memtype() is passed the
      last physical page. Gert reports triggering the:
      
          BUG_ON(start >= end);
      
      ...assertion when running a 32-bit non-PAE build on a platform that has
      a driver resource at the top of physical memory:
      
          BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
      
      Given that the decoy address scheme is only targeted at 64-bit builds and
      assumes that the top of physical address space is free for use as a decoy
      address range, simply bypass address sanitization in the 32-bit case.
      
      Lastly, there was no need to crash the system when this failure occurred,
      and no need to crash future systems if the assumptions of decoy addresses
      are ever violated. Change the BUG_ON() to a WARN() with an error return.
      
      Fixes: 510ee090 ("x86/mm/pat: Prepare {reserve, free}_memtype() for...")
      Reported-by: NGert Robben <t2@gert.gr>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NGert Robben <t2@gert.gr>
      Cc: stable@vger.kernel.org
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: platform-driver-x86@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/154454337985.789277.12133288391664677775.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1e3b98b2
    • C
      x86/mtrr: Don't copy uninitialized gentry fields back to userspace · c623326a
      Colin Ian King 提交于
      commit 32043fa065b51e0b1433e48d118821c71b5cd65d upstream.
      
      Currently the copy_to_user of data in the gentry struct is copying
      uninitiaized data in field _pad from the stack to userspace.
      
      Fix this by explicitly memset'ing gentry to zero, this also will zero any
      compiler added padding fields that may be in struct (currently there are
      none).
      
      Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable")
      
      Fixes: b263b31e ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
      Cc: security@kernel.org
      Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c623326a
    • C
      KVM: Fix UAF in nested posted interrupt processing · 1972ca04
      Cfir Cohen 提交于
      commit c2dd5146e9fe1f22c77c1b011adf84eea0245806 upstream.
      
      nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It
      caches the kmap()ed page object and pointer, however, it doesn't handle
      errors correctly: it's possible to cache a valid pointer, then release
      the page and later dereference the dangling pointer.
      
      I was able to reproduce with the following steps:
      
      1. Call vmlaunch with valid posted_intr_desc_addr but an invalid
      MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed
      pi_desc_page and pi_desc. Later the invalid EFER value fails
      check_vmentry_postreqs() which fails the first vmlaunch.
      
      2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr
      (I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages
      pi_desc_page is unmapped and released and pi_desc_page is set to NULL
      (the "shouldn't happen" clause). Due to the invalid
      posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and
      nested_get_vmcs12_pages() returns. It doesn't return an error value so
      vmlaunch proceeds. Note that at this time we have a dangling pointer in
      vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs.
      
      3. Issue an IPI in L2 guest code. This triggers a call to
      vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which
      dereferences the dangling pointer.
      
      Vulnerable code requires nested and enable_apicv variables to be set to
      true. The host CPU must also support posted interrupts.
      
      Fixes: 5e2f30b7 "KVM: nVMX: get rid of nested_get_page()"
      Cc: stable@vger.kernel.org
      Reviewed-by: NAndy Honig <ahonig@google.com>
      Signed-off-by: NCfir Cohen <cfir@google.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1972ca04
    • E
      kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs · 229468c6
      Eduardo Habkost 提交于
      commit 0e1b869fff60c81b510c2d00602d778f8f59dd9a upstream.
      
      Some guests OSes (including Windows 10) write to MSR 0xc001102c
      on some cases (possibly while trying to apply a CPU errata).
      Make KVM ignore reads and writes to that MSR, so the guest won't
      crash.
      
      The MSR is documented as "Execution Unit Configuration (EX_CFG)",
      at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family
      15h Models 00h-0Fh Processors".
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      229468c6
    • W
      KVM: X86: Fix NULL deref in vcpu_scan_ioapic · 76281d12
      Wanpeng Li 提交于
      commit dcbd3e49c2f0b2c2d8a321507ff8f3de4af76d7c upstream.
      
      Reported by syzkaller:
      
          CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374
          Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
          RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline]
          RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline]
          RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline]
          RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline]
          RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074
          Call Trace:
      	 kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596
      	 vfs_ioctl fs/ioctl.c:46 [inline]
      	 file_ioctl fs/ioctl.c:509 [inline]
      	 do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696
      	 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713
      	 __do_sys_ioctl fs/ioctl.c:720 [inline]
      	 __se_sys_ioctl fs/ioctl.c:718 [inline]
      	 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
      	 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr
      and triggers scan ioapic logic to load synic vectors into EOI exit bitmap.
      However, irqchip is not initialized by this simple testcase, ioapic/apic
      objects should not be accessed.
      
      This patch fixes it by also considering whether or not apic is present.
      
      Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76281d12
  9. 21 12月, 2018 2 次提交
    • Y
      x86/earlyprintk/efi: Fix infinite loop on some screen widths · 985dea32
      YiFei Zhu 提交于
      [ Upstream commit 79c2206d369b87b19ac29cb47601059b6bf5c291 ]
      
      An affected screen resolution is 1366 x 768, which width is not
      divisible by 8, the default font width. On such screens, when longer
      lines are earlyprintk'ed, overflow-to-next-line can never trigger,
      due to the left-most x-coordinate of the next character always less
      than the screen width. Earlyprintk will infinite loop in trying to
      print the rest of the string but unable to, due to the line being
      full.
      
      This patch makes the trigger consider the right-most x-coordinate,
      instead of left-most, as the value to compare against the screen
      width threshold.
      Signed-off-by: NYiFei Zhu <zhuyifei1999@gmail.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
      Cc: Bhupesh Sharma <bhsharma@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Snowberg <eric.snowberg@oracle.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jon Hunter <jonathanh@nvidia.com>
      Cc: Julien Thierry <julien.thierry@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181129171230.18699-12-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      985dea32
    • P
      locking/qspinlock, x86: Provide liveness guarantee · 26586875
      Peter Zijlstra 提交于
      commit 7aa54be2976550f17c11a1c3e3630002dea39303 upstream.
      
      On x86 we cannot do fetch_or() with a single instruction and thus end up
      using a cmpxchg loop, this reduces determinism. Replace the fetch_or()
      with a composite operation: tas-pending + load.
      
      Using two instructions of course opens a window we previously did not
      have. Consider the scenario:
      
      	CPU0		CPU1		CPU2
      
       1)	lock
      	  trylock -> (0,0,1)
      
       2)			lock
      			  trylock /* fail */
      
       3)	unlock -> (0,0,0)
      
       4)					lock
      					  trylock -> (0,0,1)
      
       5)			  tas-pending -> (0,1,1)
      			  load-val <- (0,1,0) from 3
      
       6)			  clear-pending-set-locked -> (0,0,1)
      
      			  FAIL: _2_ owners
      
      where 5) is our new composite operation. When we consider each part of
      the qspinlock state as a separate variable (as we can when
      _Q_PENDING_BITS == 8) then the above is entirely possible, because
      tas-pending will only RmW the pending byte, so the later load is able
      to observe prior tail and lock state (but not earlier than its own
      trylock, which operates on the whole word, due to coherence).
      
      To avoid this we need 2 things:
      
       - the load must come after the tas-pending (obviously, otherwise it
         can trivially observe prior state).
      
       - the tas-pending must be a full word RmW instruction, it cannot be an XCHGB for
         example, such that we cannot observe other state prior to setting
         pending.
      
      On x86 we can realize this by using "LOCK BTS m32, r32" for
      tas-pending followed by a regular load.
      
      Note that observing later state is not a problem:
      
       - if we fail to observe a later unlock, we'll simply spin-wait for
         that store to become visible.
      
       - if we observe a later xchg_tail(), there is no difference from that
         xchg_tail() having taken place before the tas-pending.
      Suggested-by: NWill Deacon <will.deacon@arm.com>
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: andrea.parri@amarulasolutions.com
      Cc: longman@redhat.com
      Fixes: 59fb586b ("locking/qspinlock: Remove unbounded cmpxchg() loop from locking slowpath")
      Link: https://lkml.kernel.org/r/20181003130957.183726335@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      [bigeasy: GEN_BINARY_RMWcc macro redo]
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      26586875
  10. 20 12月, 2018 1 次提交
  11. 17 12月, 2018 4 次提交
    • I
      Revert "xen/balloon: Mark unallocated host memory as UNUSABLE" · a9d79a07
      Igor Druzhinin 提交于
      [ Upstream commit 123664101aa2156d05251704fc63f9bcbf77741a ]
      
      This reverts commit b3cf8528.
      
      That commit unintentionally broke Xen balloon memory hotplug with
      "hotplug_unpopulated" set to 1. As long as "System RAM" resource
      got assigned under a new "Unusable memory" resource in IO/Mem tree
      any attempt to online this memory would fail due to general kernel
      restrictions on having "System RAM" resources as 1st level only.
      
      The original issue that commit has tried to workaround fa564ad9
      ("x86/PCI: Enable a 64bit BAR on AMD Family 15h (Models 00-1f, 30-3f,
      60-7f)") also got amended by the following 03a55173 ("x86/PCI: Move
      and shrink AMD 64-bit window to avoid conflict") which made the
      original fix to Xen ballooning unnecessary.
      Signed-off-by: NIgor Druzhinin <igor.druzhinin@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a9d79a07
    • Y
      x86/kvm/vmx: fix old-style function declaration · bf1b47f3
      Yi Wang 提交于
      [ Upstream commit 1e4329ee ]
      
      The inline keyword which is not at the beginning of the function
      declaration may trigger the following build warnings, so let's fix it:
      
      arch/x86/kvm/vmx.c:1309:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      arch/x86/kvm/vmx.c:5947:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      arch/x86/kvm/vmx.c:5985:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      arch/x86/kvm/vmx.c:6023:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration]
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bf1b47f3
    • Y
      KVM: x86: fix empty-body warnings · d6b1692d
      Yi Wang 提交于
      [ Upstream commit 354cb410 ]
      
      We get the following warnings about empty statements when building
      with 'W=1':
      
      arch/x86/kvm/lapic.c:632:53: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      arch/x86/kvm/lapic.c:1907:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      arch/x86/kvm/lapic.c:1936:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      arch/x86/kvm/lapic.c:1975:44: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body]
      
      Rework the debug helper macro to get rid of these warnings.
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d6b1692d
    • L
      KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changes · 3c7670d5
      Liran Alon 提交于
      [ Upstream commit f48b4711dd6e1cf282f9dfd159c14a305909c97c ]
      
      When guest transitions from/to long-mode by modifying MSR_EFER.LMA,
      the list of shared MSRs to be saved/restored on guest<->host
      transitions is updated (See vmx_set_efer() call to setup_msrs()).
      
      On every entry to guest, vcpu_enter_guest() calls
      vmx_prepare_switch_to_guest(). This function should also take care
      of setting the shared MSRs to be saved/restored. However, the
      function does nothing in case we are already running with loaded
      guest state (vmx->loaded_cpu_state != NULL).
      
      This means that even when guest modifies MSR_EFER.LMA which results
      in updating the list of shared MSRs, it isn't being taken into account
      by vmx_prepare_switch_to_guest() because it happens while we are
      running with loaded guest state.
      
      To fix above mentioned issue, add a flag to mark that the list of
      shared MSRs has been updated and modify vmx_prepare_switch_to_guest()
      to set shared MSRs when running with host state *OR* list of shared
      MSRs has been updated.
      
      Note that this issue was mistakenly introduced by commit
      678e315e ("KVM: vmx: add dedicated utility to access guest's
      kernel_gs_base") because previously vmx_set_efer() always called
      vmx_load_host_state() which resulted in vmx_prepare_switch_to_guest() to
      set shared MSRs.
      
      Fixes: 678e315e ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base")
      Reported-by: NEyal Moscovici <eyal.moscovici@oracle.com>
      Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: NLiam Merwick <liam.merwick@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3c7670d5
  12. 13 12月, 2018 3 次提交
    • E
      x86/efi: Allocate e820 buffer before calling efi_exit_boot_service · e88ebc06
      Eric Snowberg 提交于
      commit b84a64fad40637b1c9fa4f4dbf847a23e29e672b upstream.
      
      The following commit:
      
        d6493401 ("x86/efi: Use efi_exit_boot_services()")
      
      introduced a regression on systems with large memory maps causing them
      to hang on boot. The first "goto get_map" that was removed from
      exit_boot() ensured there was enough room for the memory map when
      efi_call_early(exit_boot_services) was called. This happens when
      (nr_desc > ARRAY_SIZE(params->e820_table).
      
      Chain of events:
      
        exit_boot()
          efi_exit_boot_services()
            efi_get_memory_map                  <- at this point the mm can't grow over 8 desc
            priv_func()
              exit_boot_func()
                allocate_e820ext()              <- new mm grows over 8 desc from e820 alloc
            efi_call_early(exit_boot_services)  <- mm key doesn't match so retry
            efi_call_early(get_memory_map)      <- not enough room for new mm
            system hangs
      
      This patch allocates the e820 buffer before calling efi_exit_boot_services()
      and fixes the regression.
      
       [ mingo: minor cleanliness edits. ]
      Signed-off-by: NEric Snowberg <eric.snowberg@oracle.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arend van Spriel <arend.vanspriel@broadcom.com>
      Cc: Bhupesh Sharma <bhsharma@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Jon Hunter <jonathanh@nvidia.com>
      Cc: Julien Thierry <julien.thierry@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: YiFei Zhu <zhuyifei1999@gmail.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181129171230.18699-2-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e88ebc06
    • M
      kprobes/x86: Fix instruction patching corruption when copying more than one... · ce74d11a
      Masami Hiramatsu 提交于
      kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction
      
      commit 43a1b0cb4cd6dbfd3cd9c10da663368394d299d8 upstream.
      
      After copy_optimized_instructions() copies several instructions
      to the working buffer it tries to fix up the real RIP address, but it
      adjusts the RIP-relative instruction with an incorrect RIP address
      for the 2nd and subsequent instructions due to a bug in the logic.
      
      This will break the kernel pretty badly (with likely outcomes such as
      a kernel freeze, a crash, or worse) because probed instructions can refer
      to the wrong data.
      
      For example putting kprobes on cpumask_next() typically hits this bug.
      
      cpumask_next() is normally like below if CONFIG_CPUMASK_OFFSTACK=y
      (in this case nr_cpumask_bits is an alias of nr_cpu_ids):
      
       <cpumask_next>:
      	48 89 f0		mov    %rsi,%rax
      	8b 35 7b fb e2 00	mov    0xe2fb7b(%rip),%esi # ffffffff82db9e64 <nr_cpu_ids>
      	55			push   %rbp
      ...
      
      If we put a kprobe on it and it gets jump-optimized, it gets
      patched by the kprobes code like this:
      
       <cpumask_next>:
      	e9 95 7d 07 1e		jmpq   0xffffffffa000207a
      	7b fb			jnp    0xffffffff81f8a2e2 <cpumask_next+2>
      	e2 00			loop   0xffffffff81f8a2e9 <cpumask_next+9>
      	55			push   %rbp
      
      This shows that the first two MOV instructions were copied to a
      trampoline buffer at 0xffffffffa000207a.
      
      Here is the disassembled result of the trampoline, skipping
      the optprobe template instructions:
      
      	# Dump of assembly code from 0xffffffffa000207a to 0xffffffffa00020ea:
      
      	54			push   %rsp
      	...
      	48 83 c4 08		add    $0x8,%rsp
      	9d			popfq
      	48 89 f0		mov    %rsi,%rax
      	8b 35 82 7d db e2	mov    -0x1d24827e(%rip),%esi # 0xffffffff82db9e67 <nr_cpu_ids+3>
      
      This dump shows that the second MOV accesses *(nr_cpu_ids+3) instead of
      the original *nr_cpu_ids. This leads to a kernel freeze because
      cpumask_next() always returns 0 and for_each_cpu() never ends.
      
      Fix this by adding 'len' correctly to the real RIP address while
      copying.
      
      [ mingo: Improved the changelog. ]
      Reported-by: NMichael Rodin <michael@rodin.online>
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org # v4.15+
      Fixes: 63fef14f ("kprobes/x86: Make insn buffer always ROX and use text_poke()")
      Link: http://lkml.kernel.org/r/153504457253.22602.1314289671019919596.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      ce74d11a
    • M
      Revert "x86/e820: put !E820_TYPE_RAM regions into memblock.reserved" · bd5d1c27
      Masayoshi Mizuma 提交于
      [ Upstream commit 9fd61bc95130d4971568b89c9548b5e0a4e18e0e ]
      
      commit 124049de ("x86/e820: put !E820_TYPE_RAM regions into
      memblock.reserved") breaks movable_node kernel option because it changed
      the memory gap range to reserved memblock.  So, the node is marked as
      Normal zone even if the SRAT has Hot pluggable affinity.
      
          =====================================================================
          kernel: BIOS-e820: [mem 0x0000180000000000-0x0000180fffffffff] usable
          kernel: BIOS-e820: [mem 0x00001c0000000000-0x00001c0fffffffff] usable
          ...
          kernel: reserved[0x12]#011[0x0000181000000000-0x00001bffffffffff], 0x000003f000000000 bytes flags: 0x0
          ...
          kernel: ACPI: SRAT: Node 2 PXM 6 [mem 0x180000000000-0x1bffffffffff] hotplug
          kernel: ACPI: SRAT: Node 3 PXM 7 [mem 0x1c0000000000-0x1fffffffffff] hotplug
          ...
          kernel: Movable zone start for each node
          kernel:  Node 3: 0x00001c0000000000
          kernel: Early memory node ranges
          ...
          =====================================================================
      
      The original issue is fixed by the former patches, so let's revert commit
      124049de ("x86/e820: put !E820_TYPE_RAM regions into
      memblock.reserved").
      
      Link: http://lkml.kernel.org/r/20181002143821.5112-4-msys.mizuma@gmail.comSigned-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      bd5d1c27