1. 18 11月, 2020 2 次提交
  2. 17 11月, 2020 1 次提交
    • C
      x86/microcode/intel: Check patch signature before saving microcode for early loading · 1a371e67
      Chen Yu 提交于
      Currently, scan_microcode() leverages microcode_matches() to check
      if the microcode matches the CPU by comparing the family and model.
      However, the processor stepping and flags of the microcode signature
      should also be considered when saving a microcode patch for early
      update.
      
      Use find_matching_signature() in scan_microcode() and get rid of the
      now-unused microcode_matches() which is a good cleanup in itself.
      
      Complete the verification of the patch being saved for early loading in
      save_microcode_patch() directly. This needs to be done there too because
      save_mc_for_early() will call save_microcode_patch() too.
      
      The second reason why this needs to be done is because the loader still
      tries to support, at least hypothetically, mixed-steppings systems and
      thus adds all patches to the cache that belong to the same CPU model
      albeit with different steppings.
      
      For example:
      
        microcode: CPU: sig=0x906ec, pf=0x2, rev=0xd6
        microcode: mc_saved[0]: sig=0x906e9, pf=0x2a, rev=0xd6, total size=0x19400, date = 2020-04-23
        microcode: mc_saved[1]: sig=0x906ea, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27
        microcode: mc_saved[2]: sig=0x906eb, pf=0x2, rev=0xd6, total size=0x19400, date = 2020-04-23
        microcode: mc_saved[3]: sig=0x906ec, pf=0x22, rev=0xd6, total size=0x19000, date = 2020-04-27
        microcode: mc_saved[4]: sig=0x906ed, pf=0x22, rev=0xd6, total size=0x19400, date = 2020-04-23
      
      The patch which is being saved for early loading, however, can only be
      the one which fits the CPU this runs on so do the signature verification
      before saving.
      
       [ bp: Do signature verification in save_microcode_patch()
             and rewrite commit message. ]
      
      Fixes: ec400dde ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU")
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=208535
      Link: https://lkml.kernel.org/r/20201113015923.13960-1-yu.c.chen@intel.com
      1a371e67
  3. 13 11月, 2020 1 次提交
  4. 10 11月, 2020 1 次提交
  5. 07 11月, 2020 3 次提交
  6. 06 11月, 2020 1 次提交
    • A
      x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP · 1978b3a5
      Anand K Mistry 提交于
      On AMD CPUs which have the feature X86_FEATURE_AMD_STIBP_ALWAYS_ON,
      STIBP is set to on and
      
        spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED
      
      At the same time, IBPB can be set to conditional.
      
      However, this leads to the case where it's impossible to turn on IBPB
      for a process because in the PR_SPEC_DISABLE case in ib_prctl_set() the
      
        spectre_v2_user_stibp == SPECTRE_V2_USER_STRICT_PREFERRED
      
      condition leads to a return before the task flag is set. Similarly,
      ib_prctl_get() will return PR_SPEC_DISABLE even though IBPB is set to
      conditional.
      
      More generally, the following cases are possible:
      
      1. STIBP = conditional && IBPB = on for spectre_v2_user=seccomp,ibpb
      2. STIBP = on && IBPB = conditional for AMD CPUs with
         X86_FEATURE_AMD_STIBP_ALWAYS_ON
      
      The first case functions correctly today, but only because
      spectre_v2_user_ibpb isn't updated to reflect the IBPB mode.
      
      At a high level, this change does one thing. If either STIBP or IBPB
      is set to conditional, allow the prctl to change the task flag.
      Also, reflect that capability when querying the state. This isn't
      perfect since it doesn't take into account if only STIBP or IBPB is
      unconditionally on. But it allows the conditional feature to work as
      expected, without affecting the unconditional one.
      
       [ bp: Massage commit message and comment; space out statements for
         better readability. ]
      
      Fixes: 21998a35 ("x86/speculation: Avoid force-disabling IBPB based on STIBP and enhanced IBRS.")
      Signed-off-by: NAnand K Mistry <amistry@google.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
      Link: https://lkml.kernel.org/r/20201105163246.v2.1.Ifd7243cd3e2c2206a893ad0a5b9a4f19549e22c6@changeid
      1978b3a5
  7. 30 10月, 2020 3 次提交
  8. 29 10月, 2020 1 次提交
  9. 28 10月, 2020 3 次提交
  10. 26 10月, 2020 1 次提交
  11. 22 10月, 2020 1 次提交
    • J
      x86/alternative: Don't call text_poke() in lazy TLB mode · abee7c49
      Juergen Gross 提交于
      When running in lazy TLB mode the currently active page tables might
      be the ones of a previous process, e.g. when running a kernel thread.
      
      This can be problematic in case kernel code is being modified via
      text_poke() in a kernel thread, and on another processor exit_mmap()
      is active for the process which was running on the first cpu before
      the kernel thread.
      
      As text_poke() is using a temporary address space and the former
      address space (obtained via cpu_tlbstate.loaded_mm) is restored
      afterwards, there is a race possible in case the cpu on which
      exit_mmap() is running wants to make sure there are no stale
      references to that address space on any cpu active (this e.g. is
      required when running as a Xen PV guest, where this problem has been
      observed and analyzed).
      
      In order to avoid that, drop off TLB lazy mode before switching to the
      temporary address space.
      
      Fixes: cefa929c ("x86/mm: Introduce temporary mm structs")
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20201009144225.12019-1-jgross@suse.com
      abee7c49
  12. 19 10月, 2020 1 次提交
  13. 18 10月, 2020 1 次提交
    • J
      task_work: cleanup notification modes · 91989c70
      Jens Axboe 提交于
      A previous commit changed the notification mode from true/false to an
      int, allowing notify-no, notify-yes, or signal-notify. This was
      backwards compatible in the sense that any existing true/false user
      would translate to either 0 (on notification sent) or 1, the latter
      which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2.
      
      Clean this up properly, and define a proper enum for the notification
      mode. Now we have:
      
      - TWA_NONE. This is 0, same as before the original change, meaning no
        notification requested.
      - TWA_RESUME. This is 1, same as before the original change, meaning
        that we use TIF_NOTIFY_RESUME.
      - TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the
        notification.
      
      Clean up all the callers, switching their 0/1/false/true to using the
      appropriate TWA_* mode for notifications.
      
      Fixes: e91b4816 ("task_work: teach task_work_add() to do signal_wake_up()")
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      91989c70
  14. 15 10月, 2020 2 次提交
    • M
      Drivers: hv: vmbus: Add parsing of VMbus interrupt in ACPI DSDT · 626b901f
      Michael Kelley 提交于
      On ARM64, Hyper-V now specifies the interrupt to be used by VMbus
      in the ACPI DSDT.  This information is not used on x86 because the
      interrupt vector must be hardcoded.  But update the generic
      VMbus driver to do the parsing and pass the information to the
      architecture specific code that sets up the Linux IRQ.  Update
      consumers of the interrupt to get it from an architecture specific
      function.
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Link: https://lore.kernel.org/r/1597434304-40631-1-git-send-email-mikelley@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
      626b901f
    • J
      x86/unwind/orc: Fix inactive tasks with stack pointer in %sp on GCC 10 compiled kernels · f2ac57a4
      Jiri Slaby 提交于
      GCC 10 optimizes the scheduler code differently than its predecessors.
      
      When CONFIG_DEBUG_SECTION_MISMATCH=y, the Makefile forces GCC not
      to inline some functions (-fno-inline-functions-called-once). Before GCC
      10, "no-inlined" __schedule() starts with the usual prologue:
      
        push %bp
        mov %sp, %bp
      
      So the ORC unwinder simply picks stack pointer from %bp and
      unwinds from __schedule() just perfectly:
      
        $ cat /proc/1/stack
        [<0>] ep_poll+0x3e9/0x450
        [<0>] do_epoll_wait+0xaa/0xc0
        [<0>] __x64_sys_epoll_wait+0x1a/0x20
        [<0>] do_syscall_64+0x33/0x40
        [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      But now, with GCC 10, there is no %bp prologue in __schedule():
      
        $ cat /proc/1/stack
        <nothing>
      
      The ORC entry of the point in __schedule() is:
      
        sp:sp+88 bp:last_sp-48 type:call end:0
      
      In this case, nobody subtracts sizeof "struct inactive_task_frame" in
      __unwind_start(). The struct is put on the stack by __switch_to_asm() and
      only then __switch_to_asm() stores %sp to task->thread.sp. But we start
      unwinding from a point in __schedule() (stored in frame->ret_addr by
      'call') and not in __switch_to_asm().
      
      So for these example values in __unwind_start():
      
        sp=ffff94b50001fdc8 bp=ffff8e1f41d29340 ip=__schedule+0x1f0
      
      The stack is:
      
        ffff94b50001fdc8: ffff8e1f41578000 # struct inactive_task_frame
        ffff94b50001fdd0: 0000000000000000
        ffff94b50001fdd8: ffff8e1f41d29340
        ffff94b50001fde0: ffff8e1f41611d40 # ...
        ffff94b50001fde8: ffffffff93c41920 # bx
        ffff94b50001fdf0: ffff8e1f41d29340 # bp
        ffff94b50001fdf8: ffffffff9376cad0 # ret_addr (and end of the struct)
      
      0xffffffff9376cad0 is __schedule+0x1f0 (after the call to
      __switch_to_asm).  Now follow those 88 bytes from the ORC entry (sp+88).
      The entry is correct, __schedule() really pushes 48 bytes (8*7) + 32 bytes
      via subq to store some local values (like 4U below). So to unwind, look
      at the offset 88-sizeof(long) = 0x50 from here:
      
        ffff94b50001fe00: ffff8e1f41578618
        ffff94b50001fe08: 00000cc000000255
        ffff94b50001fe10: 0000000500000004
        ffff94b50001fe18: 7793fab6956b2d00 # NOTE (see below)
        ffff94b50001fe20: ffff8e1f41578000
        ffff94b50001fe28: ffff8e1f41578000
        ffff94b50001fe30: ffff8e1f41578000
        ffff94b50001fe38: ffff8e1f41578000
        ffff94b50001fe40: ffff94b50001fed8
        ffff94b50001fe48: ffff8e1f41577ff0
        ffff94b50001fe50: ffffffff9376cf12
      
      Here                ^^^^^^^^^^^^^^^^ is the correct ret addr from
      __schedule(). It translates to schedule+0x42 (insn after a call to
      __schedule()).
      
      BUT, unwind_next_frame() tries to take the address starting from
      0xffff94b50001fdc8. That is exactly from thread.sp+88-sizeof(long) =
      0xffff94b50001fdc8+88-8 = 0xffff94b50001fe18, which is garbage marked as
      NOTE above. So this quits the unwinding as 7793fab6956b2d00 is obviously
      not a kernel address.
      
      There was a fix to skip 'struct inactive_task_frame' in
      unwind_get_return_address_ptr in the following commit:
      
        187b96db ("x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks")
      
      But we need to skip the struct already in the unwinder proper. So
      subtract the size (increase the stack pointer) of the structure in
      __unwind_start() directly. This allows for removal of the code added by
      commit 187b96db completely, as the address is now at
      '(unsigned long *)state->sp - 1', the same as in the generic case.
      
      [ mingo: Cleaned up the changelog a bit, for better readability. ]
      
      Fixes: ee9f8fce ("x86/unwind: Add the ORC unwinder")
      Bug: https://bugzilla.suse.com/show_bug.cgi?id=1176907Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20201014053051.24199-1-jslaby@suse.cz
      f2ac57a4
  15. 14 10月, 2020 5 次提交
    • K
      x86/kexec: Use up-to-dated screen_info copy to fill boot params · afc18069
      Kairui Song 提交于
      kexec_file_load() currently reuses the old boot_params.screen_info,
      but if drivers have change the hardware state, boot_param.screen_info
      could contain invalid info.
      
      For example, the video type might be no longer VGA, or the frame buffer
      address might be changed. If the kexec kernel keeps using the old screen_info,
      kexec'ed kernel may attempt to write to an invalid framebuffer
      memory region.
      
      There are two screen_info instances globally available, boot_params.screen_info
      and screen_info. Later one is a copy, and is updated by drivers.
      
      So let kexec_file_load use the updated copy.
      
      [ mingo: Tidied up the changelog. ]
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20201014092429.1415040-2-kasong@redhat.com
      afc18069
    • M
      x86/setup: simplify reserve_crashkernel() · 6120cdc0
      Mike Rapoport 提交于
      * Replace magic numbers with defines
      * Replace memblock_find_in_range() + memblock_reserve() with
        memblock_phys_alloc_range()
      * Stop checking for low memory size in reserve_crashkernel_low(). The
        allocation from limited range will anyway fail if there is no enough
        memory, so there is no need for extra traversal of memblock.memory
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-15-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6120cdc0
    • M
      x86/setup: simplify initrd relocation and reservation · 3c45ee6d
      Mike Rapoport 提交于
      Currently, initrd image is reserved very early during setup and then it
      might be relocated and re-reserved after the initial physical memory
      mapping is created.  The "late" reservation of memblock verifies that
      mapped memory size exceeds the size of initrd, then checks whether the
      relocation required and, if yes, relocates inirtd to a new memory
      allocated from memblock and frees the old location.
      
      The check for memory size is excessive as memblock allocation will anyway
      fail if there is not enough memory.  Besides, there is no point to
      allocate memory from memblock using memblock_find_in_range() +
      memblock_reserve() when there exists memblock_phys_alloc_range() with
      required functionality.
      
      Remove the redundant check and simplify memblock allocation.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-14-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c45ee6d
    • D
      efi/fake_mem: arrange for a resource entry per efi_fake_mem instance · 88e9a5b7
      Dan Williams 提交于
      In preparation for attaching a platform device per iomem resource teach
      the efi_fake_mem code to create an e820 entry per instance.  Similar to
      E820_TYPE_PRAM, bypass merging resource when the e820 map is sanitized.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NArd Biesheuvel <ardb@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Link: https://lkml.kernel.org/r/159643096068.4062302.11590041070221681669.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88e9a5b7
    • T
      x86/traps: Fix #DE Oops message regression · 5f1ec1fd
      Thomas Gleixner 提交于
      The conversion of #DE to the idtentry mechanism introduced a change in the
      Ooops message which confuses tools which parse crash information in dmesg.
      
      Remove the underscore from 'divide_error' to restore previous behaviour.
      
      Fixes: 9d06c402 ("x86/entry: Convert Divide Error to IDTENTRY")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/CACT4Y+bTZFkuZd7+bPArowOv-7Die+WZpfOWnEO_Wgs3U59+oA@mail.gmail.com
      5f1ec1fd
  16. 07 10月, 2020 13 次提交