1. 22 2月, 2018 3 次提交
  2. 17 2月, 2018 1 次提交
  3. 16 2月, 2018 3 次提交
  4. 15 2月, 2018 9 次提交
  5. 13 2月, 2018 22 次提交
    • T
      x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages · fd0e786d
      Tony Luck 提交于
      In the following commit:
      
        ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      
      ... we added code to memory_failure() to unmap the page from the
      kernel 1:1 virtual address space to avoid speculative access to the
      page logging additional errors.
      
      But memory_failure() may not always succeed in taking the page offline,
      especially if the page belongs to the kernel.  This can happen if
      there are too many corrected errors on a page and either mcelog(8)
      or drivers/ras/cec.c asks to take a page offline.
      
      Since we remove the 1:1 mapping early in memory_failure(), we can
      end up with the page unmapped, but still in use. On the next access
      the kernel crashes :-(
      
      There are also various debug paths that call memory_failure() to simulate
      occurrence of an error. Since there is no actual error in memory, we
      don't need to map out the page for those cases.
      
      Revert most of the previous attempt and keep the solution local to
      arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
      
      	1) there is a real error
      	2) memory_failure() succeeds.
      
      All of this only applies to 64-bit systems. 32-bit kernel doesn't map
      all of memory into kernel space. It isn't worth adding the code to unmap
      the piece that is mapped because nobody would run a 32-bit kernel on a
      machine that has recoverable machine checks.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert (Persistent Memory) <elliott@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org #v4.14
      Fixes: ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fd0e786d
    • A
      x86/error_inject: Make just_return_func() globally visible · 01684e72
      Arnd Bergmann 提交于
      With link time optimizations enabled, I get a link failure:
      
        ./ccLbOEHX.ltrans19.ltrans.o: In function `override_function_with_return':
        <artificial>:(.text+0x7f3): undefined reference to `just_return_func'
      
      Marking the symbol .globl makes it work as expected.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 540adea3 ("error-injection: Separate error-injection from kprobe")
      Link: http://lkml.kernel.org/r/20180202145634.200291-3-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      01684e72
    • M
      x86/platform/UV: Fix GAM Range Table entries less than 1GB · c25d99d2
      mike.travis@hpe.com 提交于
      The latest UV platforms include the new ApachePass NVDIMMs into the
      UV address space.  This has introduced address ranges in the Global
      Address Map Table that are less than the previous lowest range, which
      was 2GB.  Fix the address calculation so it accommodates address ranges
      from bytes to exabytes.
      Signed-off-by: NMike Travis <mike.travis@hpe.com>
      Reviewed-by: NAndrew Banman <andrew.banman@hpe.com>
      Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180205221503.190219903@stormcage.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c25d99d2
    • P
      x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore · 74eb816b
      Progyan Bhattacharya 提交于
      The file was generated by make command and should not be in the source tree.
      Signed-off-by: NProgyan Bhattacharya <progyanb@acm.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      74eb816b
    • M
      x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU · 295cc7eb
      Masayoshi Mizuma 提交于
      When a physical CPU is hot-removed, the following warning messages
      are shown while the uncore device is removed in uncore_pci_remove():
      
        WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
        uncore_pci_remove+0xf1/0x110
        ...
        CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
        Workqueue: kacpi_hotplug acpi_hotplug_work_fn
        ...
        Call Trace:
        pci_device_remove+0x36/0xb0
        device_release_driver_internal+0x145/0x210
        pci_stop_bus_device+0x76/0xa0
        pci_stop_root_bus+0x44/0x60
        acpi_pci_root_remove+0x1f/0x80
        acpi_bus_trim+0x54/0x90
        acpi_bus_trim+0x2e/0x90
        acpi_device_hotplug+0x2bc/0x4b0
        acpi_hotplug_work_fn+0x1a/0x30
        process_one_work+0x141/0x340
        worker_thread+0x47/0x3e0
        kthread+0xf5/0x130
      
      When uncore_pci_remove() runs, it tries to get the package ID to
      clear the value of uncore_extra_pci_dev[].dev[] by using
      topology_phys_to_logical_pkg(). The warning messesages are
      shown because topology_phys_to_logical_pkg() returns -1.
      
        arch/x86/events/intel/uncore.c:
        static void uncore_pci_remove(struct pci_dev *pdev)
        {
        ...
                phys_id = uncore_pcibus_to_physid(pdev->bus);
        ...
                        pkg = topology_phys_to_logical_pkg(phys_id); // returns -1
                        for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
                                if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
                                        uncore_extra_pci_dev[pkg].dev[i] = NULL;
                                        break;
                                }
                        }
                        WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!!
      
      topology_phys_to_logical_pkg() tries to find
      cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.
      
        arch/x86/kernel/smpboot.c:
        int topology_phys_to_logical_pkg(unsigned int phys_pkg)
        {
                int cpu;
      
                for_each_possible_cpu(cpu) {
                        struct cpuinfo_x86 *c = &cpu_data(cpu);
      
                        if (c->initialized && c->phys_proc_id == phys_pkg)
                                return c->logical_proc_id;
                }
                return -1;
        }
      
      However, the phys_proc_id was already set to 0 by remove_siblinginfo()
      when the CPU was offlined.
      
      So, topology_phys_to_logical_pkg() cannot find the correct
      logical_proc_id and always returns -1.
      
      As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
      messages are shown.
      
      What is worse is that the bogus 'pkg' index results in two bugs:
      
       - We dereference uncore_extra_pci_dev[] with a negative index
       - We fail to clean up a stale pointer in uncore_extra_pci_dev[][]
      
      To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo().
      
      This should not cause any problems, because ->phys_proc_id is not
      used after it is hot-removed and it is re-set while hot-adding.
      Signed-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yasu.isimatu@gmail.com
      Cc: <stable@vger.kernel.org>
      Fixes: 30bb9811 ("x86/topology: Avoid wasting 128k for package id array")
      Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      295cc7eb
    • jia zhang's avatar
      x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally · cd026ca2
      jia zhang 提交于
      The vsyscall page should be visible only if vsyscall=emulate/native when dumping /proc/kcore.
      Signed-off-by: jia zhang's avatarJia Zhang <zhang.jia@linux.alibaba.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jolsa@redhat.com
      Link: http://lkml.kernel.org/r/1518446694-21124-3-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cd026ca2
    • jia zhang's avatar
      vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page · 595dd46e
      jia zhang 提交于
      Commit:
      
        df04abfd ("fs/proc/kcore.c: Add bounce buffer for ktext data")
      
      ... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y.
      However, accessing the vsyscall user page will cause an SMAP fault.
      
      Replace memcpy() with copy_from_user() to fix this bug works, but adding
      a common way to handle this sort of user page may be useful for future.
      
      Currently, only vsyscall page requires KCORE_USER.
      Signed-off-by: jia zhang's avatarJia Zhang <zhang.jia@linux.alibaba.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jolsa@redhat.com
      Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      595dd46e
    • B
      x86/entry/64: Remove the unused 'icebp' macro · b498c261
      Borislav Petkov 提交于
      That macro was touched around 2.5.8 times, judging by the full history
      linux repo, but it was unused even then. Get rid of it already.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux@dominikbrodowski.net
      Link: http://lkml.kernel.org/r/20180212201318.GD14640@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b498c261
    • J
      x86/entry/64: Fix paranoid_entry() frame pointer warning · b3ccefae
      Josh Poimboeuf 提交于
      With the following commit:
      
        f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros")
      
      ... one of my suggested improvements triggered a frame pointer warning:
      
        arch/x86/entry/entry_64.o: warning: objtool: paranoid_entry()+0x11: call without frame pointer save/setup
      
      The warning is correct for the build-time code, but it's actually not
      relevant at runtime because of paravirt patching.  The paravirt swapgs
      call gets replaced with either a SWAPGS instruction or NOPs at runtime.
      
      Go back to the previous behavior by removing the ELF function annotation
      for paranoid_entry() and adding an unwind hint, which effectively
      silences the warning.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kbuild-all@01.org
      Cc: tipbuild@zytor.com
      Fixes: f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros")
      Link: http://lkml.kernel.org/r/20180212174503.5acbymg5z6p32snu@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3ccefae
    • D
      x86/entry/64: Indent PUSH_AND_CLEAR_REGS and POP_REGS properly · 92816f57
      Dominik Brodowski 提交于
      ... same as the other macros in arch/x86/entry/calling.h
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-8-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      92816f57
    • D
      x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros · dde3036d
      Dominik Brodowski 提交于
      Previously, error_entry() and paranoid_entry() saved the GP registers
      onto stack space previously allocated by its callers. Combine these two
      steps in the callers, and use the generic PUSH_AND_CLEAR_REGS macro
      for that.
      
      This adds a significant amount ot text size. However, Ingo Molnar points
      out that:
      
      	"these numbers also _very_ significantly over-represent the
      	extra footprint. The assumptions that resulted in
      	us compressing the IRQ entry code have changed very
      	significantly with the new x86 IRQ allocation code we
      	introduced in the last year:
      
      	- IRQ vectors are usually populated in tightly clustered
      	  groups.
      
      	  With our new vector allocator code the typical per CPU
      	  allocation percentage on x86 systems is ~3 device vectors
      	  and ~10 fixed vectors out of ~220 vectors - i.e. a very
      	  low ~6% utilization (!). [...]
      
      	  The days where we allocated a lot of vectors on every
      	  CPU and the compression of the IRQ entry code text
      	  mattered are over.
      
      	- Another issue is that only a small minority of vectors
      	  is frequent enough to actually matter to cache utilization
      	  in practice: 3-4 key IPIs and 1-2 device IRQs at most - and
      	  those vectors tend to be tightly clustered as well into about
      	  two groups, and are probably already on 2-3 cache lines in
      	  practice.
      
      	  For the common case of 'cache cold' IRQs it's the depth of
      	  the call chain and the fragmentation of the resulting I$
      	  that should be the main performance limit - not the overall
      	  size of it.
      
      	- The CPU side cost of IRQ delivery is still very expensive
      	  even in the best, most cached case, as in 'over a thousand
      	  cycles'. So much stuff is done that maybe contemporary x86
      	  IRQ entry microcode already prefetches the IDT entry and its
      	  expected call target address."[*]
      
      [*] http://lkml.kernel.org/r/20180208094710.qnjixhm6hybebdv7@gmail.com
      
      The "testb $3, CS(%rsp)" instruction in the idtentry macro does not need
      modification. Previously, %rsp was manually decreased by 15*8; with
      this patch, %rsp is decreased by 15 pushq instructions.
      
      [jpoimboe@redhat.com: unwind hint improvements]
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-7-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dde3036d
    • D
      x86/entry/64: Use PUSH_AND_CLEAN_REGS in more cases · 30907fd1
      Dominik Brodowski 提交于
      entry_SYSCALL_64_after_hwframe() and nmi() can be converted to use
      PUSH_AND_CLEAN_REGS instead of opencoded variants thereof. Due to
      the interleaving, the additional XOR-based clearing of R8 and R9
      in entry_SYSCALL_64_after_hwframe() should not have any noticeable
      negative implications.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-6-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      30907fd1
    • D
      x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro · 3f01daec
      Dominik Brodowski 提交于
      Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
      SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
      This macro uses PUSH instead of MOV and should therefore be faster, at
      least on newer CPUs.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-5-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3f01daec
    • D
      x86/entry/64: Interleave XOR register clearing with PUSH instructions · f7bafa2b
      Dominik Brodowski 提交于
      Same as is done for syscalls, interleave XOR with PUSH instructions
      for exceptions/interrupts, in order to minimize the cost of the
      additional instructions required for register clearing.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-4-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f7bafa2b
    • D
      x86/entry/64: Merge the POP_C_REGS and POP_EXTRA_REGS macros into a single POP_REGS macro · 502af0d7
      Dominik Brodowski 提交于
      The two special, opencoded cases for POP_C_REGS can be handled by ASM
      macros.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-3-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      502af0d7
    • D
      x86/entry/64: Merge SAVE_C_REGS and SAVE_EXTRA_REGS, remove unused extensions · 2e3f0098
      Dominik Brodowski 提交于
      All current code paths call SAVE_C_REGS and then immediately
      SAVE_EXTRA_REGS. Therefore, merge these two macros and order the MOV
      sequeneces properly.
      
      While at it, remove the macros to save all except specific registers,
      as these macros have been unused for a long time.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-2-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2e3f0098
    • I
      x86/speculation: Clean up various Spectre related details · 21e433bd
      Ingo Molnar 提交于
      Harmonize all the Spectre messages so that a:
      
          dmesg | grep -i spectre
      
      ... gives us most Spectre related kernel boot messages.
      
      Also fix a few other details:
      
       - clarify a comment about firmware speculation control
      
       - s/KPTI/PTI
      
       - remove various line-breaks that made the code uglier
      Acked-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      21e433bd
    • K
      KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap · 3712caeb
      KarimAllah Ahmed 提交于
      We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all
      MSR accesses or create a valid L02 MSR bitmap and use that. This decision
      has to be made every time we evaluate whether we are going to generate the
      L02 MSR bitmap.
      
      Before commit:
      
        d28b387f ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
      
      ... this was probably OK since the decision was always identical.
      
      This is no longer the case now since the MSR bitmap might actually
      change once we decide to not intercept SPEC_CTRL and PRED_CMD.
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: jmattson@google.com
      Cc: kvm@vger.kernel.org
      Cc: sironi@amazon.de
      Link: http://lkml.kernel.org/r/1518305967-31356-6-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3712caeb
    • K
      X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs · 206587a9
      KarimAllah Ahmed 提交于
      These two variables should check whether SPEC_CTRL and PRED_CMD are
      supposed to be passed through to L2 guests or not. While
      msr_write_intercepted_l01 would return 'true' if it is not passed through.
      
      So just invert the result of msr_write_intercepted_l01 to implement the
      correct semantics.
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: kvm@vger.kernel.org
      Cc: sironi@amazon.de
      Fixes: 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
      Link: http://lkml.kernel.org/r/1518305967-31356-5-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      206587a9
    • D
      KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by... · 928a4c39
      David Woodhouse 提交于
      KVM/x86: Reduce retpoline performance impact in slot_handle_level_range(), by always inlining iterator helper methods
      
      With retpoline, tight loops of "call this function for every XXX" are
      very much pessimised by taking a prediction miss *every* time. This one
      is by far the biggest contributor to the guest launch time with retpoline.
      
      By marking the iterator slot_handle_…() functions always_inline, we can
      ensure that the indirect function call can be optimised away into a
      direct call and it actually generates slightly smaller code because
      some of the other conditionals can get optimised away too.
      
      Performance is now pretty close to what we see with nospectre_v2 on
      the command line.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Tested-by: NFilippo Sironi <sironi@amazon.de>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Reviewed-by: NFilippo Sironi <sironi@amazon.de>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: jmattson@google.com
      Cc: karahmed@amazon.de
      Cc: kvm@vger.kernel.org
      Cc: rkrcmar@redhat.com
      Link: http://lkml.kernel.org/r/1518305967-31356-4-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      928a4c39
    • D
      Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" · f208820a
      David Woodhouse 提交于
      This reverts commit 64e16720.
      
      We cannot call C functions like that, without marking all the
      call-clobbered registers as, well, clobbered. We might have got away
      with it for now because the __ibp_barrier() function was *fairly*
      unlikely to actually use any other registers. But no. Just no.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: jmattson@google.com
      Cc: karahmed@amazon.de
      Cc: kvm@vger.kernel.org
      Cc: pbonzini@redhat.com
      Cc: rkrcmar@redhat.com
      Cc: sironi@amazon.de
      Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f208820a
    • D
      x86/speculation: Correct Speculation Control microcode blacklist again · d37fc6d3
      David Woodhouse 提交于
      Arjan points out that the Intel document only clears the 0xc2 microcode
      on *some* parts with CPUID 506E3 (INTEL_FAM6_SKYLAKE_DESKTOP stepping 3).
      For the Skylake H/S platform it's OK but for Skylake E3 which has the
      same CPUID it isn't (yet) cleared.
      
      So removing it from the blacklist was premature. Put it back for now.
      
      Also, Arjan assures me that the 0x84 microcode for Kaby Lake which was
      featured in one of the early revisions of the Intel document was never
      released to the public, and won't be until/unless it is also validated
      as safe. So those can change to 0x80 which is what all *other* versions
      of the doc have identified.
      
      Once the retrospective testing of existing public microcodes is done, we
      should be back into a mode where new microcodes are only released in
      batches and we shouldn't even need to update the blacklist for those
      anyway, so this tweaking of the list isn't expected to be a thing which
      keeps happening.
      Requested-by: NArjan van de Ven <arjan.van.de.ven@intel.com>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: kvm@vger.kernel.org
      Cc: pbonzini@redhat.com
      Link: http://lkml.kernel.org/r/1518449255-2182-1-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d37fc6d3
  6. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  7. 11 2月, 2018 1 次提交
    • I
      x86/Kconfig: Further simplify the NR_CPUS config · aec6487e
      Ingo Molnar 提交于
      Clean up various aspects of the x86 CONFIG_NR_CPUS configuration switches:
      
      - Rename the three CONFIG_NR_CPUS related variables to create a common
        namespace for them:
      
          RANGE_BEGIN_CPUS => NR_CPUS_RANGE_BEGIN
          RANGE_END_CPUS   => NR_CPUS_RANGE_END
          DEF_CONFIG_CPUS  => NR_CPUS_DEFAULT
      
      - Align them vertically, such as:
      
          config NR_CPUS_RANGE_END
                  int
                  depends on X86_64
                  default 8192 if  SMP && ( MAXSMP ||  CPUMASK_OFFSTACK)
                  default  512 if  SMP && (!MAXSMP && !CPUMASK_OFFSTACK)
                  default    1 if !SMP
      
      - Update help text, add more comments.
      
      Test results:
      
       # i386 allnoconfig:
       CONFIG_NR_CPUS_RANGE_BEGIN=1
       CONFIG_NR_CPUS_RANGE_END=1
       CONFIG_NR_CPUS_DEFAULT=1
       CONFIG_NR_CPUS=1
      
       # i386 defconfig:
       CONFIG_NR_CPUS_RANGE_BEGIN=2
       CONFIG_NR_CPUS_RANGE_END=8
       CONFIG_NR_CPUS_DEFAULT=8
       CONFIG_NR_CPUS=8
      
       # i386 allyesconfig:
       CONFIG_NR_CPUS_RANGE_BEGIN=2
       CONFIG_NR_CPUS_RANGE_END=64
       CONFIG_NR_CPUS_DEFAULT=32
       CONFIG_NR_CPUS=32
      
       # x86_64 allnoconfig:
       CONFIG_NR_CPUS_RANGE_BEGIN=1
       CONFIG_NR_CPUS_RANGE_END=1
       CONFIG_NR_CPUS_DEFAULT=1
       CONFIG_NR_CPUS=1
      
       # x86_64 defconfig:
       CONFIG_NR_CPUS_RANGE_BEGIN=2
       CONFIG_NR_CPUS_RANGE_END=512
       CONFIG_NR_CPUS_DEFAULT=64
       CONFIG_NR_CPUS=64
      
       # x86_64 allyesconfig:
       CONFIG_NR_CPUS_RANGE_BEGIN=8192
       CONFIG_NR_CPUS_RANGE_END=8192
       CONFIG_NR_CPUS_DEFAULT=8192
       CONFIG_NR_CPUS=8192
      Acked-by: NRandy Dunlap <rdunlap@infradead.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180210113629.jcv6su3r4suuno63@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aec6487e