1. 23 2月, 2018 5 次提交
    • S
      x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations · 45967493
      Samuel Neves 提交于
      Without this fix, /proc/cpuinfo will display an incorrect amount
      of CPU cores, after bringing them offline and online again, as
      exemplified below:
      
        $ cat /proc/cpuinfo | grep cores
        cpu cores	: 4
        cpu cores	: 8
        cpu cores	: 8
        cpu cores	: 20
        cpu cores	: 4
        cpu cores	: 3
        cpu cores	: 2
        cpu cores	: 2
      
      This patch fixes this by always zeroing the booted_cores variable
      upon turning off a logical CPU.
      Tested-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: NSamuel Neves <sneves@dei.uc.pt>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jgross@suse.com
      Cc: luto@kernel.org
      Cc: prarit@redhat.com
      Cc: vkuznets@redhat.com
      Link: http://lkml.kernel.org/r/20180221205036.5244-1-sneves@dei.uc.ptSigned-off-by: NIngo Molnar <mingo@kernel.org>
      45967493
    • W
      x86/intel_rdt: Fix incorrect returned value when creating rdgroup... · 36e74d35
      Wang Hui 提交于
      x86/intel_rdt: Fix incorrect returned value when creating rdgroup sub-directory in resctrl file system
      
      If no monitoring feature is detected because all monitoring features are
      disabled during boot time or there is no monitoring feature in hardware,
      creating rdtgroup sub-directory by "mkdir" command reports error:
      
        mkdir: cannot create directory ‘/sys/fs/resctrl/p1’: No such file or directory
      
      But the sub-directory actually is generated and content is correct:
      
        cpus  cpus_list  schemata  tasks
      
      The error is because rdtgroup_mkdir_ctrl_mon() returns non zero value after
      the sub-directory is created and the returned value is reported as an error
      to user.
      
      Clear the returned value to report to user that the sub-directory is
      actually created successfully.
      Signed-off-by: NWang Hui <john.wanghui@huawei.com>
      Signed-off-by: NZhang Yanfei <yanfei.zhang@huawei.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vikas <vikas.shivappa@intel.com>
      Cc: Xiaochen Shen <xiaochen.shen@intel.com>
      Link: http://lkml.kernel.org/r/1519356363-133085-1-git-send-email-fenghua.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      36e74d35
    • T
      x86/apic/vector: Handle vector release on CPU unplug correctly · e84cf6aa
      Thomas Gleixner 提交于
      When a irq vector is replaced, then the previous vector is normally
      released when the first interrupt happens on the new vector. If the target
      CPU of the previous vector is already offline when the new vector is
      installed, then the previous vector is silently discarded, which leads to
      accounting issues causing suspend failures and other problems.
      
      Adjust the logic so that the previous vector is freed in the underlying
      matrix allocator to ensure that the accounting stays correct.
      
      Fixes: 69cde000 ("x86/vector: Use matrix allocator for vector assignment")
      Reported-by: NYuriy Vostrikov <delamonpansie@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NYuriy Vostrikov <delamonpansie@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180222112316.930791749@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e84cf6aa
    • D
      bpf, x64: implement retpoline for tail call · a493a87f
      Daniel Borkmann 提交于
      Implement a retpoline [0] for the BPF tail call JIT'ing that converts
      the indirect jump via jmp %rax that is used to make the long jump into
      another JITed BPF image. Since this is subject to speculative execution,
      we need to control the transient instruction sequence here as well
      when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop.
      The latter aligns also with what gcc / clang emits (e.g. [1]).
      
      JIT dump after patch:
      
        # bpftool p d x i 1
         0: (18) r2 = map[id:1]
         2: (b7) r3 = 0
         3: (85) call bpf_tail_call#12
         4: (b7) r0 = 2
         5: (95) exit
      
      With CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000072  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000072  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000072  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	callq  0x000000000000006d  |+
        66:	pause                      |
        68:	lfence                     |
        6b:	jmp    0x0000000000000066  |
        6d:	mov    %rax,(%rsp)         |
        71:	retq                       |
        72:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        + retpoline for indirect jump
      
      Without CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000063  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000063  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000063  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	jmpq   *%rax               |-
        63:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        - plain indirect jump as before
      
        [0] https://support.google.com/faqs/answer/7625886
        [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2bSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a493a87f
    • H
      x86: Treat R_X86_64_PLT32 as R_X86_64_PC32 · b21ebf2f
      H.J. Lu 提交于
      On i386, there are 2 types of PLTs, PIC and non-PIC.  PIE and shared
      objects must use PIC PLT.  To use PIC PLT, you need to load
      _GLOBAL_OFFSET_TABLE_ into EBX first.  There is no need for that on
      x86-64 since x86-64 uses PC-relative PLT.
      
      On x86-64, for 32-bit PC-relative branches, we can generate PLT32
      relocation, instead of PC32 relocation, which can also be used as
      a marker for 32-bit PC-relative branches.  Linker can always reduce
      PLT32 relocation to PC32 if function is defined locally.   Local
      functions should use PC32 relocation.  As far as Linux kernel is
      concerned, R_X86_64_PLT32 can be treated the same as R_X86_64_PC32
      since Linux kernel doesn't use PLT.
      
      R_X86_64_PLT32 for 32-bit PC-relative branches has been enabled in
      binutils master branch which will become binutils 2.31.
      
      [ hjl is working on having better documentation on this all, but a few
        more notes from him:
      
         "PLT32 relocation is used as marker for PC-relative branches. Because
          of EBX, it looks odd to generate PLT32 relocation on i386 when EBX
          doesn't have GOT.
      
          As for symbol resolution, PLT32 and PC32 relocations are almost
          interchangeable. But when linker sees PLT32 relocation against a
          protected symbol, it can resolved locally at link-time since it is
          used on a branch instruction. Linker can't do that for PC32
          relocation"
      
        but for the kernel use, the two are basically the same, and this
        commit gets things building and working with the current binutils
        master   - Linus ]
      Signed-off-by: NH.J. Lu <hjl.tools@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b21ebf2f
  2. 22 2月, 2018 1 次提交
    • I
      treewide/trivial: Remove ';;$' typo noise · ed7158ba
      Ingo Molnar 提交于
      On lkml suggestions were made to split up such trivial typo fixes into per subsystem
      patches:
      
        --- a/arch/x86/boot/compressed/eboot.c
        +++ b/arch/x86/boot/compressed/eboot.c
        @@ -439,7 +439,7 @@ setup_uga32(void **uga_handle, unsigned long size, u32 *width, u32 *height)
                struct efi_uga_draw_protocol *uga = NULL, *first_uga;
                efi_guid_t uga_proto = EFI_UGA_PROTOCOL_GUID;
                unsigned long nr_ugas;
        -       u32 *handles = (u32 *)uga_handle;;
        +       u32 *handles = (u32 *)uga_handle;
                efi_status_t status = EFI_INVALID_PARAMETER;
                int i;
      
      This patch is the result of the following script:
      
        $ sed -i 's/;;$/;/g' $(git grep -E ';;$'  | grep "\.[ch]:"  | grep -vwE 'for|ia64' | cut -d: -f1 | sort | uniq)
      
      ... followed by manual review to make sure it's all good.
      
      Splitting this up is just crazy talk, let's get over with this and just do it.
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ed7158ba
  3. 21 2月, 2018 1 次提交
    • A
      x86/oprofile: Fix bogus GCC-8 warning in nmi_setup() · 85c615eb
      Arnd Bergmann 提交于
      GCC-8 shows a warning for the x86 oprofile code that copies per-CPU
      data from CPU 0 to all other CPUs, which when building a non-SMP
      kernel turns into a memcpy() with identical source and destination
      pointers:
      
       arch/x86/oprofile/nmi_int.c: In function 'mux_clone':
       arch/x86/oprofile/nmi_int.c:285:2: error: 'memcpy' source argument is the same as destination [-Werror=restrict]
         memcpy(per_cpu(cpu_msrs, cpu).multiplex,
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                per_cpu(cpu_msrs, 0).multiplex,
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                sizeof(struct op_msr) * model->num_virt_counters);
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       arch/x86/oprofile/nmi_int.c: In function 'nmi_setup':
       arch/x86/oprofile/nmi_int.c:466:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict]
       arch/x86/oprofile/nmi_int.c:470:3: error: 'memcpy' source argument is the same as destination [-Werror=restrict]
      
      I have analyzed a number of such warnings now: some are valid and the
      GCC warning is welcome. Others turned out to be false-positives, and
      GCC was changed to not warn about those any more. This is a corner case
      that is a false-positive but the GCC developers feel it's better to keep
      warning about it.
      
      In this case, it seems best to work around it by telling GCC
      a little more clearly that this code path is never hit with
      an IS_ENABLED() configuration check.
      
      Cc:stable as we also want old kernels to build cleanly with GCC-8.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Jessica Yu <jeyu@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Sebor <msebor@gcc.gnu.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: oprofile-list@lists.sf.net
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20180220205826.2008875-1-arnd@arndb.de
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84095Signed-off-by: NIngo Molnar <mingo@kernel.org>
      85c615eb
  4. 20 2月, 2018 1 次提交
  5. 17 2月, 2018 1 次提交
  6. 16 2月, 2018 3 次提交
  7. 15 2月, 2018 9 次提交
  8. 13 2月, 2018 19 次提交
    • T
      x86/mm, mm/hwpoison: Don't unconditionally unmap kernel 1:1 pages · fd0e786d
      Tony Luck 提交于
      In the following commit:
      
        ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      
      ... we added code to memory_failure() to unmap the page from the
      kernel 1:1 virtual address space to avoid speculative access to the
      page logging additional errors.
      
      But memory_failure() may not always succeed in taking the page offline,
      especially if the page belongs to the kernel.  This can happen if
      there are too many corrected errors on a page and either mcelog(8)
      or drivers/ras/cec.c asks to take a page offline.
      
      Since we remove the 1:1 mapping early in memory_failure(), we can
      end up with the page unmapped, but still in use. On the next access
      the kernel crashes :-(
      
      There are also various debug paths that call memory_failure() to simulate
      occurrence of an error. Since there is no actual error in memory, we
      don't need to map out the page for those cases.
      
      Revert most of the previous attempt and keep the solution local to
      arch/x86/kernel/cpu/mcheck/mce.c. Unmap the page only when:
      
      	1) there is a real error
      	2) memory_failure() succeeds.
      
      All of this only applies to 64-bit systems. 32-bit kernel doesn't map
      all of memory into kernel space. It isn't worth adding the code to unmap
      the piece that is mapped because nobody would run a 32-bit kernel on a
      machine that has recoverable machine checks.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert (Persistent Memory) <elliott@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org #v4.14
      Fixes: ce0fa3e5 ("x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fd0e786d
    • A
      x86/error_inject: Make just_return_func() globally visible · 01684e72
      Arnd Bergmann 提交于
      With link time optimizations enabled, I get a link failure:
      
        ./ccLbOEHX.ltrans19.ltrans.o: In function `override_function_with_return':
        <artificial>:(.text+0x7f3): undefined reference to `just_return_func'
      
      Marking the symbol .globl makes it work as expected.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nicolas Pitre <nico@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 540adea3 ("error-injection: Separate error-injection from kprobe")
      Link: http://lkml.kernel.org/r/20180202145634.200291-3-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      01684e72
    • M
      x86/platform/UV: Fix GAM Range Table entries less than 1GB · c25d99d2
      mike.travis@hpe.com 提交于
      The latest UV platforms include the new ApachePass NVDIMMs into the
      UV address space.  This has introduced address ranges in the Global
      Address Map Table that are less than the previous lowest range, which
      was 2GB.  Fix the address calculation so it accommodates address ranges
      from bytes to exabytes.
      Signed-off-by: NMike Travis <mike.travis@hpe.com>
      Reviewed-by: NAndrew Banman <andrew.banman@hpe.com>
      Reviewed-by: NDimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180205221503.190219903@stormcage.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c25d99d2
    • P
      x86/build: Add arch/x86/tools/insn_decoder_test to .gitignore · 74eb816b
      Progyan Bhattacharya 提交于
      The file was generated by make command and should not be in the source tree.
      Signed-off-by: NProgyan Bhattacharya <progyanb@acm.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      74eb816b
    • M
      x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a physical CPU · 295cc7eb
      Masayoshi Mizuma 提交于
      When a physical CPU is hot-removed, the following warning messages
      are shown while the uncore device is removed in uncore_pci_remove():
      
        WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988
        uncore_pci_remove+0xf1/0x110
        ...
        CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1
        Workqueue: kacpi_hotplug acpi_hotplug_work_fn
        ...
        Call Trace:
        pci_device_remove+0x36/0xb0
        device_release_driver_internal+0x145/0x210
        pci_stop_bus_device+0x76/0xa0
        pci_stop_root_bus+0x44/0x60
        acpi_pci_root_remove+0x1f/0x80
        acpi_bus_trim+0x54/0x90
        acpi_bus_trim+0x2e/0x90
        acpi_device_hotplug+0x2bc/0x4b0
        acpi_hotplug_work_fn+0x1a/0x30
        process_one_work+0x141/0x340
        worker_thread+0x47/0x3e0
        kthread+0xf5/0x130
      
      When uncore_pci_remove() runs, it tries to get the package ID to
      clear the value of uncore_extra_pci_dev[].dev[] by using
      topology_phys_to_logical_pkg(). The warning messesages are
      shown because topology_phys_to_logical_pkg() returns -1.
      
        arch/x86/events/intel/uncore.c:
        static void uncore_pci_remove(struct pci_dev *pdev)
        {
        ...
                phys_id = uncore_pcibus_to_physid(pdev->bus);
        ...
                        pkg = topology_phys_to_logical_pkg(phys_id); // returns -1
                        for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) {
                                if (uncore_extra_pci_dev[pkg].dev[i] == pdev) {
                                        uncore_extra_pci_dev[pkg].dev[i] = NULL;
                                        break;
                                }
                        }
                        WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!!
      
      topology_phys_to_logical_pkg() tries to find
      cpuinfo_x86->phys_proc_id that matches the phys_pkg argument.
      
        arch/x86/kernel/smpboot.c:
        int topology_phys_to_logical_pkg(unsigned int phys_pkg)
        {
                int cpu;
      
                for_each_possible_cpu(cpu) {
                        struct cpuinfo_x86 *c = &cpu_data(cpu);
      
                        if (c->initialized && c->phys_proc_id == phys_pkg)
                                return c->logical_proc_id;
                }
                return -1;
        }
      
      However, the phys_proc_id was already set to 0 by remove_siblinginfo()
      when the CPU was offlined.
      
      So, topology_phys_to_logical_pkg() cannot find the correct
      logical_proc_id and always returns -1.
      
      As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning
      messages are shown.
      
      What is worse is that the bogus 'pkg' index results in two bugs:
      
       - We dereference uncore_extra_pci_dev[] with a negative index
       - We fail to clean up a stale pointer in uncore_extra_pci_dev[][]
      
      To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo().
      
      This should not cause any problems, because ->phys_proc_id is not
      used after it is hot-removed and it is re-set while hot-adding.
      Signed-off-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yasu.isimatu@gmail.com
      Cc: <stable@vger.kernel.org>
      Fixes: 30bb9811 ("x86/topology: Avoid wasting 128k for package id array")
      Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      295cc7eb
    • jia zhang's avatar
      x86/mm/kcore: Add vsyscall page to /proc/kcore conditionally · cd026ca2
      jia zhang 提交于
      The vsyscall page should be visible only if vsyscall=emulate/native when dumping /proc/kcore.
      Signed-off-by: jia zhang's avatarJia Zhang <zhang.jia@linux.alibaba.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jolsa@redhat.com
      Link: http://lkml.kernel.org/r/1518446694-21124-3-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cd026ca2
    • jia zhang's avatar
      vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user page · 595dd46e
      jia zhang 提交于
      Commit:
      
        df04abfd ("fs/proc/kcore.c: Add bounce buffer for ktext data")
      
      ... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y.
      However, accessing the vsyscall user page will cause an SMAP fault.
      
      Replace memcpy() with copy_from_user() to fix this bug works, but adding
      a common way to handle this sort of user page may be useful for future.
      
      Currently, only vsyscall page requires KCORE_USER.
      Signed-off-by: jia zhang's avatarJia Zhang <zhang.jia@linux.alibaba.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: jolsa@redhat.com
      Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      595dd46e
    • B
      x86/entry/64: Remove the unused 'icebp' macro · b498c261
      Borislav Petkov 提交于
      That macro was touched around 2.5.8 times, judging by the full history
      linux repo, but it was unused even then. Get rid of it already.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux@dominikbrodowski.net
      Link: http://lkml.kernel.org/r/20180212201318.GD14640@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b498c261
    • J
      x86/entry/64: Fix paranoid_entry() frame pointer warning · b3ccefae
      Josh Poimboeuf 提交于
      With the following commit:
      
        f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros")
      
      ... one of my suggested improvements triggered a frame pointer warning:
      
        arch/x86/entry/entry_64.o: warning: objtool: paranoid_entry()+0x11: call without frame pointer save/setup
      
      The warning is correct for the build-time code, but it's actually not
      relevant at runtime because of paravirt patching.  The paravirt swapgs
      call gets replaced with either a SWAPGS instruction or NOPs at runtime.
      
      Go back to the previous behavior by removing the ELF function annotation
      for paranoid_entry() and adding an unwind hint, which effectively
      silences the warning.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kbuild-all@01.org
      Cc: tipbuild@zytor.com
      Fixes: f09d160992d1 ("x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros")
      Link: http://lkml.kernel.org/r/20180212174503.5acbymg5z6p32snu@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b3ccefae
    • D
      x86/entry/64: Indent PUSH_AND_CLEAR_REGS and POP_REGS properly · 92816f57
      Dominik Brodowski 提交于
      ... same as the other macros in arch/x86/entry/calling.h
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-8-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      92816f57
    • D
      x86/entry/64: Get rid of the ALLOC_PT_GPREGS_ON_STACK and SAVE_AND_CLEAR_REGS macros · dde3036d
      Dominik Brodowski 提交于
      Previously, error_entry() and paranoid_entry() saved the GP registers
      onto stack space previously allocated by its callers. Combine these two
      steps in the callers, and use the generic PUSH_AND_CLEAR_REGS macro
      for that.
      
      This adds a significant amount ot text size. However, Ingo Molnar points
      out that:
      
      	"these numbers also _very_ significantly over-represent the
      	extra footprint. The assumptions that resulted in
      	us compressing the IRQ entry code have changed very
      	significantly with the new x86 IRQ allocation code we
      	introduced in the last year:
      
      	- IRQ vectors are usually populated in tightly clustered
      	  groups.
      
      	  With our new vector allocator code the typical per CPU
      	  allocation percentage on x86 systems is ~3 device vectors
      	  and ~10 fixed vectors out of ~220 vectors - i.e. a very
      	  low ~6% utilization (!). [...]
      
      	  The days where we allocated a lot of vectors on every
      	  CPU and the compression of the IRQ entry code text
      	  mattered are over.
      
      	- Another issue is that only a small minority of vectors
      	  is frequent enough to actually matter to cache utilization
      	  in practice: 3-4 key IPIs and 1-2 device IRQs at most - and
      	  those vectors tend to be tightly clustered as well into about
      	  two groups, and are probably already on 2-3 cache lines in
      	  practice.
      
      	  For the common case of 'cache cold' IRQs it's the depth of
      	  the call chain and the fragmentation of the resulting I$
      	  that should be the main performance limit - not the overall
      	  size of it.
      
      	- The CPU side cost of IRQ delivery is still very expensive
      	  even in the best, most cached case, as in 'over a thousand
      	  cycles'. So much stuff is done that maybe contemporary x86
      	  IRQ entry microcode already prefetches the IDT entry and its
      	  expected call target address."[*]
      
      [*] http://lkml.kernel.org/r/20180208094710.qnjixhm6hybebdv7@gmail.com
      
      The "testb $3, CS(%rsp)" instruction in the idtentry macro does not need
      modification. Previously, %rsp was manually decreased by 15*8; with
      this patch, %rsp is decreased by 15 pushq instructions.
      
      [jpoimboe@redhat.com: unwind hint improvements]
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-7-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dde3036d
    • D
      x86/entry/64: Use PUSH_AND_CLEAN_REGS in more cases · 30907fd1
      Dominik Brodowski 提交于
      entry_SYSCALL_64_after_hwframe() and nmi() can be converted to use
      PUSH_AND_CLEAN_REGS instead of opencoded variants thereof. Due to
      the interleaving, the additional XOR-based clearing of R8 and R9
      in entry_SYSCALL_64_after_hwframe() should not have any noticeable
      negative implications.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-6-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      30907fd1
    • D
      x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro · 3f01daec
      Dominik Brodowski 提交于
      Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
      SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
      This macro uses PUSH instead of MOV and should therefore be faster, at
      least on newer CPUs.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-5-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3f01daec
    • D
      x86/entry/64: Interleave XOR register clearing with PUSH instructions · f7bafa2b
      Dominik Brodowski 提交于
      Same as is done for syscalls, interleave XOR with PUSH instructions
      for exceptions/interrupts, in order to minimize the cost of the
      additional instructions required for register clearing.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-4-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f7bafa2b
    • D
      x86/entry/64: Merge the POP_C_REGS and POP_EXTRA_REGS macros into a single POP_REGS macro · 502af0d7
      Dominik Brodowski 提交于
      The two special, opencoded cases for POP_C_REGS can be handled by ASM
      macros.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-3-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      502af0d7
    • D
      x86/entry/64: Merge SAVE_C_REGS and SAVE_EXTRA_REGS, remove unused extensions · 2e3f0098
      Dominik Brodowski 提交于
      All current code paths call SAVE_C_REGS and then immediately
      SAVE_EXTRA_REGS. Therefore, merge these two macros and order the MOV
      sequeneces properly.
      
      While at it, remove the macros to save all except specific registers,
      as these macros have been unused for a long time.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Link: http://lkml.kernel.org/r/20180211104949.12992-2-linux@dominikbrodowski.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2e3f0098
    • I
      x86/speculation: Clean up various Spectre related details · 21e433bd
      Ingo Molnar 提交于
      Harmonize all the Spectre messages so that a:
      
          dmesg | grep -i spectre
      
      ... gives us most Spectre related kernel boot messages.
      
      Also fix a few other details:
      
       - clarify a comment about firmware speculation control
      
       - s/KPTI/PTI
      
       - remove various line-breaks that made the code uglier
      Acked-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      21e433bd
    • K
      KVM/nVMX: Set the CPU_BASED_USE_MSR_BITMAPS if we have a valid L02 MSR bitmap · 3712caeb
      KarimAllah Ahmed 提交于
      We either clear the CPU_BASED_USE_MSR_BITMAPS and end up intercepting all
      MSR accesses or create a valid L02 MSR bitmap and use that. This decision
      has to be made every time we evaluate whether we are going to generate the
      L02 MSR bitmap.
      
      Before commit:
      
        d28b387f ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
      
      ... this was probably OK since the decision was always identical.
      
      This is no longer the case now since the MSR bitmap might actually
      change once we decide to not intercept SPEC_CTRL and PRED_CMD.
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: jmattson@google.com
      Cc: kvm@vger.kernel.org
      Cc: sironi@amazon.de
      Link: http://lkml.kernel.org/r/1518305967-31356-6-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3712caeb
    • K
      X86/nVMX: Properly set spec_ctrl and pred_cmd before merging MSRs · 206587a9
      KarimAllah Ahmed 提交于
      These two variables should check whether SPEC_CTRL and PRED_CMD are
      supposed to be passed through to L2 guests or not. While
      msr_write_intercepted_l01 would return 'true' if it is not passed through.
      
      So just invert the result of msr_write_intercepted_l01 to implement the
      correct semantics.
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: kvm@vger.kernel.org
      Cc: sironi@amazon.de
      Fixes: 086e7d4118cc ("KVM: VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
      Link: http://lkml.kernel.org/r/1518305967-31356-5-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      206587a9