1. 21 10月, 2015 10 次提交
    • B
      x86/microcode: Remove modularization leftovers · 6b26e1bf
      Borislav Petkov 提交于
      Remove the remaining module functionality leftovers. Make
      "dis_ucode_ldr" an early_param and make it static again. Drop
      module aliases, autoloading table, description, etc.
      
      Bump version number, while at it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6b26e1bf
    • B
      x86/microcode: Merge the early microcode loader · fe055896
      Borislav Petkov 提交于
      Merge the early loader functionality into the driver proper. The
      diff is huge but logically, it is simply moving code from the
      _early.c files into the main driver.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fe055896
    • B
      x86/microcode: Unmodularize the microcode driver · 9a2bc335
      Borislav Petkov 提交于
      Make CONFIG_MICROCODE a bool. It was practically a bool already anyway,
      since early loader was forcing it to =y.
      
      Regardless, there's no real reason to have something be a module which
      gets built-in on the majority of installations out there. And its not
      like there's noticeable change in functionality - we still can load late
      microcode - just the module glue disappears.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1445334889-300-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9a2bc335
    • A
      x86/mce: Fix thermal throttling reporting after kexec · 81ffdcdd
      Andi Kleen 提交于
      The per CPU thermal vector init code checks if the thermal
      vector is already installed and complains and bails out if it
      is.
      
      This happens after kexec, as kernel shut down does not clear the
      thermal vector APIC register.
      
      This causes two problems:
      
      1. So we always do not fully initialize thermal reports after
         kexec. The CPU is still likely initialized, as the previous
         kernel should have done it. But we don't set up the software
         pointer to the thermal vector, so reporting may end up with a
         unknown thermal interrupt message.
      
      2. Also it complains for every logical CPU, even though the
         value is actually derived from BP only.
      
      The problem is that we end up with one message per CPU, so on
      larger systems it becomes very noisy and messes up the otherwise
      nicely formatted CPU bootup numbers in the kernel log.
      
      Just remove the check. I checked the code and there's no valid
      code paths where the thermal init code for a CPU could be called
      multiple times.
      
      Why the kernel does not clean up this value on shutdown:
      
      The thermal monitoring is controlled per logical CPU thread.
      Normal shutdown code is just running on one CPU. To disable it
      we would need a broadcast NMI to all CPUs on shut down. That's
      overkill for this. So we just ignore it after kexec.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1445246268-26285-9-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81ffdcdd
    • B
      x86/setup/crash: Check memblock_reserve() retval · 6f376057
      Borislav Petkov 提交于
      memblock_reserve() can fail but the crashkernel reservation code
      doesn't check that and this can lead the user into believing
      that the crashkernel region was actually reserved. Make sure we
      check that return value and we exit early with a failure message
      in the error case.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f376057
    • B
      x86/setup/crash: Cleanup some more · f56d5578
      Borislav Petkov 提交于
      * Remove unused auto_set variable
      * Cleanup local function variable declarations
      * Reformat printk string and use pr_info()
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f56d5578
    • B
      x86/setup/crash: Remove alignment variable · 606134f7
      Borislav Petkov 提交于
      Use a macro instead. No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      606134f7
    • B
      x86/setup: Cleanup crashkernel reservation functions · 97eac21b
      Borislav Petkov 提交于
      * Shorten variable names
      * Realign code, space out for better readability
      
      No code changed:
      
        # arch/x86/kernel/setup.o:
      
         text    data     bss     dec     hex filename
         4543    3096   69904   77543   12ee7 setup.o.before
         4543    3096   69904   77543   12ee7 setup.o.after
      
      md5:
         8a1b7c6738a553ca207b56bd84a8f359  setup.o.before.asm
         8a1b7c6738a553ca207b56bd84a8f359  setup.o.after.asm
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Link: http://lkml.kernel.org/r/1445246268-26285-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      97eac21b
    • A
      x86/amd_nb, EDAC: Rename amd_get_node_id() · 1a6775c1
      Aravind Gopalakrishnan 提交于
      This function doesn't give us the "Node ID" as the function name
      suggests. Rather, it receives a PCI device as argument, checks
      the available F3 PCI device IDs in the system and returns the
      index of the matching Bus/Device IDs.
      
      Rename it to amd_pci_dev_to_node_id().
      
      No functional change is introduced.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1a6775c1
    • B
      x86/setup: Do not reserve crashkernel high memory if low reservation failed · eb6db83d
      Baoquan He 提交于
      People reported that when allocating crashkernel memory using
      the ",high" and ",low" syntax, there were cases where the
      reservation of the high portion succeeds but the reservation of
      the low portion fails.
      
      Then kexec can load the kdump kernel successfully, but booting
      the kdump kernel fails as there's no low memory.
      
      The low memory allocation for the kdump kernel can fail on large
      systems for a couple of reasons. For example, the manually
      specified crashkernel low memory can be too large and thus no
      adequate memblock region would be found.
      
      Therefore, we try to reserve low memory for the crash kernel
      *after* the high memory portion has been allocated. If that
      fails, we free crashkernel high memory too and return. The user
      can then take measures accordingly.
      Tested-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Massage text. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NJoerg Roedel <jroedel@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: jerry_hoemann@hp.com
      Cc: yinghai@kernel.org
      Link: http://lkml.kernel.org/r/1445246268-26285-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eb6db83d
  2. 12 10月, 2015 6 次提交
  3. 08 10月, 2015 1 次提交
    • C
      swiotlb: Enable it under x86 PAE · 9d99c712
      Christian Melki 提交于
      Most distributions end up enabling SWIOTLB already with 32-bit
      kernels due to the combination of CONFIG_HYPERVISOR_GUEST|CONFIG_XEN=y
      as those end up requiring the SWIOTLB.
      
      However for those that are not interested in virtualization and
      run in 32-bit they will discover that: "32-bit PAE 4.2.0 kernel
      (no IOMMU code) would hang when writing to my USB disk. The kernel
      spews million(-ish messages per sec) to syslog, effectively
      "hanging" userspace with my kernel.
      
      Oct  2 14:33:06 voodoochild kernel: [  223.287447] nommu_map_sg:
      overflow 25dcac000+1024 of device mask ffffffff
      Oct  2 14:33:06 voodoochild kernel: [  223.287448] nommu_map_sg:
      overflow 25dcac000+1024 of device mask ffffffff
      Oct  2 14:33:06 voodoochild kernel: [  223.287449] nommu_map_sg:
      overflow 25dcac000+1024 of device mask ffffffff
      ... etc ..."
      
      Enabling it makes the problem go away.
      
      N.B. With a6dfa128
      "config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected"
      we also have the important part of the SG macros enabled to make this
      work properly - in case anybody wants to backport this patch.
      Reported-and-Tested-by: NChristian Melki <christian.melki@t2data.com>
      Signed-off-by: NChristian Melki <christian.melki@t2data.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9d99c712
  4. 06 10月, 2015 1 次提交
    • D
      x86/xen/p2m: hint at the last populated P2M entry · 98dd166e
      David Vrabel 提交于
      With commit 633d6f17 (x86/xen: prepare
      p2m list for memory hotplug) the P2M may be sized to accomdate a much
      larger amount of memory than the domain currently has.
      
      When saving a domain, the toolstack must scan all the P2M looking for
      populated pages.  This results in a performance regression due to the
      unnecessary scanning.
      
      Instead of reporting (via shared_info) the maximum possible size of
      the P2M, hint at the last PFN which might be populated.  This hint is
      increased as new leaves are added to the P2M (in the expectation that
      they will be used for populated entries).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: <stable@vger.kernel.org> # 4.0+
      98dd166e
  5. 02 10月, 2015 4 次提交
    • B
      x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds · f4b4aae1
      Ben Hutchings 提交于
      On x32, gcc predefines __x86_64__ but long is only 32-bit.  Use
      __ILP32__ to distinguish x32.
      
      Fixes this compiler error in perf:
      
      	tools/include/asm-generic/bitops/__ffs.h: In function '__ffs':
      	tools/include/asm-generic/bitops/__ffs.h:19:8: error: right shift count >= width of type [-Werror=shift-count-overflow]
      	  word >>= 32;
      	       ^
      
      This isn't sufficient to build perf for x32, though.
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443660043.2730.15.camel@decadent.org.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f4b4aae1
    • S
      x86/mm: Set NX on gap between __ex_table and rodata · ab76f7b4
      Stephen Smalley 提交于
      Unused space between the end of __ex_table and the start of
      rodata can be left W+x in the kernel page tables.  Extend the
      setting of the NX bit to cover this gap by starting from
      text_end rather than rodata_start.
      
        Before:
        ---[ High Kernel Mapping ]---
        0xffffffff80000000-0xffffffff81000000          16M                               pmd
        0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
        0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
        0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB x  pte
        0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
        0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
        0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
        0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
        0xffffffff82200000-0xffffffffa0000000         478M                               pmd
      
        After:
        ---[ High Kernel Mapping ]---
        0xffffffff80000000-0xffffffff81000000          16M                               pmd
        0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
        0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
        0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB NX pte
        0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
        0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
        0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
        0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
        0xffffffff82200000-0xffffffffa0000000         478M                               pmd
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.govSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab76f7b4
    • L
      x86/kexec: Fix kexec crash in syscall kexec_file_load() · e3c41e37
      Lee, Chun-Yi 提交于
      The original bug is a page fault crash that sometimes happens
      on big machines when preparing ELF headers:
      
          BUG: unable to handle kernel paging request at ffffc90613fc9000
          IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260
      
      The bug is caused by us under-counting the number of memory ranges
      and subsequently not allocating enough ELF header space for them.
      The bug is typically masked on smaller systems, because the ELF header
      allocation is rounded up to the next page.
      
      This patch modifies the code in fill_up_crash_elf_data() by using
      walk_system_ram_res() instead of walk_system_ram_range() to correctly
      count the max number of crash memory ranges. That's because the
      walk_system_ram_range() filters out small memory regions that
      reside in the same page, but walk_system_ram_res() does not.
      
      Here's how I found the bug:
      
      After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
      the code uses walk_system_ram_res() to fill-in crash memory regions information
      to the program header, so it counts those small memory regions that
      reside in a page area.
      
      But, when the kernel was using walk_system_ram_range() in
      fill_up_crash_elf_data() to count the number of crash memory regions,
      it filters out small regions.
      
      I printed those small memory regions, for example:
      
        kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0
      
      Based on the code in walk_system_ram_range(), this memory region
      will be filtered out:
      
        pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
        end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
        end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE
      
      So, the max_nr_ranges that's counted by the kernel doesn't include
      small memory regions - causing us to under-allocate the required space.
      That causes the page fault crash that happens in a later code path
      when preparing ELF headers.
      
      This bug is not easy to reproduce on small machines that have few
      CPUs, because the allocated page aligned ELF buffer has more free
      space to cover those small memory regions' PT_LOAD headers.
      Signed-off-by: NLee, Chun-Yi <jlee@suse.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e3c41e37
    • A
      arch/x86/include/asm/efi.h: fix build failure · a523841e
      Andrey Ryabinin 提交于
      With KMEMCHECK=y, KASAN=n:
      
        arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
        arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
        arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
      
      Don't #undef memcpy if KASAN=n.
      
      Fixes: 769a8089 ("x86, efi, kasan: #undef memset/memcpy/memmove per arch")
      Signed-off-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a523841e
  6. 01 10月, 2015 8 次提交
    • D
      Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS · d2922422
      Dirk Müller 提交于
      The cpu feature flags are not ever going to change, so warning
      everytime can cause a lot of kernel log spam
      (in our case more than 10GB/hour).
      
      The warning seems to only occur when nested virtualization is
      enabled, so it's probably triggered by a KVM bug.  This is a
      sensible and safe change anyway, and the KVM bug fix might not
      be suitable for stable releases anyway.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NDirk Mueller <dmueller@suse.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d2922422
    • P
      Revert "KVM: SVM: use NPT page attributes" · fc07e76a
      Paolo Bonzini 提交于
      This reverts commit 3c2e7f7d.
      Initializing the mapping from MTRR to PAT values was reported to
      fail nondeterministically, and it also caused extremely slow boot
      (due to caching getting disabled---bug 103321) with assigned devices.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NSebastian Schuette <dracon@ewetel.net>
      Cc: stable@vger.kernel.org # 4.2+
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fc07e76a
    • P
      Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask" · bcf166a9
      Paolo Bonzini 提交于
      This reverts commit 54928303.
      It builds on the commit that is being reverted next.
      
      Cc: stable@vger.kernel.org # 4.2+
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bcf166a9
    • P
      Revert "KVM: SVM: Sync g_pat with guest-written PAT value" · 625422f6
      Paolo Bonzini 提交于
      This reverts commit e098223b,
      which has a dependency on other commits being reverted.
      
      Cc: stable@vger.kernel.org # 4.2+
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      625422f6
    • P
      Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages" · 606decd6
      Paolo Bonzini 提交于
      This reverts commit fd717f11.
      It was reported to cause Machine Check Exceptions (bug 104091).
      
      Reported-by: harn-solo@gmx.de
      Cc: stable@vger.kernel.org # 4.2+
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      606decd6
    • M
      x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down · a5caa209
      Matt Fleming 提交于
      Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced
      that signals that the firmware PE/COFF loader supports splitting
      code and data sections of PE/COFF images into separate EFI
      memory map entries. This allows the kernel to map those regions
      with strict memory protections, e.g. EFI_MEMORY_RO for code,
      EFI_MEMORY_XP for data, etc.
      
      Unfortunately, an unwritten requirement of this new feature is
      that the regions need to be mapped with the same offsets
      relative to each other as observed in the EFI memory map. If
      this is not done crashes like this may occur,
      
        BUG: unable to handle kernel paging request at fffffffefe6086dd
        IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
        Call Trace:
         [<ffffffff8104c90e>] efi_call+0x7e/0x100
         [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
         [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
         [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
         [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
         [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
         [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
      
      Here 0xfffffffefe6086dd refers to an address the firmware
      expects to be mapped but which the OS never claimed was mapped.
      The issue is that included in these regions are relative
      addresses to other regions which were emitted by the firmware
      toolchain before the "splitting" of sections occurred at
      runtime.
      
      Needless to say, we don't satisfy this unwritten requirement on
      x86_64 and instead map the EFI memory map entries in reverse
      order. The above crash is almost certainly triggerable with any
      kernel newer than v3.13 because that's when we rewrote the EFI
      runtime region mapping code, in commit d2f7cbe7 ("x86/efi:
      Runtime services virtual mapping"). For kernel versions before
      v3.13 things may work by pure luck depending on the
      fragmentation of the kernel virtual address space at the time we
      map the EFI regions.
      
      Instead of mapping the EFI memory map entries in reverse order,
      where entry N has a higher virtual address than entry N+1, map
      them in the same order as they appear in the EFI memory map to
      preserve this relative offset between regions.
      
      This patch has been kept as small as possible with the intention
      that it should be applied aggressively to stable and
      distribution kernels. It is very much a bugfix rather than
      support for a new feature, since when EFI_PROPERTIES_TABLE is
      enabled we must map things as outlined above to even boot - we
      have no way of asking the firmware not to split the code/data
      regions.
      
      In fact, this patch doesn't even make use of the more strict
      memory protections available in UEFI v2.5. That will come later.
      Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reported-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chun-Yi <jlee@suse.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: James Bottomley <JBottomley@Odin.com>
      Cc: Lee, Chun-Yi <jlee@suse.com>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a5caa209
    • T
      x86/process: Unify 32bit and 64bit implementations of get_wchan() · 7ba78053
      Thomas Gleixner 提交于
      The stack layout and the functionality is identical. Use the 64bit
      version for all of x86.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Link: http://lkml.kernel.org/r/20150930083302.779694618@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      7ba78053
    • T
      x86/process: Add proper bound checks in 64bit get_wchan() · eddd3826
      Thomas Gleixner 提交于
      Dmitry Vyukov reported the following using trinity and the memory
      error detector AddressSanitizer
      (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
      
      [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
      address ffff88002e280000
      [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
      the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
      [ 124.578633] Accessed by thread T10915:
      [ 124.579295] inlined in describe_heap_address
      ./arch/x86/mm/asan/report.c:164
      [ 124.579295] #0 ffffffff810dd277 in asan_report_error
      ./arch/x86/mm/asan/report.c:278
      [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
      ./arch/x86/mm/asan/asan.c:37
      [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
      [ 124.581893] #3 ffffffff8107c093 in get_wchan
      ./arch/x86/kernel/process_64.c:444
      
      The address checks in the 64bit implementation of get_wchan() are
      wrong in several ways:
      
       - The lower bound of the stack is not the start of the stack
         page. It's the start of the stack page plus sizeof (struct
         thread_info)
      
       - The upper bound must be:
      
             top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
      
         The 2 * sizeof(unsigned long) is required because the stack pointer
         points at the frame pointer. The layout on the stack is: ... IP FP
         ... IP FP. So we need to make sure that both IP and FP are in the
         bounds.
      
      Fix the bound checks and get rid of the mix of numeric constants, u64
      and unsigned long. Making all unsigned long allows us to use the same
      function for 32bit as well.
      
      Use READ_ONCE() when accessing the stack. This does not prevent a
      concurrent wakeup of the task and the stack changing, but at least it
      avoids TOCTOU.
      
      Also check task state at the end of the loop. Again that does not
      prevent concurrent changes, but it avoids walking for nothing.
      
      Add proper comments while at it.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      eddd3826
  7. 30 9月, 2015 2 次提交
  8. 29 9月, 2015 1 次提交
  9. 28 9月, 2015 5 次提交
  10. 25 9月, 2015 2 次提交
    • D
      KVM: disable halt_poll_ns as default for s390x · 920552b2
      David Hildenbrand 提交于
      We observed some performance degradation on s390x with dynamic
      halt polling. Until we can provide a proper fix, let's enable
      halt_poll_ns as default only for supported architectures.
      
      Architectures are now free to set their own halt_poll_ns
      default value.
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      920552b2
    • P
      KVM: x86: fix off-by-one in reserved bits check · 58c95070
      Paolo Bonzini 提交于
      29ecd660 ("KVM: x86: avoid uninitialized variable warning",
      2015-09-06) introduced a not-so-subtle problem, which probably
      escaped review because it was not part of the patch context.
      
      Before the patch, leaf was always equal to iterator.level.  After,
      it is equal to iterator.level - 1 in the call to is_shadow_zero_bits_set,
      and when is_shadow_zero_bits_set does another "-1" the check on
      reserved bits becomes incorrect.  Using "iterator.level" in the call
      fixes this call trace:
      
      WARNING: CPU: 2 PID: 17000 at arch/x86/kvm/mmu.c:3385 handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]()
      Modules linked in: tun sha256_ssse3 sha256_generic drbg binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kvm_amd kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd fam15h_power amd64_edac_mod k10temp edac_core amdkfd amd_iommu_v2 radeon acpi_cpufreq
      [...]
      Call Trace:
        dump_stack+0x4e/0x84
        warn_slowpath_common+0x95/0xe0
        warn_slowpath_null+0x1a/0x20
        handle_mmio_page_fault.part.93+0x1a/0x20 [kvm]
        tdp_page_fault+0x231/0x290 [kvm]
        ? emulator_pio_in_out+0x6e/0xf0 [kvm]
        kvm_mmu_page_fault+0x36/0x240 [kvm]
        ? svm_set_cr0+0x95/0xc0 [kvm_amd]
        pf_interception+0xde/0x1d0 [kvm_amd]
        handle_exit+0x181/0xa70 [kvm_amd]
        ? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm]
        kvm_arch_vcpu_ioctl_run+0x6f6/0x1730 [kvm]
        ? kvm_arch_vcpu_ioctl_run+0x68b/0x1730 [kvm]
        ? preempt_count_sub+0x9b/0xf0
        ? mutex_lock_killable_nested+0x26f/0x490
        ? preempt_count_sub+0x9b/0xf0
        kvm_vcpu_ioctl+0x358/0x710 [kvm]
        ? __fget+0x5/0x210
        ? __fget+0x101/0x210
        do_vfs_ioctl+0x2f4/0x560
        ? __fget_light+0x29/0x90
        SyS_ioctl+0x4c/0x90
        entry_SYSCALL_64_fastpath+0x16/0x73
      ---[ end trace 37901c8686d84de6 ]---
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Tested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      58c95070