1. 10 12月, 2019 2 次提交
    • I
      x86/setup: Enhance the comments · 360db4ac
      Ingo Molnar 提交于
      Update various comments, fix outright mistakes and meaningless descriptions.
      
      Also harmonize the style across the file, both in form and in language.
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      360db4ac
    • I
      x86/setup: Clean up the header portion of setup.c · 12609013
      Ingo Molnar 提交于
      In 20 years we accumulated 89 #include lines in setup.c,
      but we only need 30 of them (!) ...
      
      Get rid of the excessive ones, and while at it, sort the
      remaining ones alphabetically.
      
      Also get rid of the incomplete changelogs at the header of the file,
      and explain better what this file does.
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      12609013
  2. 18 11月, 2019 1 次提交
  3. 12 11月, 2019 1 次提交
  4. 07 11月, 2019 1 次提交
  5. 05 11月, 2019 1 次提交
    • K
      x86/mm: Report actual image regions in /proc/iomem · a3299754
      Kees Cook 提交于
      The resource reservations in /proc/iomem made for the kernel image did
      not reflect the gaps between text, rodata, and data. Add the "rodata"
      resource and update the start/end calculations to match the respective
      calls to free_kernel_image_pages().
      
      Before (booted with "nokaslr" for easier comparison):
      
      00100000-bffd9fff : System RAM
        01000000-01e011d0 : Kernel code
        01e011d1-025619bf : Kernel data
        02a95000-035fffff : Kernel bss
      
      After:
      
      00100000-bffd9fff : System RAM
        01000000-01e011d0 : Kernel code
        02000000-023d4fff : Kernel rodata
        02400000-025619ff : Kernel data
        02a95000-035fffff : Kernel bss
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
      Cc: Robert Richter <rrichter@marvell.com>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: x86-ml <x86@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20191029211351.13243-29-keescook@chromium.org
      a3299754
  6. 03 9月, 2019 1 次提交
  7. 28 6月, 2019 1 次提交
  8. 20 6月, 2019 1 次提交
  9. 21 5月, 2019 1 次提交
  10. 22 4月, 2019 2 次提交
    • D
      x86/kdump: Fall back to reserve high crashkernel memory · b9ac3849
      Dave Young 提交于
      crashkernel=xM tries to reserve memory for the crash kernel under 4G,
      which is enough, usually. But this could fail sometimes, for example
      when one tries to reserve a big chunk like 2G, for example.
      
      So let the crashkernel=xM just fall back to use high memory in case it
      fails to find a suitable low range. Do not set the ,high as default
      because it allocates extra low memory for DMA buffers and swiotlb, and
      this is not always necessary for all machines.
      
      Typically, crashkernel=128M usually works with low reservation under 4G,
      so keep <4G as default.
      
       [ bp: Massage. ]
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: linux-doc@vger.kernel.org
      Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: piliu@redhat.com
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Sinan Kaya <okaya@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thymo van Beers <thymovanbeers@gmail.com>
      Cc: vgoyal@redhat.com
      Cc: x86-ml <x86@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Zhimin Gu <kookoo.gu@intel.com>
      Link: https://lkml.kernel.org/r/20190422031905.GA8387@dhcp-128-65.nay.redhat.com
      b9ac3849
    • D
      x86/kdump: Have crashkernel=X reserve under 4G by default · 9ca5c8e6
      Dave Young 提交于
      The kdump crashkernel low reservation is limited to under 896M even for
      X86_64. This obscure and miserable limitation exists for compatibility
      with old kexec-tools but the reason is not documented anywhere.
      
      Some more tests/investigations about the background:
      
      a) Previously, old kexec-tools could only load purgatory to memory under
         2G. Eric removed that limitation in 2012 in kexec-tools:
      
           b4f9f8599679 ("kexec x86_64: Make purgatory relocatable anywhere
      		   in the 64bit address space.")
      
      b) Back in 2013 Yinghai removed all the limitations in new kexec-tools,
         bzImage64 can be loaded anywhere:
      
           82c3dd2280d2 ("kexec, x86_64: Load bzImage64 above 4G")
      
      c) Test results with old kexec-tools with old and latest kernels:
      
        1. Old kexec-tools can not build with modern toolchain anymore,
           I built it in a RHEL6 vm.
      
        2. 2.0.0 kexec-tools does not work with the latest kernel even with
           memory under 896M and gives an error:
      
           "ELF core (kcore) parse failed"
      
           For that it needs below kexec-tools fix:
      
             ed15ba1b9977 ("build_mem_phdrs(): check if p_paddr is invalid")
      
        3. Even with patched kexec-tools which fixes 2),  it still needs some
           other fixes to work correctly for KASLR-enabled kernels.
      
      So the situation is:
      
      * Old kexec-tools is already broken with latest kernels.
      
      * We can not keep these limitations forever just for compatibility with very
        old kexec-tools.
      
      * If one must use old tools then he/she can choose crashkernel=X@Y.
      
      * People have reported bugs where crashkernel=384M failed because KASLR
        makes the 0-896M space sparse.
      
      * Crashkernel can reserve in low or high area, it is natural to understand
        low as memory under 4G.
      
      Hence drop the 896M limitation and change crashkernel low reservation to
      reserve under 4G by default.
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: piliu@redhat.com
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Sinan Kaya <okaya@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: vgoyal@redhat.com
      Cc: x86-ml <x86@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Zhimin Gu <kookoo.gu@intel.com>
      Link: https://lkml.kernel.org/r/20190421035058.943630505@redhat.com
      9ca5c8e6
  11. 29 3月, 2019 1 次提交
  12. 21 12月, 2018 1 次提交
  13. 20 11月, 2018 1 次提交
  14. 31 10月, 2018 1 次提交
  15. 10 10月, 2018 1 次提交
    • J
      x86/boot: Add ACPI RSDP address to setup_header · ae7e1238
      Juergen Gross 提交于
      Xen PVH guests receive the address of the RSDP table from Xen. In order
      to support booting a Xen PVH guest via Grub2 using the standard x86
      boot entry we need a way for Grub2 to pass the RSDP address to the
      kernel.
      
      For this purpose expand the struct setup_header to hold the physical
      address of the RSDP address. Being zero means it isn't specified and
      has to be located the legacy way (searching through low memory or
      EBDA).
      
      While documenting the new setup_header layout and protocol version
      2.14 add the missing documentation of protocol version 2.13.
      
      There are Grub2 versions in several distros with a downstream patch
      violating the boot protocol by writing past the end of setup_header.
      This requires another update of the boot protocol to enable the kernel
      to distinguish between a specified RSDP address and one filled with
      garbage by such a broken Grub2.
      
      From protocol 2.14 on Grub2 will write the version it is supporting
      (but never a higher value than found to be supported by the kernel)
      ored with 0x8000 to the version field of setup_header. This enables
      the kernel to know up to which field Grub2 has written information
      to. All fields after that are supposed to be clobbered.
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: boris.ostrovsky@oracle.com
      Cc: bp@alien8.de
      Cc: corbet@lwn.net
      Cc: linux-doc@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/20181010061456.22238-3-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ae7e1238
  16. 03 10月, 2018 1 次提交
    • Z
      x86, hibernate: Fix nosave_regions setup for hibernation · cc55f753
      Zhimin Gu 提交于
      On 32bit systems, nosave_regions(non RAM areas) located between
      max_low_pfn and max_pfn are not excluded from hibernation snapshot
      currently, which may result in a machine check exception when
      trying to access these unsafe regions during hibernation:
      
      [  612.800453] Disabling lock debugging due to kernel taint
      [  612.805786] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 6: fe00000000801136
      [  612.814344] mce: [Hardware Error]: RIP !INEXACT! 60:<00000000d90be566> {swsusp_save+0x436/0x560}
      [  612.823167] mce: [Hardware Error]: TSC 1f5939fe276 ADDR dd000000 MISC 30e0000086
      [  612.830677] mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1529487426 SOCKET 0 APIC 0 microcode 24
      [  612.839581] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
      [  612.846394] mce: [Hardware Error]: Machine check: Processor context corrupt
      [  612.853380] Kernel panic - not syncing: Fatal machine check
      [  612.858978] Kernel Offset: 0x18000000 from 0xc1000000 (relocation range: 0xc0000000-0xf7ffdfff)
      
      This is because on 32bit systems, pages above max_low_pfn are regarded
      as high memeory, and accessing unsafe pages might cause expected MCE.
      On the problematic 32bit system, there are reserved memory above low
      memory, which triggered the MCE:
      
      e820 memory mapping:
      [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable
      [    0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000d160cfff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000d160d000-0x00000000d1613fff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000d1614000-0x00000000d1a44fff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000d1a45000-0x00000000d1ecffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000d1ed0000-0x00000000d7eeafff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000d7eeb000-0x00000000d7ffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000d8000000-0x00000000d875ffff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000d8760000-0x00000000d87fffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000d8800000-0x00000000d8fadfff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000d8fae000-0x00000000d8ffffff] ACPI data
      [    0.000000] BIOS-e820: [mem 0x00000000d9000000-0x00000000da71bfff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000da71c000-0x00000000da7fffff] ACPI NVS
      [    0.000000] BIOS-e820: [mem 0x00000000da800000-0x00000000dbb8bfff] usable
      [    0.000000] BIOS-e820: [mem 0x00000000dbb8c000-0x00000000dbffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000dd000000-0x00000000df1fffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed03fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
      [    0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
      [    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000041edfffff] usable
      
      Fix this problem by changing pfn limit from max_low_pfn to max_pfn.
      This fix does not impact 64bit system because on 64bit max_low_pfn
      is the same as max_pfn.
      Signed-off-by: NZhimin Gu <kookoo.gu@intel.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: All applicable <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cc55f753
  17. 20 7月, 2018 3 次提交
    • P
      x86/tsc: Calibrate tsc only once · cf7a63ef
      Pavel Tatashin 提交于
      During boot tsc is calibrated twice: once in tsc_early_delay_calibrate(),
      and the second time in tsc_init().
      
      Rename tsc_early_delay_calibrate() to tsc_early_init(), and rework it so
      the calibration is done only early, and make tsc_init() to use the values
      already determined in tsc_early_init().
      
      Sometimes it is not possible to determine tsc early, as the subsystem that
      is required is not yet initialized, in such case try again later in
      tsc_init().
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-20-pasha.tatashin@oracle.com
      cf7a63ef
    • P
      x86/jump_label: Initialize static branching early · 8990cac6
      Pavel Tatashin 提交于
      Static branching is useful to runtime patch branches that are used in hot
      path, but are infrequently changed.
      
      The x86 clock framework is one example that uses static branches to setup
      the best clock during boot and never changes it again.
      
      It is desired to enable the TSC based sched clock early to allow fine
      grained boot time analysis early on. That requires the static branching
      functionality to be functional early as well.
      
      Static branching requires patching nop instructions, thus,
      arch_init_ideal_nops() must be called prior to jump_label_init().
      
      Do all the necessary steps to call arch_init_ideal_nops() right after
      early_cpu_init(), which also allows to insert a call to jump_label_init()
      right after that. jump_label_init() will be called again from the generic
      init code, but the code is protected against reinitialization already.
      
      [ tglx: Massaged changelog ]
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Cc: pbonzini@redhat.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-10-pasha.tatashin@oracle.com
      8990cac6
    • P
      x86/kvmclock: Remove memblock dependency · 368a540e
      Pavel Tatashin 提交于
      KVM clock is initialized later compared to other hypervisor clocks because
      it has a dependency on the memblock allocator.
      
      Bring it in line with other hypervisors by using memory from the BSS
      instead of allocating it.
      
      The benefits:
      
        - Remove ifdef from common code
        - Earlier availability of the clock
        - Remove dependency on memblock, and reduce code
      
      The downside:
      
        - Static allocation of the per cpu data structures sized NR_CPUS * 64byte
          Will be addressed in follow up patches.
      
      [ tglx: Split out from larger series ]
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: steven.sistare@oracle.com
      Cc: daniel.m.jordan@oracle.com
      Cc: linux@armlinux.org.uk
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      Cc: john.stultz@linaro.org
      Cc: sboyd@codeaurora.org
      Cc: hpa@zytor.com
      Cc: douly.fnst@cn.fujitsu.com
      Cc: peterz@infradead.org
      Cc: prarit@redhat.com
      Cc: feng.tang@intel.com
      Cc: pmladek@suse.com
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: linux-s390@vger.kernel.org
      Cc: boris.ostrovsky@oracle.com
      Cc: jgross@suse.com
      Link: https://lkml.kernel.org/r/20180719205545.16512-2-pasha.tatashin@oracle.com
      368a540e
  18. 30 6月, 2018 1 次提交
  19. 21 6月, 2018 1 次提交
  20. 09 5月, 2018 1 次提交
  21. 27 4月, 2018 1 次提交
    • P
      x86/setup: Do not reserve a crash kernel region if booted on Xen PV · 3db3eb28
      Petr Tesarik 提交于
      Xen PV domains cannot shut down and start a crash kernel. Instead,
      the crashing kernel makes a SCHEDOP_shutdown hypercall with the
      reason code SHUTDOWN_crash, cf. xen_crash_shutdown() machine op in
      arch/x86/xen/enlighten_pv.c.
      
      A crash kernel reservation is merely a waste of RAM in this case. It
      may also confuse users of kexec_load(2) and/or kexec_file_load(2).
      When flags include KEXEC_ON_CRASH or KEXEC_FILE_ON_CRASH,
      respectively, these syscalls return success, which is technically
      correct, but the crash kexec image will never be actually used.
      Signed-off-by: NPetr Tesarik <ptesarik@suse.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: xen-devel@lists.xenproject.org
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jean Delvare <jdelvare@suse.de>
      Link: https://lkml.kernel.org/r/20180425120835.23cef60c@ezekiel.suse.cz
      3db3eb28
  22. 01 3月, 2018 1 次提交
    • T
      x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table · 945fd17a
      Thomas Gleixner 提交于
      The separation of the cpu_entry_area from the fixmap missed the fact that
      on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
      initial_page_table by the previous synchronizations.
      
      This results in suspend/resume failures because 32bit utilizes initial page
      table for resume. The absence of the cpu_entry_area mapping results in a
      triple fault, aka. insta reboot.
      
      With PAE enabled this works by chance because the PGD entry which covers
      the fixmap and other parts incindentally provides the cpu_entry_area
      mapping as well.
      
      Synchronize the initial page table after setting up the cpu entry
      area. Instead of adding yet another copy of the same code, move it to a
      function and invoke it from the various places.
      
      It needs to be investigated if the existing calls in setup_arch() and
      setup_per_cpu_areas() can be replaced by the later invocation from
      setup_cpu_entry_areas(), but that's beyond the scope of this fix.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Reported-by: NWoody Suwalski <terraluna977@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NWoody Suwalski <terraluna977@gmail.com>
      Cc: William Grant <william.grant@canonical.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802282137290.1392@nanos.tec.linutronix.de
      945fd17a
  23. 14 2月, 2018 1 次提交
    • K
      x86/mm: Make MAX_PHYSADDR_BITS and MAX_PHYSMEM_BITS dynamic · 162434e7
      Kirill A. Shutemov 提交于
      For boot-time switching between paging modes, we need to be able to
      adjust size of physical address space at runtime.
      
      As part of making physical address space size variable, we have to make
      X86_5LEVEL dependent on SPARSEMEM_VMEMMAP. !SPARSEMEM_VMEMMAP
      configuration doesn't build with variable MAX_PHYSMEM_BITS.
      
      For !SPARSEMEM_VMEMMAP SECTIONS_WIDTH depends on MAX_PHYSMEM_BITS:
      
      SECTIONS_WIDTH
        SECTIONS_SHIFT
          MAX_PHYSMEM_BITS
      
      And SECTIONS_WIDTH is used on pre-processor stage, it doesn't work if it's
      dyncamic. See include/linux/page-flags-layout.h.
      
      Effect on kernel image size:
      
         text	   data	    bss	    dec	    hex	filename
      8628393	4734340	1368064	14730797	 e0c62d	vmlinux.before
      8628892	4734340	1368064	14731296	 e0c820	vmlinux.after
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180214111656.88514-8-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      162434e7
  24. 16 1月, 2018 1 次提交
  25. 03 1月, 2018 1 次提交
  26. 12 12月, 2017 1 次提交
  27. 10 11月, 2017 1 次提交
  28. 07 11月, 2017 1 次提交
  29. 04 10月, 2017 1 次提交
    • J
      x86/boot: Spell out "boot CPU" for BP · a1652bb8
      Jean Delvare 提交于
      It's not obvious to everybody that BP stands for boot processor. At
      least it was not for me. And BP is also a CPU register on x86, so it
      is ambiguous. Spell out "boot CPU" everywhere instead.
      Signed-off-by: NJean Delvare <jdelvare@suse.de>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a1652bb8
  30. 26 9月, 2017 1 次提交
    • T
      x86/apic: Sanitize 32/64bit APIC callbacks · 64063505
      Thomas Gleixner 提交于
      The 32bit and the 64bit implementation of default_cpu_present_to_apicid()
      and default_check_phys_apicid_present() are exactly the same, but
      implemented and located differently.
      
      Move them to common apic code and get rid of the pointless difference.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NYu Chen <yu.c.chen@intel.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rui Zhang <rui.zhang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: https://lkml.kernel.org/r/20170913213153.757329991@linutronix.de
      64063505
  31. 25 9月, 2017 2 次提交
  32. 13 9月, 2017 1 次提交
    • A
      x86/mm/64: Initialize CR4.PCIDE early · c7ad5ad2
      Andy Lutomirski 提交于
      cpu_init() is weird: it's called rather late (after early
      identification and after most MMU state is initialized) on the boot
      CPU but is called extremely early (before identification) on secondary
      CPUs.  It's called just late enough on the boot CPU that its CR4 value
      isn't propagated to mmu_cr4_features.
      
      Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two
      problems.  First, we'd crash in the trampoline code.  That's
      fixable, and I tried that.  It turns out that mmu_cr4_features is
      totally ignored by secondary_start_64(), though, so even with the
      trampoline code fixed, it wouldn't help.
      
      This means that we don't currently have CR4.PCIDE reliably initialized
      before we start playing with cpu_tlbstate.  This is very fragile and
      tends to cause boot failures if I make even small changes to the TLB
      handling code.
      
      Make it more robust: initialize CR4.PCIDE earlier on the boot CPU
      and propagate it to secondary CPUs in start_secondary().
      
      ( Yes, this is ugly.  I think we should have improved mmu_cr4_features
        to actually control CR4 during secondary bootup, but that would be
        fairly intrusive at this stage. )
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reported-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Tested-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: 660da7c9 ("x86/mm: Enable CR4.PCIDE on supported systems")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c7ad5ad2
  33. 29 8月, 2017 1 次提交
  34. 04 8月, 2017 1 次提交
    • L
      treewide: Consolidate Apple DMI checks · 630b3aff
      Lukas Wunner 提交于
      We're about to amend ACPI bus scan with DMI checks whether we're running
      on a Mac to support Apple device properties in AML.  The DMI checks are
      performed for every single device, adding overhead for everything x86
      that isn't Apple, which is the majority.  Rafael and Andy therefore
      request to perform the DMI match only once and cache the result.
      
      Outside of ACPI various other Apple DMI checks exist and it seems
      reasonable to use the cached value there as well.  Rafael, Andy and
      Darren suggest performing the DMI check in arch code and making it
      available with a header in include/linux/platform_data/x86/.
      
      To this end, add early_platform_quirks() to arch/x86/kernel/quirks.c
      to perform the DMI check and invoke it from setup_arch().  Switch over
      all existing Apple DMI checks, thereby fixing two deficiencies:
      
      * They are now #defined to false on non-x86 arches and can thus be
        optimized away if they're located in cross-arch code.
      
      * Some of them only match "Apple Inc." but not "Apple Computer, Inc.",
        which is used by BIOSes released between January 2006 (when the first
        x86 Macs started shipping) and January 2007 (when the company name
        changed upon introduction of the iPhone).
      Suggested-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Suggested-by: NDarren Hart <dvhart@infradead.org>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Acked-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      630b3aff
  35. 26 7月, 2017 1 次提交
    • J
      x86/unwind: Add the ORC unwinder · ee9f8fce
      Josh Poimboeuf 提交于
      Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
      It plugs into the existing x86 unwinder framework.
      
      It relies on objtool to generate the needed .orc_unwind and
      .orc_unwind_ip sections.
      
      For more details on why ORC is used instead of DWARF, see
      Documentation/x86/orc-unwinder.txt - but the short version is
      that it's a simplified, fundamentally more robust debugninfo
      data structure, which also allows up to two orders of magnitude
      faster lookups than the DWARF unwinder - which matters to
      profiling workloads like perf.
      
      Thanks to Andy Lutomirski for the performance improvement ideas:
      splitting the ORC unwind table into two parallel arrays and creating a
      fast lookup table to search a subset of the unwind table.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
      [ Extended the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ee9f8fce