1. 28 4月, 2016 1 次提交
  2. 12 3月, 2016 1 次提交
    • M
      x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tables · 452308de
      Matt Fleming 提交于
      Some machines have EFI regions in page zero (physical address
      0x00000000) and historically that region has been added to the e820
      map via trim_bios_range(), and ultimately mapped into the kernel page
      tables. It was not mapped via efi_map_regions() as one would expect.
      
      Alexis reports that with the new separate EFI page tables some boot
      services regions, such as page zero, are not mapped. This triggers an
      oops during the SetVirtualAddressMap() runtime call.
      
      For the EFI boot services quirk on x86 we need to memblock_reserve()
      boot services regions until after SetVirtualAddressMap(). Doing that
      while respecting the ownership of regions that may have already been
      reserved by the kernel was the motivation behind this commit:
      
        7d68dc3f ("x86, efi: Do not reserve boot services regions within reserved areas")
      
      That patch was merged at a time when the EFI runtime virtual mappings
      were inserted into the kernel page tables as described above, and the
      trick of setting ->numpages (and hence the region size) to zero to
      track regions that should not be freed in efi_free_boot_services()
      meant that we never mapped those regions in efi_map_regions(). Instead
      we were relying solely on the existing kernel mappings.
      
      Now that we have separate page tables we need to make sure the EFI
      boot services regions are mapped correctly, even if someone else has
      already called memblock_reserve(). Instead of stashing a tag in
      ->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it
      generally makes no sense to mark a boot services region as required at
      runtime, it's pretty much guaranteed the firmware will not have
      already set this bit.
      
      For the record, the specific circumstances under which Alexis
      triggered this bug was that an EFI runtime driver on his machine was
      responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during
      SetVirtualAddressMap().
      
      The event handler for this driver looks like this,
      
        sub rsp,0x28
        lea rdx,[rip+0x2445] # 0xaa948720
        mov ecx,0x4
        call func_aa9447c0  ; call to ConvertPointer(4, & 0xaa948720)
        mov r11,QWORD PTR [rip+0x2434] # 0xaa948720
        xor eax,eax
        mov BYTE PTR [r11+0x1],0x1
        add rsp,0x28
        ret
      
      Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE
      handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting
      instruction because ConvertPointer() was being called to convert the
      address 0x0000000000000000, which when converted is left unchanged and
      remains 0x0000000000000000.
      
      The output of the oops trace gave the impression of a standard NULL
      pointer dereference bug, but because we're accessing physical
      addresses during ConvertPointer(), it wasn't. EFI boot services code
      is stored at that address on Alexis' machine.
      Reported-by: NAlexis Murzeau <amurzeau@gmail.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raphael Hertzog <hertzog@debian.org>
      Cc: Roger Shimizu <rogershimizu@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1457695163-29632-2-git-send-email-matt@codeblueprint.co.uk
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125Signed-off-by: NIngo Molnar <mingo@kernel.org>
      452308de
  3. 29 2月, 2016 1 次提交
    • J
      objtool: Mark non-standard object files and directories · c0dd6716
      Josh Poimboeuf 提交于
      Code which runs outside the kernel's normal mode of operation often does
      unusual things which can cause a static analysis tool like objtool to
      emit false positive warnings:
      
       - boot image
       - vdso image
       - relocation
       - realmode
       - efi
       - head
       - purgatory
       - modpost
      
      Set OBJECT_FILES_NON_STANDARD for their related files and directories,
      which will tell objtool to skip checking them.  It's ok to skip them
      because they don't affect runtime stack traces.
      
      Also skip the following code which does the right thing with respect to
      frame pointers, but is too "special" to be validated by a tool:
      
       - entry
       - mcount
      
      Also skip the test_nx module because it modifies its exception handling
      table at runtime, which objtool can't understand.  Fortunately it's
      just a test module so it doesn't matter much.
      
      Currently objtool is the only user of OBJECT_FILES_NON_STANDARD, but it
      might eventually be useful for other tools.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/366c080e3844e8a5b6a0327dc7e8c2b90ca3baeb.1456719558.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c0dd6716
  4. 24 2月, 2016 1 次提交
  5. 22 2月, 2016 3 次提交
    • S
      x86/efi: Only map kernel text for EFI mixed mode · 2ad510dc
      Sai Praneeth 提交于
      The correct symbol to use when figuring out the size of the kernel
      text is '_etext', not '_end' which is the symbol for the entire kernel
      image includes data and debug sections.
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1455712566-16727-14-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2ad510dc
    • S
      x86/efi: Map EFI_MEMORY_{XP,RO} memory region bits to EFI page tables · 6d0cc887
      Sai Praneeth 提交于
      Now that we have EFI memory region bits that indicate which regions do
      not need execute permission or read/write permission in the page tables,
      let's use them.
      
      We also check for EFI_NX_PE_DATA and only enforce the restrictive
      mappings if it's present (to allow us to ignore buggy firmware that sets
      bits it didn't mean to and to preserve backwards compatibility).
      
      Instead of assuming that firmware would set appropriate attributes in
      memory descriptor like EFI_MEMORY_RO for code and EFI_MEMORY_XP for
      data, we can expect some firmware out there which might only set *type*
      in memory descriptor to be EFI_RUNTIME_SERVICES_CODE or
      EFI_RUNTIME_SERVICES_DATA leaving away attribute. This will lead to
      improper mappings of EFI runtime regions. In order to avoid it, we check
      attribute and type of memory descriptor to update mappings and moreover
      Windows works this way.
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Lee, Chun-Yi <jlee@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1455712566-16727-13-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6d0cc887
    • S
      x86/mm/pat: Don't implicitly allow _PAGE_RW in kernel_map_pages_in_pgd() · 15f003d2
      Sai Praneeth 提交于
      As part of the preparation for the EFI_MEMORY_RO flag added in the UEFI
      2.5 specification, we need the ability to map pages in kernel page
      tables without _PAGE_RW being set.
      
      Modify kernel_map_pages_in_pgd() to require its callers to pass _PAGE_RW
      if the pages need to be mapped read/write. Otherwise, we'll map the
      pages as read-only.
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Lee, Chun-Yi <jlee@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1455712566-16727-12-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      15f003d2
  6. 03 2月, 2016 3 次提交
    • R
      x86/efi: Show actual ending addresses in efi_print_memmap · 1e82b947
      Robert Elliott 提交于
      Adjust efi_print_memmap to print the real end address of each
      range, not 1 byte beyond. This matches other prints like those
      for SRAT and nosave memory.
      
      While investigating grub persistent memory corruption issues, it
      was helpful to make this table match the ending address
      convention used by:
      * the kernel's e820 table prints
      	BIOS-e820: [mem 0x0000001680000000-0x0000001c7fffffff] reserved
      * the kernel's nosave memory prints
      	PM: Registered nosave memory: [mem 0x880000000-0xc7fffffff]
      * the kernel's ACPI System Resource Affinity Table prints
      	SRAT: Node 1 PXM 1 [mem 0x480000000-0x87fffffff]
      * grub's lsmmap and lsefimmap commands
      	reserved  0000001680000000-0000001c7fffffff 00600000     24GiB UC WC WT WB NV
      * the UEFI shell's memmap command
      	Reserved   000000007FC00000-000000007FFFFFFF 0000000000000400 0000000000000001
      
      For example, if you grep all the various logs for c7fffffff, you
      won't find the kernel's line if it uses c80000000.
      
      Also, change the closing ) to ] to match the opening [.
      
      old:
          efi: mem61: [Persistent Memory  |   |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000880000000-0x0000000c80000000) (16384MB)
      
      new:
          efi: mem61: [Persistent Memory  |   |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000880000000-0x0000000c7fffffff] (16384MB)
      Signed-off-by: NRobert Elliott <elliott@hpe.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1454364428-494-12-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e82b947
    • M
      x86/efi/bgrt: Don't ignore the BGRT if the 'valid' bit is 0 · 66dbe99c
      Môshe van der Sterre 提交于
      Unintuitively, the BGRT graphic is apparently meant to be usable
      if the valid bit in not set. The valid bit only conveys
      uncertainty about the validity in relation to the screen state.
      
      Windows 10 actually uses the BGRT image for its boot screen even
      if not 'valid', for example when the user triggered the boot
      menu. Because it is unclear if all firmwares will provide a
      usable graphic in this case, we now look at the BMP magic number
      as an additional check.
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NMôshe van der Sterre <me@moshe.nl>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: =?UTF-8?q?M=C3=B4she=20van=20der=20Sterre?= <me@moshe.nl>
      Link: http://lkml.kernel.org/r/1454364428-494-10-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66dbe99c
    • A
      efi: Add nonblocking option to efi_query_variable_store() · ca0e30dc
      Ard Biesheuvel 提交于
      The function efi_query_variable_store() may be invoked by
      efivar_entry_set_nonblocking(), which itself takes care to only
      call a non-blocking version of the SetVariable() runtime
      wrapper. However, efi_query_variable_store() may call the
      SetVariable() wrapper directly, as well as the wrapper for
      QueryVariableInfo(), both of which could deadlock in the same
      way we are trying to prevent by calling
      efivar_entry_set_nonblocking() in the first place.
      
      So instead, modify efi_query_variable_store() to use the
      non-blocking variants of QueryVariableInfo() (and give up rather
      than free up space if the available space is below
      EFI_MIN_RESERVE) if invoked with the 'nonblocking' argument set
      to true.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1454364428-494-5-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ca0e30dc
  7. 22 1月, 2016 1 次提交
    • M
      x86/efi: Setup separate EFI page tables in kexec paths · 753b11ef
      Matt Fleming 提交于
      The switch to using a new dedicated page table for EFI runtime
      calls in commit commit 67a9108e ("x86/efi: Build our own
      page table structures") failed to take into account changes
      required for the kexec code paths, which are unfortunately
      duplicated in the EFI code.
      
      Call the allocation and setup functions in
      kexec_enter_virtual_mode() just like we do for
      __efi_enter_virtual_mode() to avoid hitting NULL-pointer
      dereferences when making EFI runtime calls.
      
      At the very least, the call to efi_setup_page_tables() should
      have existed for kexec before the following commit:
      
        67a9108e ("x86/efi: Build our own page table structures")
      
      Things just magically worked because we were actually using
      the kernel's page tables that contained the required mappings.
      Reported-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Tested-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1453385519-11477-1-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      753b11ef
  8. 19 1月, 2016 1 次提交
  9. 07 1月, 2016 1 次提交
    • M
      x86/efi-bgrt: Replace early_memremap() with memremap() · e2c90dd7
      Matt Fleming 提交于
      Môshe reported the following warning triggered on his machine since
      commit 50a0cb56 ("x86/efi-bgrt: Fix kernel panic when mapping BGRT
      data"),
      
        [    0.026936] ------------[ cut here ]------------
        [    0.026941] WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:137 __early_ioremap+0x102/0x1bb()
        [    0.026941] Modules linked in:
        [    0.026944] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-rc1 #2
        [    0.026945] Hardware name: Dell Inc. XPS 13 9343/09K8G1, BIOS A05 07/14/2015
        [    0.026946]  0000000000000000 900f03d5a116524d ffffffff81c03e60 ffffffff813a3fff
        [    0.026948]  0000000000000000 ffffffff81c03e98 ffffffff810a0852 00000000d7b76000
        [    0.026949]  0000000000000000 0000000000000001 0000000000000001 000000000000017c
        [    0.026951] Call Trace:
        [    0.026955]  [<ffffffff813a3fff>] dump_stack+0x44/0x55
        [    0.026958]  [<ffffffff810a0852>] warn_slowpath_common+0x82/0xc0
        [    0.026959]  [<ffffffff810a099a>] warn_slowpath_null+0x1a/0x20
        [    0.026961]  [<ffffffff81d8c395>] __early_ioremap+0x102/0x1bb
        [    0.026962]  [<ffffffff81d8c602>] early_memremap+0x13/0x15
        [    0.026964]  [<ffffffff81d78361>] efi_bgrt_init+0x162/0x1ad
        [    0.026966]  [<ffffffff81d778ec>] efi_late_init+0x9/0xb
        [    0.026968]  [<ffffffff81d58ff5>] start_kernel+0x46f/0x49f
        [    0.026970]  [<ffffffff81d58120>] ? early_idt_handler_array+0x120/0x120
        [    0.026972]  [<ffffffff81d58339>] x86_64_start_reservations+0x2a/0x2c
        [    0.026974]  [<ffffffff81d58485>] x86_64_start_kernel+0x14a/0x16d
        [    0.026977] ---[ end trace f9b3812eb8e24c58 ]---
        [    0.026978] efi_bgrt: Ignoring BGRT: failed to map image memory
      
      early_memremap() has an upper limit on the size of mapping it can
      handle which is ~200KB. Clearly the BGRT image on Môshe's machine is
      much larger than that.
      
      There's actually no reason to restrict ourselves to using the early_*
      version of memremap() - the ACPI BGRT driver is invoked late enough in
      boot that we can use the standard version, with the benefit that the
      late version allows mappings of arbitrary size.
      Reported-by: NMôshe van der Sterre <me@moshe.nl>
      Tested-by: NMôshe van der Sterre <me@moshe.nl>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/1450707172-12561-1-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e2c90dd7
  10. 14 12月, 2015 2 次提交
    • S
      x86/efi-bgrt: Fix kernel panic when mapping BGRT data · 50a0cb56
      Sai Praneeth 提交于
      Starting with this commit 35eb8b81edd4 ("x86/efi: Build our own page
      table structures") efi regions have a separate page directory called
      "efi_pgd". In order to access any efi region we have to first shift %cr3
      to this page table. In the bgrt code we are trying to copy bgrt_header
      and image, but these regions fall under "EFI_BOOT_SERVICES_DATA"
      and to access these regions we have to shift %cr3 to efi_pgd and not
      doing so will cause page fault as shown below.
      
      [    0.251599] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
      [    0.259126] Freeing SMP alternatives memory: 32K (ffffffff8230e000 - ffffffff82316000)
      [    0.271803] BUG: unable to handle kernel paging request at fffffffefce35002
      [    0.279740] IP: [<ffffffff821bca49>] efi_bgrt_init+0x144/0x1fd
      [    0.286383] PGD 300f067 PUD 0
      [    0.289879] Oops: 0000 [#1] SMP
      [    0.293566] Modules linked in:
      [    0.297039] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.0-rc1-eywa-eywa-built-in-47041+ #2
      [    0.306619] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.R00.B104.B01.1511110114 11/11/2015
      [    0.320925] task: ffffffff820134c0 ti: ffffffff82000000 task.ti: ffffffff82000000
      [    0.329420] RIP: 0010:[<ffffffff821bca49>]  [<ffffffff821bca49>] efi_bgrt_init+0x144/0x1fd
      [    0.338821] RSP: 0000:ffffffff82003f18  EFLAGS: 00010246
      [    0.344852] RAX: fffffffefce35000 RBX: fffffffefce35000 RCX: fffffffefce2b000
      [    0.352952] RDX: 000000008a82b000 RSI: ffffffff8235bb80 RDI: 000000008a835000
      [    0.361050] RBP: ffffffff82003f30 R08: 000000008a865000 R09: ffffffffff202850
      [    0.369149] R10: ffffffff811ad62f R11: 0000000000000000 R12: 0000000000000000
      [    0.377248] R13: ffff88016dbaea40 R14: ffffffff822622c0 R15: ffffffff82003fb0
      [    0.385348] FS:  0000000000000000(0000) GS:ffff88016d800000(0000) knlGS:0000000000000000
      [    0.394533] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    0.401054] CR2: fffffffefce35002 CR3: 000000000300c000 CR4: 00000000003406f0
      [    0.409153] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    0.417252] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    0.425350] Stack:
      [    0.427638]  ffffffffffffffff ffffffff82256900 ffff88016dbaea40 ffffffff82003f40
      [    0.436086]  ffffffff821bbce0 ffffffff82003f88 ffffffff8219c0c2 0000000000000000
      [    0.444533]  ffffffff8219ba4a ffffffff822622c0 0000000000083000 00000000ffffffff
      [    0.452978] Call Trace:
      [    0.455763]  [<ffffffff821bbce0>] efi_late_init+0x9/0xb
      [    0.461697]  [<ffffffff8219c0c2>] start_kernel+0x463/0x47f
      [    0.467928]  [<ffffffff8219ba4a>] ? set_init_arg+0x55/0x55
      [    0.474159]  [<ffffffff8219b120>] ? early_idt_handler_array+0x120/0x120
      [    0.481669]  [<ffffffff8219b5ee>] x86_64_start_reservations+0x2a/0x2c
      [    0.488982]  [<ffffffff8219b72d>] x86_64_start_kernel+0x13d/0x14c
      [    0.495897] Code: 00 41 b4 01 48 8b 78 28 e8 09 36 01 00 48 85 c0 48 89 c3 75 13 48 c7 c7 f8 ac d3 81 31 c0 e8 d7 3b fb fe e9 b5 00 00 00 45 84 e4 <44> 8b 6b 02 74 0d be 06 00 00 00 48 89 df e8 ae 34 0$
      [    0.518151] RIP  [<ffffffff821bca49>] efi_bgrt_init+0x144/0x1fd
      [    0.524888]  RSP <ffffffff82003f18>
      [    0.528851] CR2: fffffffefce35002
      [    0.532615] ---[ end trace 7b06521e6ebf2aea ]---
      [    0.537852] Kernel panic - not syncing: Attempted to kill the idle task!
      
      As said above one way to fix this bug is to shift %cr3 to efi_pgd but we
      are not doing that way because it leaks inner details of how we switch
      to EFI page tables into a new call site and it also adds duplicate code.
      Instead, we remove the call to efi_lookup_mapped_addr() and always
      perform early_mem*() instead of early_io*() because we want to remap RAM
      regions and not I/O regions. We also delete efi_lookup_mapped_addr()
      because we are no longer using it.
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Reported-by: NWendy Wang <wendy.wang@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      50a0cb56
    • M
      x86/efi: Preface all print statements with efi* tag · 26d7f65f
      Matt Fleming 提交于
      The pr_*() calls in the x86 EFI code may or may not include a
      subsystem tag, which makes it difficult to grep the kernel log for all
      relevant EFI messages and leads users to miss important information.
      
      Recently, a bug reporter provided all the EFI print messages from the
      kernel log when trying to diagnose an issue but missed the following
      statement because it wasn't prefixed with anything indicating it was
      related to EFI,
      
        pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
      
      Cc: Borislav Petkov <bp@suse.de>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      26d7f65f
  11. 29 11月, 2015 4 次提交
    • M
      x86/efi: Build our own page table structures · 67a9108e
      Matt Fleming 提交于
      With commit e1a58320 ("x86/mm: Warn on W^X mappings") all
      users booting on 64-bit UEFI machines see the following warning,
      
        ------------[ cut here ]------------
        WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780()
        x86/mm: Found insecure W+X mapping at address ffff88000005f000/0xffff88000005f000
        ...
        x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found.
        ...
      
      This is caused by mapping EFI regions with RWX permissions.
      There isn't much we can do to restrict the permissions for these
      regions due to the way the firmware toolchains mix code and
      data, but we can at least isolate these mappings so that they do
      not appear in the regular kernel page tables.
      
      In commit d2f7cbe7 ("x86/efi: Runtime services virtual
      mapping") we started using 'trampoline_pgd' to map the EFI
      regions because there was an existing identity mapping there
      which we use during the SetVirtualAddressMap() call and for
      broken firmware that accesses those addresses.
      
      But 'trampoline_pgd' shares some PGD entries with
      'swapper_pg_dir' and does not provide the isolation we require.
      Notably the virtual address for __START_KERNEL_map and
      MODULES_START are mapped by the same PGD entry so we need to be
      more careful when copying changes over in
      efi_sync_low_kernel_mappings().
      
      This patch doesn't go the full mile, we still want to share some
      PGD entries with 'swapper_pg_dir'. Having completely separate
      page tables brings its own issues such as synchronising new
      mappings after memory hotplug and module loading. Sharing also
      keeps memory usage down.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1448658575-17029-6-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      67a9108e
    • M
      x86/efi: Hoist page table switching code into efi_call_virt() · c9f2a9a6
      Matt Fleming 提交于
      This change is a prerequisite for pending patches that switch to
      a dedicated EFI page table, instead of using 'trampoline_pgd'
      which shares PGD entries with 'swapper_pg_dir'. The pending
      patches make it impossible to dereference the runtime service
      function pointer without first switching %cr3.
      
      It's true that we now have duplicated switching code in
      efi_call_virt() and efi_call_phys_{prolog,epilog}() but we are
      sacrificing code duplication for a little more clarity and the
      ease of writing the page table switching code in C instead of
      asm.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1448658575-17029-5-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c9f2a9a6
    • M
      x86/efi: Map RAM into the identity page table for mixed mode · b61a76f8
      Matt Fleming 提交于
      We are relying on the pre-existing mappings in 'trampoline_pgd'
      when accessing function arguments in the EFI mixed mode thunking
      code.
      
      Instead let's map memory explicitly so that things will continue
      to work when we move to a separate page table in the future.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1448658575-17029-4-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b61a76f8
    • M
      x86/mm/pat: Ensure cpa->pfn only contains page frame numbers · edc3b912
      Matt Fleming 提交于
      The x86 pageattr code is confused about the data that is stored
      in cpa->pfn, sometimes it's treated as a page frame number,
      sometimes it's treated as an unshifted physical address, and in
      one place it's treated as a pte.
      
      The result of this is that the mapping functions do not map the
      intended physical address.
      
      This isn't a problem in practice because most of the addresses
      we're mapping in the EFI code paths are already mapped in
      'trampoline_pgd' and so the pageattr mapping functions don't
      actually do anything in this case. But when we move to using a
      separate page table for the EFI runtime this will be an issue.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1448658575-17029-3-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      edc3b912
  12. 28 10月, 2015 1 次提交
    • A
      efi: Use correct type for struct efi_memory_map::phys_map · 44511fb9
      Ard Biesheuvel 提交于
      We have been getting away with using a void* for the physical
      address of the UEFI memory map, since, even on 32-bit platforms
      with 64-bit physical addresses, no truncation takes place if the
      memory map has been allocated by the firmware (which only uses
      1:1 virtually addressable memory), which is usually the case.
      
      However, commit:
      
        0f96a99d ("efi: Add "efi_fake_mem" boot option")
      
      adds code that clones and modifies the UEFI memory map, and the
      clone may live above 4 GB on 32-bit platforms.
      
      This means our use of void* for struct efi_memory_map::phys_map has
      graduated from 'incorrect but working' to 'incorrect and
      broken', and we need to fix it.
      
      So redefine struct efi_memory_map::phys_map as phys_addr_t, and
      get rid of a bunch of casts that are now unneeded.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: izumi.taku@jp.fujitsu.com
      Cc: kamezawa.hiroyu@jp.fujitsu.com
      Cc: linux-efi@vger.kernel.org
      Cc: matt.fleming@intel.com
      Link: http://lkml.kernel.org/r/1445593697-1342-1-git-send-email-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      44511fb9
  13. 12 10月, 2015 2 次提交
  14. 01 10月, 2015 1 次提交
    • M
      x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down · a5caa209
      Matt Fleming 提交于
      Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced
      that signals that the firmware PE/COFF loader supports splitting
      code and data sections of PE/COFF images into separate EFI
      memory map entries. This allows the kernel to map those regions
      with strict memory protections, e.g. EFI_MEMORY_RO for code,
      EFI_MEMORY_XP for data, etc.
      
      Unfortunately, an unwritten requirement of this new feature is
      that the regions need to be mapped with the same offsets
      relative to each other as observed in the EFI memory map. If
      this is not done crashes like this may occur,
      
        BUG: unable to handle kernel paging request at fffffffefe6086dd
        IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
        Call Trace:
         [<ffffffff8104c90e>] efi_call+0x7e/0x100
         [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
         [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
         [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
         [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
         [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
         [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
      
      Here 0xfffffffefe6086dd refers to an address the firmware
      expects to be mapped but which the OS never claimed was mapped.
      The issue is that included in these regions are relative
      addresses to other regions which were emitted by the firmware
      toolchain before the "splitting" of sections occurred at
      runtime.
      
      Needless to say, we don't satisfy this unwritten requirement on
      x86_64 and instead map the EFI memory map entries in reverse
      order. The above crash is almost certainly triggerable with any
      kernel newer than v3.13 because that's when we rewrote the EFI
      runtime region mapping code, in commit d2f7cbe7 ("x86/efi:
      Runtime services virtual mapping"). For kernel versions before
      v3.13 things may work by pure luck depending on the
      fragmentation of the kernel virtual address space at the time we
      map the EFI regions.
      
      Instead of mapping the EFI memory map entries in reverse order,
      where entry N has a higher virtual address than entry N+1, map
      them in the same order as they appear in the EFI memory map to
      preserve this relative offset between regions.
      
      This patch has been kept as small as possible with the intention
      that it should be applied aggressively to stable and
      distribution kernels. It is very much a bugfix rather than
      support for a new feature, since when EFI_PROPERTIES_TABLE is
      enabled we must map things as outlined above to even boot - we
      have no way of asking the firmware not to split the code/data
      regions.
      
      In fact, this patch doesn't even make use of the more strict
      memory protections available in UEFI v2.5. That will come later.
      Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reported-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chun-Yi <jlee@suse.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: James Bottomley <JBottomley@Odin.com>
      Cc: Lee, Chun-Yi <jlee@suse.com>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1443218539-7610-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a5caa209
  15. 11 9月, 2015 1 次提交
    • D
      kexec: split kexec_load syscall from kexec core code · 2965faa5
      Dave Young 提交于
      There are two kexec load syscalls, kexec_load another and kexec_file_load.
       kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
      split kexec_load syscall code to kernel/kexec.c.
      
      And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
      use kexec_file_load only, or vice verse.
      
      The original requirement is from Ted Ts'o, he want kexec kernel signature
      being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
      kexec_load syscall can bypass the checking.
      
      Vivek Goyal proposed to create a common kconfig option so user can compile
      in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
      KEXEC_CORE so that old config files still work.
      
      Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
      architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
      KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
      kexec_load syscall.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2965faa5
  16. 08 8月, 2015 2 次提交
  17. 31 7月, 2015 1 次提交
    • R
      efi: Check for NULL efi kernel parameters · 9115c758
      Ricardo Neri 提交于
      Even though it is documented how to specifiy efi parameters, it is
      possible to cause a kernel panic due to a dereference of a NULL pointer when
      parsing such parameters if "efi" alone is given:
      
      PANIC: early exception 0e rip 10:ffffffff812fb361 error 0 cr2 0
      [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.2.0-rc1+ #450
      [ 0.000000]  ffffffff81fe20a9 ffffffff81e03d50 ffffffff8184bb0f 00000000000003f8
      [ 0.000000]  0000000000000000 ffffffff81e03e08 ffffffff81f371a1 64656c62616e6520
      [ 0.000000]  0000000000000069 000000000000005f 0000000000000000 0000000000000000
      [ 0.000000] Call Trace:
      [ 0.000000]  [<ffffffff8184bb0f>] dump_stack+0x45/0x57
      [ 0.000000]  [<ffffffff81f371a1>] early_idt_handler_common+0x81/0xae
      [ 0.000000]  [<ffffffff812fb361>] ? parse_option_str+0x11/0x90
      [ 0.000000]  [<ffffffff81f4dd69>] arch_parse_efi_cmdline+0x15/0x42
      [ 0.000000]  [<ffffffff81f376e1>] do_early_param+0x50/0x8a
      [ 0.000000]  [<ffffffff8106b1b3>] parse_args+0x1e3/0x400
      [ 0.000000]  [<ffffffff81f37a43>] parse_early_options+0x24/0x28
      [ 0.000000]  [<ffffffff81f37691>] ? loglevel+0x31/0x31
      [ 0.000000]  [<ffffffff81f37a78>] parse_early_param+0x31/0x3d
      [ 0.000000]  [<ffffffff81f3ae98>] setup_arch+0x2de/0xc08
      [ 0.000000]  [<ffffffff8109629a>] ? vprintk_default+0x1a/0x20
      [ 0.000000]  [<ffffffff81f37b20>] start_kernel+0x90/0x423
      [ 0.000000]  [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
      [ 0.000000]  [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
      [ 0.000000] RIP 0xffffffff81ba2efc
      
      This panic is not reproducible with "efi=" as this will result in a non-NULL
      zero-length string.
      
      Thus, verify that the pointer to the parameter string is not NULL. This is
      consistent with other parameter-parsing functions which check for NULL pointers.
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      9115c758
  18. 25 6月, 2015 1 次提交
  19. 28 5月, 2015 1 次提交
    • D
      e820, efi: add ACPI 6.0 persistent memory types · ad5fb870
      Dan Williams 提交于
      ACPI 6.0 formalizes e820-type-7 and efi-type-14 as persistent memory.
      Mark it "reserved" and allow it to be claimed by a persistent memory
      device driver.
      
      This definition is in addition to the Linux kernel's existing type-12
      definition that was recently added in support of shipping platforms with
      NVDIMM support that predate ACPI 6.0 (which now classifies type-12 as
      OEM reserved).
      
      Note, /proc/iomem can be consulted for differentiating legacy
      "Persistent Memory (legacy)" E820_PRAM vs standard "Persistent Memory"
      E820_PMEM.
      
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ad5fb870
  20. 01 5月, 2015 1 次提交
  21. 01 4月, 2015 3 次提交
  22. 24 2月, 2015 1 次提交
  23. 13 2月, 2015 1 次提交
    • M
      x86/efi: Avoid triple faults during EFI mixed mode calls · 96738c69
      Matt Fleming 提交于
      Andy pointed out that if an NMI or MCE is received while we're in the
      middle of an EFI mixed mode call a triple fault will occur. This can
      happen, for example, when issuing an EFI mixed mode call while running
      perf.
      
      The reason for the triple fault is that we execute the mixed mode call
      in 32-bit mode with paging disabled but with 64-bit kernel IDT handlers
      installed throughout the call.
      
      At Andy's suggestion, stop playing the games we currently do at runtime,
      such as disabling paging and installing a 32-bit GDT for __KERNEL_CS. We
      can simply switch to the __KERNEL32_CS descriptor before invoking
      firmware services, and run in compatibility mode. This way, if an
      NMI/MCE does occur the kernel IDT handler will execute correctly, since
      it'll jump to __KERNEL_CS automatically.
      
      However, this change is only possible post-ExitBootServices(). Before
      then the firmware "owns" the machine and expects for its 32-bit IDT
      handlers to be left intact to service interrupts, etc.
      
      So, we now need to distinguish between early boot and runtime
      invocations of EFI services. During early boot, we need to restore the
      GDT that the firmware expects to be present. We can only jump to the
      __KERNEL32_CS code segment for mixed mode calls after ExitBootServices()
      has been invoked.
      
      A liberal sprinkling of comments in the thunking code should make the
      differences in early and late environments more apparent.
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      96738c69
  24. 12 11月, 2014 1 次提交
  25. 04 10月, 2014 4 次提交
    • M
      x86/efi: Mark initialization code as such · 4e78eb05
      Mathias Krause 提交于
      The 32 bit and 64 bit implementations differ in their __init annotations
      for some functions referenced from the common EFI code. Namely, the 32
      bit variant is missing some of the __init annotations the 64 bit variant
      has.
      
      To solve the colliding annotations, mark the corresponding functions in
      efi_32.c as initialization code, too -- as it is such.
      
      Actually, quite a few more functions are only used during initialization
      and therefore can be marked __init. They are therefore annotated, too.
      Also add the __init annotation to the prototypes in the efi.h header so
      users of those functions will see it's meant as initialization code
      only.
      
      This patch also fixes the "prelog" typo. ("prologue" / "epilogue" might
      be more appropriate but this is C code after all, not an opera! :D)
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      4e78eb05
    • M
      x86/efi: Update comment regarding required phys mapped EFI services · 0ce4605c
      Mathias Krause 提交于
      Commit 3f4a7836 ("x86/efi: Rip out phys_efi_get_time()") left
      set_virtual_address_map as the only runtime service needed with a
      phys mapping but missed to update the preceding comment. Fix that.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      0ce4605c
    • M
      x86/efi: Unexport add_efi_memmap variable · 60920685
      Mathias Krause 提交于
      This variable was accidentally exported, even though it's only used in
      this compilation unit and only during initialization.
      
      Remove the bogus export, make the variable static instead and mark it
      as __initdata.
      
      Fixes: 200001eb ("x86 boot: only pick up additional EFI memmap...")
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      60920685
    • L
      x86: efi: Format EFI memory type & attrs with efi_md_typeattr_format() · ace1d121
      Laszlo Ersek 提交于
      An example log excerpt demonstrating the change:
      
      Before the patch:
      
      > efi: mem00: type=7, attr=0xf, range=[0x0000000000000000-0x000000000009f000) (0MB)
      > efi: mem01: type=2, attr=0xf, range=[0x000000000009f000-0x00000000000a0000) (0MB)
      > efi: mem02: type=7, attr=0xf, range=[0x0000000000100000-0x0000000000400000) (3MB)
      > efi: mem03: type=2, attr=0xf, range=[0x0000000000400000-0x0000000000800000) (4MB)
      > efi: mem04: type=10, attr=0xf, range=[0x0000000000800000-0x0000000000808000) (0MB)
      > efi: mem05: type=7, attr=0xf, range=[0x0000000000808000-0x0000000000810000) (0MB)
      > efi: mem06: type=10, attr=0xf, range=[0x0000000000810000-0x0000000000900000) (0MB)
      > efi: mem07: type=4, attr=0xf, range=[0x0000000000900000-0x0000000001100000) (8MB)
      > efi: mem08: type=7, attr=0xf, range=[0x0000000001100000-0x0000000001400000) (3MB)
      > efi: mem09: type=2, attr=0xf, range=[0x0000000001400000-0x0000000002613000) (18MB)
      > efi: mem10: type=7, attr=0xf, range=[0x0000000002613000-0x0000000004000000) (25MB)
      > efi: mem11: type=4, attr=0xf, range=[0x0000000004000000-0x0000000004020000) (0MB)
      > efi: mem12: type=7, attr=0xf, range=[0x0000000004020000-0x00000000068ea000) (40MB)
      > efi: mem13: type=2, attr=0xf, range=[0x00000000068ea000-0x00000000068f0000) (0MB)
      > efi: mem14: type=3, attr=0xf, range=[0x00000000068f0000-0x0000000006c7b000) (3MB)
      > efi: mem15: type=6, attr=0x800000000000000f, range=[0x0000000006c7b000-0x0000000006c7d000) (0MB)
      > efi: mem16: type=5, attr=0x800000000000000f, range=[0x0000000006c7d000-0x0000000006c85000) (0MB)
      > efi: mem17: type=6, attr=0x800000000000000f, range=[0x0000000006c85000-0x0000000006c87000) (0MB)
      > efi: mem18: type=3, attr=0xf, range=[0x0000000006c87000-0x0000000006ca3000) (0MB)
      > efi: mem19: type=6, attr=0x800000000000000f, range=[0x0000000006ca3000-0x0000000006ca6000) (0MB)
      > efi: mem20: type=10, attr=0xf, range=[0x0000000006ca6000-0x0000000006cc6000) (0MB)
      > efi: mem21: type=6, attr=0x800000000000000f, range=[0x0000000006cc6000-0x0000000006d95000) (0MB)
      > efi: mem22: type=5, attr=0x800000000000000f, range=[0x0000000006d95000-0x0000000006e22000) (0MB)
      > efi: mem23: type=7, attr=0xf, range=[0x0000000006e22000-0x0000000007165000) (3MB)
      > efi: mem24: type=4, attr=0xf, range=[0x0000000007165000-0x0000000007d22000) (11MB)
      > efi: mem25: type=7, attr=0xf, range=[0x0000000007d22000-0x0000000007d25000) (0MB)
      > efi: mem26: type=3, attr=0xf, range=[0x0000000007d25000-0x0000000007ea2000) (1MB)
      > efi: mem27: type=5, attr=0x800000000000000f, range=[0x0000000007ea2000-0x0000000007ed2000) (0MB)
      > efi: mem28: type=6, attr=0x800000000000000f, range=[0x0000000007ed2000-0x0000000007ef6000) (0MB)
      > efi: mem29: type=7, attr=0xf, range=[0x0000000007ef6000-0x0000000007f00000) (0MB)
      > efi: mem30: type=9, attr=0xf, range=[0x0000000007f00000-0x0000000007f02000) (0MB)
      > efi: mem31: type=10, attr=0xf, range=[0x0000000007f02000-0x0000000007f06000) (0MB)
      > efi: mem32: type=4, attr=0xf, range=[0x0000000007f06000-0x0000000007fd0000) (0MB)
      > efi: mem33: type=6, attr=0x800000000000000f, range=[0x0000000007fd0000-0x0000000007ff0000) (0MB)
      > efi: mem34: type=7, attr=0xf, range=[0x0000000007ff0000-0x0000000008000000) (0MB)
      
      After the patch:
      
      > efi: mem00: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000000000-0x000000000009f000) (0MB)
      > efi: mem01: [Loader Data        |   |  |  |  |   |WB|WT|WC|UC] range=[0x000000000009f000-0x00000000000a0000) (0MB)
      > efi: mem02: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000100000-0x0000000000400000) (3MB)
      > efi: mem03: [Loader Data        |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000400000-0x0000000000800000) (4MB)
      > efi: mem04: [ACPI Memory NVS    |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000800000-0x0000000000808000) (0MB)
      > efi: mem05: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000808000-0x0000000000810000) (0MB)
      > efi: mem06: [ACPI Memory NVS    |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000810000-0x0000000000900000) (0MB)
      > efi: mem07: [Boot Data          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000000900000-0x0000000001100000) (8MB)
      > efi: mem08: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000001100000-0x0000000001400000) (3MB)
      > efi: mem09: [Loader Data        |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000001400000-0x0000000002613000) (18MB)
      > efi: mem10: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000002613000-0x0000000004000000) (25MB)
      > efi: mem11: [Boot Data          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000004000000-0x0000000004020000) (0MB)
      > efi: mem12: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000004020000-0x00000000068ea000) (40MB)
      > efi: mem13: [Loader Data        |   |  |  |  |   |WB|WT|WC|UC] range=[0x00000000068ea000-0x00000000068f0000) (0MB)
      > efi: mem14: [Boot Code          |   |  |  |  |   |WB|WT|WC|UC] range=[0x00000000068f0000-0x0000000006c7b000) (3MB)
      > efi: mem15: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006c7b000-0x0000000006c7d000) (0MB)
      > efi: mem16: [Runtime Code       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006c7d000-0x0000000006c85000) (0MB)
      > efi: mem17: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006c85000-0x0000000006c87000) (0MB)
      > efi: mem18: [Boot Code          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000006c87000-0x0000000006ca3000) (0MB)
      > efi: mem19: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006ca3000-0x0000000006ca6000) (0MB)
      > efi: mem20: [ACPI Memory NVS    |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000006ca6000-0x0000000006cc6000) (0MB)
      > efi: mem21: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006cc6000-0x0000000006d95000) (0MB)
      > efi: mem22: [Runtime Code       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000006d95000-0x0000000006e22000) (0MB)
      > efi: mem23: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000006e22000-0x0000000007165000) (3MB)
      > efi: mem24: [Boot Data          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007165000-0x0000000007d22000) (11MB)
      > efi: mem25: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007d22000-0x0000000007d25000) (0MB)
      > efi: mem26: [Boot Code          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007d25000-0x0000000007ea2000) (1MB)
      > efi: mem27: [Runtime Code       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000007ea2000-0x0000000007ed2000) (0MB)
      > efi: mem28: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000007ed2000-0x0000000007ef6000) (0MB)
      > efi: mem29: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007ef6000-0x0000000007f00000) (0MB)
      > efi: mem30: [ACPI Reclaim Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007f00000-0x0000000007f02000) (0MB)
      > efi: mem31: [ACPI Memory NVS    |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007f02000-0x0000000007f06000) (0MB)
      > efi: mem32: [Boot Data          |   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007f06000-0x0000000007fd0000) (0MB)
      > efi: mem33: [Runtime Data       |RUN|  |  |  |   |WB|WT|WC|UC] range=[0x0000000007fd0000-0x0000000007ff0000) (0MB)
      > efi: mem34: [Conventional Memory|   |  |  |  |   |WB|WT|WC|UC] range=[0x0000000007ff0000-0x0000000008000000) (0MB)
      
      Both the type enum and the attribute bitmap are decoded, with the
      additional benefit that the memory ranges line up as well.
      Signed-off-by: NLaszlo Ersek <lersek@redhat.com>
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      ace1d121