1. 13 6月, 2017 1 次提交
  2. 28 5月, 2017 3 次提交
    • B
      x86/efi: Correct EFI identity mapping under 'efi=old_map' when KASLR is enabled · 94133e46
      Baoquan He 提交于
      For EFI with the 'efi=old_map' kernel option specified, the kernel will panic
      when KASLR is enabled:
      
        BUG: unable to handle kernel paging request at 000000007febd57e
        IP: 0x7febd57e
        PGD 1025a067
        PUD 0
      
        Oops: 0010 [#1] SMP
        Call Trace:
         efi_enter_virtual_mode()
         start_kernel()
         x86_64_start_reservations()
         x86_64_start_kernel()
         start_cpu()
      
      The root cause is that the identity mapping is not built correctly
      in the 'efi=old_map' case.
      
      On 'nokaslr' kernels, PAGE_OFFSET is 0xffff880000000000 which is PGDIR_SIZE
      aligned. We can borrow the PUD table from the direct mappings safely. Given a
      physical address X, we have pud_index(X) == pud_index(__va(X)).
      
      However, on KASLR kernels, PAGE_OFFSET is PUD_SIZE aligned. For a given physical
      address X, pud_index(X) != pud_index(__va(X)). We can't just copy the PGD entry
      from direct mapping to build identity mapping, instead we need to copy the
      PUD entries one by one from the direct mapping.
      
      Fix it.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Bhupesh Sharma <bhsharma@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Frank Ramsay <frank.ramsay@hpe.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170526113652.21339-5-matt@codeblueprint.co.uk
      [ Fixed and reworded the changelog and code comments to be more readable. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      94133e46
    • S
      x86/efi: Disable runtime services on kexec kernel if booted with efi=old_map · 4e52797d
      Sai Praneeth 提交于
      Booting kexec kernel with "efi=old_map" in kernel command line hits
      kernel panic as shown below.
      
       BUG: unable to handle kernel paging request at ffff88007fe78070
       IP: virt_efi_set_variable.part.7+0x63/0x1b0
       PGD 7ea28067
       PUD 7ea2b067
       PMD 7ea2d067
       PTE 0
       [...]
       Call Trace:
        virt_efi_set_variable()
        efi_delete_dummy_variable()
        efi_enter_virtual_mode()
        start_kernel()
        x86_64_start_reservations()
        x86_64_start_kernel()
        start_cpu()
      
      [ efi=old_map was never intended to work with kexec. The problem with
        using efi=old_map is that the virtual addresses are assigned from the
        memory region used by other kernel mappings; vmalloc() space.
        Potentially there could be collisions when booting kexec if something
        else is mapped at the virtual address we allocated for runtime service
        regions in the initial boot - Matt Fleming ]
      
      Since kexec was never intended to work with efi=old_map, disable
      runtime services in kexec if booted with efi=old_map, so that we don't
      panic.
      Tested-by: NLee Chun-Yi <jlee@suse.com>
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: NDave Young <dyoung@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170526113652.21339-4-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4e52797d
    • J
      efi: Don't issue error message when booted under Xen · 1ea34adb
      Juergen Gross 提交于
      When booted as Xen dom0 there won't be an EFI memmap allocated. Avoid
      issuing an error message in this case:
      
        [    0.144079] efi: Failed to allocate new EFI memmap
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170526113652.21339-2-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1ea34adb
  3. 09 5月, 2017 1 次提交
  4. 13 4月, 2017 1 次提交
    • O
      x86/efi: Don't try to reserve runtime regions · 6f6266a5
      Omar Sandoval 提交于
      Reserving a runtime region results in splitting the EFI memory
      descriptors for the runtime region. This results in runtime region
      descriptors with bogus memory mappings, leading to interesting crashes
      like the following during a kexec:
      
        general protection fault: 0000 [#1] SMP
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1 #53
        Hardware name: Wiwynn Leopard-Orv2/Leopard-DDR BW, BIOS LBM05   09/30/2016
        RIP: 0010:virt_efi_set_variable()
        ...
        Call Trace:
         efi_delete_dummy_variable()
         efi_enter_virtual_mode()
         start_kernel()
         ? set_init_arg()
         x86_64_start_reservations()
         x86_64_start_kernel()
         start_cpu()
        ...
        Kernel panic - not syncing: Fatal exception
      
      Runtime regions will not be freed and do not need to be reserved, so
      skip the memmap modification in this case.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Fixes: 8e80632f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
      Link: http://lkml.kernel.org/r/20170412152719.9779-2-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f6266a5
  5. 05 4月, 2017 2 次提交
  6. 27 3月, 2017 1 次提交
  7. 23 3月, 2017 1 次提交
  8. 16 3月, 2017 1 次提交
    • T
      x86: Remap GDT tables in the fixmap section · 69218e47
      Thomas Garnier 提交于
      Each processor holds a GDT in its per-cpu structure. The sgdt
      instruction gives the base address of the current GDT. This address can
      be used to bypass KASLR memory randomization. With another bug, an
      attacker could target other per-cpu structures or deduce the base of
      the main memory section (PAGE_OFFSET).
      
      This patch relocates the GDT table for each processor inside the
      fixmap section. The space is reserved based on number of supported
      processors.
      
      For consistency, the remapping is done by default on 32 and 64-bit.
      
      Each processor switches to its remapped GDT at the end of
      initialization. For hibernation, the main processor returns with the
      original GDT and switches back to the remapping at completion.
      
      This patch was tested on both architectures. Hibernation and KVM were
      both tested specially for their usage of the GDT.
      
      Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and
      recommending changes for Xen support.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Luis R . Rodriguez <mcgrof@kernel.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kvm@vger.kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-pm@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69218e47
  9. 14 3月, 2017 1 次提交
  10. 01 2月, 2017 3 次提交
  11. 29 1月, 2017 2 次提交
    • I
      x86/boot/e820: Simplify the e820__update_table() interface · f9748fa0
      Ingo Molnar 提交于
      The e820__update_table() parameters are pretty complex:
      
        arch/x86/include/asm/e820/api.h:extern int  e820__update_table(struct e820_entry *biosmap, int max_nr_map, u32 *pnr_map);
      
      But 90% of the usage is trivial:
      
        arch/x86/kernel/e820.c:	if (e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries))
        arch/x86/kernel/e820.c:	e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
        arch/x86/kernel/e820.c:	e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
        arch/x86/kernel/e820.c:		if (e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries) < 0)
        arch/x86/kernel/e820.c:	e820__update_table(boot_params.e820_table, ARRAY_SIZE(boot_params.e820_table), &new_nr);
        arch/x86/kernel/early-quirks.c:	e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
        arch/x86/kernel/setup.c:	e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
        arch/x86/kernel/setup.c:		e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
        arch/x86/platform/efi/efi.c:	e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
        arch/x86/xen/setup.c:	e820__update_table(xen_e820_table.entries, ARRAY_SIZE(xen_e820_table.entries),
        arch/x86/xen/setup.c:	e820__update_table(e820_table->entries, ARRAY_SIZE(e820_table->entries), &e820_table->nr_entries);
        arch/x86/xen/setup.c:	e820__update_table(xen_e820_table.entries, ARRAY_SIZE(xen_e820_table.entries),
      
      as it only uses an exiting struct e820_table's entries array, its size and
      its current number of entries as input and output arguments.
      
      Only one use is non-trivial:
      
        arch/x86/kernel/e820.c:	e820__update_table(boot_params.e820_table, ARRAY_SIZE(boot_params.e820_table), &new_nr);
      
      ... which call updates the E820 table in the zeropage in-situ, and the layout there does not
      match that of 'struct e820_table' (in particular nr_entries is at a different offset,
      hardcoded by the boot protocol).
      
      Simplify all this by introducing a low level __e820__update_table() API that
      the zeropage update call can use, and simplifying the main e820__update_table()
      call signature down to:
      
      	int e820__update_table(struct e820_table *table);
      
      This visibly simplifies all the call sites:
      
        arch/x86/include/asm/e820/api.h:extern int  e820__update_table(struct e820_table *table);
        arch/x86/include/asm/e820/types.h: * call to e820__update_table() to remove duplicates.  The allowance
        arch/x86/kernel/e820.c: * The return value from e820__update_table() is zero if it
        arch/x86/kernel/e820.c:int __init e820__update_table(struct e820_table *table)
        arch/x86/kernel/e820.c:	if (e820__update_table(e820_table))
        arch/x86/kernel/e820.c:	e820__update_table(e820_table_firmware);
        arch/x86/kernel/e820.c:	e820__update_table(e820_table);
        arch/x86/kernel/e820.c:	e820__update_table(e820_table);
        arch/x86/kernel/e820.c:		if (e820__update_table(e820_table) < 0)
        arch/x86/kernel/early-quirks.c:	e820__update_table(e820_table);
        arch/x86/kernel/setup.c:	e820__update_table(e820_table);
        arch/x86/kernel/setup.c:		e820__update_table(e820_table);
        arch/x86/platform/efi/efi.c:	e820__update_table(e820_table);
        arch/x86/xen/setup.c:	e820__update_table(&xen_e820_table);
        arch/x86/xen/setup.c:	e820__update_table(e820_table);
        arch/x86/xen/setup.c:	e820__update_table(&xen_e820_table);
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f9748fa0
    • I
      x86/boot/e820: Prefix the E820_* type names with "E820_TYPE_" · 09821ff1
      Ingo Molnar 提交于
      So there's a number of constants that start with "E820" but which
      are not types - these create a confusing mixture when seen together
      with 'enum e820_type' values:
      
      	E820MAP
      	E820NR
      	E820_X_MAX
      	E820MAX
      
      To better differentiate the 'enum e820_type' values prefix them
      with E820_TYPE_.
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      09821ff1
  12. 28 1月, 2017 9 次提交
    • I
      x86/boot/e820: Create coherent API function names for E820 range operations · ab6bc04c
      Ingo Molnar 提交于
      We have these three related functions:
      
       extern void e820_add_region(u64 start, u64 size, int type);
       extern u64  e820_update_range(u64 start, u64 size, unsigned old_type, unsigned new_type);
       extern u64  e820_remove_range(u64 start, u64 size, unsigned old_type, int checktype);
      
      But it's not clear from the naming that they are 3 operations based around the
      same 'memory range' concept. Rename them to better signal this, and move
      the prototypes next to each other:
      
       extern void e820__range_add   (u64 start, u64 size, int type);
       extern u64  e820__range_update(u64 start, u64 size, unsigned old_type, unsigned new_type);
       extern u64  e820__range_remove(u64 start, u64 size, unsigned old_type, int checktype);
      
      Note that this improved organization of the functions shows another problem that was easy
      to miss before: sometimes the E820 entry type is 'int', sometimes 'unsigned int' - but this
      will be fixed in a separate patch.
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ab6bc04c
    • I
      x86/boot/e820: Rename e820_any_mapped()/e820_all_mapped() to e820__mapped_any()/e820__mapped_all() · 3bce64f0
      Ingo Molnar 提交于
      The 'any' and 'all' are modified to the 'mapped' concept, so move them last in the name.
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3bce64f0
    • I
      x86/boot/e820: Rename sanitize_e820_table() to e820__update_table() · f52355a9
      Ingo Molnar 提交于
      sanitize_e820_table() is a minor misnomer in that it suggests that
      the E820 table requires sanitizing - which implies that it will only
      do anything if the E820 table is irregular (not sane).
      
      That is wrong, because sanitize_e820_table() also does a very regular
      sorting of the E820 table, which is a necessity in the basic
      append-only flow of E820 updates the kernel is allowed to perform to
      it.
      
      So rename it to e820__update_table() to include that purpose as well.
      
      This also lines up all the table-update functions into a coherent
      naming family:
      
        int  e820__update_table(struct e820_entry *biosmap, int max_nr_map, u32 *pnr_map);
      
        void e820__update_table_print(void);
        void e820__update_table_firmware(void);
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f52355a9
    • I
      x86/boot/e820: Harmonize the 'struct e820_table' fields · bf495573
      Ingo Molnar 提交于
      So the e820_table->map and e820_table->nr_map names are a bit
      confusing, because it's not clear what a 'map' really means
      (it could be a bitmap, or some other data structure), nor is
      it clear what nr_map means (is it a current index, or some
      other count).
      
      Rename the fields from:
      
       e820_table->map        =>     e820_table->entries
       e820_table->nr_map     =>     e820_table->nr_entries
      
      which makes it abundantly clear that these are entries
      of the table, and that the size of the table is ->nr_entries.
      
      Propagate the changes to all affected files. Where necessary,
      adjust local variable names to better reflect the new field names.
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      bf495573
    • I
      x86/boot/e820: Rename everything to e820_table · 61a50101
      Ingo Molnar 提交于
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      61a50101
    • I
      x86/boot/e820: Rename 'e820_map' variables to 'e820_array' · acd4c048
      Ingo Molnar 提交于
      In line with the rename to 'struct e820_array', harmonize the naming of common e820
      table variable names as well:
      
       e820          =>  e820_array
       e820_saved    =>  e820_array_saved
       e820_map      =>  e820_array
       initial_e820  =>  e820_array_init
      
      This makes the variable names more consistent  and easier to grep for.
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      acd4c048
    • I
      x86/boot/e820: Remove spurious asm/e820/api.h inclusions · 5520b7e7
      Ingo Molnar 提交于
      A commonly used lowlevel x86 header, asm/pgtable.h, includes asm/e820/api.h
      spuriously, without making direct use of it.
      
      Removing it is not simple: over the years various .c code learned to rely
      on this indirect inclusion.
      
      Remove the unnecessary include - this should speed up the kernel build a bit,
      as a large header is not included anymore in totally unrelated code.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5520b7e7
    • I
      x86/boot/e820: Move asm/e820.h to asm/e820/api.h · 66441bd3
      Ingo Molnar 提交于
      In line with asm/e820/types.h, move the e820 API declarations to
      asm/e820/api.h and update all usage sites.
      
      This is just a mechanical, obviously correct move & replace patch,
      there will be subsequent changes to clean up the code and to make
      better use of the new header organization.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      66441bd3
    • J
      x86/efi: Always map the first physical page into the EFI pagetables · bf29bddf
      Jiri Kosina 提交于
      Commit:
      
        12976670 ("x86/efi: Only map RAM into EFI page tables if in mixed-mode")
      
      stopped creating 1:1 mappings for all RAM, when running in native 64-bit mode.
      
      It turns out though that there are 64-bit EFI implementations in the wild
      (this particular problem has been reported on a Lenovo Yoga 710-11IKB),
      which still make use of the first physical page for their own private use,
      even though they explicitly mark it EFI_CONVENTIONAL_MEMORY in the memory
      map.
      
      In case there is no mapping for this particular frame in the EFI pagetables,
      as soon as firmware tries to make use of it, a triple fault occurs and the
      system reboots (in case of the Yoga 710-11IKB this is very early during bootup).
      
      Fix that by always mapping the first page of physical memory into the EFI
      pagetables. We're free to hand this page to the BIOS, as trim_bios_range()
      will reserve the first page and isolate it away from memory allocators anyway.
      
      Note that just reverting 12976670 alone is not enough on v4.9-rc1+ to fix the
      regression on affected hardware, as this commit:
      
         ab72a27d ("x86/efi: Consolidate region mapping logic")
      
      later made the first physical frame not to be mapped anyway.
      Reported-by: NHanka Pavlikova <hanka@ucw.cz>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vojtech Pavlik <vojtech@ucw.cz>
      Cc: Waiman Long <waiman.long@hpe.com>
      Cc: linux-efi@vger.kernel.org
      Cc: stable@kernel.org # v4.8+
      Fixes: 12976670 ("x86/efi: Only map RAM into EFI page tables if in mixed-mode")
      Link: http://lkml.kernel.org/r/20170127222552.22336-1-matt@codeblueprint.co.uk
      [ Tidied up the changelog and the comment. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      bf29bddf
  13. 14 1月, 2017 1 次提交
    • P
      efi/x86: Prune invalid memory map entries and fix boot regression · 0100a3e6
      Peter Jones 提交于
      Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
      (2.28), include memory map entries with phys_addr=0x0 and num_pages=0.
      
      These machines fail to boot after the following commit,
      
        commit 8e80632f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
      
      Fix this by removing such bogus entries from the memory map.
      
      Furthermore, currently the log output for this case (with efi=debug)
      looks like:
      
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |  |  |  |  |  ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)
      
      This is clearly wrong, and also not as informative as it could be.  This
      patch changes it so that if we find obviously invalid memory map
      entries, we print an error and skip those entries.  It also detects the
      display of the address range calculation overflow, so the new output is:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[0x0000000000000000-0x0000000000000000] (invalid)
      
      It also detects memory map sizes that would overflow the physical
      address, for example phys_addr=0xfffffffffffff000 and
      num_pages=0x0200000000000001, and prints:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)
      
      It then removes these entries from the memory map.
      Signed-off-by: NPeter Jones <pjones@redhat.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      [Matt: Include bugzilla info in commit log]
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0100a3e6
  14. 07 1月, 2017 1 次提交
    • N
      x86/efi: Don't allocate memmap through memblock after mm_init() · 20b1e22d
      Nicolai Stange 提交于
      With the following commit:
      
        4bc9f92e ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
      
      ...  efi_bgrt_init() calls into the memblock allocator through
      efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init() has been called.
      
      Indeed, KASAN reports a bad read access later on in efi_free_boot_services():
      
        BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
                  at addr ffff88022de12740
        Read of size 4 by task swapper/0/0
        page:ffffea0008b78480 count:0 mapcount:-127
        mapping:          (null) index:0x1 flags: 0x5fff8000000000()
        [...]
        Call Trace:
         dump_stack+0x68/0x9f
         kasan_report_error+0x4c8/0x500
         kasan_report+0x58/0x60
         __asan_load4+0x61/0x80
         efi_free_boot_services+0xae/0x24c
         start_kernel+0x527/0x562
         x86_64_start_reservations+0x24/0x26
         x86_64_start_kernel+0x157/0x17a
         start_cpu+0x5/0x14
      
      The instruction at the given address is the first read from the memmap's
      memory, i.e. the read of md->type in efi_free_boot_services().
      
      Note that the writes earlier in efi_arch_mem_reserve() don't splat because
      they're done through early_memremap()ed addresses.
      
      So, after memblock is gone, allocations should be done through the "normal"
      page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
      it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake
      of consistency, from efi_fake_memmap() as well.
      
      Note that for the latter, the memmap allocations cease to be page aligned.
      This isn't needed though.
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: <stable@vger.kernel.org> # v4.9
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mika Penttilä <mika.penttila@nextfour.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Fixes: 4bc9f92e ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
      Link: http://lkml.kernel.org/r/20170105125130.2815-1-nicstange@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20b1e22d
  15. 13 11月, 2016 2 次提交
    • M
      x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y · f6697df3
      Matt Fleming 提交于
      Booting an EFI mixed mode kernel has been crashing since commit:
      
        e37e43a4 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)")
      
      The user-visible effect in my test setup was the kernel being unable
      to find the root file system ramdisk. This was likely caused by silent
      memory or page table corruption.
      
      Enabling CONFIG_DEBUG_VIRTUAL=y immediately flagged the thunking code as
      abusing virt_to_phys() because it was passing addresses that were not
      part of the kernel direct mapping.
      
      Use the slow version instead, which correctly handles all memory
      regions by performing a page table walk.
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161112210424.5157-3-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f6697df3
    • B
      x86/efi: Fix EFI memmap pointer size warning · 02e56902
      Borislav Petkov 提交于
      Fix this when building on 32-bit:
      
        arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’:
        arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
             (efi_memory_desc_t *)pa);
             ^
        arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
             (efi_memory_desc_t *)pa);
             ^
      
      The @pa local variable is declared as phys_addr_t and that is a u64 when
      CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE
      build.)
      
      However, its value comes from __pa() which is basically doing pointer
      arithmetic and checking, and returns unsigned long as it is the native
      pointer width.
      
      So let's use an unsigned long too. It should be fine to do so because
      the later users cast it to a pointer too.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161112210424.5157-2-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      02e56902
  16. 21 9月, 2016 1 次提交
  17. 20 9月, 2016 2 次提交
    • M
      x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE · 92dc3350
      Matt Fleming 提交于
      Mike Galbraith reported that his machine started rebooting during boot
      after,
      
        commit 8e80632f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
      
      The ESRT table on his machine is 56 bytes and at no point in the
      efi_arch_mem_reserve() call path is that size rounded up to
      EFI_PAGE_SIZE, nor is the start address on an EFI_PAGE_SIZE boundary.
      
      Since the EFI memory map only deals with whole pages, inserting an EFI
      memory region with 56 bytes results in a new entry covering zero
      pages, and completely screws up the calculations for the old regions
      that were trimmed.
      
      Round all sizes upwards, and start addresses downwards, to the nearest
      EFI_PAGE_SIZE boundary.
      
      Additionally, efi_memmap_insert() expects the mem::range::end value to
      be one less than the end address for the region.
      Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Reported-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Tested-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      92dc3350
    • M
      x86/efi: Only map RAM into EFI page tables if in mixed-mode · 12976670
      Matt Fleming 提交于
      Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
      multi-terabyte HP machine results in boot crashes, because the EFI
      region mapping functions loop forever while trying to map those
      regions describing RAM.
      
      While this patch doesn't fix the underlying hang, there's really no
      reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
      when mixed-mode is not in use at runtime.
      Reported-by: NWaiman Long <waiman.long@hpe.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      CC: Theodore Ts'o <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Scott J Norton <scott.norton@hpe.com>
      Cc: Douglas Hatch <doug.hatch@hpe.com>
      Cc: <stable@vger.kernel.org> # v4.6+
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      12976670
  18. 09 9月, 2016 7 次提交
    • M
      x86/efi: Use kmalloc_array() in efi_call_phys_prolog() · 20ebc15e
      Markus Elfring 提交于
      * A multiplication for the size determination of a memory allocation
        indicated that an array data structure should be processed.
        Thus reuse the corresponding function "kmalloc_array".
      
        This issue was detected by using the Coccinelle software.
      
      * Replace the specification of a data type by a pointer dereference
        to make the corresponding size determination a bit safer according to
        the Linux coding style convention.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Julia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      20ebc15e
    • R
      x86/efi: Defer efi_esrt_init until after memblock_x86_fill · 3dad6f7f
      Ricardo Neri 提交于
      Commit 7b02d53e7852 ("efi: Allow drivers to reserve boot services forever")
      introduced a new efi_mem_reserve to reserve the boot services memory
      regions forever. This reservation involves allocating a new EFI memory
      range descriptor. However, allocation can only succeed if there is memory
      available for the allocation. Otherwise, error such as the following may
      occur:
      
      esrt: Reserving ESRT space from 0x000000003dd6a000 to 0x000000003dd6a010.
      Kernel panic - not syncing: ERROR: Failed to allocate 0x9f0 bytes below \
       0x0.
      CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc5+ #503
       0000000000000000 ffffffff81e03ce0 ffffffff8131dae8 ffffffff81bb6c50
       ffffffff81e03d70 ffffffff81e03d60 ffffffff8111f4df 0000000000000018
       ffffffff81e03d70 ffffffff81e03d08 00000000000009f0 00000000000009f0
      Call Trace:
       [<ffffffff8131dae8>] dump_stack+0x4d/0x65
       [<ffffffff8111f4df>] panic+0xc5/0x206
       [<ffffffff81f7c6d3>] memblock_alloc_base+0x29/0x2e
       [<ffffffff81f7c6e3>] memblock_alloc+0xb/0xd
       [<ffffffff81f6c86d>] efi_arch_mem_reserve+0xbc/0x134
       [<ffffffff81fa3280>] efi_mem_reserve+0x2c/0x31
       [<ffffffff81fa3280>] ? efi_mem_reserve+0x2c/0x31
       [<ffffffff81fa40d3>] efi_esrt_init+0x19e/0x1b4
       [<ffffffff81f6d2dd>] efi_init+0x398/0x44a
       [<ffffffff81f5c782>] setup_arch+0x415/0xc30
       [<ffffffff81f55af1>] start_kernel+0x5b/0x3ef
       [<ffffffff81f55434>] x86_64_start_reservations+0x2f/0x31
       [<ffffffff81f55520>] x86_64_start_kernel+0xea/0xed
      ---[ end Kernel panic - not syncing: ERROR: Failed to allocate 0x9f0
           bytes below 0x0.
      
      An inspection of the memblock configuration reveals that there is no memory
      available for the allocation:
      
      MEMBLOCK configuration:
       memory size = 0x0 reserved size = 0x4f339c0
       memory.cnt  = 0x1
       memory[0x0]    [0x00000000000000-0xffffffffffffffff], 0x0 bytes on node 0\
                       flags: 0x0
       reserved.cnt  = 0x4
       reserved[0x0]  [0x0000000008c000-0x0000000008c9bf], 0x9c0 bytes flags: 0x0
       reserved[0x1]  [0x0000000009f000-0x000000000fffff], 0x61000 bytes\
                       flags: 0x0
       reserved[0x2]  [0x00000002800000-0x0000000394bfff], 0x114c000 bytes\
                       flags: 0x0
       reserved[0x3]  [0x000000304e4000-0x00000034269fff], 0x3d86000 bytes\
                       flags: 0x0
      
      This situation can be avoided if we call efi_esrt_init after memblock has
      memory regions for the allocation.
      
      Also, the EFI ESRT driver makes use of early_memremap'pings. Therfore, we
      do not want to defer efi_esrt_init for too long. We must call such function
      while calls to early_memremap are still valid.
      
      A good place to meet the two aforementioned conditions is right after
      memblock_x86_fill, grouped with other EFI-related functions.
      Reported-by: NScott Lawson <scott.lawson@intel.com>
      Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Peter Jones <pjones@redhat.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      3dad6f7f
    • A
      x86/efi: Map in physical addresses in efi_map_region_fixed · 0513fe1d
      Alex Thorlton 提交于
      This is a simple change to add in the physical mappings as well as the
      virtual mappings in efi_map_region_fixed.  The motivation here is to
      get access to EFI runtime code that is only available via the 1:1
      mappings on a kexec'd kernel.
      
      The added call is essentially the kexec analog of the first __map_region
      that Boris put in efi_map_region in commit d2f7cbe7 ("x86/efi:
      Runtime services virtual mapping").
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      0513fe1d
    • M
      x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data · 4bc9f92e
      Matt Fleming 提交于
      efi_mem_reserve() allows us to permanently mark EFI boot services
      regions as reserved, which means we no longer need to copy the image
      data out and into a separate buffer.
      
      Leaving the data in the original boot services region has the added
      benefit that BGRT images can now be passed across kexec reboot.
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
      Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Môshe van der Sterre <me@moshe.nl>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      4bc9f92e
    • M
      efi/runtime-map: Use efi.memmap directly instead of a copy · 31ce8cc6
      Matt Fleming 提交于
      Now that efi.memmap is available all of the time there's no need to
      allocate and build a separate copy of the EFI memory map.
      
      Furthermore, efi.memmap contains boot services regions but only those
      regions that have been reserved via efi_mem_reserve(). Using
      efi.memmap allows us to pass boot services across kexec reboot so that
      the ESRT and BGRT drivers will now work.
      
      Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
      Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      31ce8cc6
    • M
      efi: Allow drivers to reserve boot services forever · 816e7612
      Matt Fleming 提交于
      Today, it is not possible for drivers to reserve EFI boot services for
      access after efi_free_boot_services() has been called on x86. For
      ARM/arm64 it can be done simply by calling memblock_reserve().
      
      Having this ability for all three architectures is desirable for a
      couple of reasons,
      
        1) It saves drivers copying data out of those regions
        2) kexec reboot can now make use of things like ESRT
      
      Instead of using the standard memblock_reserve() which is insufficient
      to reserve the region on x86 (see efi_reserve_boot_services()), a new
      API is introduced in this patch; efi_mem_reserve().
      
      efi.memmap now always represents which EFI memory regions are
      available. On x86 the EFI boot services regions that have not been
      reserved via efi_mem_reserve() will be removed from efi.memmap during
      efi_free_boot_services().
      
      This has implications for kexec, since it is not possible for a newly
      kexec'd kernel to access the same boot services regions that the
      initial boot kernel had access to unless they are reserved by every
      kexec kernel in the chain.
      
      Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
      Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      816e7612
    • M
      efi: Add efi_memmap_init_late() for permanent EFI memmap · dca0f971
      Matt Fleming 提交于
      Drivers need a way to access the EFI memory map at runtime. ARM and
      arm64 currently provide this by remapping the EFI memory map into the
      vmalloc space before setting up the EFI virtual mappings.
      
      x86 does not provide this functionality which has resulted in the code
      in efi_mem_desc_lookup() where it will manually map individual EFI
      memmap entries if the memmap has already been torn down on x86,
      
        /*
         * If a driver calls this after efi_free_boot_services,
         * ->map will be NULL, and the target may also not be mapped.
         * So just always get our own virtual map on the CPU.
         *
         */
        md = early_memremap(p, sizeof (*md));
      
      There isn't a good reason for not providing a permanent EFI memory map
      for runtime queries, especially since the EFI regions are not mapped
      into the standard kernel page tables.
      
      Tested-by: Dave Young <dyoung@redhat.com> [kexec/kdump]
      Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [arm]
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      dca0f971