1. 17 8月, 2017 2 次提交
    • B
      x86/boot/KASLR: Prefer mirrored memory regions for the kernel physical address · c05cd797
      Baoquan He 提交于
      Currently KASLR will parse all e820 entries of RAM type and add all
      candidate positions into the slots array. After that we choose one slot
      randomly as the new position which the kernel will be decompressed into
      and run at.
      
      On systems with EFI enabled, e820 memory regions are coming from EFI
      memory regions by combining adjacent regions.
      
      These EFI memory regions have various attributes, and the "mirrored"
      attribute is one of them. The physical memory region whose descriptors
      in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
      mirrored. The address range mirroring feature of the kernel arranges such
      mirrored regions into normal zones and other regions into movable zones.
      
      With the mirroring feature enabled, the code and data of the kernel can only
      be located in the more reliable mirrored regions. However, the current KASLR
      code doesn't check EFI memory entries, and could choose a new kernel position
      in non-mirrored regions. This will break the intended functionality of the
      address range mirroring feature.
      
      To fix this, if EFI is detected, iterate EFI memory map and pick the mirrored
      region to process for adding candidate of randomization slot. If EFI is disabled
      or no mirrored region found, still process the e820 memory map.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: ard.biesheuvel@linaro.org
      Cc: fanc.fnst@cn.fujitsu.com
      Cc: izumi.taku@jp.fujitsu.com
      Cc: keescook@chromium.org
      Cc: linux-efi@vger.kernel.org
      Cc: matt@codeblueprint.co.uk
      Cc: n-horiguchi@ah.jp.nec.com
      Cc: thgarnie@google.com
      Link: http://lkml.kernel.org/r/1502722464-20614-3-git-send-email-bhe@redhat.com
      [ Rewrote most of the text. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c05cd797
    • B
      efi: Introduce efi_early_memdesc_ptr to get pointer to memmap descriptor · 02e43c2d
      Baoquan He 提交于
      The existing map iteration helper for_each_efi_memory_desc_in_map can
      only be used after the kernel initializes the EFI subsystem to set up
      struct efi_memory_map.
      
      Before that we also need iterate map descriptors which are stored in several
      intermediate structures, like struct efi_boot_memmap for arch independent
      usage and struct efi_info for x86 arch only.
      
      Introduce efi_early_memdesc_ptr() to get pointer to a map descriptor, and
      replace several places where that primitive is open coded.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Various improvements to the text. ]
      Acked-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: ard.biesheuvel@linaro.org
      Cc: fanc.fnst@cn.fujitsu.com
      Cc: izumi.taku@jp.fujitsu.com
      Cc: keescook@chromium.org
      Cc: linux-efi@vger.kernel.org
      Cc: n-horiguchi@ah.jp.nec.com
      Cc: thgarnie@google.com
      Link: http://lkml.kernel.org/r/20170816134651.GF21273@x1Signed-off-by: NIngo Molnar <mingo@kernel.org>
      02e43c2d
  2. 28 7月, 2017 1 次提交
    • M
      x86/boot: Disable the address-of-packed-member compiler warning · 20c6c189
      Matthias Kaehlcke 提交于
      The clang warning 'address-of-packed-member' is disabled for the general
      kernel code, also disable it for the x86 boot code.
      
      This suppresses a bunch of warnings like this when building with clang:
      
      ./arch/x86/include/asm/processor.h:535:30: warning: taking address of
        packed member 'sp0' of class or structure 'x86_hw_tss' may result in an
        unaligned pointer value [-Waddress-of-packed-member]
          return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
                                      ^~~~~~~~~~~~~~~~~~~
      ./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro
        'this_cpu_read_stable'
          #define this_cpu_read_stable(var)       percpu_stable_op("mov", var)
                                                                          ^~~
      ./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro
        'percpu_stable_op'
          : "p" (&(var)));
                   ^~~
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Cc: Doug Anderson <dianders@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20c6c189
  3. 18 7月, 2017 3 次提交
  4. 13 7月, 2017 1 次提交
    • D
      include/linux/string.h: add the option of fortified string.h functions · 6974f0c4
      Daniel Micay 提交于
      This adds support for compiling with a rough equivalent to the glibc
      _FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
      overflow checks for string.h functions when the compiler determines the
      size of the source or destination buffer at compile-time.  Unlike glibc,
      it covers buffer reads in addition to writes.
      
      GNU C __builtin_*_chk intrinsics are avoided because they would force a
      much more complex implementation.  They aren't designed to detect read
      overflows and offer no real benefit when using an implementation based
      on inline checks.  Inline checks don't add up to much code size and
      allow full use of the regular string intrinsics while avoiding the need
      for a bunch of _chk functions and per-arch assembly to avoid wrapper
      overhead.
      
      This detects various overflows at compile-time in various drivers and
      some non-x86 core kernel code.  There will likely be issues caught in
      regular use at runtime too.
      
      Future improvements left out of initial implementation for simplicity,
      as it's all quite optional and can be done incrementally:
      
      * Some of the fortified string functions (strncpy, strcat), don't yet
        place a limit on reads from the source based on __builtin_object_size of
        the source buffer.
      
      * Extending coverage to more string functions like strlcat.
      
      * It should be possible to optionally use __builtin_object_size(x, 1) for
        some functions (C strings) to detect intra-object overflows (like
        glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
        approach to avoid likely compatibility issues.
      
      * The compile-time checks should be made available via a separate config
        option which can be enabled by default (or always enabled) once enough
        time has passed to get the issues it catches fixed.
      
      Kees said:
       "This is great to have. While it was out-of-tree code, it would have
        blocked at least CVE-2016-3858 from being exploitable (improper size
        argument to strlcpy()). I've sent a number of fixes for
        out-of-bounds-reads that this detected upstream already"
      
      [arnd@arndb.de: x86: fix fortified memcpy]
        Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
      [keescook@chromium.org: avoid panic() in favor of BUG()]
        Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
      [keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
      Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
      Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.orgSigned-off-by: NDaniel Micay <danielmicay@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6974f0c4
  5. 30 6月, 2017 3 次提交
  6. 13 6月, 2017 5 次提交
  7. 31 5月, 2017 1 次提交
    • A
      x86/KASLR: Use the right memcpy() implementation · 5b8b9cf7
      Arnd Bergmann 提交于
      The decompressor has its own implementation of the string functions,
      but has to include the right header to get those, while implicitly
      including linux/string.h may result in a link error:
      
        arch/x86/boot/compressed/kaslr.o: In function `choose_random_location':
        kaslr.c:(.text+0xf51): undefined reference to `_mmx_memcpy'
      
      This has appeared now as KASLR started using memcpy(), via:
      
      	d52e7d5a ("x86/KASLR: Parse all 'memmap=' boot option entries")
      
      Other files in the decompressor already do the same thing.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170530091446.1000183-1-arnd@arndb.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5b8b9cf7
  8. 24 5月, 2017 2 次提交
  9. 21 5月, 2017 1 次提交
  10. 08 5月, 2017 1 次提交
    • X
      x86/mm: Add support for gbpages to kernel_ident_mapping_init() · 66aad4fd
      Xunlei Pang 提交于
      Kernel identity mappings on x86-64 kernels are created in two
      ways: by the early x86 boot code, or by kernel_ident_mapping_init().
      
      Native kernels (which is the dominant usecase) use the former,
      but the kexec and the hibernation code uses kernel_ident_mapping_init().
      
      There's a subtle difference between these two ways of how identity
      mappings are created, the current kernel_ident_mapping_init() code
      creates identity mappings always using 2MB page(PMD level) - while
      the native kernel boot path also utilizes gbpages where available.
      
      This difference is suboptimal both for performance and for memory
      usage: kernel_ident_mapping_init() needs to allocate pages for the
      page tables when creating the new identity mappings.
      
      This patch adds 1GB page(PUD level) support to kernel_ident_mapping_init()
      to address these concerns.
      
      The primary advantage would be better TLB coverage/performance,
      because we'd utilize 1GB TLBs instead of 2MB ones.
      
      It is also useful for machines with large number of memory to
      save paging structure allocations(around 4MB/TB using 2MB page)
      when setting identity mappings for all the memory, after using
      1GB page it will consume only 8KB/TB.
      
      ( Note that this change alone does not activate gbpages in kexec,
        we are doing that in a separate patch. )
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: akpm@linux-foundation.org
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1493862171-8799-1-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66aad4fd
  11. 07 5月, 2017 1 次提交
    • K
      x86/boot: Declare error() as noreturn · 60854a12
      Kees Cook 提交于
      The compressed boot function error() is used to halt execution, but it
      wasn't marked with "noreturn". This fixes that in preparation for
      supporting kernel FORTIFY_SOURCE, which uses the noreturn annotation
      on panic, and calls error(). GCC would warn about a noreturn function
      calling a non-noreturn function:
      
        arch/x86/boot/compressed/misc.c: In function ‘fortify_panic’:
        arch/x86/boot/compressed/misc.c:416:1: warning: ‘noreturn’ function does return
         }
       ^
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/20170506045116.GA2879@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>
      60854a12
  12. 28 4月, 2017 1 次提交
    • B
      x86/KASLR: Fix kexec kernel boot crash when KASLR randomization fails · da63b6b2
      Baoquan He 提交于
      Dave found that a kdump kernel with KASLR enabled will reset to the BIOS
      immediately if physical randomization failed to find a new position for
      the kernel. A kernel with the 'nokaslr' option works in this case.
      
      The reason is that KASLR will install a new page table for the identity
      mapping, while it missed building it for the original kernel location
      if KASLR physical randomization fails.
      
      This only happens in the kexec/kdump kernel, because the identity mapping
      has been built for kexec/kdump in the 1st kernel for the whole memory by
      calling init_pgtable(). Here if physical randomizaiton fails, it won't build
      the identity mapping for the original area of the kernel but change to a
      new page table '_pgtable'. Then the kernel will triple fault immediately
      caused by no identity mappings.
      
      The normal kernel won't see this bug, because it comes here via startup_32()
      and CR3 will be set to _pgtable already. In startup_32() the identity
      mapping is built for the 0~4G area. In KASLR we just append to the existing
      area instead of entirely overwriting it for on-demand identity mapping
      building. So the identity mapping for the original area of kernel is still
      there.
      
      To fix it we just switch to the new identity mapping page table when physical
      KASLR succeeds. Otherwise we keep the old page table unchanged just like
      "nokaslr" does.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1493278940-5885-1-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      da63b6b2
  13. 31 3月, 2017 2 次提交
  14. 07 2月, 2017 2 次提交
  15. 01 2月, 2017 2 次提交
  16. 29 1月, 2017 2 次提交
    • I
      x86/boot/e820: Separate the E820 ABI structures from the in-kernel structures · 7410aa1c
      Ingo Molnar 提交于
      Linus pointed out that relying on the compiler to pack structures with
      enums is fragile not just for the kernel, but for external tooling as
      well which might rely on our UAPI headers.
      
      So separate the two from each other: introduce 'struct boot_e820_entry',
      which is the boot protocol entry format.
      
      This actually simplifies the code, as e820__update_table() is now never
      called directly with boot protocol table entries - we can rely on
      append_e820_table() and do a e820__update_table() call afterwards.
      
      ( This will allow further simplifications of __e820__update_table(),
        but that will be done in a separate patch. )
      
      This change also has the side effect of not modifying the bootparams structure
      anymore - which might be useful for debugging. In theory we could even constify
      the boot_params structure - at least from the E820 code's point of view.
      
      Remove the uapi/asm/e820/types.h file, as it's not used anymore - all
      kernel side E820 types are defined in asm/e820/types.h.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7410aa1c
    • I
      x86/boot/e820: Prefix the E820_* type names with "E820_TYPE_" · 09821ff1
      Ingo Molnar 提交于
      So there's a number of constants that start with "E820" but which
      are not types - these create a confusing mixture when seen together
      with 'enum e820_type' values:
      
      	E820MAP
      	E820NR
      	E820_X_MAX
      	E820MAX
      
      To better differentiate the 'enum e820_type' values prefix them
      with E820_TYPE_.
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      09821ff1
  17. 28 1月, 2017 4 次提交
    • I
      x86/boot/e820: Rename everything to e820_table · 61a50101
      Ingo Molnar 提交于
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      61a50101
    • I
      x86/boot/e820: Rename 'e820_map' variables to 'e820_array' · acd4c048
      Ingo Molnar 提交于
      In line with the rename to 'struct e820_array', harmonize the naming of common e820
      table variable names as well:
      
       e820          =>  e820_array
       e820_saved    =>  e820_array_saved
       e820_map      =>  e820_array
       initial_e820  =>  e820_array_init
      
      This makes the variable names more consistent  and easier to grep for.
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      acd4c048
    • I
      x86/boot/e820: Rename the basic e820 data types to 'struct e820_entry' and 'struct e820_array' · 8ec67d97
      Ingo Molnar 提交于
      The 'e820entry' and 'e820map' names have various annoyances:
      
       - the missing underscore departs from the usual kernel style
         and makes the code look weird,
      
       - in the past I kept confusing the 'map' with the 'entry', because
         a 'map' is ambiguous in that regard,
      
       - it's not really clear from the 'e820map' that this is a regular
         C array.
      
      Rename them to 'struct e820_entry' and 'struct e820_array' accordingly.
      
      ( Leave the legacy UAPI header alone but do the rename in the bootparam.h
        and e820/types.h file - outside tools relying on these defines should
        either adjust their code, or should use the legacy header, or should
        create their private copies for the definitions. )
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8ec67d97
    • I
      x86/boot/e820: Remove spurious asm/e820/api.h inclusions · 5520b7e7
      Ingo Molnar 提交于
      A commonly used lowlevel x86 header, asm/pgtable.h, includes asm/e820/api.h
      spuriously, without making direct use of it.
      
      Removing it is not simple: over the years various .c code learned to rely
      on this indirect inclusion.
      
      Remove the unnecessary include - this should speed up the kernel build a bit,
      as a large header is not included anymore in totally unrelated code.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5520b7e7
  18. 25 1月, 2017 1 次提交
    • D
      x86/boot: Fix KASLR and memmap= collision · f2844249
      Dave Jiang 提交于
      CONFIG_RANDOMIZE_BASE=y relocates the kernel to a random base address.
      
      However it does not take into account the memmap= parameter passed in from
      the kernel command line. This results in the kernel sometimes being put in
      the middle of memmap.
      
      Teach KASLR to not insert the kernel in memmap defined regions. We support
      up to 4 memmap regions: any additional regions will cause KASLR to disable.
      
      The mem_avoid set has been augmented to add up to 4 unusable regions of
      memmaps provided by the user to exclude those regions from the set of valid
      address range to insert the uncompressed kernel image.
      
      The nn@ss ranges will be skipped by the mem_avoid set since it indicates
      that memory is useable.
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: dan.j.williams@intel.com
      Cc: david@fromorbit.com
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/148417664156.131935.2248592164852799738.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2844249
  19. 14 12月, 2016 1 次提交
    • P
      Remove references to dead make variable LINUX_INCLUDE · 846221cf
      Paul Bolle 提交于
      Commit 4fd06960 ("Use the new x86 setup code for i386") introduced a
      reference to the make variable LINUX_INCLUDE. That reference got moved
      around a bit and copied twice and now there are three references to it.
      
      There has never been a definition of that variable. (Presumably that is
      because it started out as a mistyped reference to LINUXINCLUDE.) So this
      reference has always been an empty string. Let's remove it before it
      spreads any further.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      846221cf
  20. 21 11月, 2016 1 次提交
    • H
      x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well · a980ce35
      H.J. Lu 提交于
      Since the bootloader may load the compressed x86 kernel at any address,
      it should always be built as PIE, not just when CONFIG_RELOCATABLE=y.
      
      Otherwise, linker in binutils 2.27 will optimize GOT load into the
      absolute address when building the compressed x86 kernel as a non-PIE
      executable.
      Signed-off-by: NH.J. Lu <hjl.tools@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      [ Small wording changes. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a980ce35
  21. 13 11月, 2016 1 次提交
    • L
      x86/efi: Retrieve and assign Apple device properties · 58c5475a
      Lukas Wunner 提交于
      Apple's EFI drivers supply device properties which are needed to support
      Macs optimally. They contain vital information which cannot be obtained
      any other way (e.g. Thunderbolt Device ROM). They're also used to convey
      the current device state so that OS drivers can pick up where EFI
      drivers left (e.g. GPU mode setting).
      
      There's an EFI driver dubbed "AAPL,PathProperties" which implements a
      per-device key/value store. Other EFI drivers populate it using a custom
      protocol. The macOS bootloader /System/Library/CoreServices/boot.efi
      retrieves the properties with the same protocol. The kernel extension
      AppleACPIPlatform.kext subsequently merges them into the I/O Kit
      registry (see ioreg(8)) where they can be queried by other kernel
      extensions and user space.
      
      This commit extends the efistub to retrieve the device properties before
      ExitBootServices is called. It assigns them to devices in an fs_initcall
      so that they can be queried with the API in <linux/property.h>.
      
      Note that the device properties will only be available if the kernel is
      booted with the efistub. Distros should adjust their installers to
      always use the efistub on Macs. grub with the "linux" directive will not
      work unless the functionality of this commit is duplicated in grub.
      (The "linuxefi" directive should work but is not included upstream as of
      this writing.)
      
      The custom protocol has GUID 91BD12FE-F6C3-44FB-A5B7-5122AB303AE0 and
      looks like this:
      
      typedef struct {
      	unsigned long version; /* 0x10000 */
      	efi_status_t (*get) (
      		IN	struct apple_properties_protocol *this,
      		IN	struct efi_dev_path *device,
      		IN	efi_char16_t *property_name,
      		OUT	void *buffer,
      		IN OUT	u32 *buffer_len);
      		/* EFI_SUCCESS, EFI_NOT_FOUND, EFI_BUFFER_TOO_SMALL */
      	efi_status_t (*set) (
      		IN	struct apple_properties_protocol *this,
      		IN	struct efi_dev_path *device,
      		IN	efi_char16_t *property_name,
      		IN	void *property_value,
      		IN	u32 property_value_len);
      		/* allocates copies of property name and value */
      		/* EFI_SUCCESS, EFI_OUT_OF_RESOURCES */
      	efi_status_t (*del) (
      		IN	struct apple_properties_protocol *this,
      		IN	struct efi_dev_path *device,
      		IN	efi_char16_t *property_name);
      		/* EFI_SUCCESS, EFI_NOT_FOUND */
      	efi_status_t (*get_all) (
      		IN	struct apple_properties_protocol *this,
      		OUT	void *buffer,
      		IN OUT	u32 *buffer_len);
      		/* EFI_SUCCESS, EFI_BUFFER_TOO_SMALL */
      } apple_properties_protocol;
      
      Thanks to Pedro Vilaça for this blog post which was helpful in reverse
      engineering Apple's EFI drivers and bootloader:
      https://reverse.put.as/2016/06/25/apple-efi-firmware-passwords-and-the-scbo-myth/
      
      If someone at Apple is reading this, please note there's a memory leak
      in your implementation of the del() function as the property struct is
      freed but the name and value allocations are not.
      
      Neither the macOS bootloader nor Apple's EFI drivers check the protocol
      version, but we do to avoid breakage if it's ever changed. It's been the
      same since at least OS X 10.6 (2009).
      
      The get_all() function conveniently fills a buffer with all properties
      in marshalled form which can be passed to the kernel as a setup_data
      payload. The number of device properties is dynamic and can change
      between a first invocation of get_all() (to determine the buffer size)
      and a second invocation (to retrieve the actual buffer), hence the
      peculiar loop which does not finish until the buffer size settles.
      The macOS bootloader does the same.
      
      The setup_data payload is later on unmarshalled in an fs_initcall. The
      idea is that most buses instantiate devices in "subsys" initcall level
      and drivers are usually bound to these devices in "device" initcall
      level, so we assign the properties in-between, i.e. in "fs" initcall
      level.
      
      This assumes that devices to which properties pertain are instantiated
      from a "subsys" initcall or earlier. That should always be the case
      since on macOS, AppleACPIPlatformExpert::matchEFIDevicePath() only
      supports ACPI and PCI nodes and we've fully scanned those buses during
      "subsys" initcall level.
      
      The second assumption is that properties are only needed from a "device"
      initcall or later. Seems reasonable to me, but should this ever not work
      out, an alternative approach would be to store the property sets e.g. in
      a btree early during boot. Then whenever device_add() is called, an EFI
      Device Path would have to be constructed for the newly added device,
      and looked up in the btree. That way, the property set could be assigned
      to the device immediately on instantiation. And this would also work for
      devices instantiated in a deferred fashion. It seems like this approach
      would be more complicated and require more code. That doesn't seem
      justified without a specific use case.
      
      For comparison, the strategy on macOS is to assign properties to objects
      in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()).
      That approach is definitely wrong as it fails for devices not present in
      the namespace: The NHI EFI driver supplies properties for attached
      Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device
      level behind the host controller is described in the namespace.
      Consequently macOS cannot assign properties for chained devices. With
      Thunderbolt 2 they started to describe three device levels behind host
      controllers in the namespace but this grossly inflates the SSDT and
      still fails if the user daisy-chained more than three devices.
      
      We copy the property names and values from the setup_data payload to
      swappable virtual memory and afterwards make the payload available to
      the page allocator. This is just for the sake of good housekeeping, it
      wouldn't occupy a meaningful amount of physical memory (4444 bytes on my
      machine). Only the payload is freed, not the setup_data header since
      otherwise we'd break the list linkage and we cannot safely update the
      predecessor's ->next link because there's no locking for the list.
      
      The payload is currently not passed on to kexec'ed kernels, same for PCI
      ROMs retrieved by setup_efi_pci(). This can be added later if there is
      demand by amending setup_efi_state(). The payload can then no longer be
      made available to the page allocator of course.
      
      Tested-by: Lukas Wunner <lukas@wunner.de> [MacBookPro9,1]
      Tested-by: Pierre Moreau <pierre.morrow@free.fr> [MacBookPro11,3]
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andreas Noever <andreas.noever@gmail.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pedro Vilaça <reverser@put.as>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: grub-devel@gnu.org
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161112213237.8804-9-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      58c5475a
  22. 07 11月, 2016 1 次提交
  23. 09 9月, 2016 1 次提交
    • L
      x86/efi: Allow invocation of arbitrary boot services · 0a637ee6
      Lukas Wunner 提交于
      We currently allow invocation of 8 boot services with efi_call_early().
      Not included are LocateHandleBuffer and LocateProtocol in particular.
      For graphics output or to retrieve PCI ROMs and Apple device properties,
      we're thus forced to use the LocateHandle + AllocatePool + LocateHandle
      combo, which is cumbersome and needs more code.
      
      The ARM folks allow invocation of the full set of boot services but are
      restricted to our 8 boot services in functions shared across arches.
      
      Thus, rather than adding just LocateHandleBuffer and LocateProtocol to
      struct efi_config, let's rework efi_call_early() to allow invocation of
      arbitrary boot services by selecting the 64 bit vs 32 bit code path in
      the macro itself.
      
      When compiling for 32 bit or for 64 bit without mixed mode, the unused
      code path is optimized away and the binary code is the same as before.
      But on 64 bit with mixed mode enabled, this commit adds one compare
      instruction to each invocation of a boot service and, depending on the
      code path selected, two jump instructions. (Most of the time gcc
      arranges the jumps in the 32 bit code path.) The result is a minuscule
      performance penalty and the binary code becomes slightly larger and more
      difficult to read when disassembled. This isn't a hot path, so these
      drawbacks are arguably outweighed by the attainable simplification of
      the C code. We have some overhead anyway for thunking or conversion
      between calling conventions.
      
      The 8 boot services can consequently be removed from struct efi_config.
      
      No functional change intended (for now).
      
      Example -- invocation of free_pool before (64 bit code path):
      0x2d4      movq  %ds:efi_early, %rdx          ; efi_early
      0x2db      movq  %ss:arg_0-0x20(%rsp), %rsi
      0x2e0      xorl  %eax, %eax
      0x2e2      movq  %ds:0x28(%rdx), %rdi         ; efi_early->free_pool
      0x2e6      callq *%ds:0x58(%rdx)              ; efi_early->call()
      
      Example -- invocation of free_pool after (64 / 32 bit mixed code path):
      0x0dc      movq  %ds:efi_early, %rax          ; efi_early
      0x0e3      cmpb  $0, %ds:0x28(%rax)           ; !efi_early->is64 ?
      0x0e7      movq  %ds:0x20(%rax), %rdx         ; efi_early->call()
      0x0eb      movq  %ds:0x10(%rax), %rax         ; efi_early->boot_services
      0x0ef      je    $0x150
      0x0f1      movq  %ds:0x48(%rax), %rdi         ; free_pool (64 bit)
      0x0f5      xorl  %eax, %eax
      0x0f7      callq *%rdx
      ...
      0x150      movl  %ds:0x30(%rax), %edi         ; free_pool (32 bit)
      0x153      jmp   $0x0f5
      
      Size of eboot.o text section:
      CONFIG_X86_32:                         6464 before, 6318 after
      CONFIG_X86_64 && !CONFIG_EFI_MIXED:    7670 before, 7573 after
      CONFIG_X86_64 &&  CONFIG_EFI_MIXED:    7670 before, 8319 after
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      0a637ee6