1. 29 1月, 2017 2 次提交
    • I
      x86/boot/e820: Separate the E820 ABI structures from the in-kernel structures · 7410aa1c
      Ingo Molnar 提交于
      Linus pointed out that relying on the compiler to pack structures with
      enums is fragile not just for the kernel, but for external tooling as
      well which might rely on our UAPI headers.
      
      So separate the two from each other: introduce 'struct boot_e820_entry',
      which is the boot protocol entry format.
      
      This actually simplifies the code, as e820__update_table() is now never
      called directly with boot protocol table entries - we can rely on
      append_e820_table() and do a e820__update_table() call afterwards.
      
      ( This will allow further simplifications of __e820__update_table(),
        but that will be done in a separate patch. )
      
      This change also has the side effect of not modifying the bootparams structure
      anymore - which might be useful for debugging. In theory we could even constify
      the boot_params structure - at least from the E820 code's point of view.
      
      Remove the uapi/asm/e820/types.h file, as it's not used anymore - all
      kernel side E820 types are defined in asm/e820/types.h.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7410aa1c
    • I
      x86/boot/e820: Prefix the E820_* type names with "E820_TYPE_" · 09821ff1
      Ingo Molnar 提交于
      So there's a number of constants that start with "E820" but which
      are not types - these create a confusing mixture when seen together
      with 'enum e820_type' values:
      
      	E820MAP
      	E820NR
      	E820_X_MAX
      	E820MAX
      
      To better differentiate the 'enum e820_type' values prefix them
      with E820_TYPE_.
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      09821ff1
  2. 28 1月, 2017 6 次提交
    • I
      x86/boot/e820: Rename everything to e820_table · 61a50101
      Ingo Molnar 提交于
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      61a50101
    • I
      x86/boot/e820: Rename 'e820_map' variables to 'e820_array' · acd4c048
      Ingo Molnar 提交于
      In line with the rename to 'struct e820_array', harmonize the naming of common e820
      table variable names as well:
      
       e820          =>  e820_array
       e820_saved    =>  e820_array_saved
       e820_map      =>  e820_array
       initial_e820  =>  e820_array_init
      
      This makes the variable names more consistent  and easier to grep for.
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      acd4c048
    • I
      x86/boot/e820: Rename the basic e820 data types to 'struct e820_entry' and 'struct e820_array' · 8ec67d97
      Ingo Molnar 提交于
      The 'e820entry' and 'e820map' names have various annoyances:
      
       - the missing underscore departs from the usual kernel style
         and makes the code look weird,
      
       - in the past I kept confusing the 'map' with the 'entry', because
         a 'map' is ambiguous in that regard,
      
       - it's not really clear from the 'e820map' that this is a regular
         C array.
      
      Rename them to 'struct e820_entry' and 'struct e820_array' accordingly.
      
      ( Leave the legacy UAPI header alone but do the rename in the bootparam.h
        and e820/types.h file - outside tools relying on these defines should
        either adjust their code, or should use the legacy header, or should
        create their private copies for the definitions. )
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8ec67d97
    • I
      x86/boot/e820: Remove assembly guard from asm/e820/types.h · b0bd00d6
      Ingo Molnar 提交于
      There's an assembly guard in asm/e820/types.h, and only
      a single .S file includes this header: arch/x86/boot/header.S,
      but it does not actually make use of any of the E820 defines.
      
      Remove the inclusion and remove the assembly guard as well.
      
      No change in functionality.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b0bd00d6
    • I
      x86/boot/e820: Remove spurious asm/e820/api.h inclusions · 5520b7e7
      Ingo Molnar 提交于
      A commonly used lowlevel x86 header, asm/pgtable.h, includes asm/e820/api.h
      spuriously, without making direct use of it.
      
      Removing it is not simple: over the years various .c code learned to rely
      on this indirect inclusion.
      
      Remove the unnecessary include - this should speed up the kernel build a bit,
      as a large header is not included anymore in totally unrelated code.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5520b7e7
    • I
      x86/boot/e820: Move asm/e820.h to asm/e820/api.h · 66441bd3
      Ingo Molnar 提交于
      In line with asm/e820/types.h, move the e820 API declarations to
      asm/e820/api.h and update all usage sites.
      
      This is just a mechanical, obviously correct move & replace patch,
      there will be subsequent changes to clean up the code and to make
      better use of the new header organization.
      
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      66441bd3
  3. 25 1月, 2017 1 次提交
    • D
      x86/boot: Fix KASLR and memmap= collision · f2844249
      Dave Jiang 提交于
      CONFIG_RANDOMIZE_BASE=y relocates the kernel to a random base address.
      
      However it does not take into account the memmap= parameter passed in from
      the kernel command line. This results in the kernel sometimes being put in
      the middle of memmap.
      
      Teach KASLR to not insert the kernel in memmap defined regions. We support
      up to 4 memmap regions: any additional regions will cause KASLR to disable.
      
      The mem_avoid set has been augmented to add up to 4 unusable regions of
      memmaps provided by the user to exclude those regions from the set of valid
      address range to insert the uncompressed kernel image.
      
      The nn@ss ranges will be skipped by the mem_avoid set since it indicates
      that memory is useable.
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: dan.j.williams@intel.com
      Cc: david@fromorbit.com
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/148417664156.131935.2248592164852799738.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2844249
  4. 09 1月, 2017 1 次提交
  5. 19 12月, 2016 1 次提交
  6. 14 12月, 2016 1 次提交
    • P
      Remove references to dead make variable LINUX_INCLUDE · 846221cf
      Paul Bolle 提交于
      Commit 4fd06960 ("Use the new x86 setup code for i386") introduced a
      reference to the make variable LINUX_INCLUDE. That reference got moved
      around a bit and copied twice and now there are three references to it.
      
      There has never been a definition of that variable. (Presumably that is
      because it started out as a mistyped reference to LINUXINCLUDE.) So this
      reference has always been an empty string. Let's remove it before it
      spreads any further.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      846221cf
  7. 21 11月, 2016 2 次提交
    • H
      x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well · a980ce35
      H.J. Lu 提交于
      Since the bootloader may load the compressed x86 kernel at any address,
      it should always be built as PIE, not just when CONFIG_RELOCATABLE=y.
      
      Otherwise, linker in binutils 2.27 will optimize GOT load into the
      absolute address when building the compressed x86 kernel as a non-PIE
      executable.
      Signed-off-by: NH.J. Lu <hjl.tools@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      [ Small wording changes. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a980ce35
    • A
      x86/boot: Fail the boot if !M486 and CPUID is missing · ed68d7e9
      Andy Lutomirski 提交于
      Linux will have all kinds of sporadic problems on systems that don't
      have the CPUID instruction unless CONFIG_M486=y.  In particular,
      sync_core() will explode.
      
      I believe that these kernels had a better chance of working before
      commit 05fb3c19 ("x86/boot: Initialize FPU and X86_FEATURE_ALWAYS
      even if we don't have CPUID").  That commit inadvertently fixed a
      serious bug: we used to fail to detect the FPU if CPUID wasn't
      present.  Because we also used to forget to set X86_FEATURE_ALWAYS, we
      end up with no cpu feature bits set at all.  This meant that
      alternative patching didn't do anything and, if paravirt was disabled,
      we could plausibly finish the entire boot process without calling
      sync_core().
      
      Rather than trying to work around these issues, just have the kernel
      fail loudly if it's running on a CPUID-less 486, doesn't have CPUID,
      and doesn't have CONFIG_M486 set.
      Reported-by: NMatthew Whitehead <tedheadster@gmail.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/70eac6639f23df8be5fe03fa1984aedd5d40077a.1479598603.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ed68d7e9
  8. 13 11月, 2016 1 次提交
    • L
      x86/efi: Retrieve and assign Apple device properties · 58c5475a
      Lukas Wunner 提交于
      Apple's EFI drivers supply device properties which are needed to support
      Macs optimally. They contain vital information which cannot be obtained
      any other way (e.g. Thunderbolt Device ROM). They're also used to convey
      the current device state so that OS drivers can pick up where EFI
      drivers left (e.g. GPU mode setting).
      
      There's an EFI driver dubbed "AAPL,PathProperties" which implements a
      per-device key/value store. Other EFI drivers populate it using a custom
      protocol. The macOS bootloader /System/Library/CoreServices/boot.efi
      retrieves the properties with the same protocol. The kernel extension
      AppleACPIPlatform.kext subsequently merges them into the I/O Kit
      registry (see ioreg(8)) where they can be queried by other kernel
      extensions and user space.
      
      This commit extends the efistub to retrieve the device properties before
      ExitBootServices is called. It assigns them to devices in an fs_initcall
      so that they can be queried with the API in <linux/property.h>.
      
      Note that the device properties will only be available if the kernel is
      booted with the efistub. Distros should adjust their installers to
      always use the efistub on Macs. grub with the "linux" directive will not
      work unless the functionality of this commit is duplicated in grub.
      (The "linuxefi" directive should work but is not included upstream as of
      this writing.)
      
      The custom protocol has GUID 91BD12FE-F6C3-44FB-A5B7-5122AB303AE0 and
      looks like this:
      
      typedef struct {
      	unsigned long version; /* 0x10000 */
      	efi_status_t (*get) (
      		IN	struct apple_properties_protocol *this,
      		IN	struct efi_dev_path *device,
      		IN	efi_char16_t *property_name,
      		OUT	void *buffer,
      		IN OUT	u32 *buffer_len);
      		/* EFI_SUCCESS, EFI_NOT_FOUND, EFI_BUFFER_TOO_SMALL */
      	efi_status_t (*set) (
      		IN	struct apple_properties_protocol *this,
      		IN	struct efi_dev_path *device,
      		IN	efi_char16_t *property_name,
      		IN	void *property_value,
      		IN	u32 property_value_len);
      		/* allocates copies of property name and value */
      		/* EFI_SUCCESS, EFI_OUT_OF_RESOURCES */
      	efi_status_t (*del) (
      		IN	struct apple_properties_protocol *this,
      		IN	struct efi_dev_path *device,
      		IN	efi_char16_t *property_name);
      		/* EFI_SUCCESS, EFI_NOT_FOUND */
      	efi_status_t (*get_all) (
      		IN	struct apple_properties_protocol *this,
      		OUT	void *buffer,
      		IN OUT	u32 *buffer_len);
      		/* EFI_SUCCESS, EFI_BUFFER_TOO_SMALL */
      } apple_properties_protocol;
      
      Thanks to Pedro Vilaça for this blog post which was helpful in reverse
      engineering Apple's EFI drivers and bootloader:
      https://reverse.put.as/2016/06/25/apple-efi-firmware-passwords-and-the-scbo-myth/
      
      If someone at Apple is reading this, please note there's a memory leak
      in your implementation of the del() function as the property struct is
      freed but the name and value allocations are not.
      
      Neither the macOS bootloader nor Apple's EFI drivers check the protocol
      version, but we do to avoid breakage if it's ever changed. It's been the
      same since at least OS X 10.6 (2009).
      
      The get_all() function conveniently fills a buffer with all properties
      in marshalled form which can be passed to the kernel as a setup_data
      payload. The number of device properties is dynamic and can change
      between a first invocation of get_all() (to determine the buffer size)
      and a second invocation (to retrieve the actual buffer), hence the
      peculiar loop which does not finish until the buffer size settles.
      The macOS bootloader does the same.
      
      The setup_data payload is later on unmarshalled in an fs_initcall. The
      idea is that most buses instantiate devices in "subsys" initcall level
      and drivers are usually bound to these devices in "device" initcall
      level, so we assign the properties in-between, i.e. in "fs" initcall
      level.
      
      This assumes that devices to which properties pertain are instantiated
      from a "subsys" initcall or earlier. That should always be the case
      since on macOS, AppleACPIPlatformExpert::matchEFIDevicePath() only
      supports ACPI and PCI nodes and we've fully scanned those buses during
      "subsys" initcall level.
      
      The second assumption is that properties are only needed from a "device"
      initcall or later. Seems reasonable to me, but should this ever not work
      out, an alternative approach would be to store the property sets e.g. in
      a btree early during boot. Then whenever device_add() is called, an EFI
      Device Path would have to be constructed for the newly added device,
      and looked up in the btree. That way, the property set could be assigned
      to the device immediately on instantiation. And this would also work for
      devices instantiated in a deferred fashion. It seems like this approach
      would be more complicated and require more code. That doesn't seem
      justified without a specific use case.
      
      For comparison, the strategy on macOS is to assign properties to objects
      in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()).
      That approach is definitely wrong as it fails for devices not present in
      the namespace: The NHI EFI driver supplies properties for attached
      Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device
      level behind the host controller is described in the namespace.
      Consequently macOS cannot assign properties for chained devices. With
      Thunderbolt 2 they started to describe three device levels behind host
      controllers in the namespace but this grossly inflates the SSDT and
      still fails if the user daisy-chained more than three devices.
      
      We copy the property names and values from the setup_data payload to
      swappable virtual memory and afterwards make the payload available to
      the page allocator. This is just for the sake of good housekeeping, it
      wouldn't occupy a meaningful amount of physical memory (4444 bytes on my
      machine). Only the payload is freed, not the setup_data header since
      otherwise we'd break the list linkage and we cannot safely update the
      predecessor's ->next link because there's no locking for the list.
      
      The payload is currently not passed on to kexec'ed kernels, same for PCI
      ROMs retrieved by setup_efi_pci(). This can be added later if there is
      demand by amending setup_efi_state(). The payload can then no longer be
      made available to the page allocator of course.
      
      Tested-by: Lukas Wunner <lukas@wunner.de> [MacBookPro9,1]
      Tested-by: Pierre Moreau <pierre.morrow@free.fr> [MacBookPro11,3]
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andreas Noever <andreas.noever@gmail.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pedro Vilaça <reverser@put.as>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: grub-devel@gnu.org
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161112213237.8804-9-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      58c5475a
  9. 07 11月, 2016 2 次提交
  10. 09 9月, 2016 3 次提交
    • L
      x86/efi: Allow invocation of arbitrary boot services · 0a637ee6
      Lukas Wunner 提交于
      We currently allow invocation of 8 boot services with efi_call_early().
      Not included are LocateHandleBuffer and LocateProtocol in particular.
      For graphics output or to retrieve PCI ROMs and Apple device properties,
      we're thus forced to use the LocateHandle + AllocatePool + LocateHandle
      combo, which is cumbersome and needs more code.
      
      The ARM folks allow invocation of the full set of boot services but are
      restricted to our 8 boot services in functions shared across arches.
      
      Thus, rather than adding just LocateHandleBuffer and LocateProtocol to
      struct efi_config, let's rework efi_call_early() to allow invocation of
      arbitrary boot services by selecting the 64 bit vs 32 bit code path in
      the macro itself.
      
      When compiling for 32 bit or for 64 bit without mixed mode, the unused
      code path is optimized away and the binary code is the same as before.
      But on 64 bit with mixed mode enabled, this commit adds one compare
      instruction to each invocation of a boot service and, depending on the
      code path selected, two jump instructions. (Most of the time gcc
      arranges the jumps in the 32 bit code path.) The result is a minuscule
      performance penalty and the binary code becomes slightly larger and more
      difficult to read when disassembled. This isn't a hot path, so these
      drawbacks are arguably outweighed by the attainable simplification of
      the C code. We have some overhead anyway for thunking or conversion
      between calling conventions.
      
      The 8 boot services can consequently be removed from struct efi_config.
      
      No functional change intended (for now).
      
      Example -- invocation of free_pool before (64 bit code path):
      0x2d4      movq  %ds:efi_early, %rdx          ; efi_early
      0x2db      movq  %ss:arg_0-0x20(%rsp), %rsi
      0x2e0      xorl  %eax, %eax
      0x2e2      movq  %ds:0x28(%rdx), %rdi         ; efi_early->free_pool
      0x2e6      callq *%ds:0x58(%rdx)              ; efi_early->call()
      
      Example -- invocation of free_pool after (64 / 32 bit mixed code path):
      0x0dc      movq  %ds:efi_early, %rax          ; efi_early
      0x0e3      cmpb  $0, %ds:0x28(%rax)           ; !efi_early->is64 ?
      0x0e7      movq  %ds:0x20(%rax), %rdx         ; efi_early->call()
      0x0eb      movq  %ds:0x10(%rax), %rax         ; efi_early->boot_services
      0x0ef      je    $0x150
      0x0f1      movq  %ds:0x48(%rax), %rdi         ; free_pool (64 bit)
      0x0f5      xorl  %eax, %eax
      0x0f7      callq *%rdx
      ...
      0x150      movl  %ds:0x30(%rax), %edi         ; free_pool (32 bit)
      0x153      jmp   $0x0f5
      
      Size of eboot.o text section:
      CONFIG_X86_32:                         6464 before, 6318 after
      CONFIG_X86_64 && !CONFIG_EFI_MIXED:    7670 before, 7573 after
      CONFIG_X86_64 &&  CONFIG_EFI_MIXED:    7670 before, 8319 after
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      0a637ee6
    • L
      x86/efi: Remove unused find_bits() function · 15cf7cae
      Lukas Wunner 提交于
      Left behind by commit fc372064 ("efi/libstub: Move Graphics Output
      Protocol handling to generic code").
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      15cf7cae
    • C
      x86/efi: Initialize status to ensure garbage is not returned on small size · ac0e94b6
      Colin Ian King 提交于
      Although very unlikey, if size is too small or zero, then we end up with
      status not being set and returning garbage. Instead, initializing status to
      EFI_INVALID_PARAMETER to indicate that size is invalid in the calls to
      setup_uga32 and setup_uga64.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      ac0e94b6
  11. 05 9月, 2016 2 次提交
    • J
      x86/efi: Use efi_exit_boot_services() · d6493401
      Jeffrey Hugo 提交于
      The eboot code directly calls ExitBootServices.  This is inadvisable as the
      UEFI spec details a complex set of errors, race conditions, and API
      interactions that the caller of ExitBootServices must get correct.  The
      eboot code attempts allocations after calling ExitBootSerives which is
      not permitted per the spec.  Call the efi_exit_boot_services() helper
      intead, which handles the allocation scenario properly.
      Signed-off-by: NJeffrey Hugo <jhugo@codeaurora.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      d6493401
    • J
      efi/libstub: Allocate headspace in efi_get_memory_map() · dadb57ab
      Jeffrey Hugo 提交于
      efi_get_memory_map() allocates a buffer to store the memory map that it
      retrieves.  This buffer may need to be reused by the client after
      ExitBootServices() is called, at which point allocations are not longer
      permitted.  To support this usecase, provide the allocated buffer size back
      to the client, and allocate some additional headroom to account for any
      reasonable growth in the map that is likely to happen between the call to
      efi_get_memory_map() and the client reusing the buffer.
      Signed-off-by: NJeffrey Hugo <jhugo@codeaurora.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      dadb57ab
  12. 19 7月, 2016 1 次提交
    • A
      Kbuild: arch: look for generated headers in obtree · 58ab5e0c
      Arnd Bergmann 提交于
      There are very few files that need add an -I$(obj) gcc for the preprocessor
      or the assembler. For C files, we add always these for both the objtree and
      srctree, but for the other ones we require the Makefile to add them, and
      Kbuild then adds it for both trees.
      
      As a preparation for changing the meaning of the -I$(obj) directive to
      only refer to the srctree, this changes the two instances in arch/x86 to use
      an explictit $(objtree) prefix where needed, otherwise we won't find the
      headers any more, as reported by the kbuild 0day builder.
      
      arch/x86/realmode/rm/realmode.lds.S:75:20: fatal error: pasyms.h: No such file or directory
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMichal Marek <mmarek@suse.com>
      58ab5e0c
  13. 13 7月, 2016 1 次提交
    • D
      x86/mm: Disallow running with 32-bit PTEs to work around erratum · e4a84be6
      Dave Hansen 提交于
      The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
      Landing) has an erratum where a processor thread setting the Accessed
      or Dirty bits may not do so atomically against its checks for the
      Present bit.  This may cause a thread (which is about to page fault)
      to set A and/or D, even though the Present bit had already been
      atomically cleared.
      
      These bits are truly "stray".  In the case of the Dirty bit, the
      thread associated with the stray set was *not* allowed to write to
      the page.  This means that we do not have to launder the bit(s); we
      can simply ignore them.
      
      If the PTE is used for storing a swap index or a NUMA migration index,
      the A bit could be misinterpreted as part of the swap type.  The stray
      bits being set cause a software-cleared PTE to be interpreted as a
      swap entry.  In some cases (like when the swap index ends up being
      for a non-existent swapfile), the kernel detects the stray value
      and WARN()s about it, but there is no guarantee that the kernel can
      always detect it.
      
      When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able
      to move the swap PTE format around to avoid these troublesome bits.
      But, 32-bit non-PAE is tight on bits.  So, disallow it from running
      on this hardware.  I can't imagine anyone wanting to run 32-bit
      non-highmem kernels on this hardware, but disallowing them from
      running entirely is surely the safe thing to do.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4a84be6
  14. 08 7月, 2016 3 次提交
    • T
      x86/mm: Enable KASLR for physical mapping memory regions · 021182e5
      Thomas Garnier 提交于
      Add the physical mapping in the list of randomized memory regions.
      
      The physical memory mapping holds most allocations from boot and heap
      allocators. Knowing the base address and physical memory size, an attacker
      can deduce the PDE virtual address for the vDSO memory page. This attack
      was demonstrated at CanSecWest 2016, in the following presentation:
      
        "Getting Physical: Extreme Abuse of Intel Based Paged Systems":
        https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf
      
      (See second part of the presentation).
      
      The exploits used against Linux worked successfully against 4.6+ but
      fail with KASLR memory enabled:
      
        https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits
      
      Similar research was done at Google leading to this patch proposal.
      
      Variants exists to overwrite /proc or /sys objects ACLs leading to
      elevation of privileges. These variants were tested against 4.6+.
      
      The page offset used by the compressed kernel retains the static value
      since it is not yet randomized during this boot stage.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Alexander Popov <alpopov@ptsecurity.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lv Zheng <lv.zheng@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      021182e5
    • T
      x86/mm: Refactor KASLR entropy functions · d899a7d1
      Thomas Garnier 提交于
      Move the KASLR entropy functions into arch/x86/lib to be used in early
      kernel boot for KASLR memory randomization.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Alexander Popov <alpopov@ptsecurity.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lv Zheng <lv.zheng@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466556426-32664-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d899a7d1
    • B
      x86/KASLR: Fix boot crash with certain memory configurations · 6daa2ec0
      Baoquan He 提交于
      Ye Xiaolong reported this boot crash:
      
      |
      |  XZ-compressed data is corrupt
      |
      |   -- System halted
      |
      
      Fix the bug in mem_avoid_overlap() of finding the earliest overlap.
      Reported-and-tested-by: NYe Xiaolong <xiaolong.ye@intel.com>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6daa2ec0
  15. 27 6月, 2016 1 次提交
  16. 26 6月, 2016 6 次提交
    • Y
      x86/KASLR: Allow randomization below the load address · e066cc47
      Yinghai Lu 提交于
      Currently the kernel image physical address randomization's lower
      boundary is the original kernel load address.
      
      For bootloaders that load kernels into very high memory (e.g. kexec),
      this means randomization takes place in a very small window at the
      top of memory, ignoring the large region of physical memory below
      the load address.
      
      Since mem_avoid[] is already correctly tracking the regions that must be
      avoided, this patch changes the minimum address to whatever is less:
      512M (to conservatively avoid unknown things in lower memory) or the
      load address. Now, for example, if the kernel is loaded at 8G, [512M,
      8G) will be added to the list of possible physical memory positions.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      [ Rewrote the changelog, refactored the code to use min(). ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1464216334-17200-6-git-send-email-keescook@chromium.org
      [ Edited the changelog some more, plus the code comment as well. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e066cc47
    • K
      x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G · ed9f007e
      Kees Cook 提交于
      We want the physical address to be randomized anywhere between
      16MB and the top of physical memory (up to 64TB).
      
      This patch exchanges the prior slots[] array for the new slot_areas[]
      array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical
      address offset for 64-bit. As before, process_e820_entry() walks
      memory and populates slot_areas[], splitting on any detected mem_avoid
      collisions.
      
      Finally, since the slots[] array and its associated functions are not
      needed any more, so they are removed.
      
      Based on earlier patches by Baoquan He.
      
      Originally-from: Baoquan He <bhe@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-5-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ed9f007e
    • B
      x86/KASLR: Randomize virtual address separately · 8391c73c
      Baoquan He 提交于
      The current KASLR implementation randomizes the physical and virtual
      addresses of the kernel together (both are offset by the same amount). It
      calculates the delta of the physical address where vmlinux was linked
      to load and where it is finally loaded. If the delta is not equal to 0
      (i.e. the kernel was relocated), relocation handling needs be done.
      
      On 64-bit, this patch randomizes both the physical address where kernel
      is decompressed and the virtual address where kernel text is mapped and
      will execute from. We now have two values being chosen, so the function
      arguments are reorganized to pass by pointer so they can be directly
      updated. Since relocation handling only depends on the virtual address,
      we must check the virtual delta, not the physical delta for processing
      kernel relocations. This also populates the page table for the new
      virtual address range. 32-bit does not support a separate virtual address,
      so it continues to use the physical offset for its virtual offset.
      
      Additionally updates the sanity checks done on the resulting kernel
      addresses since they are potentially separate now.
      
      [kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
      [kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8391c73c
    • K
      x86/KASLR: Clarify identity map interface · 11fdf97a
      Kees Cook 提交于
      This extracts the call to prepare_level4() into a top-level function
      that the user of the pagetable.c interface must call to initialize
      the new page tables. For clarity and to match the "finalize" function,
      it has been renamed to initialize_identity_maps(). This function also
      gains the initialization of mapping_info so we don't have to do it each
      time in add_identity_map().
      
      Additionally add copyright notice to the top, to make it clear that the
      bulk of the pagetable.c code was written by Yinghai, and that I just
      added bugs later. :)
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      11fdf97a
    • K
      x86/boot: Refuse to build with data relocations · 98f78525
      Kees Cook 提交于
      The compressed kernel is built with -fPIC/-fPIE so that it can run in any
      location a bootloader happens to put it. However, since ELF relocation
      processing is not happening (and all the relocation information has
      already been stripped at link time), none of the code can use data
      relocations (e.g. static assignments of pointers). This is already noted
      in a warning comment at the top of misc.c, but this adds an explicit
      check for the condition during the linking stage to block any such bugs
      from appearing.
      
      If this was in place with the earlier bug in pagetable.c, the build
      would fail like this:
      
        ...
          CC      arch/x86/boot/compressed/pagetable.o
          DATAREL arch/x86/boot/compressed/vmlinux
        error: arch/x86/boot/compressed/pagetable.o has data relocations!
        make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1
        ...
      
      A clean build shows:
      
        ...
          CC      arch/x86/boot/compressed/pagetable.o
          DATAREL arch/x86/boot/compressed/vmlinux
          LD      arch/x86/boot/compressed/vmlinux
        ...
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      98f78525
    • K
      x86/KASLR, x86/power: Remove x86 hibernation restrictions · 65fe935d
      Kees Cook 提交于
      With the following fix:
      
        70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel")
      
      ... there is no longer a problem with hibernation resuming a
      KASLR-booted kernel image, so remove the restriction.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linux PM list <linux-pm@vger.kernel.org>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      65fe935d
  17. 09 6月, 2016 2 次提交
  18. 08 6月, 2016 1 次提交
  19. 10 5月, 2016 3 次提交
    • K
      x86/KASLR: Clarify purpose of each get_random_long() · d2d3462f
      Kees Cook 提交于
      KASLR will be calling get_random_long() twice, but the debug output
      won't distinguishing between them. This patch adds a report on when it
      is fetching the physical vs virtual address. With this, once the virtual
      offset is separate, the report changes from:
      
       KASLR using RDTSC...
       KASLR using RDTSC...
      
      into:
      
       Physical KASLR using RDTSC...
       Virtual KASLR using RDTSC...
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462825332-10505-7-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d2d3462f
    • B
      x86/KASLR: Add virtual address choosing function · 071a7493
      Baoquan He 提交于
      To support randomizing the kernel virtual address separately from the
      physical address, this patch adds find_random_virt_addr() to choose
      a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
      Since this address is virtual, not physical, we can place the kernel
      anywhere in this region, as long as it is aligned and (in the case of
      kernel being larger than the slot size) placed with enough room to load
      the entire kernel image.
      
      For clarity and readability, find_random_addr() is renamed to
      find_random_phys_addr() and has "size" renamed to "image_size" to match
      find_random_virt_addr().
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Rewrote changelog, refactored slot calculation for readability. ]
      [ Renamed find_random_phys_addr() and size argument. ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462825332-10505-6-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      071a7493
    • K
      x86/KASLR: Return earliest overlap when avoiding regions · 06486d6c
      Kees Cook 提交于
      In preparation for being able to detect where to split up contiguous
      memory regions that overlap with memory regions to avoid, we need to
      pass back what the earliest overlapping region was. This modifies the
      overlap checker to return that information.
      
      Based on a separate mem_min_overlap() implementation by Baoquan He.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462825332-10505-5-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      06486d6c