1. 09 9月, 2016 2 次提交
  2. 19 7月, 2016 1 次提交
    • A
      Kbuild: arch: look for generated headers in obtree · 58ab5e0c
      Arnd Bergmann 提交于
      There are very few files that need add an -I$(obj) gcc for the preprocessor
      or the assembler. For C files, we add always these for both the objtree and
      srctree, but for the other ones we require the Makefile to add them, and
      Kbuild then adds it for both trees.
      
      As a preparation for changing the meaning of the -I$(obj) directive to
      only refer to the srctree, this changes the two instances in arch/x86 to use
      an explictit $(objtree) prefix where needed, otherwise we won't find the
      headers any more, as reported by the kbuild 0day builder.
      
      arch/x86/realmode/rm/realmode.lds.S:75:20: fatal error: pasyms.h: No such file or directory
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMichal Marek <mmarek@suse.com>
      58ab5e0c
  3. 13 7月, 2016 1 次提交
    • D
      x86/mm: Disallow running with 32-bit PTEs to work around erratum · e4a84be6
      Dave Hansen 提交于
      The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
      Landing) has an erratum where a processor thread setting the Accessed
      or Dirty bits may not do so atomically against its checks for the
      Present bit.  This may cause a thread (which is about to page fault)
      to set A and/or D, even though the Present bit had already been
      atomically cleared.
      
      These bits are truly "stray".  In the case of the Dirty bit, the
      thread associated with the stray set was *not* allowed to write to
      the page.  This means that we do not have to launder the bit(s); we
      can simply ignore them.
      
      If the PTE is used for storing a swap index or a NUMA migration index,
      the A bit could be misinterpreted as part of the swap type.  The stray
      bits being set cause a software-cleared PTE to be interpreted as a
      swap entry.  In some cases (like when the swap index ends up being
      for a non-existent swapfile), the kernel detects the stray value
      and WARN()s about it, but there is no guarantee that the kernel can
      always detect it.
      
      When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able
      to move the swap PTE format around to avoid these troublesome bits.
      But, 32-bit non-PAE is tight on bits.  So, disallow it from running
      on this hardware.  I can't imagine anyone wanting to run 32-bit
      non-highmem kernels on this hardware, but disallowing them from
      running entirely is surely the safe thing to do.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4a84be6
  4. 08 7月, 2016 3 次提交
    • T
      x86/mm: Enable KASLR for physical mapping memory regions · 021182e5
      Thomas Garnier 提交于
      Add the physical mapping in the list of randomized memory regions.
      
      The physical memory mapping holds most allocations from boot and heap
      allocators. Knowing the base address and physical memory size, an attacker
      can deduce the PDE virtual address for the vDSO memory page. This attack
      was demonstrated at CanSecWest 2016, in the following presentation:
      
        "Getting Physical: Extreme Abuse of Intel Based Paged Systems":
        https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf
      
      (See second part of the presentation).
      
      The exploits used against Linux worked successfully against 4.6+ but
      fail with KASLR memory enabled:
      
        https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits
      
      Similar research was done at Google leading to this patch proposal.
      
      Variants exists to overwrite /proc or /sys objects ACLs leading to
      elevation of privileges. These variants were tested against 4.6+.
      
      The page offset used by the compressed kernel retains the static value
      since it is not yet randomized during this boot stage.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Alexander Popov <alpopov@ptsecurity.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lv Zheng <lv.zheng@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      021182e5
    • T
      x86/mm: Refactor KASLR entropy functions · d899a7d1
      Thomas Garnier 提交于
      Move the KASLR entropy functions into arch/x86/lib to be used in early
      kernel boot for KASLR memory randomization.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Alexander Popov <alpopov@ptsecurity.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lv Zheng <lv.zheng@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466556426-32664-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d899a7d1
    • B
      x86/KASLR: Fix boot crash with certain memory configurations · 6daa2ec0
      Baoquan He 提交于
      Ye Xiaolong reported this boot crash:
      
      |
      |  XZ-compressed data is corrupt
      |
      |   -- System halted
      |
      
      Fix the bug in mem_avoid_overlap() of finding the earliest overlap.
      Reported-and-tested-by: NYe Xiaolong <xiaolong.ye@intel.com>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6daa2ec0
  5. 27 6月, 2016 1 次提交
  6. 26 6月, 2016 6 次提交
    • Y
      x86/KASLR: Allow randomization below the load address · e066cc47
      Yinghai Lu 提交于
      Currently the kernel image physical address randomization's lower
      boundary is the original kernel load address.
      
      For bootloaders that load kernels into very high memory (e.g. kexec),
      this means randomization takes place in a very small window at the
      top of memory, ignoring the large region of physical memory below
      the load address.
      
      Since mem_avoid[] is already correctly tracking the regions that must be
      avoided, this patch changes the minimum address to whatever is less:
      512M (to conservatively avoid unknown things in lower memory) or the
      load address. Now, for example, if the kernel is loaded at 8G, [512M,
      8G) will be added to the list of possible physical memory positions.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      [ Rewrote the changelog, refactored the code to use min(). ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1464216334-17200-6-git-send-email-keescook@chromium.org
      [ Edited the changelog some more, plus the code comment as well. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e066cc47
    • K
      x86/KASLR: Extend kernel image physical address randomization to addresses larger than 4G · ed9f007e
      Kees Cook 提交于
      We want the physical address to be randomized anywhere between
      16MB and the top of physical memory (up to 64TB).
      
      This patch exchanges the prior slots[] array for the new slot_areas[]
      array, and lifts the limitation of KERNEL_IMAGE_SIZE on the physical
      address offset for 64-bit. As before, process_e820_entry() walks
      memory and populates slot_areas[], splitting on any detected mem_avoid
      collisions.
      
      Finally, since the slots[] array and its associated functions are not
      needed any more, so they are removed.
      
      Based on earlier patches by Baoquan He.
      
      Originally-from: Baoquan He <bhe@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-5-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ed9f007e
    • B
      x86/KASLR: Randomize virtual address separately · 8391c73c
      Baoquan He 提交于
      The current KASLR implementation randomizes the physical and virtual
      addresses of the kernel together (both are offset by the same amount). It
      calculates the delta of the physical address where vmlinux was linked
      to load and where it is finally loaded. If the delta is not equal to 0
      (i.e. the kernel was relocated), relocation handling needs be done.
      
      On 64-bit, this patch randomizes both the physical address where kernel
      is decompressed and the virtual address where kernel text is mapped and
      will execute from. We now have two values being chosen, so the function
      arguments are reorganized to pass by pointer so they can be directly
      updated. Since relocation handling only depends on the virtual address,
      we must check the virtual delta, not the physical delta for processing
      kernel relocations. This also populates the page table for the new
      virtual address range. 32-bit does not support a separate virtual address,
      so it continues to use the physical offset for its virtual offset.
      
      Additionally updates the sanity checks done on the resulting kernel
      addresses since they are potentially separate now.
      
      [kees: rewrote changelog, limited virtual split to 64-bit only, update checks]
      [kees: fix CONFIG_RANDOMIZE_BASE=n boot failure]
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-4-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8391c73c
    • K
      x86/KASLR: Clarify identity map interface · 11fdf97a
      Kees Cook 提交于
      This extracts the call to prepare_level4() into a top-level function
      that the user of the pagetable.c interface must call to initialize
      the new page tables. For clarity and to match the "finalize" function,
      it has been renamed to initialize_identity_maps(). This function also
      gains the initialization of mapping_info so we don't have to do it each
      time in add_identity_map().
      
      Additionally add copyright notice to the top, to make it clear that the
      bulk of the pagetable.c code was written by Yinghai, and that I just
      added bugs later. :)
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      11fdf97a
    • K
      x86/boot: Refuse to build with data relocations · 98f78525
      Kees Cook 提交于
      The compressed kernel is built with -fPIC/-fPIE so that it can run in any
      location a bootloader happens to put it. However, since ELF relocation
      processing is not happening (and all the relocation information has
      already been stripped at link time), none of the code can use data
      relocations (e.g. static assignments of pointers). This is already noted
      in a warning comment at the top of misc.c, but this adds an explicit
      check for the condition during the linking stage to block any such bugs
      from appearing.
      
      If this was in place with the earlier bug in pagetable.c, the build
      would fail like this:
      
        ...
          CC      arch/x86/boot/compressed/pagetable.o
          DATAREL arch/x86/boot/compressed/vmlinux
        error: arch/x86/boot/compressed/pagetable.o has data relocations!
        make[2]: *** [arch/x86/boot/compressed/vmlinux] Error 1
        ...
      
      A clean build shows:
      
        ...
          CC      arch/x86/boot/compressed/pagetable.o
          DATAREL arch/x86/boot/compressed/vmlinux
          LD      arch/x86/boot/compressed/vmlinux
        ...
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1464216334-17200-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      98f78525
    • K
      x86/KASLR, x86/power: Remove x86 hibernation restrictions · 65fe935d
      Kees Cook 提交于
      With the following fix:
      
        70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel")
      
      ... there is no longer a problem with hibernation resuming a
      KASLR-booted kernel image, so remove the restriction.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linux PM list <linux-pm@vger.kernel.org>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      65fe935d
  7. 09 6月, 2016 2 次提交
  8. 08 6月, 2016 1 次提交
  9. 10 5月, 2016 7 次提交
    • K
      x86/KASLR: Clarify purpose of each get_random_long() · d2d3462f
      Kees Cook 提交于
      KASLR will be calling get_random_long() twice, but the debug output
      won't distinguishing between them. This patch adds a report on when it
      is fetching the physical vs virtual address. With this, once the virtual
      offset is separate, the report changes from:
      
       KASLR using RDTSC...
       KASLR using RDTSC...
      
      into:
      
       Physical KASLR using RDTSC...
       Virtual KASLR using RDTSC...
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462825332-10505-7-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d2d3462f
    • B
      x86/KASLR: Add virtual address choosing function · 071a7493
      Baoquan He 提交于
      To support randomizing the kernel virtual address separately from the
      physical address, this patch adds find_random_virt_addr() to choose
      a slot anywhere between LOAD_PHYSICAL_ADDR and KERNEL_IMAGE_SIZE.
      Since this address is virtual, not physical, we can place the kernel
      anywhere in this region, as long as it is aligned and (in the case of
      kernel being larger than the slot size) placed with enough room to load
      the entire kernel image.
      
      For clarity and readability, find_random_addr() is renamed to
      find_random_phys_addr() and has "size" renamed to "image_size" to match
      find_random_virt_addr().
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Rewrote changelog, refactored slot calculation for readability. ]
      [ Renamed find_random_phys_addr() and size argument. ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462825332-10505-6-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      071a7493
    • K
      x86/KASLR: Return earliest overlap when avoiding regions · 06486d6c
      Kees Cook 提交于
      In preparation for being able to detect where to split up contiguous
      memory regions that overlap with memory regions to avoid, we need to
      pass back what the earliest overlapping region was. This modifies the
      overlap checker to return that information.
      
      Based on a separate mem_min_overlap() implementation by Baoquan He.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462825332-10505-5-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      06486d6c
    • B
      x86/KASLR: Add 'struct slot_area' to manage random_addr slots · c401cf15
      Baoquan He 提交于
      In order to support KASLR moving the kernel anywhere in physical memory
      (which could be up to 64TB), we need to handle counting the potential
      randomization locations in a more efficient manner.
      
      In the worst case with 64TB, there could be roughly 32 * 1024 * 1024
      randomization slots if CONFIG_PHYSICAL_ALIGN is 0x1000000. Currently
      the starting address of candidate positions is stored into the slots[]
      array, one at a time. This method would cost too much memory and it's
      also very inefficient to get and save the slot information into the slot
      array one by one.
      
      This patch introduces 'struct slot_area' to manage each contiguous region
      of randomization slots. Each slot_area will contain the starting address
      and how many available slots are in this area. As with the original code,
      the slot_areas[] will avoid the mem_avoid[] regions.
      
      Since setup_data is a linked list, it could contain an unknown number
      of memory regions to be avoided, which could cause us to fragment
      the contiguous memory that the slot_area array is tracking. In normal
      operation this level of fragmentation will be extremely rare, but we
      choose a suitably large value (100) for the array. If setup_data forces
      the slot_area array to become highly fragmented and there are more
      slots available beyond the first 100 found, the rest will be ignored
      for KASLR selection.
      
      The function store_slot_info() is used to calculate the number of slots
      available in the passed-in memory region and stores it into slot_areas[]
      after adjusting for alignment and size requirements.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Rewrote changelog, squashed with new functions. ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462825332-10505-4-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c401cf15
    • K
      x86/boot: Add missing file header comments · cb18ef0d
      Kees Cook 提交于
      There were some files with missing header comments. Since they are
      included from both compressed and regular kernels, make note of that.
      Also corrects a typo in the mem_avoid comments.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462825332-10505-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb18ef0d
    • K
      x86/KASLR: Initialize mapping_info every time · 434a6c9f
      Kees Cook 提交于
      As it turns out, mapping_info DOES need to be initialized every
      time, because pgt_data address could be changed during kernel
      relocation. So it can not be build time assigned.
      
      Without this, page tables were not being corrected updated, which
      could cause reboots when a physical address beyond 2G was chosen.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462825332-10505-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      434a6c9f
    • B
      x86/boot: Comment what finalize_identity_maps() does · 36a39ac9
      Borislav Petkov 提交于
      So it is not really obvious that finalize_identity_maps() doesn't do any
      finalization but it *actually* writes CR3 with the ident PGD. Comment
      that at the call site.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: bhe@redhat.com
      Cc: dyoung@redhat.com
      Cc: jkosina@suse.cz
      Cc: linux-tip-commits@vger.kernel.org
      Cc: luto@kernel.org
      Cc: vgoyal@redhat.com
      Cc: yinghai@kernel.org
      Link: http://lkml.kernel.org/r/20160507100541.GA24613@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      36a39ac9
  10. 07 5月, 2016 3 次提交
    • K
      x86/KASLR: Build identity mappings on demand · 3a94707d
      Kees Cook 提交于
      Currently KASLR only supports relocation in a small physical range (from
      16M to 1G), due to using the initial kernel page table identity mapping.
      To support ranges above this, we need to have an identity mapping for the
      desired memory range before we can decompress (and later run) the kernel.
      
      32-bit kernels already have the needed identity mapping. This patch adds
      identity mappings for the needed memory ranges on 64-bit kernels. This
      happens in two possible boot paths:
      
      If loaded via startup_32(), we need to set up the needed identity map.
      
      If loaded from a 64-bit bootloader, the bootloader will have already
      set up an identity mapping, and we'll start via the compressed kernel's
      startup_64(). In this case, the bootloader's page tables need to be
      avoided while selecting the new uncompressed kernel location. If not,
      the decompressor could overwrite them during decompression.
      
      To accomplish this, we could walk the pagetable and find every page
      that is used, and add them to mem_avoid, but this needs extra code and
      will require increasing the size of the mem_avoid array.
      
      Instead, we can create a new set of page tables for our own identity
      mapping instead. The pages for the new page table will come from the
      _pagetable section of the compressed kernel, which means they are
      already contained by in mem_avoid array. To do this, we reuse the code
      from the uncompressed kernel's identity mapping routines.
      
      The _pgtable will be shared by both the 32-bit and 64-bit paths to reduce
      init_size, as now the compressed kernel's _rodata to _end will contribute
      to init_size.
      
      To handle the possible mappings, we need to increase the existing page
      table buffer size:
      
      When booting via startup_64(), we need to cover the old VO, params,
      cmdline and uncompressed kernel. In an extreme case we could have them
      all beyond the 512G boundary, which needs (2+2)*4 pages with 2M mappings.
      And we'll need 2 for first 2M for VGA RAM. One more is needed for level4.
      This gets us to 19 pages total.
      
      When booting via startup_32(), KASLR could move the uncompressed kernel
      above 4G, so we need to create extra identity mappings, which should only
      need (2+2) pages at most when it is beyond the 512G boundary. So 19
      pages is sufficient for this case as well.
      
      The resulting BOOT_*PGT_SIZE defines use the "_SIZE" suffix on their
      names to maintain logical consistency with the existing BOOT_HEAP_SIZE
      and BOOT_STACK_SIZE defines.
      
      This patch is based on earlier patches from Yinghai Lu and Baoquan He.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462572095-11754-4-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3a94707d
    • K
      x86/KASLR: Improve comments around the mem_avoid[] logic · ed09acde
      Kees Cook 提交于
      This attempts to improve the comments that describe how the memory
      range used for decompression is avoided. Additionally uses an enum
      instead of raw numbers for the mem_avoid[] indexing.
      Suggested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/20160506194459.GA16480@www.outflux.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ed09acde
    • B
      x86/boot: Simplify pointer casting in choose_random_location() · 549f90db
      Borislav Petkov 提交于
      Pass them down as 'unsigned long' directly and get rid of more casting and
      assignments.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: bhe@redhat.com
      Cc: dyoung@redhat.com
      Cc: linux-tip-commits@vger.kernel.org
      Cc: luto@kernel.org
      Cc: vgoyal@redhat.com
      Cc: yinghai@kernel.org
      Link: http://lkml.kernel.org/r/20160506115015.GI24044@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      549f90db
  11. 06 5月, 2016 2 次提交
    • Y
      x86/KASLR: Consolidate mem_avoid[] entries · 9dc1969c
      Yinghai Lu 提交于
      The mem_avoid[] array is used to track positions that should be avoided (like
      the compressed kernel, decompression code, etc) when selecting a memory
      position for the randomly relocated kernel. Since ZO is now at the end of
      the decompression buffer and the decompression code (and its heap and
      stack) are at the front, we can safely consolidate the decompression entry,
      the heap entry, and the stack entry. The boot_params memory, however, could
      be elsewhere, so it should be explicitly included.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Rwrote changelog, cleaned up code comments. ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462486436-3707-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9dc1969c
    • K
      x86/boot: Clean up pointer casting · 2bc1cd39
      Kees Cook 提交于
      Currently extract_kernel() defines the input and output buffer pointers
      as "unsigned char *" since that's effectively what they are. It passes
      these to the decompressor routine and to the ELF parser, which both
      logically deal with buffer pointers too. There is some casting ("unsigned
      long") done to validate the numerical value of the pointers, but it is
      relatively limited.
      
      However, choose_random_location() operates almost exclusively on the
      numerical representation of these pointers, so it ended up carrying
      a lot of "unsigned long" casts. With the future physical/virtual split
      these casts were going to multiply, so this attempts to solve the
      problem by doing all the casting in choose_random_location()'s entry
      and return instead of through-out the code. Adjusts argument names to
      be more meaningful, and changes one us of "choice" to "output" to make
      the future physical/virtual split more clear (i.e. "choice" should be
      strictly a function return value and not used as an intermediate).
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1462486436-3707-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2bc1cd39
  12. 03 5月, 2016 2 次提交
    • K
      x86/boot: Warn on future overlapping memcpy() use · 00ec2c37
      Kees Cook 提交于
      If an overlapping memcpy() is ever attempted, we should at least report
      it, in case it might lead to problems, so it could be changed to a
      memmove() call instead.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Lasse Collin <lasse.collin@tukaani.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1462229461-3370-3-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      00ec2c37
    • K
      x86/boot: Extract error reporting functions · dc425a6e
      Kees Cook 提交于
      Currently to use warn(), a caller would need to include misc.h. However,
      this means they would get the (unavailable during compressed boot)
      gcc built-in memcpy family of functions. But since string.c is defining
      these memcpy functions for use by misc.c, we end up in a weird circular
      dependency.
      
      To break this loop, move the error reporting functions outside of misc.c
      with their own header so that they can be independently included by
      other sources. Since the screen-writing routines use memmove(), keep the
      low-level *_putstr() functions in misc.c.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Lasse Collin <lasse.collin@tukaani.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1462229461-3370-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dc425a6e
  13. 29 4月, 2016 6 次提交
    • Y
      x86/boot: Correctly bounds-check relocations · 4abf061b
      Yinghai Lu 提交于
      Relocation handling performs bounds checking on the resulting calculated
      addresses. The existing code uses output_len (VO size plus relocs size) as
      the max address. This is not right since the max_addr check should stop at
      the end of VO and exclude bss, brk, etc, which follows.  The valid range
      should be VO [_text, __bss_start] in the loaded physical address space.
      
      This patch adds an export for __bss_start in voffset.h and uses it to
      set the correct limit for max_addr.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      [ Rewrote the changelog. ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1461888548-32439-7-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4abf061b
    • Y
      x86/KASLR: Clean up unused code from old 'run_size' and rename it to 'kernel_total_size' · 4d2d5424
      Yinghai Lu 提交于
      Since 'run_size' is now calculated in misc.c, the old script and associated
      argument passing is no longer needed. This patch removes them, and renames
      'run_size' to the more descriptive 'kernel_total_size'.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Rewrote the changelog, renamed 'run_size' to 'kernel_total_size' ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Junjie Mao <eternal.n08@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1461888548-32439-6-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4d2d5424
    • Y
      x86/boot: Fix "run_size" calculation · 67b66625
      Yinghai Lu 提交于
      Currently, the "run_size" variable holds the total kernel size
      (size of code plus brk and bss) and is calculated via the shell script
      arch/x86/tools/calc_run_size.sh. It gets the file offset and mem size
      of the .bss and .brk sections from the vmlinux, and adds them as follows:
      
        run_size = $(( $offsetA + $sizeA + $sizeB ))
      
      However, this is not correct (it is too large). To illustrate, here's
      a walk-through of the script's calculation, compared to the correct way
      to find it.
      
      First, offsetA is found as the starting address of the first .bss or
      .brk section seen in the ELF file. The sizeA and sizeB values are the
      respective section sizes.
      
       [bhe@x1 linux]$ objdump -h vmlinux
      
       vmlinux:     file format elf64-x86-64
      
       Sections:
       Idx Name    Size      VMA               LMA               File off  Algn
        27 .bss    00170000  ffffffff81ec8000  0000000001ec8000  012c8000  2**12
                   ALLOC
        28 .brk    00027000  ffffffff82038000  0000000002038000  012c8000  2**0
                   ALLOC
      
      Here, offsetA is 0x012c8000, with sizeA at 0x00170000 and sizeB at
      0x00027000. The resulting run_size is 0x145f000:
      
       0x012c8000 + 0x00170000 + 0x00027000 = 0x145f000
      
      However, if we instead examine the ELF LOAD program headers, we see a
      different picture.
      
       [bhe@x1 linux]$ readelf -l vmlinux
      
       Elf file type is EXEC (Executable file)
       Entry point 0x1000000
       There are 5 program headers, starting at offset 64
      
       Program Headers:
        Type        Offset             VirtAddr           PhysAddr
                    FileSiz            MemSiz              Flags  Align
        LOAD        0x0000000000200000 0xffffffff81000000 0x0000000001000000
                    0x0000000000b5e000 0x0000000000b5e000  R E    200000
        LOAD        0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
                    0x0000000000145000 0x0000000000145000  RW     200000
        LOAD        0x0000000001000000 0x0000000000000000 0x0000000001d45000
                    0x0000000000018158 0x0000000000018158  RW     200000
        LOAD        0x000000000115e000 0xffffffff81d5e000 0x0000000001d5e000
                    0x000000000016a000 0x0000000000301000  RWE    200000
        NOTE        0x000000000099bcac 0xffffffff8179bcac 0x000000000179bcac
                    0x00000000000001bc 0x00000000000001bc         4
      
       Section to Segment mapping:
        Segment Sections...
         00     .text .notes __ex_table .rodata __bug_table .pci_fixup .tracedata
                __ksymtab __ksymtab_gpl __ksymtab_strings __init_rodata __param
                __modver
         01     .data .vvar
         02     .data..percpu
         03     .init.text .init.data .x86_cpu_dev.init .parainstructions
                .altinstructions .altinstr_replacement .iommu_table .apicdrivers
                .exit.text .smp_locks .bss .brk
         04     .notes
      
      As mentioned, run_size needs to be the size of the running kernel
      including .bss and .brk. We can see from the Section/Segment mapping
      above that .bss and .brk are included in segment 03 (which corresponds
      to the final LOAD program header). To find the run_size, we calculate
      the end of the LOAD segment from its PhysAddr start (0x0000000001d5e000)
      and its MemSiz (0x0000000000301000), minus the physical load address of
      the kernel (the first LOAD segment's PhysAddr: 0x0000000001000000). The
      resulting run_size is 0x105f000:
      
       0x0000000001d5e000 + 0x0000000000301000 - 0x0000000001000000 = 0x105f000
      
      So, from this we can see that the existing run_size calculation is
      0x400000 too high. And, as it turns out, the correct run_size is
      actually equal to VO_end - VO_text, which is certainly easier to calculate.
      _end: 0xffffffff8205f000
      _text:0xffffffff81000000
      
       0xffffffff8205f000 - 0xffffffff81000000 = 0x105f000
      
      As a result, run_size is a simple constant, so we don't need to pass it
      around; we already have voffset.h for such things. We can share voffset.h
      between misc.c and header.S instead of getting run_size in other ways.
      This patch moves voffset.h creation code to boot/compressed/Makefile,
      and switches misc.c to use the VO_end - VO_text calculation for run_size.
      
      Dependence before:
      
       boot/header.S ==> boot/voffset.h ==> vmlinux
       boot/header.S ==> compressed/vmlinux ==> compressed/misc.c
      
      Dependence after:
      
       boot/header.S ==> compressed/vmlinux ==> compressed/misc.c ==> boot/voffset.h ==> vmlinux
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Rewrote the changelog. ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Junjie Mao <eternal.n08@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: lasse.collin@tukaani.org
      Fixes: e6023367 ("x86, kaslr: Prevent .bss from overlaping initrd")
      Link: http://lkml.kernel.org/r/1461888548-32439-5-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      67b66625
    • Y
      x86/boot: Calculate decompression size during boot not build · d607251b
      Yinghai Lu 提交于
      Currently z_extract_offset is calculated in boot/compressed/mkpiggy.c.
      This doesn't work well because mkpiggy.c doesn't know the details of the
      decompressor in use. As a result, it can only make an estimation, which
      has risks:
      
       - output + output_len (VO) could be much bigger than input + input_len
         (ZO). In this case, the decompressed kernel plus relocs could overwrite
         the decompression code while it is running.
      
       - The head code of ZO could be bigger than z_extract_offset. In this case
         an overwrite could happen when the head code is running to move ZO to
         the end of buffer. Though currently the size of the head code is very
         small it's still a potential risk. Since there is no rule to limit the
         size of the head code of ZO, it runs the risk of suddenly becoming a
         (hard to find) bug.
      
      Instead, this moves the z_extract_offset calculation into header.S, and
      makes adjustments to be sure that the above two cases can never happen,
      and further corrects the comments describing the calculations.
      
      Since we have (in the previous patch) made ZO always be located against
      the end of decompression buffer, z_extract_offset is only used here to
      calculate an appropriate buffer size (INIT_SIZE), and is not longer used
      elsewhere. As such, it can be removed from voffset.h.
      
      Additionally clean up #if/#else #define to improve readability.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Rewrote the changelog and comments. ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1461888548-32439-4-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d607251b
    • Y
      x86/boot: Move compressed kernel to the end of the decompression buffer · 974f221c
      Yinghai Lu 提交于
      This change makes later calculations about where the kernel is located
      easier to reason about. To better understand this change, we must first
      clarify what 'VO' and 'ZO' are. These values were introduced in commits
      by hpa:
      
        77d1a499 ("x86, boot: make symbols from the main vmlinux available")
        37ba7ab5 ("x86, boot: make kernel_alignment adjustable; new bzImage fields")
      
      Specifically:
      
      All names prefixed with 'VO_':
      
       - relate to the uncompressed kernel image
      
       - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)
      
      All names prefixed with 'ZO_':
      
       - relate to the bootable compressed kernel image (boot/compressed/vmlinux),
         which is composed of the following memory areas:
           - head text
           - compressed kernel (VO image and relocs table)
           - decompressor code
      
       - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)
      
      The 'INIT_SIZE' value is used to find the larger of the two image sizes:
      
       #define ZO_INIT_SIZE    (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
       #define VO_INIT_SIZE    (VO__end - VO__text)
      
       #if ZO_INIT_SIZE > VO_INIT_SIZE
       # define INIT_SIZE ZO_INIT_SIZE
       #else
       # define INIT_SIZE VO_INIT_SIZE
       #endif
      
      The current code uses extract_offset to decide where to position the
      copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
      currently includes the extract_offset.)
      
      Why does z_extract_offset exist? It's needed because we are trying to minimize
      the amount of RAM used for the whole act of creating an uncompressed, executable,
      properly relocation-linked kernel image in system memory. We do this so that
      kernels can be booted on even very small systems.
      
      To achieve the goal of minimal memory consumption we have implemented an in-place
      decompression strategy: instead of cleanly separating the VO and ZO images and
      also allocating some memory for the decompression code's runtime needs, we instead
      create this elaborate layout of memory buffers where the output (decompressed)
      stream, as it progresses, overlaps with and destroys the input (compressed)
      stream. This can only be done safely if the ZO image is placed to the end of the
      VO range, plus a certain amount of safety distance to make sure that when the last
      bytes of the VO range are decompressed, the compressed stream pointer is safely
      beyond the end of the VO range.
      
      z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
      the build process, at a point when we know the exact compressed and
      uncompressed size of the kernel images and can calculate this safe minimum
      offset value. (Note that the mkpiggy.c calculation is not perfect, because
      we don't know the decompressor used at that stage, so the z_extract_offset
      calculation is necessarily imprecise and is mostly based on gzip internals -
      we'll improve that in the next patch.)
      
      When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
      the copied ZO occupies the memory from extract_offset to the end of
      decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
      like this:
      
                                  |-----compressed kernel image------|
                                  V                                  V
      0                       extract_offset                      +INIT_SIZE
      |-----------|---------------|-------------------------|--------|
                  |               |                         |        |
                VO__text      startup_32 of ZO          VO__end    ZO__end
                  ^                                         ^
                  |-------uncompressed kernel image---------|
      
      When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
      left from end of ZO to the end of decompressing buffer, like below.
      
                                  |-compressed kernel image-|
                                  V                         V
      0                       extract_offset                      +INIT_SIZE
      |-----------|---------------|-------------------------|--------|
                  |               |                         |        |
                VO__text      startup_32 of ZO          ZO__end    VO__end
                  ^                                                  ^
                  |------------uncompressed kernel image-------------|
      
      To simplify calculations and avoid special cases, it is cleaner to
      always place the compressed kernel image in memory so that ZO__end
      is at the end of the decompression buffer, instead of placing t at
      the start of extract_offset as is currently done.
      
      This patch adds BP_init_size (which is the INIT_SIZE as passed in from
      the boot_params) into asm-offsets.c to make it visible to the assembly
      code.
      
      Then when moving the ZO, it calculates the starting position of
      the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
      will be at the end of the decompression buffer. To make the position
      calculation safe, the end of ZO is page aligned (and a comment is added
      to the existing VO alignment for good measure).
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      [ Rewrote changelog and comments. ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
      [ Rewrote the changelog some more. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      974f221c
    • B
      x86/KASLR: Handle kernel relocations above 2G correctly · 6f9af75f
      Baoquan He 提交于
      When processing the relocation table, the offset used to calculate the
      relocation is an 'int'. This is sufficient for calculating the physical
      address of the relocs entry on 32-bit systems and on 64-bit systems when
      the relocation is under 2G.
      
      To handle relocations above 2G (seen in situations like kexec, netboot, etc),
      this offset needs to be calculated using a 'long' to avoid wrapping and
      miscalculating the relocation.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      [ Rewrote the changelog. ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: lasse.collin@tukaani.org
      Link: http://lkml.kernel.org/r/1461888548-32439-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6f9af75f
  14. 28 4月, 2016 3 次提交