1. 27 7月, 2016 1 次提交
  2. 22 7月, 2016 2 次提交
    • A
      x86/boot: Simplify EBDA-vs-BIOS reservation logic · 6a79296c
      Andy Lutomirski 提交于
      Both the intent and the effect of reserve_bios_regions() is simple:
      reserve the range from the apparent BIOS start (suitably filtered)
      through 1MB and, if the EBDA start address is sensible, extend that
      reservation downward to cover the EBDA as well.
      
      The code is overcomplicated, though, and contains head-scratchers
      like:
      
      	if (ebda_start < BIOS_START_MIN)
      		ebda_start = BIOS_START_MAX;
      
      That snipped is trying to say "if ebda_start < BIOS_START_MIN,
      ignore it".
      
      Simplify it: reorder the code so that it makes sense.  This should
      have no functional effect under any circumstances.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Link: http://lkml.kernel.org/r/ef89c0c761be20ead8bd9a3275743e6259b6092a.1469135598.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6a79296c
    • D
      x86/fpu: Do not BUG_ON() in early FPU code · ec3ed4a2
      Dave Hansen 提交于
      I don't think it is really possible to have a system where CPUID
      enumerates support for XSAVE but that it does not have FP/SSE
      (they are "legacy" features and always present).
      
      But, I did manage to hit this case in qemu when I enabled its
      somewhat shaky XSAVE support.  The bummer is that the FPU is set
      up before we parse the command-line or have *any* console support
      including earlyprintk.  That turned what should have been an easy
      thing to debug in to a bit more of an odyssey.
      
      So a BUG() here is worthless.  All it does it guarantee that
      if/when we hit this case we have an empty console.  So, remove
      the BUG() and try to limp along by disabling XSAVE and trying to
      continue.  Add a comment on why we are doing this, and also add
      a common "out_disable" path for leaving fpu__init_system_xstate().
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160720194551.63BB2B58@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ec3ed4a2
  3. 21 7月, 2016 1 次提交
    • I
      x86/boot: Reorganize and clean up the BIOS area reservation code · edce2121
      Ingo Molnar 提交于
      So the reserve_ebda_region() code has accumulated a number of
      problems over the years that make it really difficult to read
      and understand:
      
      - The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily
        interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks...
      
      - 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other
        parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it
        super confusing to read.
      
      - It does not help at all that we have various memory range markers, half of which
        are 'start of range', half of which are 'end of range' - but this crucial
        property is not obvious in the naming at all ... gave me a headache trying to
        understand all this.
      
      - Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is
        obvious, all values here are addresses!), while it does not highlight that it's
        the _start_ of the EBDA region ...
      
      - 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value
        that is a pointer to a value, not a memory range address!
      
      - The function name itself is a misnomer: it says 'reserve_ebda_region()' while
        its main purpose is to reserve all the firmware ROM typically between 640K and
        1MB, while the 'EBDA' part is only a small part of that ...
      
      - Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this
        too should be about whether to reserve firmware areas in the paravirt case.
      
      - In fact thinking about this as 'end of RAM' is confusing: what this function
        *really* wants to reserve is firmware data and code areas! Once the thinking is
        inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure
        'reserved area' notion everything becomes a lot clearer.
      
      To improve all this rewrite the whole code (without changing the logic):
      
      - Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start'
        and propagate this concept through all the variable names and constants.
      
      	BIOS_RAM_SIZE_KB_PTR		// was: BIOS_LOWMEM_KILOBYTES
      
      	BIOS_START_MIN			// was: INSANE_CUTOFF
      
      	ebda_start			// was: ebda_addr
      	bios_start			// was: lowmem
      
      	BIOS_START_MAX			// was: LOWMEM_CAP
      
      - Then clean up the name of the function itself by renaming it
        to reserve_bios_regions() and renaming the ::ebda_search paravirt
        flag to ::reserve_bios_regions.
      
      - Fix up all the comments (fix typos), harmonize and simplify their
        formulation and remove comments that become unnecessary due to
        the much better naming all around.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      edce2121
  4. 15 7月, 2016 9 次提交
  5. 12 7月, 2016 3 次提交
  6. 11 7月, 2016 8 次提交
    • Y
      x86/fpu/xstate: Re-enable XSAVES · b8be15d5
      Yu-cheng Yu 提交于
      We did not handle XSAVES instructions correctly. There were issues in
      converting between standard and compacted format when interfacing with
      user-space. These issues have been corrected.
      
      Add a WARN_ONCE() to make it clear that XSAVES supervisor states are not
      yet implemented.
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <h.peter.anvin@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1468253937-40008-5-git-send-email-fenghua.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b8be15d5
    • Y
      x86/fpu/xstate: Fix fpstate_init() for XRSTORS · 35ac2d7b
      Yu-cheng Yu 提交于
      In XSAVES mode if fpstate_init() is used to initialize a
      task's extended state area, xsave.header.xcomp_bv[63] must
      be set. Otherwise, when the task is scheduled, a warning is
      triggered from copy_kernel_to_xregs().
      
      One such test case is: setting an invalid extended state
      through PTRACE. When xstateregs_set() rejects the syscall
      and re-initializes the task's extended state area. This triggers
      the warning mentioned above.
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <h.peter.anvin@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1468253937-40008-4-git-send-email-fenghua.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35ac2d7b
    • Y
      x86/fpu/xstate: Return NULL for disabled xstate component address · 5060b915
      Yu-cheng Yu 提交于
      It is an error to request a disabled XSAVE/XSAVES component address.
      For that case, make __raw_xsave_addr() return a NULL and issue a
      warning.
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <h.peter.anvin@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1468253937-40008-3-git-send-email-fenghua.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5060b915
    • Y
      x86/fpu/xstate: Fix __fpu_restore_sig() for XSAVES · 1fc2b67b
      Yu-cheng Yu 提交于
      When the kernel is using XSAVES compacted format, we cannot do
      __copy_from_user() from a signal frame, which has standard-format data.
      Fix it by using copyin_to_xsaves(), which converts between formats and
      filters out all supervisor states that we do not allow userspace to
      write.
      Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Cc: H. Peter Anvin <h.peter.anvin@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1468253937-40008-2-git-send-email-fenghua.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1fc2b67b
    • I
      Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86" · 44530d58
      Ingo Molnar 提交于
      This reverts commit 2c95afc1.
      
      Stephane reported the following regression:
      
       > Since Andi added:
       >
       > commit 2c95afc1
       > Author: Andi Kleen <ak@linux.intel.com>
       > Date:   Thu Jun 9 06:14:38 2016 -0700
       >
       >    perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
       >
       > $ perf stat -e ref-cycles ls
       >   <not counted> ....
       >
       > fails systematically because the ref-cycles is now used by the
       > watchdog and given this is a system-wide pinned event, it monopolizes
       > the fixed counter 2 which is the only counter able to measure this event.
      
      Since the next merge window is near, fix the regression for now
      by reverting the commit.
      Reported-by: NStephane Eranian <eranian@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      44530d58
    • L
      x86/quirks: Add early quirk to reset Apple AirPort card · abb2bafd
      Lukas Wunner 提交于
      The EFI firmware on Macs contains a full-fledged network stack for
      downloading OS X images from osrecovery.apple.com. Unfortunately
      on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
      wireless card on every boot and leaves it enabled even after
      ExitBootServices has been called. The card continues to assert its IRQ
      line, causing spurious interrupts if the IRQ is shared. It also corrupts
      memory by DMAing received packets, allowing for remote code execution
      over the air. This only stops when a driver is loaded for the wireless
      card, which may be never if the driver is not installed or blacklisted.
      
      The issue seems to be constrained to the Broadcom 4331. Chris Milsted
      has verified that the newer Broadcom 4360 built into the MacBookPro11,3
      (2013/2014) does not exhibit this behaviour. The chances that Apple will
      ever supply a firmware fix for the older machines appear to be zero.
      
      The solution is to reset the card on boot by writing to a reset bit in
      its mmio space. This must be done as an early quirk and not as a plain
      vanilla PCI quirk to successfully combat memory corruption by DMAed
      packets: Matthew Garrett found out in 2012 that the packets are written
      to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
      This type of memory is made available to the page allocator by
      efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
      subsys initcall level. In-between a time window would be open for memory
      corruption. Random crashes occurring in this time window and attributed
      to DMAed packets have indeed been observed in the wild by Chris
      Bainbridge.
      
      When Matthew Garrett analyzed the memory corruption issue in 2012, he
      sought to fix it with a grub quirk which transitions the card to D3hot:
      http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56
      
      This approach does not help users with other bootloaders and while it
      may prevent DMAed packets, it does not cure the spurious interrupts
      emanating from the card. Unfortunately the card's mmio space is
      inaccessible in D3hot, so to reset it, we have to undo the effect of
      Matthew's grub patch and transition the card back to D0.
      
      Note that the quirk takes a few shortcuts to reduce the amount of code:
      The size of BAR 0 and the location of the PM capability is identical
      on all affected machines and therefore hardcoded. Only the address of
      BAR 0 differs between models. Also, it is assumed that the BCMA core
      currently mapped is the 802.11 core. The EFI driver seems to always take
      care of this.
      
      Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
      towards finding the best solution to this problem.
      
      The following should be a comprehensive list of affected models:
          iMac13,1        2012  21.5"       [Root Port 00:1c.3 = 8086:1e16]
          iMac13,2        2012  27"         [Root Port 00:1c.3 = 8086:1e16]
          Macmini5,1      2011  i5 2.3 GHz  [Root Port 00:1c.1 = 8086:1c12]
          Macmini5,2      2011  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1c12]
          Macmini5,3      2011  i7 2.0 GHz  [Root Port 00:1c.1 = 8086:1c12]
          Macmini6,1      2012  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1e12]
          Macmini6,2      2012  i7 2.3 GHz  [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro8,1   2011  13"         [Root Port 00:1c.1 = 8086:1c12]
          MacBookPro8,2   2011  15"         [Root Port 00:1c.1 = 8086:1c12]
          MacBookPro8,3   2011  17"         [Root Port 00:1c.1 = 8086:1c12]
          MacBookPro9,1   2012  15"         [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro9,2   2012  13"         [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro10,1  2012  15"         [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro10,2  2012  13"         [Root Port 00:1c.1 = 8086:1e12]
      
      For posterity, spurious interrupts caused by the Broadcom 4331 wireless
      card resulted in splats like this (stacktrace omitted):
      
          irq 17: nobody cared (try booting with the "irqpoll" option)
          handlers:
          [<ffffffff81374370>] pcie_isr
          [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
          [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
          Disabling IRQ #17
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
      Tested-by: Konstantin Simanov <k.simanov@stlk.ru>        # [MacBookPro8,1]
      Tested-by: Lukas Wunner <lukas@wunner.de>                # [MacBookPro9,1]
      Tested-by: Bryan Paradis <bryan.paradis@gmail.com>       # [MacBookPro9,2]
      Tested-by: Andrew Worsley <amworsley@gmail.com>          # [MacBookPro10,1]
      Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Acked-by: NRafał Miłecki <zajec5@gmail.com>
      Acked-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris Milsted <cmilsted@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Michael Buesch <m@bues.ch>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: b43-dev@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: linux-wireless@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Apply nvidia_bugs quirk only on root bus
      Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Reintroduce scanning of secondary buses
      Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
      [ Did minor readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      abb2bafd
    • L
      x86/quirks: Reintroduce scanning of secondary buses · 850c3210
      Lukas Wunner 提交于
      We used to scan secondary buses until the following commit that
      was applied in 2009:
      
        8659c406 ("x86: only scan the root bus in early PCI quirks")
      
      which commit constrained early quirks to the root bus only. Its
      motivation was to prevent application of the nvidia_bugs quirk
      on secondary buses.
      
      We're about to add a quirk to reset the Broadcom 4331 wireless card on
      2011/2012 Macs, which is located on a secondary bus behind a PCIe root
      port. To facilitate that, reintroduce scanning of secondary buses.
      
      The commit message of 8659c406 notes that scanning only the root bus
      "saves quite some unnecessary scanning work". The algorithm used prior
      to 8659c406 was particularly time consuming because it scanned
      buses 0 to 31 brute force. To avoid lengthening boot time, employ a
      recursive strategy which only scans buses that are actually reachable
      from the root bus.
      
      Yinghai Lu pointed out that the secondary bus number read from a
      bridge's config space may be invalid, in particular a value of 0 would
      cause an infinite loop. The PCI core goes beyond that and recurses to a
      child bus only if its bus number is greater than the parent bus number
      (see pci_scan_bridge()). Since the root bus is numbered 0, this implies
      that secondary buses may not be 0. Do the same on early scanning.
      
      If this algorithm is found to significantly impact boot time or cause
      infinite loops on broken hardware, it would be possible to limit its
      recursion depth: The Broadcom 4331 quirk applies at depth 1, all others
      at depth 0, so the bus need not be scanned deeper than that for now. An
      alternative approach would be to revert to scanning only the root bus,
      and apply the Broadcom 4331 quirk to the root ports 8086:1c12, 8086:1e12
      and 8086:1e16. Apple always positioned the card behind either of these
      three ports. The quirk would then check presence of the card in slot 0
      below the root port and do its deed.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-pci@vger.kernel.org
      Link: http://lkml.kernel.org/r/f0daa70dac1a9b2483abdb31887173eb6ab77bdf.1465690253.git.lukas@wunner.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      850c3210
    • L
      x86/quirks: Apply nvidia_bugs quirk only on root bus · 447d29d1
      Lukas Wunner 提交于
      Since the following commit:
      
        8659c406 ("x86: only scan the root bus in early PCI quirks")
      
      ... early quirks are only applied to devices on the root bus.
      
      The motivation was to prevent application of the nvidia_bugs quirk on
      secondary buses.
      
      We're about to reintroduce scanning of secondary buses for a quirk to
      reset the Broadcom 4331 wireless card on 2011/2012 Macs. To prevent
      regressions, open code the requirement to apply nvidia_bugs only on the
      root bus.
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/4d5477c1d76b2f0387a780f2142bbcdd9fee869b.1465690253.git.lukas@wunner.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      447d29d1
  7. 10 7月, 2016 11 次提交
  8. 08 7月, 2016 5 次提交
    • T
      x86/mm: Enable KASLR for physical mapping memory regions · 021182e5
      Thomas Garnier 提交于
      Add the physical mapping in the list of randomized memory regions.
      
      The physical memory mapping holds most allocations from boot and heap
      allocators. Knowing the base address and physical memory size, an attacker
      can deduce the PDE virtual address for the vDSO memory page. This attack
      was demonstrated at CanSecWest 2016, in the following presentation:
      
        "Getting Physical: Extreme Abuse of Intel Based Paged Systems":
        https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/blob/master/Presentation/CanSec2016_Presentation.pdf
      
      (See second part of the presentation).
      
      The exploits used against Linux worked successfully against 4.6+ but
      fail with KASLR memory enabled:
      
        https://github.com/n3k/CansecWest2016_Getting_Physical_Extreme_Abuse_of_Intel_Based_Paging_Systems/tree/master/Demos/Linux/exploits
      
      Similar research was done at Google leading to this patch proposal.
      
      Variants exists to overwrite /proc or /sys objects ACLs leading to
      elevation of privileges. These variants were tested against 4.6+.
      
      The page offset used by the compressed kernel retains the static value
      since it is not yet randomized during this boot stage.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Alexander Popov <alpopov@ptsecurity.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lv Zheng <lv.zheng@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466556426-32664-7-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      021182e5
    • T
      x86/mm: Implement ASLR for kernel memory regions · 0483e1fa
      Thomas Garnier 提交于
      Randomizes the virtual address space of kernel memory regions for
      x86_64. This first patch adds the infrastructure and does not randomize
      any region. The following patches will randomize the physical memory
      mapping, vmalloc and vmemmap regions.
      
      This security feature mitigates exploits relying on predictable kernel
      addresses. These addresses can be used to disclose the kernel modules
      base addresses or corrupt specific structures to elevate privileges
      bypassing the current implementation of KASLR. This feature can be
      enabled with the CONFIG_RANDOMIZE_MEMORY option.
      
      The order of each memory region is not changed. The feature looks at the
      available space for the regions based on different configuration options
      and randomizes the base and space between each. The size of the physical
      memory mapping is the available physical memory. No performance impact
      was detected while testing the feature.
      
      Entropy is generated using the KASLR early boot functions now shared in
      the lib directory (originally written by Kees Cook). Randomization is
      done on PGD & PUD page table levels to increase possible addresses. The
      physical memory mapping code was adapted to support PUD level virtual
      addresses. This implementation on the best configuration provides 30,000
      possible virtual addresses in average for each memory region.  An
      additional low memory page is used to ensure each CPU can start with a
      PGD aligned virtual address (for realmode).
      
      x86/dump_pagetable was updated to correctly display each region.
      
      Updated documentation on x86_64 memory layout accordingly.
      
      Performance data, after all patches in the series:
      
      Kernbench shows almost no difference (-+ less than 1%):
      
      Before:
      
      Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
      User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
      (13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)
      
      After:
      
      Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
      User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
      (12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)
      
      Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):
      
      attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
      5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
      10,0.068,0.071 average,0.0677,0.0677
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Alexander Popov <alpopov@ptsecurity.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lv Zheng <lv.zheng@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0483e1fa
    • B
      x86/dumpstack: Add show_stack_regs() and use it · 81c2949f
      Borislav Petkov 提交于
      Add a helper to dump supplied pt_regs and use it in the MSR exception
      handling code to have precise stack traces pointing to the actual
      function causing the MSR access exception and not the stack frame of the
      exception handler itself.
      
      The new output looks like this:
      
       unchecked MSR access error: RDMSR from 0xdeadbeef at rIP: 0xffffffff8102ddb6 (early_init_intel+0x16/0x3a0)
        00000000756e6547 ffffffff81c03f68 ffffffff81dd0940 ffffffff81c03f10
        ffffffff81d42e65 0000000001000000 ffffffff81c03f58 ffffffff81d3e5a3
        0000800000000000 ffffffff81800080 ffffffffffffffff 0000000000000000
       Call Trace:
        [<ffffffff81d42e65>] early_cpu_init+0xe7/0x136
        [<ffffffff81d3e5a3>] setup_arch+0xa5/0x9df
        [<ffffffff81d38bb9>] start_kernel+0x9f/0x43a
        [<ffffffff81d38294>] x86_64_start_reservations+0x2f/0x31
        [<ffffffff81d383fe>] x86_64_start_kernel+0x168/0x176
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1467671487-10344-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81c2949f
    • A
      x86/dumpstack: Honor supplied @regs arg · ef16dd0c
      Andy Lutomirski 提交于
      The comment suggests that show_stack(NULL, NULL) should backtrace the
      current context, but the code doesn't match the comment. If regs are
      given, start the "Stack:" hexdump at regs->sp.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1467671487-10344-2-git-send-email-bp@alien8.de
      Link: http://lkml.kernel.org/r/efcd79bf4106d61f1cd258c2caa87f3a0618eeac.1466036668.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef16dd0c
    • B
      x86/mce: Fix mce_rdmsrl() warning message · 38c54ccb
      Borislav Petkov 提交于
      The MSR address we're dumping in there should be in hex, otherwise we
      get funsies like:
      
        [    0.016000] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:428 mce_rdmsrl+0xd9/0xe0
        [    0.016000] mce: Unable to read msr -1073733631!
      				       ^^^^^^^^^^^
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1467968983-4874-5-git-send-email-bp@alien8.de
      [ Fixed capitalization of 'MSR'. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      38c54ccb