1. 14 6月, 2017 1 次提交
  2. 04 4月, 2017 1 次提交
  3. 16 3月, 2017 1 次提交
    • T
      x86/mm: Adapt MODULES_END based on fixmap section size · f06bdd40
      Thomas Garnier 提交于
      This patch aligns MODULES_END to the beginning of the fixmap section.
      It optimizes the space available for both sections. The address is
      pre-computed based on the number of pages required by the fixmap
      section.
      
      It will allow GDT remapping in the fixmap section. The current
      MODULES_END static address does not provide enough space for the kernel
      to support a large number of processors.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Luis R . Rodriguez <mcgrof@kernel.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kvm@vger.kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-pm@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/20170314170508.100882-1-thgarnie@google.com
      [ Small build fix. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f06bdd40
  4. 26 10月, 2016 1 次提交
    • J
      x86/dumpstack: Remove raw stack dump · 0ee1dd9f
      Josh Poimboeuf 提交于
      For mostly historical reasons, the x86 oops dump shows the raw stack
      values:
      
        ...
        [registers]
        Stack:
         ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0
         ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321
         0000000000000002 0000000000000000 0000000000000000 0000000000000000
        Call Trace:
        ...
      
      This seems to be an artifact from long ago, and probably isn't needed
      anymore.  It generally just adds noise to the dump, and it can be
      actively harmful because it leaks kernel addresses.
      
      Linus says:
      
        "The stack dump actually goes back to forever, and it used to be
         useful back in 1992 or so. But it used to be useful mainly because
         stacks were simpler and we didn't have very good call traces anyway. I
         definitely remember having used them - I just do not remember having
         used them in the last ten+ years.
      
         Of course, it's still true that if you can trigger an oops, you've
         likely already lost the security game, but since the stack dump is so
         useless, let's aim to just remove it and make games like the above
         harder."
      
      This also removes the related 'kstack=' cmdline option and the
      'kstack_depth_to_print' sysctl.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0ee1dd9f
  5. 06 9月, 2016 1 次提交
  6. 04 8月, 2016 1 次提交
  7. 08 7月, 2016 1 次提交
    • T
      x86/mm: Implement ASLR for kernel memory regions · 0483e1fa
      Thomas Garnier 提交于
      Randomizes the virtual address space of kernel memory regions for
      x86_64. This first patch adds the infrastructure and does not randomize
      any region. The following patches will randomize the physical memory
      mapping, vmalloc and vmemmap regions.
      
      This security feature mitigates exploits relying on predictable kernel
      addresses. These addresses can be used to disclose the kernel modules
      base addresses or corrupt specific structures to elevate privileges
      bypassing the current implementation of KASLR. This feature can be
      enabled with the CONFIG_RANDOMIZE_MEMORY option.
      
      The order of each memory region is not changed. The feature looks at the
      available space for the regions based on different configuration options
      and randomizes the base and space between each. The size of the physical
      memory mapping is the available physical memory. No performance impact
      was detected while testing the feature.
      
      Entropy is generated using the KASLR early boot functions now shared in
      the lib directory (originally written by Kees Cook). Randomization is
      done on PGD & PUD page table levels to increase possible addresses. The
      physical memory mapping code was adapted to support PUD level virtual
      addresses. This implementation on the best configuration provides 30,000
      possible virtual addresses in average for each memory region.  An
      additional low memory page is used to ensure each CPU can start with a
      PGD aligned virtual address (for realmode).
      
      x86/dump_pagetable was updated to correctly display each region.
      
      Updated documentation on x86_64 memory layout accordingly.
      
      Performance data, after all patches in the series:
      
      Kernbench shows almost no difference (-+ less than 1%):
      
      Before:
      
      Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.63 (1.2695)
      User Time 1034.89 (1.18115) System Time 87.056 (0.456416) Percent CPU 1092.9
      (13.892) Context Switches 199805 (3455.33) Sleeps 97907.8 (900.636)
      
      After:
      
      Average Optimal load -j 12 Run (std deviation): Elapsed Time 102.489 (1.10636)
      User Time 1034.86 (1.36053) System Time 87.764 (0.49345) Percent CPU 1095
      (12.7715) Context Switches 199036 (4298.1) Sleeps 97681.6 (1031.11)
      
      Hackbench shows 0% difference on average (hackbench 90 repeated 10 times):
      
      attemp,before,after 1,0.076,0.069 2,0.072,0.069 3,0.066,0.066 4,0.066,0.068
      5,0.066,0.067 6,0.066,0.069 7,0.067,0.066 8,0.063,0.067 9,0.067,0.065
      10,0.068,0.071 average,0.0677,0.0677
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Alexander Popov <alpopov@ptsecurity.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lv Zheng <lv.zheng@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466556426-32664-6-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0483e1fa
  8. 01 7月, 2016 1 次提交
  9. 22 4月, 2016 1 次提交
  10. 18 2月, 2016 1 次提交
  11. 29 11月, 2015 1 次提交
    • M
      Documentation/x86: Update EFI memory region description · ff3d0a12
      Matt Fleming 提交于
      Make it clear that the EFI page tables are only available during
      EFI runtime calls since that subject has come up a fair numbers
      of times in the past.
      
      Additionally, add the EFI region start and end addresses to the
      table so that it's possible to see at a glance where they fall
      in relation to other regions.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1448658575-17029-7-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff3d0a12
  12. 07 6月, 2015 1 次提交
  13. 27 5月, 2015 1 次提交
  14. 14 2月, 2015 1 次提交
    • A
      x86_64: add KASan support · ef7f0d6a
      Andrey Ryabinin 提交于
      This patch adds arch specific code for kernel address sanitizer.
      
      16TB of virtual addressed used for shadow memory.  It's located in range
      [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
      stacks.
      
      At early stage we map whole shadow region with zero page.  Latter, after
      pages mapped to direct mapping address range we unmap zero pages from
      corresponding shadow (see kasan_map_shadow()) and allocate and map a real
      shadow memory reusing vmemmap_populate() function.
      
      Also replace __pa with __pa_nodebug before shadow initialized.  __pa with
      CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
      __phys_addr is instrumented, so __asan_load could be called before shadow
      area initialized.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jim Davis <jim.epost@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef7f0d6a
  15. 03 1月, 2015 1 次提交
    • A
      x86, entry: Switch stacks on a paranoid entry from userspace · 48e08d0f
      Andy Lutomirski 提交于
      This causes all non-NMI, non-double-fault kernel entries from
      userspace to run on the normal kernel stack.  Double-fault is
      exempt to minimize confusion if we double-fault directly from
      userspace due to a bad kernel stack.
      
      This is, suprisingly, simpler and shorter than the current code.  It
      removes the IMO rather frightening paranoid_userspace path, and it
      make sync_regs much simpler.
      
      There is no risk of stack overflow due to this change -- the kernel
      stack that we switch to is empty.
      
      This will also enable us to create non-atomic sections within
      machine checks from userspace, which will simplify memory failure
      handling.  It will also allow the upcoming fsgsbase code to be
      simplified, because it doesn't need to worry about usergs when
      scheduling in paranoid_exit, as that code no longer exists.
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      48e08d0f
  16. 19 9月, 2014 1 次提交
  17. 01 5月, 2014 1 次提交
    • H
      x86-64, espfix: Don't leak bits 31:16 of %esp returning to 16-bit stack · 3891a04a
      H. Peter Anvin 提交于
      The IRET instruction, when returning to a 16-bit segment, only
      restores the bottom 16 bits of the user space stack pointer.  This
      causes some 16-bit software to break, but it also leaks kernel state
      to user space.  We have a software workaround for that ("espfix") for
      the 32-bit kernel, but it relies on a nonzero stack segment base which
      is not available in 64-bit mode.
      
      In checkin:
      
          b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
      
      we "solved" this by forbidding 16-bit segments on 64-bit kernels, with
      the logic that 16-bit support is crippled on 64-bit kernels anyway (no
      V86 support), but it turns out that people are doing stuff like
      running old Win16 binaries under Wine and expect it to work.
      
      This works around this by creating percpu "ministacks", each of which
      is mapped 2^16 times 64K apart.  When we detect that the return SS is
      on the LDT, we copy the IRET frame to the ministack and use the
      relevant alias to return to userspace.  The ministacks are mapped
      readonly, so if IRET faults we promote #GP to #DF which is an IST
      vector and thus has its own stack; we then do the fixup in the #DF
      handler.
      
      (Making #GP an IST exception would make the msr_safe functions unsafe
      in NMI/MC context, and quite possibly have other effects.)
      
      Special thanks to:
      
      - Andy Lutomirski, for the suggestion of using very small stack slots
        and copy (as opposed to map) the IRET frame there, and for the
        suggestion to mark them readonly and let the fault promote to #DF.
      - Konrad Wilk for paravirt fixup and testing.
      - Borislav Petkov for testing help and useful comments.
      Reported-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andrew Lutomriski <amluto@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dirk Hohndel <dirk@hohndel.org>
      Cc: Arjan van de Ven <arjan.van.de.ven@intel.com>
      Cc: comex <comexk@gmail.com>
      Cc: Alexander van Heukelum <heukelum@fastmail.fm>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: <stable@vger.kernel.org> # consider after upstream merge
      3891a04a
  18. 02 12月, 2013 1 次提交
  19. 02 11月, 2013 1 次提交
    • B
      x86/efi: Runtime services virtual mapping · d2f7cbe7
      Borislav Petkov 提交于
      We map the EFI regions needed for runtime services non-contiguously,
      with preserved alignment on virtual addresses starting from -4G down
      for a total max space of 64G. This way, we provide for stable runtime
      services addresses across kernels so that a kexec'd kernel can still use
      them.
      
      Thus, they're mapped in a separate pagetable so that we don't pollute
      the kernel namespace.
      
      Add an efi= kernel command line parameter for passing miscellaneous
      options and chicken bits from the command line.
      
      While at it, add a chicken bit called "efi=old_map" which can be used as
      a fallback to the old runtime services mapping method in case there's
      some b0rkage with a particular EFI implementation (haha, it is hard to
      hold up the sarcasm here...).
      
      Also, add the UEFI RT VA space to Documentation/x86/x86_64/mm.txt.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      d2f7cbe7
  20. 09 7月, 2013 1 次提交
  21. 11 4月, 2013 1 次提交
  22. 03 4月, 2013 1 次提交
  23. 23 1月, 2013 1 次提交
  24. 28 9月, 2012 1 次提交
  25. 10 5月, 2011 1 次提交
  26. 23 3月, 2011 1 次提交
  27. 18 3月, 2011 1 次提交
  28. 29 6月, 2010 1 次提交
    • C
      x86: Always use irq stacks · 7974891d
      Christoph Hellwig 提交于
      IRQ stacks provide much better safety against unexpected stack use from
      interrupts, at the minimal downside of slightly higher memory usage.
      Enable irq stacks also for the default 8k stack on 32-bit kernels to
      minimize the problem of stack overflows through interrupt activity.
      
      This is what the 64-bit kernel and various other architectures already do.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      LKML-Reference: <20100628121554.GA6605@lst.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7974891d
  29. 16 2月, 2010 2 次提交
    • D
      x86, numa: Remove configurable node size support for numa emulation · ca2107c9
      David Rientjes 提交于
      Now that numa=fake=<size>[MG] is implemented, it is possible to remove
      configurable node size support.  The command-line parsing was already
      broken (numa=fake=*128, for example, would not work) and since fake nodes
      are now interleaved over physical nodes, this support is no longer
      required.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1002151343080.26927@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ca2107c9
    • D
      x86, numa: Add fixed node size option for numa emulation · 8df5bb34
      David Rientjes 提交于
      numa=fake=N specifies the number of fake nodes, N, to partition the
      system into and then allocates them by interleaving over physical nodes.
      This requires knowledge of the system capacity when attempting to
      allocate nodes of a certain size: either very large nodes to benchmark
      scalability of code that operates on individual nodes, or very small
      nodes to find bugs in the VM.
      
      This patch introduces numa=fake=<size>[MG] so it is possible to specify
      the size of each node to allocate.  When used, nodes of the size
      specified will be allocated and interleaved over the set of physical
      nodes.
      
      FAKE_NODE_MIN_SIZE was also moved to the more-appropriate
      include/asm/numa_64.h.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1002151342510.26927@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8df5bb34
  30. 11 6月, 2009 1 次提交
    • H
      x86, mce: Add boot options for corrected errors · 62fdac59
      Hidetoshi Seto 提交于
      This patch introduces three boot options (no_cmci, dont_log_ce
      and ignore_ce) to control handling for corrected errors.
      
      The "mce=no_cmci" boot option disables the CMCI feature.
      
      Since CMCI is a new feature so having boot controls to disable
      it will be a help if the hardware is misbehaving.
      
      The "mce=dont_log_ce" boot option disables logging for corrected
      errors. All reported corrected errors will be cleared silently.
      This option will be useful if you never care about corrected
      errors.
      
      The "mce=ignore_ce" boot option disables features for corrected
      errors, i.e. polling timer and cmci.  All corrected events are
      not cleared and kept in bank MSRs.
      
      Usually this disablement is not recommended, however it will be
      a help if there are some conflict with the BIOS or hardware
      monitoring applications etc., that clears corrected events in
      banks instead of OS.
      
      [ And trivial cleanup (space -> tab) for doc is included. ]
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      LKML-Reference: <4A30ACDF.5030408@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      62fdac59
  31. 04 6月, 2009 1 次提交
    • A
      x86, mce: switch x86 machine check handler to Monarch election. · 3c079792
      Andi Kleen 提交于
      On Intel platforms machine check exceptions are always broadcast to
      all CPUs.  This patch makes the machine check handler synchronize all
      these machine checks, elect a Monarch to handle the event and collect
      the worst event from all CPUs and then process it first.
      
      This has some advantages:
      
      - When there is a truly data corrupting error the system panics as
        quickly as possible. This improves containment of corrupted
        data and makes sure the corrupted data never hits stable storage.
      
      - The panics are synchronized and do not reenter the panic code
        on multiple CPUs (which currently does not handle this well).
      
      - All the errors are reported. Currently it often happens that
        another CPU happens to do the panic first, but reports useless
        information (empty machine check) because the real error
        happened on another CPU which came in later.
        This is a big advantage on Nehalem where the 8 threads per CPU
        lead to often the wrong CPU winning the race and dumping
        useless information on a machine check.  The problem also occurs
        in a less severe form on older CPUs.
      
      - The system can detect when no CPUs detected a machine check
        and shut down the system.  This can happen when one CPU is so
        badly hung that that it cannot process a machine check anymore
        or when some external agent wants to stop the system by
        asserting the machine check pin.  This follows Intel hardware
        recommendations.
      
      - This matches the recommended error model by the CPU designers.
      
      - The events can be output in true severity order
      
      - When a panic happens on another CPU it makes sure to be actually
        be able to process the stop IPI by enabling interrupts.
      
      The code is extremly careful to handle timeouts while waiting
      for other CPUs. It can't rely on the normal timing mechanisms
      (jiffies, ktime_get) because of its asynchronous/lockless nature,
      so it uses own timeouts using ndelay() and a "SPINUNIT"
      
      The timeout is configurable. By default it waits for upto one
      second for the other CPUs.  This can be also disabled.
      
      From some informal testing AMD systems do not see to broadcast
      machine checks, so right now it's always disabled by default on
      non Intel CPUs or also on very old Intel systems.
      
      Includes fixes from Ying Huang
      Fixed a "ecception" in a comment (H.Seto)
      Moved global_nwo reset later based on suggestion from H.Seto
      v2: Avoid duplicate messages
      
      [ Impact: feature, fixes long standing problems. ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      3c079792
  32. 29 5月, 2009 1 次提交
  33. 18 5月, 2009 1 次提交
    • Y
      mm, x86: remove MEMORY_HOTPLUG_RESERVE related code · 888a589f
      Yinghai Lu 提交于
      after:
      
       | commit b263295d
       | Author: Christoph Lameter <clameter@sgi.com>
       | Date:   Wed Jan 30 13:30:47 2008 +0100
       |
       |    x86: 64-bit, make sparsemem vmemmap the only memory model
      
      we don't have MEMORY_HOTPLUG_RESERVE anymore.
      
      Historically, x86-64 had an architecture-specific method for memory hotplug
      whereby it scanned the SRAT for physical memory ranges that could be
      potentially used for memory hot-add later. By reserving those ranges
      without physical memory, the memmap would be allocated and left dormant
      until needed. This depended on the DISCONTIG memory model which has been
      removed so the code implementing HOTPLUG_RESERVE is now dead.
      
      This patch removes the dead code used by MEMORY_HOTPLUG_RESERVE.
      
      (Changelog authored by Mel.)
      
      v2: updated changelog, and remove hotadd= in doc
      
      [ Impact: remove dead code ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Workflow-found-OK-by: NAndrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4A0C4910.7090508@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      888a589f
  34. 06 5月, 2009 2 次提交
  35. 30 3月, 2009 1 次提交
  36. 12 11月, 2008 1 次提交
  37. 03 11月, 2008 1 次提交
  38. 28 10月, 2008 1 次提交