1. 07 6月, 2015 1 次提交
  2. 02 12月, 2013 1 次提交
  3. 09 7月, 2013 1 次提交
  4. 11 4月, 2013 1 次提交
  5. 23 1月, 2013 1 次提交
  6. 28 9月, 2012 1 次提交
  7. 10 5月, 2011 1 次提交
  8. 23 3月, 2011 1 次提交
  9. 18 3月, 2011 1 次提交
  10. 16 2月, 2010 2 次提交
    • D
      x86, numa: Remove configurable node size support for numa emulation · ca2107c9
      David Rientjes 提交于
      Now that numa=fake=<size>[MG] is implemented, it is possible to remove
      configurable node size support.  The command-line parsing was already
      broken (numa=fake=*128, for example, would not work) and since fake nodes
      are now interleaved over physical nodes, this support is no longer
      required.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1002151343080.26927@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ca2107c9
    • D
      x86, numa: Add fixed node size option for numa emulation · 8df5bb34
      David Rientjes 提交于
      numa=fake=N specifies the number of fake nodes, N, to partition the
      system into and then allocates them by interleaving over physical nodes.
      This requires knowledge of the system capacity when attempting to
      allocate nodes of a certain size: either very large nodes to benchmark
      scalability of code that operates on individual nodes, or very small
      nodes to find bugs in the VM.
      
      This patch introduces numa=fake=<size>[MG] so it is possible to specify
      the size of each node to allocate.  When used, nodes of the size
      specified will be allocated and interleaved over the set of physical
      nodes.
      
      FAKE_NODE_MIN_SIZE was also moved to the more-appropriate
      include/asm/numa_64.h.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1002151342510.26927@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8df5bb34
  11. 11 6月, 2009 1 次提交
    • H
      x86, mce: Add boot options for corrected errors · 62fdac59
      Hidetoshi Seto 提交于
      This patch introduces three boot options (no_cmci, dont_log_ce
      and ignore_ce) to control handling for corrected errors.
      
      The "mce=no_cmci" boot option disables the CMCI feature.
      
      Since CMCI is a new feature so having boot controls to disable
      it will be a help if the hardware is misbehaving.
      
      The "mce=dont_log_ce" boot option disables logging for corrected
      errors. All reported corrected errors will be cleared silently.
      This option will be useful if you never care about corrected
      errors.
      
      The "mce=ignore_ce" boot option disables features for corrected
      errors, i.e. polling timer and cmci.  All corrected events are
      not cleared and kept in bank MSRs.
      
      Usually this disablement is not recommended, however it will be
      a help if there are some conflict with the BIOS or hardware
      monitoring applications etc., that clears corrected events in
      banks instead of OS.
      
      [ And trivial cleanup (space -> tab) for doc is included. ]
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      LKML-Reference: <4A30ACDF.5030408@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      62fdac59
  12. 04 6月, 2009 1 次提交
    • A
      x86, mce: switch x86 machine check handler to Monarch election. · 3c079792
      Andi Kleen 提交于
      On Intel platforms machine check exceptions are always broadcast to
      all CPUs.  This patch makes the machine check handler synchronize all
      these machine checks, elect a Monarch to handle the event and collect
      the worst event from all CPUs and then process it first.
      
      This has some advantages:
      
      - When there is a truly data corrupting error the system panics as
        quickly as possible. This improves containment of corrupted
        data and makes sure the corrupted data never hits stable storage.
      
      - The panics are synchronized and do not reenter the panic code
        on multiple CPUs (which currently does not handle this well).
      
      - All the errors are reported. Currently it often happens that
        another CPU happens to do the panic first, but reports useless
        information (empty machine check) because the real error
        happened on another CPU which came in later.
        This is a big advantage on Nehalem where the 8 threads per CPU
        lead to often the wrong CPU winning the race and dumping
        useless information on a machine check.  The problem also occurs
        in a less severe form on older CPUs.
      
      - The system can detect when no CPUs detected a machine check
        and shut down the system.  This can happen when one CPU is so
        badly hung that that it cannot process a machine check anymore
        or when some external agent wants to stop the system by
        asserting the machine check pin.  This follows Intel hardware
        recommendations.
      
      - This matches the recommended error model by the CPU designers.
      
      - The events can be output in true severity order
      
      - When a panic happens on another CPU it makes sure to be actually
        be able to process the stop IPI by enabling interrupts.
      
      The code is extremly careful to handle timeouts while waiting
      for other CPUs. It can't rely on the normal timing mechanisms
      (jiffies, ktime_get) because of its asynchronous/lockless nature,
      so it uses own timeouts using ndelay() and a "SPINUNIT"
      
      The timeout is configurable. By default it waits for upto one
      second for the other CPUs.  This can be also disabled.
      
      From some informal testing AMD systems do not see to broadcast
      machine checks, so right now it's always disabled by default on
      non Intel CPUs or also on very old Intel systems.
      
      Includes fixes from Ying Huang
      Fixed a "ecception" in a comment (H.Seto)
      Moved global_nwo reset later based on suggestion from H.Seto
      v2: Avoid duplicate messages
      
      [ Impact: feature, fixes long standing problems. ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      3c079792
  13. 29 5月, 2009 1 次提交
  14. 18 5月, 2009 1 次提交
    • Y
      mm, x86: remove MEMORY_HOTPLUG_RESERVE related code · 888a589f
      Yinghai Lu 提交于
      after:
      
       | commit b263295d
       | Author: Christoph Lameter <clameter@sgi.com>
       | Date:   Wed Jan 30 13:30:47 2008 +0100
       |
       |    x86: 64-bit, make sparsemem vmemmap the only memory model
      
      we don't have MEMORY_HOTPLUG_RESERVE anymore.
      
      Historically, x86-64 had an architecture-specific method for memory hotplug
      whereby it scanned the SRAT for physical memory ranges that could be
      potentially used for memory hot-add later. By reserving those ranges
      without physical memory, the memmap would be allocated and left dormant
      until needed. This depended on the DISCONTIG memory model which has been
      removed so the code implementing HOTPLUG_RESERVE is now dead.
      
      This patch removes the dead code used by MEMORY_HOTPLUG_RESERVE.
      
      (Changelog authored by Mel.)
      
      v2: updated changelog, and remove hotadd= in doc
      
      [ Impact: remove dead code ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Workflow-found-OK-by: NAndrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4A0C4910.7090508@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      888a589f
  15. 03 11月, 2008 1 次提交
  16. 28 10月, 2008 1 次提交
  17. 23 9月, 2008 1 次提交
  18. 19 9月, 2008 1 次提交
  19. 29 8月, 2008 1 次提交
  20. 31 5月, 2008 1 次提交
    • H
      x86: move x86-specific documentation into Documentation/x86 · 23deb068
      H. Peter Anvin 提交于
      The current organization of the x86 documentation makes it appear as
      if the "i386" documentation doesn't apply to x86-64, which is does.
      Thus, move that documentation into Documentation/x86, and move the
      x86-64-specific stuff into Documentation/x86/x86_64 with the eventual
      goal to move stuff that isn't actually 64-bit specific back into
      Documentation/x86.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      23deb068
  21. 17 4月, 2008 1 次提交
    • I
      x86: add gbpages switches · 00d1c5e0
      Ingo Molnar 提交于
      These new controls toggle experimental support for a new CPU feature,
      the straightforward extension of largepages from the pmd level to the
      pud level, which allows 1GB (kernel) TLBs instead of 2MB TLBs.
      
      Turn it off by default, as this code has not been tested well enough yet.
      
      Use the CONFIG_DIRECT_GBPAGES=y .config option or gbpages on the
      boot line can be used to enable it. If enabled in the .config then
      nogbpages boot option disables it.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      00d1c5e0
  22. 30 1月, 2008 3 次提交
  23. 22 7月, 2007 2 次提交
    • T
      x86_64: mcelog tolerant level cleanup · bd78432c
      Tim Hockin 提交于
      Background:
       The MCE handler has several paths that it can take, depending on various
       conditions of the MCE status and the value of the 'tolerant' knob.  The
       exact semantics are not well defined and the code is a bit twisty.
      
      Description:
       This patch makes the MCE handler's behavior more clear by documenting the
       behavior for various 'tolerant' levels.  It also fixes or enhances
       several small things in the handler.  Specifically:
           * If RIPV is set it is not safe to restart, so set the 'no way out'
             flag rather than the 'kill it' flag.
           * Don't panic() on correctable MCEs.
           * If the _OVER bit is set *and* the _UC bit is set (meaning possibly
             dropped uncorrected errors), set the 'no way out' flag.
           * Use EIPV for testing whether an app can be killed (SIGBUS) rather
             than RIPV.  According to docs, EIPV indicates that the error is
             related to the IP, while RIPV simply means the IP is valid to
             restart from.
           * Don't clear the MCi_STATUS registers until after the panic() path.
             This leaves the status bits set after the panic() so clever BIOSes
             can find them (and dumb BIOSes can do nothing).
      
       This patch also calls nonseekable_open() in mce_open (as suggested by akpm).
      
      Result:
       Tolerant levels behave almost identically to how they always have, but
       not it's well defined.  There's a slightly higher chance of panic()ing
       when multiple errors happen (a good thing, IMHO).  If you take an MBE and
       panic(), the error status bits are not cleared.
      
      Alternatives:
       None.
      
      Testing:
       I used software to inject correctable and uncorrectable errors.  With
       tolerant = 3, the system usually survives.  With tolerant = 2, the system
       usually panic()s (PCC) but not always.  With tolerant = 1, the system
       always panic()s.  When the system panic()s, the BIOS is able to detect
       that the cause of death was an MC4.  I was not able to reproduce the
       case of a non-PCC error in userspace, with EIPV, with (tolerant < 3).
       That will be rare at best.
      Signed-off-by: NTim Hockin <thockin@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd78432c
    • J
      x86_64: remove unused variable maxcpus · d567b6a9
      Jan Beulich 提交于
      .. and adjust documentation to properly reflect options that are
      x86-64 specific.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d567b6a9
  24. 03 5月, 2007 3 次提交
    • D
      [PATCH] x86-64: fixed size remaining fake nodes · 382591d5
      David Rientjes 提交于
      Extends the numa=fake x86_64 command-line option to split the remaining system
      memory into nodes of fixed size.  Any leftover memory is allocated to a final
      node unless the command-line ends with a comma.
      
      For example:
        numa=fake=2*512,*128	gives two 512M nodes and the remaining system
      			memory is split into nodes of 128M each.
      
      This is beneficial for systems where the exact size of RAM is unknown or not
      necessarily relevant, but the size of the remaining nodes to be allocated is
      known based on their capacity for resource management.
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      382591d5
    • D
      [PATCH] x86-64: split remaining fake nodes equally · 14694d73
      David Rientjes 提交于
      Extends the numa=fake x86_64 command-line option to split the remaining
      system memory into equal-sized nodes.
      
      For example:
      numa=fake=2*512,4*	gives two 512M nodes and the remaining system
      			memory is split into four approximately equal
      			chunks.
      
      This is beneficial for systems where the exact size of RAM is unknown or not
      necessarily relevant, but the granularity with which nodes shall be allocated
      is known.
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      14694d73
    • D
      [PATCH] x86-64: configurable fake numa node sizes · 8b8ca80e
      David Rientjes 提交于
      Extends the numa=fake x86_64 command-line option to allow for configurable
      node sizes.  These nodes can be used in conjunction with cpusets for coarse
      memory resource management.
      
      The old command-line option is still supported:
        numa=fake=32	gives 32 fake NUMA nodes, ignoring the NUMA setup of the
      		actual machine.
      
      But now you may configure your system for the node sizes of your choice:
        numa=fake=2*512,1024,2*256
      		gives two 512M nodes, one 1024M node, two 256M nodes, and
      		the rest of system memory to a sixth node.
      
      The existing hash function is maintained to support the various node sizes
      that are possible with this implementation.
      
      Each node of the same size receives roughly the same amount of available
      pages, regardless of any reserved memory with its address range.  The total
      available pages on the system is calculated and divided by the number of equal
      nodes to allocate.  These nodes are then dynamically allocated and their
      borders extended until such time as their number of available pages reaches
      the required size.
      
      Configurable node sizes are recommended when used in conjunction with cpusets
      for memory control because it eliminates the overhead associated with scanning
      the zonelists of many smaller full nodes on page_alloc().
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      8b8ca80e
  25. 24 4月, 2007 1 次提交
    • A
      [PATCH] x86: Remove noreplacement option · 9ce883be
      Andi Kleen 提交于
      noreplacement is dangerous on modern systems because it will not replace the
      context switch FNSAVE with SSE aware FXSAVE. But other places in the kernel still assume
      SSE and do FXSAVE and the CPU will then access FXSAVE information with
      FNSAVE and cause corruption.
      
      Easiest way to avoid this is to remove the option. It was mostly for paranoia
      reasons anyways and alternative()s have been stable for some time.
      
      Thanks to Jeremy F. for reporting and helping debug it.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      9ce883be
  26. 13 2月, 2007 2 次提交
  27. 09 1月, 2007 1 次提交
  28. 07 12月, 2006 2 次提交
  29. 04 10月, 2006 1 次提交
  30. 30 9月, 2006 2 次提交
  31. 26 9月, 2006 1 次提交