1. 01 3月, 2010 22 次提交
  2. 27 2月, 2010 1 次提交
    • R
      x86: Enable NMI on all cpus on UV · 78c06176
      Russ Anderson 提交于
      Enable NMI on all cpus in UV system and add an NMI handler
      to dump_stack on each cpu.
      
      By default on x86 all the cpus except the boot cpu have NMI
      masked off.  This patch enables NMI on all cpus in UV system
      and adds an NMI handler to dump_stack on each cpu.  This
      way if a system hangs we can NMI the machine and get a
      backtrace from all the cpus.
      
      Version 2: Use x86_platform driver mechanism for nmi init, per
                 Ingo's suggestion.
      
      Version 3: Clean up Ingo's nits.
      Signed-off-by: NRuss Anderson <rja@sgi.com>
      LKML-Reference: <20100226164912.GA24439@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78c06176
  3. 25 2月, 2010 1 次提交
    • I
      x86, mm: Allow highmem user page tables to be disabled at boot time · 14315592
      Ian Campbell 提交于
      Distros generally (I looked at Debian, RHEL5 and SLES11) seem to
      enable CONFIG_HIGHPTE for any x86 configuration which has highmem
      enabled. This means that the overhead applies even to machines which
      have a fairly modest amount of high memory and which therefore do not
      really benefit from allocating PTEs in high memory but still pay the
      price of the additional mapping operations.
      
      Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but
      no actual highptes being allocated there was a reduction in system
      time used from 59.737s to 55.9s.
      
      With CONFIG_HIGHPTE=y and highmem PTEs being allocated:
        Average Optimal load -j 4 Run (std deviation):
        Elapsed Time 175.396 (0.238914)
        User Time 515.983 (5.85019)
        System Time 59.737 (1.26727)
        Percent CPU 263.8 (71.6796)
        Context Switches 39989.7 (4672.64)
        Sleeps 42617.7 (246.307)
      
      With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated:
        Average Optimal load -j 4 Run (std deviation):
        Elapsed Time 174.278 (0.831968)
        User Time 515.659 (6.07012)
        System Time 55.9 (1.07799)
        Percent CPU 263.8 (71.266)
        Context Switches 39929.6 (4485.13)
        Sleeps 42583.7 (373.039)
      
      This patch allows the user to control the allocation of PTEs in
      highmem from the command line ("userpte=nohigh") but retains the
      status-quo as the default.
      
      It is possible that some simple heuristic could be developed which
      allows auto-tuning of this option however I don't have a sufficiently
      large machine available to me to perform any particularly meaningful
      experiments. We could probably handwave up an argument for a threshold
      at 16G of total RAM.
      
      Assuming 768M of lowmem we have 196608 potential lowmem PTE
      pages. Each page can map 2M of RAM in a PAE-enabled configuration,
      meaning a maximum of 384G of RAM could potentially be mapped using
      lowmem PTEs.
      
      Even allowing generous factor of 10 to account for other required
      lowmem allocations, generous slop to account for page sharing (which
      reduces the total amount of RAM mappable by a given number of PT
      pages) and other innacuracies in the estimations it would seem that
      even a 32G machine would not have a particularly pressing need for
      highmem PTEs. I think 32G could be considered to be at the upper bound
      of what might be sensible on a 32 bit machine (although I think in
      practice 64G is still supported).
      
      It's seems questionable if HIGHPTE is even a win for any amount of RAM
      you would sensibly run a 32 bit kernel on rather than going 64 bit.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      14315592
  4. 24 2月, 2010 1 次提交
  5. 20 2月, 2010 1 次提交
    • F
      hw-breakpoint: Keep track of dr7 local enable bits · 326264a0
      Frederic Weisbecker 提交于
      When the user enables breakpoints through dr7, he can choose
      between "local" or "global" enable bits but given how linux is
      implemented, both have the same effect.
      
      That said we don't keep track how the user enabled the breakpoints
      so when the user requests the dr7 value, we only translate the
      "enabled" status using the global enabled bits. It means that if
      the user enabled a breakpoint using the local enabled bit, reading
      back dr7 will set the global bit and clear the local one.
      
      Apps like Wine expect a full dr7 POKEUSER/PEEKUSER match for emulated
      softwares that implement old reverse engineering protection schemes.
      
      We fix that by keeping track of the whole dr7 value given by the user
      in the thread structure to drop this bug. We'll think about
      something more proper later.
      
      This fixes a 2.6.32 - 2.6.33-x ptrace regression.
      Reported-and-tested-by: NMichael Stefaniuc <mstefani@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NK.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Maneesh Soni <maneesh@linux.vnet.ibm.com>
      Cc: Alexandre Julliard <julliard@winehq.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
      326264a0
  6. 17 2月, 2010 4 次提交
  7. 16 2月, 2010 1 次提交
    • D
      x86, numa: Add fixed node size option for numa emulation · 8df5bb34
      David Rientjes 提交于
      numa=fake=N specifies the number of fake nodes, N, to partition the
      system into and then allocates them by interleaving over physical nodes.
      This requires knowledge of the system capacity when attempting to
      allocate nodes of a certain size: either very large nodes to benchmark
      scalability of code that operates on individual nodes, or very small
      nodes to find bugs in the VM.
      
      This patch introduces numa=fake=<size>[MG] so it is possible to specify
      the size of each node to allocate.  When used, nodes of the size
      specified will be allocated and interleaved over the set of physical
      nodes.
      
      FAKE_NODE_MIN_SIZE was also moved to the more-appropriate
      include/asm/numa_64.h.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1002151342510.26927@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8df5bb34
  8. 14 2月, 2010 2 次提交
    • J
      x86, cpu: Print AMD virtualization features in /proc/cpuinfo · 414bb144
      Joerg Roedel 提交于
      This patch adds code to cpu initialization path to detect
      the extended virtualization features of AMD cpus to show
      them in /proc/cpuinfo.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      LKML-Reference: <1260792521-15212-1-git-send-email-joerg.roedel@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      414bb144
    • A
      x86-64, rwsem: Avoid store forwarding hazard in __downgrade_write · 0d1622d7
      Avi Kivity 提交于
      The Intel Architecture Optimization Reference Manual states that a short
      load that follows a long store to the same object will suffer a store
      forwading penalty, particularly if the two accesses use different addresses.
      Trivially, a long load that follows a short store will also suffer a penalty.
      
      __downgrade_write() in rwsem incurs both penalties:  the increment operation
      will not be able to reuse a recently-loaded rwsem value, and its result will
      not be reused by any recently-following rwsem operation.
      
      A comment in the code states that this is because 64-bit immediates are
      special and expensive; but while they are slightly special (only a single
      instruction allows them), they aren't expensive: a test shows that two loops,
      one loading a 32-bit immediate and one loading a 64-bit immediate, both take
      1.5 cycles per iteration.
      
      Fix this by changing __downgrade_write to use the same add instruction on
      i386 and on x86_64, so that it uses the same operand size as all the other
      rwsem functions.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      LKML-Reference: <1266049992-17419-1-git-send-email-avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      0d1622d7
  9. 12 2月, 2010 1 次提交
    • S
      x86, ptrace: regset extensions to support xstate · 5b3efd50
      Suresh Siddha 提交于
      Add the xstate regset support which helps extend the kernel ptrace and the
      core-dump interfaces to support AVX state etc.
      
      This regset interface is designed to support all the future state that gets
      supported using xsave/xrstor infrastructure.
      
      Looking at the memory layout saved by "xsave", one can't say which state
      is represented in the memory layout. This is because if a particular state is
      in init state, in the xsave hdr it can be represented by bit '0'. And hence
      we can't really say by the xsave header wether a state is in init state or
      the state is not saved in the memory layout.
      
      And hence the xsave memory layout available through this regset
      interface uses SW usable bytes [464..511] to convey what state is represented
      in the memory layout.
      
      First 8 bytes of the sw_usable_bytes[464..467] will be set to OS enabled xstate
      mask(which is same as the 64bit mask returned by the xgetbv's xCR0).
      
      The note NT_X86_XSTATE represents the extended state information in the
      core file, using the above mentioned memory layout.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100211195614.802495327@sbs-t61.sc.intel.com>
      Signed-off-by: NHongjiu Lu <hjl.tools@gmail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5b3efd50
  10. 10 2月, 2010 1 次提交
  11. 06 2月, 2010 5 次提交