1. 20 5月, 2010 2 次提交
    • H
      ACPI, APEI, Use ERST for persistent storage of MCE · 482908b4
      Huang Ying 提交于
      Traditionally, fatal MCE will cause Linux print error log to console
      then reboot. Because MCE registers will preserve their content after
      warm reboot, the hardware error can be logged to disk or network after
      reboot. But system may fail to warm reboot, then you may lose the
      hardware error log. ERST can help here. Through saving the hardware
      error log into flash via ERST before go panic, the hardware error log
      can be gotten from the flash after system boot successful again.
      
      The fatal MCE processing procedure with ERST involved is as follow:
      
      - Hardware detect error, MCE raised
      - MCE read MCE registers, check error severity (fatal), prepare error record
      - Write MCE error record into flash via ERST
      - Go panic, then trigger system reboot
      - System reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash
        for error record of previous boot via ERST, and output and clear
        them if available
      - /sbin/mcelog logs error records into disk or network
      
      ERST only accepts CPER record format, but there is no pre-defined CPER
      section can accommodate all information in struct mce, so a customized
      section type is defined to hold struct mce inside a CPER record as an
      error section.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      482908b4
    • H
      ACPI, APEI, Generic Hardware Error Source memory error support · d334a491
      Huang Ying 提交于
      Generic Hardware Error Source provides a way to report platform
      hardware errors (such as that from chipset). It works in so called
      "Firmware First" mode, that is, hardware errors are reported to
      firmware firstly, then reported to Linux by firmware. This way, some
      non-standard hardware error registers or non-standard hardware link
      can be checked by firmware to produce more valuable hardware error
      information for Linux.
      
      Now, only SCI notification type and memory errors are supported. More
      notification type and hardware error type will be added later. These
      memory errors are reported to user space through /dev/mcelog via
      faking a corrected Machine Check, so that the error memory page can be
      offlined by /sbin/mcelog if the error count for one page is beyond the
      threshold.
      
      On some machines, Machine Check can not report physical address for
      some corrected memory errors, but GHES can do that. So this simplified
      GHES is implemented firstly.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      d334a491
  2. 15 5月, 2010 1 次提交
    • F
      x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments · 7f284d3c
      Frank Arnold 提交于
      When running a quest kernel on xen we get:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
      IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
      PGD 0
      Oops: 0000 [#1] SMP
      last sysfs file:
      CPU 0
      Modules linked in:
      
      Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
      RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
      2ca/0x3df
      RSP: 0018:ffff880002203e08  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
      RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
      RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
      R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
      R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
      FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
      Stack:
       ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
      <0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
      <0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
      Call Trace:
       <IRQ>
       [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
       [<ffffffff81059140>] ? mod_timer+0x23/0x25
       [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
       [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
       [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
       [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
       [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
       [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
       [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
       [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
       [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
       <EOI>
       [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
       [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
       [<ffffffff810114eb>] ? default_idle+0x36/0x53
       [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
       [<ffffffff81423a9a>] rest_init+0x7e/0x80
       [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
       [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
       [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
      Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
       00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
       ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
      RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
       RSP <ffff880002203e08>
      CR2: 0000000000000038
      ---[ end trace a7919e7f17c0a726 ]---
      
      The L3 cache index disable feature of AMD CPUs has to be disabled if the
      kernel is running as guest on top of a hypervisor because northbridge
      devices are not available to the guest. Currently, this fixes a boot
      crash on top of Xen. In the future this will become an issue on KVM as
      well.
      
      Check if northbridge devices are present and do not enable the feature
      if there are none.
      
      [ hpa: backported to 2.6.34 ]
      Signed-off-by: NFrank Arnold <frank.arnold@amd.com>
      LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@kernel.org>
      7f284d3c
  3. 03 5月, 2010 1 次提交
  4. 25 4月, 2010 1 次提交
    • D
      VMware Balloon driver · 453dc659
      Dmitry Torokhov 提交于
      This is a standalone version of VMware Balloon driver.  Ballooning is a
      technique that allows hypervisor dynamically limit the amount of memory
      available to the guest (with guest cooperation).  In the overcommit
      scenario, when hypervisor set detects that it needs to shuffle some
      memory, it instructs the driver to allocate certain number of pages, and
      the underlying memory gets returned to the hypervisor.  Later hypervisor
      may return memory to the guest by reattaching memory to the pageframes and
      instructing the driver to "deflate" balloon.
      
      We are submitting a standalone driver because KVM maintainer (Avi Kivity)
      expressed opinion (rightly) that our transport does not fit well into
      virtqueue paradigm and thus it does not make much sense to integrate with
      virtio.
      
      There were also some concerns whether current ballooning technique is the
      right thing.  If there appears a better framework to achieve this we are
      prepared to evaluate and switch to using it, but in the meantime we'd like
      to get this driver upstream.
      
      We want to get the driver accepted in distributions so that users do not
      have to deal with an out-of-tree module and many distributions have
      "upstream first" requirement.
      
      The driver has been shipping for a number of years and users running on
      VMware platform will have it installed as part of VMware Tools even if it
      will not come from a distribution, thus there should not be additional
      risk in pulling the driver into mainline.  The driver will only activate
      if host is VMware so everyone else should not be affected at all.
      Signed-off-by: NDmitry Torokhov <dtor@vmware.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      453dc659
  5. 24 4月, 2010 1 次提交
    • H
      x86: Disable large pages on CPUs with Atom erratum AAE44 · 7a0fc404
      H. Peter Anvin 提交于
      Atom erratum AAE44/AAF40/AAG38/AAH41:
      
      "If software clears the PS (page size) bit in a present PDE (page
      directory entry), that will cause linear addresses mapped through this
      PDE to use 4-KByte pages instead of using a large page after old TLB
      entries are invalidated. Due to this erratum, if a code fetch uses
      this PDE before the TLB entry for the large page is invalidated then
      it may fetch from a different physical address than specified by
      either the old large page translation or the new 4-KByte page
      translation. This erratum may also cause speculative code fetches from
      incorrect addresses."
      
      [http://download.intel.com/design/processor/specupdt/319536.pdf]
      
      Where as commit 211b3d03 seems to
      workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
      opportunity for the bug to occur and does not totally remove it.  This
      patch disables mixed 4K/4MB page tables totally avoiding the page
      splitting and not tripping this processor issue.
      
      This is based on an original patch by Colin King.
      Originally-by: NColin Ian King <colin.king@canonical.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com>
      Cc: <stable@kernel.org>
      7a0fc404
  6. 06 4月, 2010 1 次提交
    • V
      perf, x86: Enable Nehalem-EX support · 134fbadf
      Vince Weaver 提交于
      According to Intel Software Devel Manual Volume 3B, the
      Nehalem-EX PMU is just like regular Nehalem (except for the
      uncore support, which is completely different).
      Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      134fbadf
  7. 03 4月, 2010 2 次提交
    • T
      perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernels · 257ef9d2
      Torok Edwin 提交于
      When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
      stopped after the first function, because it has seen a garbage memory
      address (tried to interpret the frame pointer, and return address as a
      64-bit pointer).
      
      Fix this by using a struct stack_frame with 32-bit pointers when the
      TIF_IA32 flag is set.
      
      Note that TIF_IA32 flag must be used, and not is_compat_task(), because
      the latter is only set when the 32-bit process is executing a syscall,
      which may not always be the case (when tracing page fault events for
      example).
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: x86@kernel.org
      Cc: linux-kernel@vger.kernel.org
      LKML-Reference: <1268820436-13145-1-git-send-email-edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      257ef9d2
    • P
      perf, x86: Fix AMD hotplug & constraint initialization · b38b24ea
      Peter Zijlstra 提交于
      Commit 3f6da390 ("perf: Rework and fix the arch CPU-hotplug hooks") moved
      the amd northbridge allocation from CPUS_ONLINE to CPUS_PREPARE_UP
      however amd_nb_id() doesn't work yet on prepare so it would simply bail
      basically reverting to a state where we do not properly track node wide
      constraints - causing weird perf results.
      
      Fix up the AMD NorthBridge initialization code by allocating from
      CPU_UP_PREPARE and installing it from CPU_STARTING once we have the
      proper nb_id. It also properly deals with the allocation failing.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      [ robustify using amd_has_nb() ]
      Signed-off-by: NStephane Eranian <eranian@google.com>
      LKML-Reference: <1269353485.5109.48.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b38b24ea
  8. 01 4月, 2010 1 次提交
    • F
      perf: Use hot regs with software sched switch/migrate events · e49a5bd3
      Frederic Weisbecker 提交于
      Scheduler's task migration events don't work because they always
      pass NULL regs perf_sw_event(). The event hence gets filtered
      in perf_swevent_add().
      
      Scheduler's context switches events use task_pt_regs() to get
      the context when the event occured which is a wrong thing to
      do as this won't give us the place in the kernel where we went
      to sleep but the place where we left userspace. The result is
      even more wrong if we switch from a kernel thread.
      
      Use the hot regs snapshot for both events as they belong to the
      non-interrupt/exception based events family. Unlike page faults
      or so that provide the regs matching the exact origin of the event,
      we need to save the current context.
      
      This makes the task migration event working and fix the context
      switch callchains and origin ip.
      
      Example: perf record -a -e cs
      
      Before:
      
          10.91%      ksoftirqd/0                  0  [k] 0000000000000000
                      |
                      --- (nil)
                          perf_callchain
                          perf_prepare_sample
                          __perf_event_overflow
                          perf_swevent_overflow
                          perf_swevent_add
                          perf_swevent_ctx_event
                          do_perf_sw_event
                          __perf_sw_event
                          perf_event_task_sched_out
                          schedule
                          run_ksoftirqd
                          kthread
                          kernel_thread_helper
      
      After:
      
          23.77%  hald-addon-stor  [kernel.kallsyms]  [k] schedule
                  |
                  --- schedule
                     |
                     |--60.00%-- schedule_timeout
                     |          wait_for_common
                     |          wait_for_completion
                     |          blk_execute_rq
                     |          scsi_execute
                     |          scsi_execute_req
                     |          sr_test_unit_ready
                     |          |
                     |          |--66.67%-- sr_media_change
                     |          |          media_changed
                     |          |          cdrom_media_changed
                     |          |          sr_block_media_changed
                     |          |          check_disk_change
                     |          |          cdrom_open
      
      v2: Always build perf_arch_fetch_caller_regs() now that software
      events need that too. They don't need it from modules, unlike trace
      events, so we keep the EXPORT_SYMBOL in trace_event_perf.c
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      e49a5bd3
  9. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  10. 23 3月, 2010 1 次提交
  11. 17 3月, 2010 1 次提交
    • F
      perf: Fix unexported generic perf_arch_fetch_caller_regs · dcd5c166
      Frederic Weisbecker 提交于
      perf_arch_fetch_caller_regs() is exported for the overriden x86
      version, but not for the generic weak version.
      
      As a general rule, weak functions should not have their symbol
      exported in the same file they are defined.
      
      So let's export it on trace_event_perf.c as it is used by trace
      events only.
      
      This fixes:
      
      	ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
      	ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
      
      -v2: And also only build it if trace events are enabled.
      -v3: Fix changelog mistake
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dcd5c166
  12. 14 3月, 2010 1 次提交
    • I
      x86/mce: Fix build bug with CONFIG_PROVE_LOCKING=y && CONFIG_X86_MCE_INTEL=y · 2aa2b50d
      Ingo Molnar 提交于
      Commit f56e8a07 "x86/mce: Fix RCU lockdep splats" introduced the
      following build bug:
      
        arch/x86/kernel/cpu/mcheck/mce.c: In function 'mce_log':
        arch/x86/kernel/cpu/mcheck/mce.c:166: error: 'mce_read_mutex' undeclared (first use in this function)
        arch/x86/kernel/cpu/mcheck/mce.c:166: error: (Each undeclared identifier is reported only once
        arch/x86/kernel/cpu/mcheck/mce.c:166: error: for each function it appears in.)
      
      Move the in-the-middle-of-file lock variable up to the variable
      definition section, the top of the .c file.
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1267830207-9474-3-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2aa2b50d
  13. 11 3月, 2010 4 次提交
    • X
      perf: export perf_trace_regs and perf_arch_fetch_caller_regs · 639fe4b1
      Xiao Guangrong 提交于
      Export perf_trace_regs and perf_arch_fetch_caller_regs since module will
      use these.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      [ use EXPORT_PER_CPU_SYMBOL_GPL() ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4B989C1B.2090407@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      639fe4b1
    • P
      perf, x86: Fix hw_perf_enable() event assignment · 45e16a68
      Peter Zijlstra 提交于
      What happens is that we schedule badly like:
      
      <...>-1987  [019]   280.252808: x86_pmu_start: event-46/1300c0: idx: 0
      <...>-1987  [019]   280.252811: x86_pmu_start: event-47/1300c0: idx: 1
      <...>-1987  [019]   280.252812: x86_pmu_start: event-48/1300c0: idx: 2
      <...>-1987  [019]   280.252813: x86_pmu_start: event-49/1300c0: idx: 3
      <...>-1987  [019]   280.252814: x86_pmu_start: event-50/1300c0: idx: 32
      <...>-1987  [019]   280.252825: x86_pmu_stop: event-46/1300c0: idx: 0
      <...>-1987  [019]   280.252826: x86_pmu_stop: event-47/1300c0: idx: 1
      <...>-1987  [019]   280.252827: x86_pmu_stop: event-48/1300c0: idx: 2
      <...>-1987  [019]   280.252828: x86_pmu_stop: event-49/1300c0: idx: 3
      <...>-1987  [019]   280.252829: x86_pmu_stop: event-50/1300c0: idx: 32
      <...>-1987  [019]   280.252834: x86_pmu_start: event-47/1300c0: idx: 1
      <...>-1987  [019]   280.252834: x86_pmu_start: event-48/1300c0: idx: 2
      <...>-1987  [019]   280.252835: x86_pmu_start: event-49/1300c0: idx: 3
      <...>-1987  [019]   280.252836: x86_pmu_start: event-50/1300c0: idx: 32
      <...>-1987  [019]   280.252837: x86_pmu_start: event-51/1300c0: idx: 32 *FAIL*
      
      This happens because we only iterate the n_running events in the first
      pass, and reset their index to -1 if they don't match to force a
      re-assignment.
      
      Now, in our RR example, n_running == 0 because we fully unscheduled, so
      event-50 will retain its idx==32, even though in scheduling it will have
      gotten idx=0, and we don't trigger the re-assign path.
      
      The easiest way to fix this is the below patch, which simply validates
      the full assignment in the second pass.
      Reported-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268311069.5037.31.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      45e16a68
    • M
      x86: Reduce per cpu MCA boot up messages · 10fb7f1f
      Mike Travis 提交于
      Don't write per cpu MCA boot up messages.
      Signed-of-by: NMike Travis <travis@sgi.com>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: x86@kernel.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      10fb7f1f
    • P
      x86/mce: Fix RCU lockdep splats · f56e8a07
      Paul E. McKenney 提交于
      Create an rcu_dereference_check_mce() that checks for RCU-sched
      read side and mce_read_mutex being held on update side.  Replace
      uses of rcu_dereference() in arch/x86/kernel/cpu/mcheck/mce.c
      with this new macro.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1267830207-9474-3-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f56e8a07
  14. 10 3月, 2010 13 次提交
    • F
      perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot · 5331d7b8
      Frederic Weisbecker 提交于
      Events that trigger overflows by interrupting a context can
      use get_irq_regs() or task_pt_regs() to retrieve the state
      when the event triggered. But this is not the case for some
      other class of events like trace events as tracepoints are
      executed in the same context than the code that triggered
      the event.
      
      It means we need a different api to capture the regs there,
      namely we need a hot snapshot to get the most important
      informations for perf: the instruction pointer to get the
      event origin, the frame pointer for the callchain, the code
      segment for user_mode() tests (we always use __KERNEL_CS as
      trace events always occur from the kernel) and the eflags
      for further purposes.
      
      v2: rename perf_save_regs to perf_fetch_caller_regs as per
      Masami's suggestion.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Archs <linux-arch@vger.kernel.org>
      5331d7b8
    • P
      perf, x86: Fix double enable calls · f3d46b2e
      Peter Zijlstra 提交于
      hw_perf_enable() would enable already enabled events.
      
      This causes problems with code that assumes that ->enable/->disable calls
      are balanced (like the LBR code does).
      
      What happens is that events that were already running and left in place
      would get enabled again.
      
      Avoid this by only enabling new events that match their previous
      assignment.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f3d46b2e
    • P
      perf, x86: Fix double disable calls · 19925ce7
      Peter Zijlstra 提交于
      hw_perf_enable() would disable events that were not yet enabled.
      
      This causes problems with code that assumes that ->enable/->disable calls
      are balanced (like the LBR code does).
      
      What happens is that we disable newly added counters that match their
      previous assignment, even though they are not yet programmed on the
      hardware.
      
      Avoid this by only doing the first pass over the existing events.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      19925ce7
    • P
      perf, x86: Properly account n_added · 356e1f2e
      Peter Zijlstra 提交于
      Make sure n_added is properly accounted so that we can rely on the value
      to reflect the number of added counters. This is needed if its going to
      be used for more than a boolean check.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      356e1f2e
    • P
      perf, x86: Avoid double disable on throttle vs ioctl(PERF_IOC_DISABLE) · 71e2d282
      Peter Zijlstra 提交于
      Calling ioctl(PERF_EVENT_IOC_DISABLE) on a thottled counter would result
      in a double disable, cure this by using x86_pmu_{start,stop} for
      throttle/unthrottle and teach x86_pmu_stop() to check ->active_mask.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71e2d282
    • P
      perf, x86: Fix x86_pmu_start · c08053e6
      Peter Zijlstra 提交于
      pmu::start should undo pmu::stop, make it so.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c08053e6
    • P
      perf, x86: Use unlocked bitops · 34538ee7
      Peter Zijlstra 提交于
      There is no concurrency on these variables, so don't use LOCK'ed ops.
      
      As to the intel_pmu_handle_irq() status bit clean, nobody uses that so
      remove it all together.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <20100304140100.240023029@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      34538ee7
    • P
      perf, x86: Change x86_pmu.{enable,disable} calling convention · aff3d91a
      Peter Zijlstra 提交于
      Pass the full perf_event into the x86_pmu functions so that those may
      make use of more than the hw_perf_event, and while doing this, remove the
      superfluous second argument.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <20100304140100.165166129@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aff3d91a
    • P
      perf, x86: Remove superfluous arguments to x86_perf_event_update() · cc2ad4ba
      Peter Zijlstra 提交于
      The second and third argument to x86_perf_event_update() are superfluous
      since they are simple expressions of the first argument. Hence remove
      them.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <20100304140100.089468871@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc2ad4ba
    • P
      perf, x86: Remove superfluous arguments to x86_perf_event_set_period() · 07088edb
      Peter Zijlstra 提交于
      The second and third argument to x86_perf_event_set_period() are
      superfluous since they are simple expressions of the first argument.
      Hence remove them.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <20100304140100.006500906@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      07088edb
    • P
      perf, x86, Do not user perf_disable from NMI context · 3fb2b8dd
      Peter Zijlstra 提交于
      Explicitly use intel_pmu_{disable,enable}_all() in intel_pmu_handle_irq()
      to avoid the NMI race conditions in perf_{disable,enable}
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3fb2b8dd
    • P
      perf: Rework and fix the arch CPU-hotplug hooks · 3f6da390
      Peter Zijlstra 提交于
      Remove the hw_perf_event_*() hotplug hooks in favour of per PMU hotplug
      notifiers. This has the advantage of reducing the static weak interface
      as well as exposing all hotplug actions to the PMU.
      
      Use this to fix x86 hotplug usage where we did things in ONLINE which
      should have been done in UP_PREPARE or STARTING.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <20100305154128.736225361@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f6da390
    • P
      perf: Provide generic perf_sample_data initialization · dc1d628a
      Peter Zijlstra 提交于
      This makes it easier to extend perf_sample_data and fixes a bug on arm
      and sparc, which failed to set ->raw to NULL, which can cause crashes
      when combined with PERF_SAMPLE_RAW.
      
      It also optimizes PowerPC and tracepoint, because the struct
      initialization is forced to zero out the whole structure.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NJean Pihet <jpihet@mvista.com>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Jamie Iles <jamie.iles@picochip.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: stable@kernel.org
      LKML-Reference: <20100304140100.315416040@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dc1d628a
  15. 08 3月, 2010 2 次提交
  16. 07 3月, 2010 1 次提交
  17. 06 3月, 2010 1 次提交
  18. 02 3月, 2010 3 次提交
  19. 01 3月, 2010 1 次提交
    • R
      perf, x86: rename macro in ARCH_PERFMON_EVENTSEL_ENABLE · bb1165d6
      Robert Richter 提交于
      For consistency reasons this patch renames
      ARCH_PERFMON_EVENTSEL0_ENABLE to ARCH_PERFMON_EVENTSEL_ENABLE.
      
      The following is performed:
      
       $ sed -i -e s/ARCH_PERFMON_EVENTSEL0_ENABLE/ARCH_PERFMON_EVENTSEL_ENABLE/g \
         arch/x86/include/asm/perf_event.h arch/x86/kernel/cpu/perf_event.c \
         arch/x86/kernel/cpu/perf_event_p6.c \
         arch/x86/kernel/cpu/perfctr-watchdog.c \
         arch/x86/oprofile/op_model_amd.c arch/x86/oprofile/op_model_ppro.c
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      bb1165d6
  20. 27 2月, 2010 1 次提交