1. 20 5月, 2011 1 次提交
  2. 19 5月, 2011 4 次提交
  3. 17 5月, 2011 3 次提交
  4. 16 5月, 2011 1 次提交
    • Y
      x86, apic: Fix spurious error interrupts triggering on all non-boot APs · e503f9e4
      Youquan Song 提交于
      This patch fixes a bug reported by a customer, who found
      that many unreasonable error interrupts reported on all
      non-boot CPUs (APs) during the system boot stage.
      
      According to Chapter 10 of Intel Software Developer Manual
      Volume 3A, Local APIC may signal an illegal vector error when
      an LVT entry is set as an illegal vector value (0~15) under
      FIXED delivery mode (bits 8-11 is 0), regardless of whether
      the mask bit is set or an interrupt actually happen. These
      errors are seen as error interrupts.
      
      The initial value of thermal LVT entries on all APs always reads
      0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
      sequence to them and LVT registers are reset to 0s except for
      the mask bits which are set to 1s when APs receive INIT IPI.
      
      When the BIOS takes over the thermal throttling interrupt,
      the LVT thermal deliver mode should be SMI and it is required
      from the kernel to keep AP's LVT thermal monitoring register
      programmed as such as well.
      
      This issue happens when BIOS does not take over thermal throttling
      interrupt, AP's LVT thermal monitor register will be restored to
      0x10000 which means vector 0 and fixed deliver mode, so all APs will
      signal illegal vector error interrupts.
      
      This patch check if interrupt delivery mode is not fixed mode before
      restoring AP's LVT thermal monitor register.
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NYong Wang <yong.y.wang@intel.com>
      Cc: hpa@linux.intel.com
      Cc: joe@perches.com
      Cc: jbaron@redhat.com
      Cc: trenn@suse.de
      Cc: kent.liu@intel.com
      Cc: chaohong.guo@intel.com
      Cc: <stable@kernel.org> # As far back as possible
      Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      e503f9e4
  5. 13 5月, 2011 8 次提交
    • J
      x86, mce, AMD: Fix leaving freed data in a list · d9a5ac9e
      Julia Lawall 提交于
      b may be added to a list, but is not removed before being freed
      in the case of an error.  This is done in the corresponding
      deallocation function, so the code here has been changed to
      follow that.
      
      The sematic match that finds this problem is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression E,E1,E2;
      identifier l;
      @@
      
      *list_add(&E->l,E1);
      ... when != E1
          when != list_del(&E->l)
          when != list_del_init(&E->l)
          when != E = E2
      *kfree(E);// </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dkSigned-off-by: NIngo Molnar <mingo@elte.hu>
      d9a5ac9e
    • C
      x86: Fix UV BAU for non-consecutive nasids · 77ed23f8
      Cliff Wickman 提交于
      This is a fix for the SGI Altix-UV Broadcast Assist Unit code,
      which is used for TLB flushing.
      
      Certain hardware configurations (that customers are ordering)
      cause nasids (numa address space id's) to be non-consecutive.
      Specifically, once you have more than 4 blades in a IRU
      (Individual Rack Unit - or 1/2 rack) but less than the maximum
      of 16, the nasid numbering becomes non-consecutive.  This
      currently results in a 'catastrophic error' (CATERR) detected by
      the firmware during OS boot.  The BAU is generating an 'INTD'
      request that is targeting a non-existent nasid value. Such
      configurations may also occur when a blade is configured off
      because of hardware errors. (There is one UV hub per blade.)
      
      This patch is required to support such configurations.
      
      The problem with the tlb_uv.c code is that is using the
      consecutive hub numbers as indices to the BAU distribution bit
      map. These are simply the ordinal position of the hub or blade
      within its partition.  It should be using physical node numbers
      (pnodes), which correspond to the physical nasid values. Use of
      the hub number only works as long as the nasids in the partition
      are consecutive and increase with a stride of 1.
      
      This patch changes the index to be the pnode number, thus
      allowing nasids to be non-consecutive.
      It also provides a table in local memory for each cpu to
      translate target cpu number to target pnode and nasid.
      And it improves naming to properly reflect 'node' and 'uvhub'
      versus 'nasid'.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/E1QJmxX-0002Mz-Fk@eag09.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      77ed23f8
    • D
      arch/x86/xen/setup: Cleanup code/data sections definitions · ae15a3b4
      Daniel Kiper 提交于
      Cleanup code/data sections definitions
      accordingly to include/linux/init.h.
      Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ae15a3b4
    • D
      arch/x86/xen/enlighten: Cleanup code/data sections definitions · ad3062a0
      Daniel Kiper 提交于
      Cleanup code/data sections definitions
      accordingly to include/linux/init.h.
      Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ad3062a0
    • D
      arch/x86/xen/irq: Cleanup code/data sections definitions · 251511a1
      Daniel Kiper 提交于
      Cleanup code/data sections definitions
      accordingly to include/linux/init.h.
      Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      251511a1
    • S
      x86/mm: Fix section mismatch derived from native_pagetable_reserve() · 53f8023f
      Sedat Dilek 提交于
      With CONFIG_DEBUG_SECTION_MISMATCH=y I see these warnings in next-20110415:
      
        LD      vmlinux.o
        MODPOST vmlinux.o
      WARNING: vmlinux.o(.text+0x1ba48): Section mismatch in reference from the function native_pagetable_reserve() to the function .init.text:memblock_x86_reserve_range()
      The function native_pagetable_reserve() references
      the function __init memblock_x86_reserve_range().
      This is often because native_pagetable_reserve lacks a __init
      annotation or the annotation of memblock_x86_reserve_range is wrong.
      
      This patch fixes the issue.
      Thanks to pipacs from PaX project for help on IRC.
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      53f8023f
    • S
      x86,xen: introduce x86_init.mapping.pagetable_reserve · 279b706b
      Stefano Stabellini 提交于
      Introduce a new x86_init hook called pagetable_reserve that at the end
      of init_memory_mapping is used to reserve a range of memory addresses for
      the kernel pagetable pages we used and free the other ones.
      
      On native it just calls memblock_x86_reserve_range while on xen it also
      takes care of setting the spare memory previously allocated
      for kernel pagetable pages from RO to RW, so that it can be used for
      other purposes.
      
      A detailed explanation of the reason why this hook is needed follows.
      
      As a consequence of the commit:
      
      commit 4b239f45
      Author: Yinghai Lu <yinghai@kernel.org>
      Date:   Fri Dec 17 16:58:28 2010 -0800
      
          x86-64, mm: Put early page table high
      
      at some point init_memory_mapping is going to reach the pagetable pages
      area and map those pages too (mapping them as normal memory that falls
      in the range of addresses passed to init_memory_mapping as argument).
      Some of those pages are already pagetable pages (they are in the range
      pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
      everything is fine.
      Some of these pages are not pagetable pages yet (they fall in the range
      pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
      are going to be mapped RW.  When these pages become pagetable pages and
      are hooked into the pagetable, xen will find that the guest has already
      a RW mapping of them somewhere and fail the operation.
      The reason Xen requires pagetables to be RO is that the hypervisor needs
      to verify that the pagetables are valid before using them. The validation
      operations are called "pinning" (more details in arch/x86/xen/mmu.c).
      
      In order to fix the issue we mark all the pages in the entire range
      pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
      is completed only the range pgt_buf_start-pgt_buf_end is reserved by
      init_memory_mapping. Hence the kernel is going to crash as soon as one
      of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
      ranges are RO).
      
      For this reason we need a hook to reserve the kernel pagetable pages we
      used and free the other ones so that they can be reused for other
      purposes.
      On native it just means calling memblock_x86_reserve_range, on Xen it
      also means marking RW the pagetable pages that we allocated before but
      that haven't been used before.
      
      Another way to fix this is without using the hook is by adding a 'if
      (xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
      counterpart, but that is just nasty.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      279b706b
    • K
      Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high"" · 92bdaef7
      Konrad Rzeszutek Wilk 提交于
      This reverts commit a3864783.
      
      It does not work with certain AMD machines.
      
      last_pfn = 0x100000 max_arch_pfn = 0x400000000
      initial memory mapped : 0 - 02c3a000
      Base memory trampoline at [ffff88000009b000] 9b000 size 20480
      init_memory_mapping: 0000000000000000-0000000100000000
       0000000000 - 0100000000 page 4k
      kernel direct mapping tables up to 100000000 @ ff7fb000-100000000
      init_memory_mapping: 0000000100000000-00000001e0800000
       0100000000 - 01e0800000 page 4k
      kernel direct mapping tables up to 1e0800000 @ 1df0f3000-1e0000000
      xen: setting RW the range fffdc000 - 100000000
      RAMDISK: 0203b000 - 02c3a000
      No NUMA configuration found
      Faking a node at 0000000000000000-00000001e0800000
      NUMA: Using 63 for the hash shift.
      Initmem setup node 0 0000000000000000-00000001e0800000
        NODE_DATA [00000001dfffb000 - 00000001dfffffff]
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
      PGD 0
      Oops: 0003 [#1] SMP
      last sysfs file:
      CPU 0
      Modules linked in:
      
      Pid: 0, comm: swapper Not tainted 2.6.39-0-virtual #6~smb1
      RIP: e030:[<ffffffff81cf6a75>]  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
      RSP: e02b:ffffffff81c01e38  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 00000001e0800000 RCX: 0000000000001040
      RDX: 0000000000004100 RSI: 0000000000000000 RDI: ffff8801dfffb000
      RBP: ffffffff81c01e58 R08: 0000000000000020 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000bfe400
      FS:  0000000000000000(0000) GS:ffffffff81cca000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000001c03000 CR4: 0000000000000660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0b020)
      Stack:
       0000000000000040 0000000000000001 0000000000000000 ffffffffffffffff
       ffffffff81c01e88 ffffffff81cf6c25 0000000000000000 0000000000000000
       ffffffff81cf687f 0000000000000000 ffffffff81c01ea8 ffffffff81cf6e45
      Call Trace:
       [<ffffffff81cf6c25>] numa_register_memblks.constprop.3+0x150/0x181
       [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
       [<ffffffff81cf6e45>] numa_init.part.2+0x1c/0x7c
       [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
       [<ffffffff81cf6f67>] numa_init+0x6c/0x70
       [<ffffffff81cf7057>] initmem_init+0x39/0x3b
       [<ffffffff81ce5865>] setup_arch+0x64e/0x769
       [<ffffffff815e43c1>] ? printk+0x51/0x53
       [<ffffffff81cdf92b>] start_kernel+0xd4/0x3f3
       [<ffffffff81cdf388>] x86_64_start_reservations+0x132/0x136
       [<ffffffff81ce2ed4>] xen_start_kernel+0x588/0x58f
      Code: 41 00 00 48 8b 3c c5 a0 24 cc 81 31 c0 40 f6 c7 01 74 05 aa 66 ba ff 40 40 f6 c7 02 74 05 66 ab 83 ea 02 89 d1 c1 e9 02 f6 c2 02 <f3> ab 74 02 66 ab 80 e2 01 74 01 aa 49 63 c4 48 c1 eb 0c 44 89
      RIP  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
       RSP <ffffffff81c01e38>
      CR2: 0000000000000000
      ---[ end trace a7919e7f17c0a725 ]---
      Kernel panic - not syncing: Attempted to kill the idle task!
      Pid: 0, comm: swapper Tainted: G      D     2.6.39-0-virtual #6~smb1
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      92bdaef7
  6. 11 5月, 2011 1 次提交
  7. 10 5月, 2011 1 次提交
    • J
      x86, UV: Fix NMI handler for UV platforms · 1d44e828
      Jack Steiner 提交于
      This fixes problems seen on UV systems handling NMIs from the
      node controller.
      
      I isolated the "dazed..." messages that I saw earlier to a bug in
      the BMC on our platform. It was sending NMIs w/o properly setting
      a register that indicated the source of NMI.
      
      So rather than _assuming_ any unhandled NMI came from the UV system
      maintenance console (SMC), add a check to verify that the SMC actually
      sent the NMI.
      Signed-off-by: NJack Steiner <steiner@sgi.com>
      Cc: gorcunov@gmail.com
      Cc: dzickus@redhat.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1d44e828
  8. 07 5月, 2011 1 次提交
  9. 06 5月, 2011 1 次提交
  10. 04 5月, 2011 2 次提交
    • D
      [CPUFREQ] use dynamic debug instead of custom infrastructure · 2d06d8c4
      Dominik Brodowski 提交于
      With dynamic debug having gained the capability to report debug messages
      also during the boot process, it offers a far superior interface for
      debug messages than the custom cpufreq infrastructure. As a first step,
      remove the old cpufreq_debug_printk() function and replace it with a call
      to the generic pr_debug() function.
      
      How can dynamic debug be used on cpufreq? You need a kernel which has
      CONFIG_DYNAMIC_DEBUG enabled.
      
      To enabled debugging during runtime, mount debugfs and
      
      $ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control
      
      for debugging the complete "cpufreq" module. To achieve the same goal during
      boot, append
      
      	ddebug_query="module cpufreq +p"
      
      as a boot parameter to the kernel of your choice.
      
      For more detailled instructions, please see
      Documentation/dynamic-debug-howto.txt
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NDave Jones <davej@redhat.com>
      2d06d8c4
    • N
      [CPUFREQ] Fix _OSC UUID in pcc-cpufreq · 904cc1e6
      Naga Chumbalkar 提交于
      UUID needs to be written out the way it is described in
      Sec 18.5.124 of ACPI 4.0a Specification.
      
      Platform firmware's use of this UUID/_OSC is optional, which is
      why we didn't notice this bug earlier.
      Signed-off-by: NNaga Chumbalkar <nagananda.chumbalkar@hp.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      Cc: stable@kernel.org
      904cc1e6
  11. 03 5月, 2011 3 次提交
    • H
      x86, reboot: Fix relocations in reboot_32.S · 7806a49a
      H. Peter Anvin 提交于
      The use of base for %ebx in this file is arbitrary, *except* that we
      also use it to compute the real-mode segment.  Therefore, make it so
      that r_base really is the true address to which %ebx points.
      
      This resolves kernel bugzilla 33302.
      Reported-and-tested-by: NAlexey Zaytsev <alexey.zaytsev@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/n/tip-08os5wi3yq1no0y4i5m4z7he@git.kernel.org
      7806a49a
    • S
      xen: mask_rw_pte mark RO all pagetable pages up to pgt_buf_top · b9269dc7
      Stefano Stabellini 提交于
      mask_rw_pte is currently checking if a pfn is a pagetable page if it
      falls in the range pgt_buf_start - pgt_buf_end but that is incorrect
      because pgt_buf_end is a moving target: pgt_buf_top is the real
      boundary.
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      b9269dc7
    • K
      xen/mmu: Add workaround "x86-64, mm: Put early page table high" · a3864783
      Konrad Rzeszutek Wilk 提交于
      As a consequence of the commit:
      
      commit 4b239f45
      Author: Yinghai Lu <yinghai@kernel.org>
      Date:   Fri Dec 17 16:58:28 2010 -0800
      
          x86-64, mm: Put early page table high
      
      it causes the Linux kernel to crash under Xen:
      
      mapping kernel into physical memory
      Xen: setup ISA identity maps
      about to get started...
      (XEN) mm.c:2466:d0 Bad type (saw 7400000000000001 != exp 1000000000000000) for mfn b1d89 (pfn bacf7)
      (XEN) mm.c:3027:d0 Error while pinning mfn b1d89
      (XEN) traps.c:481:d0 Unhandled invalid opcode fault/trap [#6] on VCPU 0 [ec=0000]
      (XEN) domain_crash_sync called from entry.S
      (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
      ...
      
      The reason is that at some point init_memory_mapping is going to reach
      the pagetable pages area and map those pages too (mapping them as normal
      memory that falls in the range of addresses passed to init_memory_mapping
      as argument). Some of those pages are already pagetable pages (they are
      in the range pgt_buf_start-pgt_buf_end) therefore they are going to be
      mapped RO and everything is fine.
      Some of these pages are not pagetable pages yet (they fall in the range
      pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
      are going to be mapped RW.  When these pages become pagetable pages and
      are hooked into the pagetable, xen will find that the guest has already
      a RW mapping of them somewhere and fail the operation.
      The reason Xen requires pagetables to be RO is that the hypervisor needs
      to verify that the pagetables are valid before using them. The validation
      operations are called "pinning" (more details in arch/x86/xen/mmu.c).
      
      In order to fix the issue we mark all the pages in the entire range
      pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
      is completed only the range pgt_buf_start-pgt_buf_end is reserved by
      init_memory_mapping. Hence the kernel is going to crash as soon as one
      of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
      ranges are RO).
      
      For this reason, this function is introduced which is called _after_
      the init_memory_mapping has completed (in a perfect world we would
      call this function from init_memory_mapping, but lets ignore that).
      
      Because we are called _after_ init_memory_mapping the pgt_buf_[start,
      end,top] have all changed to new values (b/c another init_memory_mapping
      is called). Hence, the first time we enter this function, we save
      away the pgt_buf_start value and update the pgt_buf_[end,top].
      
      When we detect that the "old" pgt_buf_start through pgt_buf_end
      PFNs have been reserved (so memblock_x86_reserve_range has been called),
      we immediately set out to RW the "old" pgt_buf_end through pgt_buf_top.
      
      And then we update those "old" pgt_buf_[end|top] with the new ones
      so that we can redo this on the next pagetable.
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      [v1: Updated with Jeremy's comments]
      [v2: Added the crash output]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a3864783
  12. 02 5月, 2011 2 次提交
  13. 28 4月, 2011 2 次提交
  14. 27 4月, 2011 2 次提交
    • D
      perf, x86, nmi: Move LVT un-masking into irq handlers · 2bce5dac
      Don Zickus 提交于
      It was noticed that P4 machines were generating double NMIs for
      each perf event.  These extra NMIs lead to 'Dazed and confused'
      messages on the screen.
      
      I tracked this down to a P4 quirk that said the overflow bit had
      to be cleared before re-enabling the apic LVT mask.  My first
      attempt was to move the un-masking inside the perf nmi handler
      from before the chipset NMI handler to after.
      
      This broke Nehalem boxes that seem to like the unmasking before
      the counters themselves are re-enabled.
      
      In order to keep this change simple for 2.6.39, I decided to
      just simply move the apic LVT un-masking to the beginning of all
      the chipset NMI handlers, with the exception of Pentium4's to
      fix the double NMI issue.
      
      Later on we can move the un-masking to later in the handlers to
      save a number of 'extra' NMIs on those particular chipsets.
      
      I tested this change on a P4 machine, an AMD machine, a Nehalem
      box, and a core2quad box.  'perf top' worked correctly along
      with various other small 'perf record' runs.  Anything high
      stress breaks all the machines but that is a different problem.
      
      Thanks to various people for testing different versions of this
      patch.
      Reported-and-tested-by: NShaun Ruffell <sruffell@digium.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      CC: Cyrill Gorcunov <gorcunov@gmail.com>
      2bce5dac
    • I
      perf events, x86: Work around the Nehalem AAJ80 erratum · ec75a716
      Ingo Molnar 提交于
      On Nehalem CPUs the retired branch-misses event can be completely bogus,
      when there are no branch-misses occuring. When there are a lot of branch
      misses then the count is pretty accurate. Still, this leaves us with an
      event that over-counts a lot.
      
      Detect this erratum and work it around by using BR_MISP_EXEC.ANY events.
      These will also count speculated branches but still it's a lot more
      precise in practice than the architectural event.
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/n/tip-yyfg0bxo9jsqxd6a0ovfny27@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      ec75a716
  15. 26 4月, 2011 2 次提交
  16. 25 4月, 2011 1 次提交
  17. 22 4月, 2011 4 次提交
    • P
      perf, x86: Update/fix Intel Nehalem cache events · f4929bd3
      Peter Zijlstra 提交于
      Change the Nehalem cache events to use retired memory instruction counters
      (similar to Westmere), this greatly improves the provided stats.
      
      Using:
      
      main ()
      {
              int i;
      
              for (i = 0; i < 1000000000; i++) {
                      asm("mov (%%rsp), %%rbx;"
                          "mov %%rbx, (%%rsp);" : : : "rbx");
              }
      }
      
      We find:
      
       $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores
        Performance counter stats for './loop_1b_loads+stores' (10 runs):
            4,000,081,056 instructions:u           #      0.000 IPC ( +-   0.000% )
            4,999,502,846 l1-dcache-loads:u          ( +-   0.008% )
            1,000,034,832 l1-dcache-stores:u         ( +-   0.000% )
               1.565184942  seconds time elapsed   ( +-   0.005% )
      
      The 5b is surprising - we'd expect 1b:
      
       $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores
        Performance counter stats for './loop_1b_loads+stores' (10 runs):
            4,000,081,054 instructions:u           #      0.000 IPC ( +-   0.000% )
            1,000,021,961 r10b:u                     ( +-   0.000% )
            1,000,030,951 l1-dcache-stores:u         ( +-   0.000% )
               1.565055422  seconds time elapsed   ( +-   0.003% )
      
      Which this patch thus fixes.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      f4929bd3
    • C
      perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow · 1ea5a6af
      Cyrill Gorcunov 提交于
      It's not enough to simply disable event on overflow the
      cpuc->active_mask should be cleared as well otherwise counter
      may stall in "active" even in real being already disabled (which
      potentially may lead to the situation that user may not use this
      counter further).
      
      Don pointed out that:
      
       " I also noticed this patch fixed some unknown NMIs
         on a P4 when I stressed the box".
      Tested-by: NLin Ming <ming.m.lin@intel.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1ea5a6af
    • I
      x86, perf event: Turn off unstructured raw event access to offcore registers · b52c55c6
      Ingo Molnar 提交于
      Andi Kleen pointed out that the Intel offcore support patches were merged
      without user-space tool support to the functionality:
      
       |
       | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
       | user space bits were not. This made it impossible to set the extra mask
       | and actually do the OFFCORE profiling
       |
      
      Andi submitted a preliminary patch for user-space support, as an
      extension to perf's raw event syntax:
      
       |
       | Some raw events -- like the Intel OFFCORE events -- support additional
       | parameters. These can be appended after a ':'.
       |
       | For example on a multi socket Intel Nehalem:
       |
       |    perf stat -e r1b7:20ff -a sleep 1
       |
       | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
       | that measures any access to DRAM on another socket.
       |
      
      But this kind of usability is absolutely unacceptable - users should not
      be expected to type in magic, CPU and model specific incantations to get
      access to useful hardware functionality.
      
      The proper solution is to expose useful offcore functionality via
      generalized events - that way users do not have to care which specific
      CPU model they are using, they can use the conceptual event and not some
      model specific quirky hexa number.
      
      We already have such generalization in place for CPU cache events,
      and it's all very extensible.
      
      "Offcore" events measure general DRAM access patters along various
      parameters. They are particularly useful in NUMA systems.
      
      We want to support them via generalized DRAM events: either as the
      fourth level of cache (after the last-level cache), or as a separate
      generalization category.
      
      That way user-space support would be very obvious, memory access
      profiling could be done via self-explanatory commands like:
      
        perf record -e dram ./myapp
        perf record -e dram-remote ./myapp
      
      ... to measure DRAM accesses or more expensive cross-node NUMA DRAM
      accesses.
      
      These generalized events would work on all CPUs and architectures that
      have comparable PMU features.
      
      ( Note, these are just examples: actual implementation could have more
        sophistication and more parameter - as long as they center around
        similarly simple usecases. )
      
      Now we do not want to revert *all* of the current offcore bits, as they
      are still somewhat useful for generic last-level-cache events, implemented
      in this commit:
      
        e994d7d2: perf: Fix LLC-* events on Intel Nehalem/Westmere
      
      But we definitely do not yet want to expose the unstructured raw events
      to user-space, until better generalization and usability is implemented
      for these hardware event features.
      
      ( Note: after generalization has been implemented raw offcore events can be
        supported as well: there can always be an odd event that is marginally
        useful but not useful enough to generalize. DRAM profiling is definitely
        *not* such a category so generalization must be done first. )
      
      Furthermore, PERF_TYPE_RAW access to these registers was not intended
      to go upstream without proper support - it was a side-effect of the above
      e994d7d2 commit, not mentioned in the changelog.
      
      As v2.6.39 is nearing release we go for the simplest approach: disable
      the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
      kernel and becomes an ABI.
      
      Once proper structure is implemented for these hardware events and users
      are offered usable solutions we can revisit this issue.
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b52c55c6
    • A
      perf: Support Xeon E7's via the Westmere PMU driver · b2508e82
      Andi Kleen 提交于
      There's a new model number public, 47, for Xeon E7 (aka Westmere EX).
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: a.p.zijlstra@chello.nl
      Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b2508e82
  18. 21 4月, 2011 1 次提交