1. 01 3月, 2010 16 次提交
  2. 27 2月, 2010 2 次提交
    • R
      x86: Enable NMI on all cpus on UV · 78c06176
      Russ Anderson 提交于
      Enable NMI on all cpus in UV system and add an NMI handler
      to dump_stack on each cpu.
      
      By default on x86 all the cpus except the boot cpu have NMI
      masked off.  This patch enables NMI on all cpus in UV system
      and adds an NMI handler to dump_stack on each cpu.  This
      way if a system hangs we can NMI the machine and get a
      backtrace from all the cpus.
      
      Version 2: Use x86_platform driver mechanism for nmi init, per
                 Ingo's suggestion.
      
      Version 3: Clean up Ingo's nits.
      Signed-off-by: NRuss Anderson <rja@sgi.com>
      LKML-Reference: <20100226164912.GA24439@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78c06176
    • P
      perf_event, amd: Fix spinlock initialization · 1dd2980d
      Peter Zijlstra 提交于
      Avoid kernels from exploding on AMD machines when they have any
      lock debugging bits enabled.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1dd2980d
  3. 26 2月, 2010 17 次提交
    • P
      perf_events, x86: Split PMU definitions into separate files · f22f54f4
      Peter Zijlstra 提交于
      Split amd,p6,intel into separate files so that we can easily deal with
      CONFIG_CPU_SUP_* things, needed to make things build now that perf_event.c
      relies on symbols from amd.c
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f22f54f4
    • R
      oprofile/x86: fix msr access to reserved counters · cfc9c0b4
      Robert Richter 提交于
      During switching virtual counters there is access to perfctr msrs. If
      the counter is not available this fails due to an invalid
      address. This patch fixes this.
      
      Cc: stable@kernel.org
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      cfc9c0b4
    • R
      oprofile/x86: use kzalloc() instead of kmalloc() · c17c8fbf
      Robert Richter 提交于
      Cc: stable@kernel.org
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      c17c8fbf
    • R
      oprofile/x86: fix perfctr nmi reservation for mulitplexing · 68dc819c
      Robert Richter 提交于
      Multiple virtual counters share one physical counter. The reservation
      of virtual counters fails due to duplicate allocation of the same
      counter. The counters are already reserved. Thus, virtual counter
      reservation may removed at all. This also makes the code easier.
      
      Cc: stable@kernel.org
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      68dc819c
    • N
      oprofile/x86: add comment to counter-in-use warning · 8588d106
      Naga Chumbalkar 提交于
      Currently, oprofile fails silently on platforms where a non-OS entity
      such as the system firmware "enables" and uses a performance
      counter. There is a warning in the code for this case.
      
      The warning indicates an already running counter. If oprofile doesn't
      collect data, then try using a different performance counter on your
      platform to monitor the desired event. Delete the counter from the
      desired event by editing the
      
       /usr/share/oprofile/<cpu_type>/<cpu>/events
      
      file. If the event cannot be monitored by any other counter, contact
      your hardware or BIOS vendor.
      
      Cc: Shashi Belur <shashi-kiran.belur@hp.com>
      Cc: Tony Jones <tonyj@suse.de>
      Signed-off-by: NNaga Chumbalkar <nagananda.chumbalkar@hp.com>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      8588d106
    • R
      oprofile/x86: warn user if a counter is already active · 98a2e73a
      Robert Richter 提交于
      This patch generates a warning if a counter is already active.
      
      Implemented for AMD and P6 models. P4 is not supported.
      
      Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
      Cc: Shashi Belur <shashi-kiran.belur@hp.com>
      Cc: Tony Jones <tonyj@suse.de>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      98a2e73a
    • R
      oprofile/x86: implement randomization for IBS periodic op counter · ba52078e
      Robert Richter 提交于
      IBS selects an op (execution operation) for sampling by counting
      either cycles or dispatched ops. Better statistical samples can be
      produced by adding a software generated random offset to the periodic
      op counter value with each sample.
      
      This patch adds software randomization to the IBS periodic op
      counter. The lower 12 bits of the 20 bit counter are
      randomized. IbsOpCurCnt is initialized with a 12 bit random value.
      
      There is a work around if the hw can not write to IbsOpCurCnt. Then
      the lower 8 bits of the 16 bit IbsOpMaxCnt [15:0] value are randomized
      in the range of -128 to +127 by adding/subtracting an offset to the
      maximum count (IbsOpMaxCnt).
      
      The linear feedback shift register (LFSR) algorithm is used for
      pseudo-random number generation to have low impact to the memory
      system.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      ba52078e
    • S
      oprofile/x86: implement lsfr pseudo-random number generator for IBS · f125be14
      Suravee Suthikulpanit 提交于
      This patch implements a linear feedback shift register (LFSR) for
      pseudo-random number generation for IBS.
      
      For IBS measurements it would be good to minimize memory traffic in
      the interrupt handler since every access pollutes the data
      caches. Computing a maximal period LFSR just needs shifts and ORs.
      
      The LFSR method is good enough to randomize the ops at low
      overhead. 16 pseudo-random bits are enough for the implementation and
      it doesn't matter that the pattern repeats with a fairly short
      cycle. It only needs to break up (hard) periodic sampling behavior.
      
      The logic was designed by Paul Drongowski.
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      f125be14
    • R
      oprofile/x86: implement IBS cpuid feature detection · 64683da6
      Robert Richter 提交于
      This patch adds IBS feature detection using cpuid flags. An IBS
      capability mask is introduced to test for certain IBS features. The
      bit mask is the same as for IBS cpuid feature flags (Fn8000_001B_EAX),
      but bit 0 is used to indicate the existence of IBS.
      
      The patch also changes the handling of the IbsOpCntCtl bit (periodic
      op counter count control). The oprofilefs file for this feature
      (ibs_op/dispatched_ops) will be only exposed if the feature is
      available, also the default for the bit is set to count clock cycles.
      
      In general, the userland can detect the availability of a feature by
      checking for the corresponding file in oprofilefs. If it exists, the
      feature also exists. This may lead to a dynamic file layout depending
      on the cpu type with that the userland has to deal with. Current
      opcontrol is compatible.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      64683da6
    • R
      oprofile/x86: remove node check in AMD IBS initialization · 89baaaa9
      Robert Richter 提交于
      Standard AMD systems have the same number of nodes as there are
      northbridge devices. However, there may kernel configurations
      (especially for 32 bit) or system setups exist, where the node number
      is different or it can not be detected properly. Thus the check is not
      reliable and may fail though IBS setup was fine. For this reason it is
      better to remove the check.
      
      Cc: stable <stable@kernel.org>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      89baaaa9
    • R
      oprofile/x86: remove OPROFILE_IBS config option · 013cfc50
      Robert Richter 提交于
      OProfile support for IBS is now for several versions in the
      kernel. The feature is stable now and the code can be activated
      permanently.
      
      As a side effect IBS now works also on nosmp configs.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      013cfc50
    • P
      perf_events, x86: Remove superflous MSR writes · 6667661d
      Peter Zijlstra 提交于
      We re-program the event control register every time we reset the count,
      this appears to be superflous, hence remove it.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6667661d
    • P
      perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in() · 6e37738a
      Peter Zijlstra 提交于
      Since the cpu argument to hw_perf_group_sched_in() is always
      smp_processor_id(), simplify the code a little by removing this argument
      and using the current cpu where needed.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: David Miller <davem@davemloft.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1265890918.5396.3.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e37738a
    • S
      perf_events, x86: AMD event scheduling · 38331f62
      Stephane Eranian 提交于
      This patch adds correct AMD NorthBridge event scheduling.
      
      NB events are events measuring L3 cache, Hypertransport traffic. They are
      identified by an event code >= 0xe0. They measure events on the
      Northbride which is shared by all cores on a package. NB events are
      counted on a shared set of counters. When a NB event is programmed in a
      counter, the data actually comes from a shared counter. Thus, access to
      those counters needs to be synchronized.
      
      We implement the synchronization such that no two cores can be measuring
      NB events using the same counters. Thus, we maintain a per-NB allocation
      table. The available slot is propagated using the event_constraint
      structure.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4b703957.0702d00a.6bf2.7b7d@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      38331f62
    • S
      perf_events: Add new start/stop PMU callbacks · d76a0812
      Stephane Eranian 提交于
      In certain situations, the kernel may need to stop and start the same
      event rapidly. The current PMU callbacks do not distinguish between stop
      and release (i.e., stop + free the resource). Thus, a counter may be
      released, then it will be immediately re-acquired. Event scheduling will
      again take place with no guarantee to assign the same counter. On some
      processors, this may event yield to failure to assign the event back due
      to competion between cores.
      
      This patch is adding a new pair of callback to stop and restart a counter
      without actually release the underlying counter resource. On stop, the
      counter is stopped, its values saved and that's it. On start, the value
      is reloaded and counter is restarted (on x86, actual restart is delayed
      until perf_enable()).
      Signed-off-by: NStephane Eranian <eranian@google.com>
      [ added fallback to ->enable/->disable for all other PMUs
        fixed x86_pmu_start() to call x86_pmu.enable()
        merged __x86_pmu_disable into x86_pmu_stop() ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4b703875.0a04d00a.7896.ffffb824@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d76a0812
    • P
      x86, mm: Unify kernel_physical_mapping_init() API · c1fd1b43
      Pekka Enberg 提交于
      This patch changes the 32-bit version of kernel_physical_mapping_init() to
      return the last mapped address like the 64-bit one so that we can unify the
      call-site in init_memory_mapping().
      
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      LKML-Reference: <alpine.DEB.2.00.1002241703570.1180@melkki.cs.helsinki.fi>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      c1fd1b43
    • T
      x86/PCI: Prevent mmconfig memory corruption · bb8d4133
      Thomas Gleixner 提交于
      commit ff097ddd (x86/PCI: MMCONFIG: manage pci_mmcfg_region as a
      list, not a table) introduced a nasty memory corruption when
      pci_mmcfg_list is empty.
      
      pci_mmcfg_check_end_bus_number() dereferences pci_mmcfg_list.prev even
      when the list is empty. The following write hits some variable near to
      pci_mmcfg_list.
      
      Further down a similar problem exists, where cfg->list.next is
      dereferenced unconditionally and a comparison with some variable near
      to pci_mmcfg_list happens.
      
      Add a check for the last element into the for_each_entry() loop and
      remove all the other crappy logic which is just a leftover of the old
      array based code which was replaced by the list conversion.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      bb8d4133
  4. 25 2月, 2010 3 次提交
    • S
      ftrace: Remove memory barriers from NMI code when not needed · 0c54dd34
      Steven Rostedt 提交于
      The code in stop_machine that modifies the kernel text has a bit
      of logic to handle the case of NMIs. stop_machine does not prevent
      NMIs from executing, and if an NMI were to trigger on another CPU
      as the modifying CPU is changing the NMI text, a GPF could result.
      
      To prevent the GPF, the NMI calls ftrace_nmi_enter() which may
      modify the code first, then any other NMIs will just change the
      text to the same content which will do no harm. The code that
      stop_machine called must wait for NMIs to finish while it changes
      each location in the kernel. That code may also change the text
      to what the NMI changed it to. The key is that the text will never
      change content while another CPU is executing it.
      
      To make the above work, the call to ftrace_nmi_enter() must also
      do a smp_mb() as well as atomic_inc().  But for applications like
      perf that require a high number of NMIs for profiling, this can have
      a dramatic effect on the system. Not only is it doing a full memory
      barrier on both nmi_enter() as well as nmi_exit() it is also
      modifying a global variable with an atomic operation. This kills
      performance on large SMP machines.
      
      Since the memory barriers are only needed when ftrace is in the
      process of modifying the text (which is seldom), this patch
      adds a "modifying_code" variable that gets set before stop machine
      is executed and cleared afterwards.
      
      The NMIs will check this variable and store it in a per CPU
      "save_modifying_code" variable that it will use to check if it
      needs to do the memory barriers and atomic dec on NMI exit.
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0c54dd34
    • I
      x86, mm: Allow highmem user page tables to be disabled at boot time · 14315592
      Ian Campbell 提交于
      Distros generally (I looked at Debian, RHEL5 and SLES11) seem to
      enable CONFIG_HIGHPTE for any x86 configuration which has highmem
      enabled. This means that the overhead applies even to machines which
      have a fairly modest amount of high memory and which therefore do not
      really benefit from allocating PTEs in high memory but still pay the
      price of the additional mapping operations.
      
      Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but
      no actual highptes being allocated there was a reduction in system
      time used from 59.737s to 55.9s.
      
      With CONFIG_HIGHPTE=y and highmem PTEs being allocated:
        Average Optimal load -j 4 Run (std deviation):
        Elapsed Time 175.396 (0.238914)
        User Time 515.983 (5.85019)
        System Time 59.737 (1.26727)
        Percent CPU 263.8 (71.6796)
        Context Switches 39989.7 (4672.64)
        Sleeps 42617.7 (246.307)
      
      With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated:
        Average Optimal load -j 4 Run (std deviation):
        Elapsed Time 174.278 (0.831968)
        User Time 515.659 (6.07012)
        System Time 55.9 (1.07799)
        Percent CPU 263.8 (71.266)
        Context Switches 39929.6 (4485.13)
        Sleeps 42583.7 (373.039)
      
      This patch allows the user to control the allocation of PTEs in
      highmem from the command line ("userpte=nohigh") but retains the
      status-quo as the default.
      
      It is possible that some simple heuristic could be developed which
      allows auto-tuning of this option however I don't have a sufficiently
      large machine available to me to perform any particularly meaningful
      experiments. We could probably handwave up an argument for a threshold
      at 16G of total RAM.
      
      Assuming 768M of lowmem we have 196608 potential lowmem PTE
      pages. Each page can map 2M of RAM in a PAE-enabled configuration,
      meaning a maximum of 384G of RAM could potentially be mapped using
      lowmem PTEs.
      
      Even allowing generous factor of 10 to account for other required
      lowmem allocations, generous slop to account for page sharing (which
      reduces the total amount of RAM mappable by a given number of PT
      pages) and other innacuracies in the estimations it would seem that
      even a 32G machine would not have a particularly pressing need for
      highmem PTEs. I think 32G could be considered to be at the upper bound
      of what might be sensible on a 32 bit machine (although I think in
      practice 64G is still supported).
      
      It's seems questionable if HIGHPTE is even a win for any amount of RAM
      you would sensibly run a 32 bit kernel on rather than going 64 bit.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      14315592
    • T
      x86: Do not reserve brk for DMI if it's not going to be used · e808bae2
      Thadeu Lima de Souza Cascardo 提交于
      This will save 64K bytes from memory when loading linux if DMI is
      disabled, which is good for embedded systems.
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
      LKML-Reference: <1265758732-19320-1-git-send-email-cascardo@holoscopio.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      e808bae2
  5. 24 2月, 2010 2 次提交