1. 09 6月, 2010 1 次提交
    • F
      x86: Unify dumpstack.h and stacktrace.h · c9cf4dbb
      Frederic Weisbecker 提交于
      arch/x86/include/asm/stacktrace.h and arch/x86/kernel/dumpstack.h
      declare headers of objects that deal with the same topic.
      Actually most of the files that include stacktrace.h also include
      dumpstack.h
      
      Although dumpstack.h seems more reserved for internals of stack
      traces, those are quite often needed to define specialized stack
      trace operations. And perf event arch headers are going to need
      access to such low level operations anyway. So don't continue to
      bother with dumpstack.h as it's not anymore about isolated deep
      internals.
      
      v2: fix struct stack_frame definition conflict in sysprof
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Soeren Sandmann <sandmann@daimi.au.dk>
      c9cf4dbb
  2. 28 5月, 2010 3 次提交
  3. 26 5月, 2010 1 次提交
    • K
      driver core: add devname module aliases to allow module on-demand auto-loading · 578454ff
      Kay Sievers 提交于
      This adds:
        alias: devname:<name>
      to some common kernel modules, which will allow the on-demand loading
      of the kernel module when the device node is accessed.
      
      Ideally all these modules would be compiled-in, but distros seems too
      much in love with their modularization that we need to cover the common
      cases with this new facility. It will allow us to remove a bunch of pretty
      useless init scripts and modprobes from init scripts.
      
      The static device node aliases will be carried in the module itself. The
      program depmod will extract this information to a file in the module directory:
        $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname
        # Device nodes to trigger on-demand module loading.
        microcode cpu/microcode c10:184
        fuse fuse c10:229
        ppp_generic ppp c108:0
        tun net/tun c10:200
        dm_mod mapper/control c10:235
      
      Udev will pick up the depmod created file on startup and create all the
      static device nodes which the kernel modules specify, so that these modules
      get automatically loaded when the device node is accessed:
        $ /sbin/udevd --debug
        ...
        static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
        static_dev_create_from_modules: mknod '/dev/fuse' c10:229
        static_dev_create_from_modules: mknod '/dev/ppp' c108:0
        static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
        static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
        udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
        udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666
      
      A few device nodes are switched to statically allocated numbers, to allow
      the static nodes to work. This might also useful for systems which still run
      a plain static /dev, which is completely unsafe to use with any dynamic minor
      numbers.
      
      Note:
      The devname aliases must be limited to the *common* and *single*instance*
      device nodes, like the misc devices, and never be used for conceptually limited
      systems like the loop devices, which should rather get fixed properly and get a
      control node for losetup to talk to, instead of creating a random number of
      device nodes in advance, regardless if they are ever used.
      
      This facility is to hide the mess distros are creating with too modualized
      kernels, and just to hide that these modules are not compiled-in, and not to
      paper-over broken concepts. Thanks! :)
      
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Ian Kent <raven@themaw.net>
      Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      578454ff
  4. 25 5月, 2010 1 次提交
    • P
      perf, trace: Fix !x86 build bug · 87f44bbc
      Peter Zijlstra 提交于
      Patch b7e2ecef (perf, trace: Optimize tracepoints by removing
      IRQ-disable from perf/tracepoint interaction) made the
      unfortunate mistake of assuming the world is x86 only, correct
      this.
      
      The problem was that perf_fetch_caller_regs() did
      local_save_flags() into regs->flags, and I re-used that to
      remove another local_save_flags(), forgetting !x86 doesn't have
      regs->flags.
      
      Do the reverse, remove the local_save_flags() from
      perf_fetch_caller_regs() and let the ftrace site do the
      local_save_flags() instead.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Cc: acme@redhat.com
      Cc: efault@gmx.de
      Cc: fweisbec@gmail.com
      Cc: rostedt@goodmis.org
      LKML-Reference: <1274778175.5882.623.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      87f44bbc
  5. 21 5月, 2010 8 次提交
    • J
      earlyprintk,vga,kdb: Fix \b and \r for earlyprintk=vga with kdb · 61eaf539
      Jason Wessel 提交于
      Allow kdb to work properly with with earlyprintk=vga by interpreting
      the backspace and carriage return output characters.  These
      interpretation of these characters is used for simple line editing
      provided in the kdb shell.
      
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      CC: x86@kernel.org
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      61eaf539
    • J
      x86,early dr regs,kgdb: Allow kernel debugger early dr register access · 0bb9fef9
      Jason Wessel 提交于
      If the kernel debugger was configured, attached and started with
      kgdbwait, the hardware breakpoint registers should get restored by the
      kgdb code which is managing the dr registers.
      
      CC: x86@kernel.org
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      0bb9fef9
    • J
      x86,kgdb: Implement early hardware breakpoint debugging · 031acd8c
      Jason Wessel 提交于
      It is not possible to use the hw_breakpoint.c API prior to mm_init(),
      but it is possible to use hardware breakpoints with the kernel
      debugger.
      
      Prior to smp_init() it is possible to simply write to the dr registers
      of the boot cpu directly.  This can be used up until the
      kgdb_arch_late() is invoked, at which point the standard hw_breakpoint.c
      API will get used.
      
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      031acd8c
    • J
      x86, kgdb, init: Add early and late debug states · 0b4b3827
      Jason Wessel 提交于
      The kernel debugger can operate well before mm_init(), but the x86
      hardware breakpoint code which uses the perf api requires that the
      kernel allocators are initialized.
      
      This means the kernel debug core needs to provide an optional arch
      specific call back to allow the initialization functions to run after
      the kernel has been further initialized.
      
      The kdb shell already had a similar restriction with an early
      initialization and late initialization.  The kdb_init() was moved into
      the debug core's version of the late init which is called
      dbg_late_init();
      
      CC: kgdb-bugreport@lists.sourceforge.net
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      0b4b3827
    • J
      x86, kgdb: early trap init for early debug · 29c84391
      Jan Kiszka 提交于
      Allow the x86 arch to have early exception processing for the purpose
      of debugging via the kgdb.
      Signed-off-by: NJan Kiszka <jan.kiszka@web.de>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      29c84391
    • J
      x86,kgdb: Add low level debug hook · f503b5ae
      Jason Wessel 提交于
      The only way the debugger can handle a trap in inside rcu_lock,
      notify_die, or atomic_notifier_call_chain without a triple fault is
      to have a low level "first opportunity handler" in the int3 exception
      handler.
      
      Generally this will be something the vast majority of folks will not
      need, but for those who need it, it is added as a kernel .config
      option called KGDB_LOW_LEVEL_TRAP.
      
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: H. Peter Anvin <hpa@zytor.com>
      CC: x86@kernel.org
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      f503b5ae
    • J
      kgdb: remove post_primary_code references · 98ec1878
      Jason Wessel 提交于
      Remove all the references to the kgdb_post_primary_code.  This
      function serves no useful purpose because you can obtain the same
      information from the "struct kgdb_state *ks" from with in the
      debugger, if for some reason you want the data.
      
      Also remove the unintentional duplicate assignment for ks->ex_vector.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      98ec1878
    • J
      kgdb: core changes to support kdb · dcc78711
      Jason Wessel 提交于
      These are the minimum changes to the kgdb core in order to enable an
      API to connect a new front end (kdb) to the debug core.
      
      This patch introduces the dbg_kdb_mode variable controls where the
      user level I/O is routed.  It will be routed to the gdbstub (kgdb) or
      to the kdb front end which is a simple shell available over the kgdboc
      connection.
      
      You can switch back and forth between kdb or the gdb stub mode of
      operation dynamically.  From gdb stub mode you can blindly type
      "$3#33", or from the kdb mode you can enter "kgdb" to switch to the
      gdb stub.
      
      The logic in the debug core depends on kdb to look for the typical gdb
      connection sequences and return immediately with KGDB_PASS_EVENT if a
      gdb serial command sequence is detected.  That should allow a
      reasonably seamless transition between kdb -> gdb without leaving the
      kernel exception state.  The two gdb serial queries that kdb is
      responsible for detecting are the "?" and "qSupported" packets.
      
      CC: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Acked-by: NMartin Hicks <mort@sgi.com>
      dcc78711
  6. 19 5月, 2010 7 次提交
    • G
      x86, paravirt: don't compute pvclock adjustments if we trust the tsc · 3a0d7256
      Glauber Costa 提交于
      If the HV told us we can fully trust the TSC, skip any
      correction
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Acked-by: NZachary Amsden <zamsden@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      3a0d7256
    • G
      x86: KVM guest: Try using new kvm clock msrs · 838815a7
      Glauber Costa 提交于
      We now added a new set of clock-related msrs in replacement of the old
      ones. In theory, we could just try to use them and get a return value
      indicating they do not exist, due to our use of kvm_write_msr_save.
      
      However, kvm clock registration happens very early, and if we ever
      try to write to a non-existant MSR, we raise a lethal #GP, since our
      idt handlers are not in place yet.
      
      So this patch tests for a cpuid feature exported by the host to
      decide which set of msrs are supported.
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Acked-by: NZachary Amsden <zamsden@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      838815a7
    • G
      x86, paravirt: Add a global synchronization point for pvclock · 489fb490
      Glauber Costa 提交于
      In recent stress tests, it was found that pvclock-based systems
      could seriously warp in smp systems. Using ingo's time-warp-test.c,
      I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
      (to be fair, it wasn't that bad in most of them). Investigating further, I
      found out that such warps were caused by the very offset-based calculation
      pvclock is based on.
      
      This happens even on some machines that report constant_tsc in its tsc flags,
      specially on multi-socket ones.
      
      Two reads of the same kernel timestamp at approx the same time, will likely
      have tsc timestamped in different occasions too. This means the delta we
      calculate is unpredictable at best, and can probably be smaller in a cpu
      that is legitimately reading clock in a forward ocasion.
      
      Some adjustments on the host could make this window less likely to happen,
      but still, it pretty much poses as an intrinsic problem of the mechanism.
      
      A while ago, I though about using a shared variable anyway, to hold clock
      last state, but gave up due to the high contention locking was likely
      to introduce, possibly rendering the thing useless on big machines. I argue,
      however, that locking is not necessary.
      
      We do a read-and-return sequence in pvclock, and between read and return,
      the global value can have changed. However, it can only have changed
      by means of an addition of a positive value. So if we detected that our
      clock timestamp is less than the current global, we know that we need to
      return a higher one, even though it is not exactly the one we compared to.
      
      OTOH, if we detect we're greater than the current time source, we atomically
      replace the value with our new readings. This do causes contention on big
      boxes (but big here means *BIG*), but it seems like a good trade off, since
      it provide us with a time source guaranteed to be stable wrt time warps.
      
      After this patch is applied, I don't see a single warp in time during 5 days
      of execution, in any of the machines I saw them before.
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Acked-by: NZachary Amsden <zamsden@redhat.com>
      CC: Jeremy Fitzhardinge <jeremy@goop.org>
      CC: Avi Kivity <avi@redhat.com>
      CC: Marcelo Tosatti <mtosatti@redhat.com>
      CC: Zachary Amsden <zamsden@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      489fb490
    • G
      x86, paravirt: Enable pvclock flags in vcpu_time_info structure · 424c32f1
      Glauber Costa 提交于
      This patch removes one padding byte and transform it into a flags
      field. New versions of guests using pvclock will query these flags
      upon each read.
      
      Flags, however, will only be interpreted when the guest decides to.
      It uses the pvclock_valid_flags function to signal that a specific
      set of flags should be taken into consideration. Which flags are valid
      are usually devised via HV negotiation.
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      CC: Jeremy Fitzhardinge <jeremy@goop.org>
      Acked-by: NZachary Amsden <zamsden@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      424c32f1
    • S
      KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) · cafd6659
      Shane Wang 提交于
      Per document, for feature control MSR:
      
        Bit 1 enables VMXON in SMX operation. If the bit is clear, execution
              of VMXON in SMX operation causes a general-protection exception.
        Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution
              of VMXON outside SMX operation causes a general-protection exception.
      
      This patch is to enable this kind of check with SMX for VMXON in KVM.
      Signed-off-by: NShane Wang <shane.wang@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      cafd6659
    • C
      perf, x86: P4_pmu_schedule_events -- use smp_processor_id instead of raw_ · 9d36dfcf
      Cyrill Gorcunov 提交于
      This snippet somehow escaped the commit:
      
       | commit 137351e0
       | Author: Cyrill Gorcunov <gorcunov@openvz.org>
       | Date:   Sat May 8 15:25:52 2010 +0400
       |
       |    x86, perf: P4 PMU -- protect sensible procedures from preemption
      
      so bring it eventually back. It helps to catch
      preemption issue (if there will be, rule of thumb --
      don't use raw_ if you can).
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100518212439.167259349@openvz.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9d36dfcf
    • C
      perf, x86: P4 PMU -- do a real check for ESCR address being in hash · 623aab89
      Cyrill Gorcunov 提交于
      To prevent from clashes in future code modifications
      do a real check for ESCR address being in hash. At
      moment the callers are known to pass sane values but
      better to be on a safe side.
      
      And comment fix.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      CC: Lin Ming <ming.m.lin@intel.com>
      CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100518212439.004503600@openvz.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      623aab89
  7. 18 5月, 2010 3 次提交
  8. 17 5月, 2010 1 次提交
  9. 15 5月, 2010 2 次提交
    • C
      x86, perf: P4 PMU - fix counters management logic · 1ff3d7d7
      Cyrill Gorcunov 提交于
      Jaswinder reported this #GP:
      
       |
       | Message from syslogd@ht at May 14 09:39:32 ...
       | kernel:[  314.908612] EIP: [<c100ccca>]
       | x86_perf_event_set_period+0x19d/0x1b2 SS:ESP 0068:edac3d70
       |
      
      Ming has narrowed it down to a comparision issue
      between arguments with different sizes and
      signs. As result event index reached a wrong
      value which in turn led to a GP fault.
      
      At the same time it was found that p4_next_cntr
      has broken logic and should return the counter
      index only if it was not yet borrowed for
      another event.
      Reported-by: NJaswinder Singh Rajput <jaswinderlinux@gmail.com>
      Reported-by: NLin Ming <ming.m.lin@intel.com>
      Bisected-by: NLin Ming <ming.m.lin@intel.com>
      Tested-by: NJaswinder Singh Rajput <jaswinderlinux@gmail.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100514190815.GG13509@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1ff3d7d7
    • F
      x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments · 7f284d3c
      Frank Arnold 提交于
      When running a quest kernel on xen we get:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
      IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
      PGD 0
      Oops: 0000 [#1] SMP
      last sysfs file:
      CPU 0
      Modules linked in:
      
      Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
      RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
      2ca/0x3df
      RSP: 0018:ffff880002203e08  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
      RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
      RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
      R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
      R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
      FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
      Stack:
       ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
      <0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
      <0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
      Call Trace:
       <IRQ>
       [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
       [<ffffffff81059140>] ? mod_timer+0x23/0x25
       [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
       [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
       [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
       [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
       [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
       [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
       [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
       [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
       [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
       <EOI>
       [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
       [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
       [<ffffffff810114eb>] ? default_idle+0x36/0x53
       [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
       [<ffffffff81423a9a>] rest_init+0x7e/0x80
       [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
       [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
       [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
      Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
       00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
       ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
      RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
       RSP <ffff880002203e08>
      CR2: 0000000000000038
      ---[ end trace a7919e7f17c0a726 ]---
      
      The L3 cache index disable feature of AMD CPUs has to be disabled if the
      kernel is running as guest on top of a hypervisor because northbridge
      devices are not available to the guest. Currently, this fixes a boot
      crash on top of Xen. In the future this will become an issue on KVM as
      well.
      
      Check if northbridge devices are present and do not enable the feature
      if there are none.
      
      [ hpa: backported to 2.6.34 ]
      Signed-off-by: NFrank Arnold <frank.arnold@amd.com>
      LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@kernel.org>
      7f284d3c
  10. 14 5月, 2010 1 次提交
  11. 13 5月, 2010 1 次提交
    • C
      x86, perf: P4 PMU -- use hash for p4_get_escr_idx() · 72001990
      Cyrill Gorcunov 提交于
      Linear search over all p4 MSRs should be fine if only
      we would not use it in events scheduling routine which
      is pretty time critical. Lets use hashes. It should speed
      scheduling up significantly.
      
      v2: Steven proposed to use more gentle approach than issue
          BUG on error, so we use WARN_ONCE now
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      LKML-Reference: <20100512174242.GA5190@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      72001990
  12. 11 5月, 2010 4 次提交
    • J
      x86/amd-iommu: Add amd_iommu=off command line option · a5235725
      Joerg Roedel 提交于
      This patch adds a command line option to tell the AMD IOMMU
      driver to not initialize any IOMMU it finds.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      a5235725
    • M
      kprobes/x86: Fix removed int3 checking order · 829e9245
      Masami Hiramatsu 提交于
      Fix kprobe/x86 to check removed int3 when failing to get kprobe
      from hlist. Since we have a time window between checking int3
      exists on probed address and getting kprobe on that address,
      we can have following scenario:
      
       -------
       CPU1                     CPU2
       hit int3
       check int3 exists
                                remove int3
                                remove kprobe from hlist
       get kprobe from hlist
       no kprobe->OOPS!
       -------
      
      This patch moves int3 checking if there is no kprobe on that
      address for fixing this problem as follows:
      
       ------
       CPU1                     CPU2
       hit int3
                                remove int3
                                remove kprobe from hlist
       get kprobe from hlist
       no kprobe->check int3 exists
                ->rollback&retry
       ------
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100427223348.2322.9112.stgit@localhost6.localdomain6>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      829e9245
    • A
      x86: Introduce 'struct fpu' and related API · 86603283
      Avi Kivity 提交于
      Currently all fpu state access is through tsk->thread.xstate.  Since we wish
      to generalize fpu access to non-task contexts, wrap the state in a new
      'struct fpu' and convert existing access to use an fpu API.
      
      Signal frame handlers are not converted to the API since they will remain
      task context only things.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      86603283
    • A
      x86: Eliminate TS_XSAVE · c9ad4882
      Avi Kivity 提交于
      The fpu code currently uses current->thread_info->status & TS_XSAVE as
      a way to distinguish between XSAVE capable processors and older processors.
      The decision is not really task specific; instead we use the task status to
      avoid a global memory reference - the value should be the same across all
      threads.
      
      Eliminate this tie-in into the task structure by using an alternative
      instruction keyed off the XSAVE cpu feature; this results in shorter and
      faster code, without introducing a global memory reference.
      
      [ hpa: in the future, this probably should use an asm jmp ]
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      c9ad4882
  13. 10 5月, 2010 1 次提交
    • H
      x86, hypervisor: add missing <linux/module.h> · 3998d095
      H. Peter Anvin 提交于
      EXPORT_SYMBOL() needs <linux/module.h> to be included; fixes modular
      builds of the VMware balloon driver, and any future modular drivers
      which depends on the hypervisor.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Hank Janssen <hjanssen@microsoft.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Ky Srinivasan <ksrinivasan@novell.com>
      Cc: Dmitry Torokhov <dtor@vmware.com>
      LKML-Reference: <4BE49778.6060800@zytor.com>
      3998d095
  14. 09 5月, 2010 1 次提交
    • H
      x86, hypervisor: Export the x86_hyper* symbols · 96f6e775
      H. Peter Anvin 提交于
      Export x86_hyper and the related specific structures, allowing for
      hypervisor identification by modules.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Hank Janssen <hjanssen@microsoft.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Ky Srinivasan <ksrinivasan@novell.com>
      Cc: Dmitry Torokhov <dtor@vmware.com>
      LKML-Reference: <4BE49778.6060800@zytor.com>
      96f6e775
  15. 08 5月, 2010 5 次提交
    • C
      x86, perf: P4 PMU -- check for proper event index in RAW events · c7993165
      Cyrill Gorcunov 提交于
      RAW events are special and we should be ready for user passing
      in insane event index values.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      LKML-Reference: <20100508112717.315897547@openvz.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c7993165
    • C
      x86, perf: P4 PMU -- Get rid of redundant check for array index · 3f51b711
      Cyrill Gorcunov 提交于
      The caller already has done such a check.
      And it was wrong anyway, it had to be '>=' rather than '>'
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      LKML-Reference: <20100508112717.130386882@openvz.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f51b711
    • C
      x86, perf: P4 PMU -- protect sensible procedures from preemption · 137351e0
      Cyrill Gorcunov 提交于
      Steven reported:
      
      |
      | I'm getting:
      |
      | Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727
      | Call Trace:
      |  [<ffffffff811c7565>] debug_smp_processor_id+0xd5/0xf0
      |  [<ffffffff81019874>] p4_hw_config+0x2b/0x15c
      |  [<ffffffff8107acbc>] ? trace_hardirqs_on_caller+0x12b/0x14f
      |  [<ffffffff81019143>] hw_perf_event_init+0x468/0x7be
      |  [<ffffffff810782fd>] ? debug_mutex_init+0x31/0x3c
      |  [<ffffffff810c68b2>] T.850+0x273/0x42e
      |  [<ffffffff810c6cab>] sys_perf_event_open+0x23e/0x3f1
      |  [<ffffffff81009e6a>] ? sysret_check+0x2e/0x69
      |  [<ffffffff81009e32>] system_call_fastpath+0x16/0x1b
      |
      | When running perf record in latest tip/perf/core
      |
      
      Due to the fact that p4 counters are shared between HT threads
      we synthetically divide the whole set of counters into two
      non-intersected subsets. And while we're "borrowing" counters
      from these subsets we should not be preempted (well, strictly
      speaking in p4_hw_config we just pre-set reference to the
      subset which allow to save some cycles in schedule routine
      if it happens on the same cpu). So use get_cpu/put_cpu pair.
      
      Also p4_pmu_schedule_events should use smp_processor_id rather
      than raw_ version. This allow us to catch up preemption issue
      (if there will ever be).
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      LKML-Reference: <20100508112716.963478928@openvz.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      137351e0
    • C
      x86, perf: P4 PMU -- configure predefined events · de902d96
      Cyrill Gorcunov 提交于
      If an event is not RAW we should not exit p4_hw_config
      early but call x86_setup_perfctr as well.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de902d96
    • H
      x86: Clean up the hypervisor layer · e08cae41
      H. Peter Anvin 提交于
      Clean up the hypervisor layer and the hypervisor drivers, using an ops
      structure instead of an enumeration with if statements.
      
      The identity of the hypervisor, if needed, can be tested by testing
      the pointer value in x86_hyper.
      
      The MS-HyperV private state is moved into a normal global variable
      (it's per-system state, not per-CPU state).  Being a normal bss
      variable, it will be left at all zero on non-HyperV platforms, and so
      can generally be tested for HyperV-specific features without
      additional qualification.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Acked-by: NGreg KH <greg@kroah.com>
      Cc: Hank Janssen <hjanssen@microsoft.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Ky Srinivasan <ksrinivasan@novell.com>
      LKML-Reference: <4BE49778.6060800@zytor.com>
      e08cae41