1. 10 6月, 2010 1 次提交
  2. 28 5月, 2010 3 次提交
  3. 26 5月, 2010 2 次提交
    • B
      x86, k8: Fix section mismatch for powernowk8_exit() · fe501f1e
      Borislav Petkov 提交于
      Fix the following warning:
      
      "WARNING: arch/x86/kernel/built-in.o(.exit.text+0x72):
      Section mismatch in reference from the function powernowk8_exit() to the variable .cpuinit.data:cpb_nb
      
      The function __exit powernowk8_exit() references a variable
      __cpuinitdata cpb_nb. This is often seen when error handling in the exit
      function uses functionality in the init path. The fix is often to remove
      the __cpuinitdata annotation of cpb_nb so it may be used outside an init
      section."
      
      Cc: <stable@kernel.org>
      Reported-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <20100525152858.GA24836@aftab>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      fe501f1e
    • K
      driver core: add devname module aliases to allow module on-demand auto-loading · 578454ff
      Kay Sievers 提交于
      This adds:
        alias: devname:<name>
      to some common kernel modules, which will allow the on-demand loading
      of the kernel module when the device node is accessed.
      
      Ideally all these modules would be compiled-in, but distros seems too
      much in love with their modularization that we need to cover the common
      cases with this new facility. It will allow us to remove a bunch of pretty
      useless init scripts and modprobes from init scripts.
      
      The static device node aliases will be carried in the module itself. The
      program depmod will extract this information to a file in the module directory:
        $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname
        # Device nodes to trigger on-demand module loading.
        microcode cpu/microcode c10:184
        fuse fuse c10:229
        ppp_generic ppp c108:0
        tun net/tun c10:200
        dm_mod mapper/control c10:235
      
      Udev will pick up the depmod created file on startup and create all the
      static device nodes which the kernel modules specify, so that these modules
      get automatically loaded when the device node is accessed:
        $ /sbin/udevd --debug
        ...
        static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
        static_dev_create_from_modules: mknod '/dev/fuse' c10:229
        static_dev_create_from_modules: mknod '/dev/ppp' c108:0
        static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
        static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
        udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
        udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666
      
      A few device nodes are switched to statically allocated numbers, to allow
      the static nodes to work. This might also useful for systems which still run
      a plain static /dev, which is completely unsafe to use with any dynamic minor
      numbers.
      
      Note:
      The devname aliases must be limited to the *common* and *single*instance*
      device nodes, like the misc devices, and never be used for conceptually limited
      systems like the loop devices, which should rather get fixed properly and get a
      control node for losetup to talk to, instead of creating a random number of
      device nodes in advance, regardless if they are ever used.
      
      This facility is to hide the mess distros are creating with too modualized
      kernels, and just to hide that these modules are not compiled-in, and not to
      paper-over broken concepts. Thanks! :)
      
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Ian Kent <raven@themaw.net>
      Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      578454ff
  4. 25 5月, 2010 4 次提交
    • P
      perf, trace: Fix !x86 build bug · 87f44bbc
      Peter Zijlstra 提交于
      Patch b7e2ecef (perf, trace: Optimize tracepoints by removing
      IRQ-disable from perf/tracepoint interaction) made the
      unfortunate mistake of assuming the world is x86 only, correct
      this.
      
      The problem was that perf_fetch_caller_regs() did
      local_save_flags() into regs->flags, and I re-used that to
      remove another local_save_flags(), forgetting !x86 doesn't have
      regs->flags.
      
      Do the reverse, remove the local_save_flags() from
      perf_fetch_caller_regs() and let the ftrace site do the
      local_save_flags() instead.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Cc: acme@redhat.com
      Cc: efault@gmx.de
      Cc: fweisbec@gmail.com
      Cc: rostedt@goodmis.org
      LKML-Reference: <1274778175.5882.623.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      87f44bbc
    • G
      x86, setup: Phoenix BIOS fixup is needed on Dell Inspiron Mini 1012 · 3d6e77a3
      Gabor Gombas 提交于
      The low-memory corruption checker triggers during suspend/resume, so we
      need to reserve the low 64k.  Don't be fooled that the BIOS identifies
      itself as "Dell Inc.", it's still Phoenix BIOS.
      
      [ hpa: I think we blacklist almost every BIOS in existence.  We should
      either change this to a whitelist or just make it unconditional. ]
      Signed-off-by: NGabor Gombas <gombasg@digikabel.hu>
      LKML-Reference: <201005241913.o4OJDIMM010877@imap1.linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@kernel.org>
      3d6e77a3
    • J
      x86: "nosmp" command line option should force the system into UP mode · 5f2eb550
      Jan Beulich 提交于
      Bits set in cpu_possible_mask prior to the execution of
      prefill_possible_map() (i.e.  when parsing ACPI or MPS tables) would
      prevent the SMP alternatives logic from switching to UP mode, plus
      unnecessary setup of per-CPU data for CPUs that can never come online.
      
      Additionally, without CONFIG_HOTPLUG_CPU disabled CPUs can never come
      online, and hence setting cpu_possible_mask bits for them is again a
      simple waste of resources.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      LKML-Reference: <201005241913.o4OJDH3Z010874@imap1.linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      5f2eb550
    • K
      x86, apic: ack all pending irqs when crashed/on kexec · 8c3ba8d0
      Kerstin Jonsson 提交于
      When the SMP kernel decides to crash_kexec() the local APICs may have
      pending interrupts in their vector tables.
      
      The setup routine for the local APIC has a deficient mechanism for
      clearing these interrupts, it only handles interrupts that has already
      been dispatched to the local core for servicing (the ISR register) safely,
      it doesn't consider lower prioritized queued interrupts stored in the IRR
      register.
      
      If you have more than one pending interrupt within the same 32 bit word in
      the LAPIC vector table registers you may find yourself entering the IO
      APIC setup with pending interrupts left in the LAPIC.  This is a situation
      for wich the IO APIC setup is not prepared.  Depending of what/which
      interrupt vector/vectors are stuck in the APIC tables your system may show
      various degrees of malfunctioning.  That was the reason why the
      check_timer() failed in our system, the timer interrupts was blocked by
      pending interrupts from the old kernel when routed trough the IO APIC.
      
      Additional comment from Jiri Bohac:
      ==============
      If this should go into stable release,
      I'd add some kind of limit on the number of iterations, just to be safe from
      hard to debug lock-ups:
      
      +if (loops++  > MAX_LOOPS) {
      +        printk("LAPIC pending clean-up")
      +        break;
      +}
       while (queued);
      
      with MAX_LOOPS something like 1E9 this would leave plenty of time for the
      pending IRQs to be cleared and would and still cause at most a second of delay
      if the loop were to lock-up for whatever reason.
      
      [trenn@suse.de:
      
      V2: Use tsc if avail to bail out after 1 sec due to possible virtual
          apic_read calls which may take rather long (suggested by: Avi Kivity
          <avi@redhat.com>) If no tsc is available bail out quickly after
          cpu_khz, if we broke out too early and still have irqs pending (which
          should never happen?) we still get a WARN_ON...
      
      V3: - Fixed indentation -> checkpatch clean
          - max_loops must be signed
      
      V4: - Fix typo, mixed up tsc and ntsc in first rdtscll() call
      
      V5: Adjust WARN_ON() condition to also catch error in cpu_has_tsc case]
      
      Cc: <jbohac@novell.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Kerstin Jonsson <kerstin.jonsson@ericsson.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Tested-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      LKML-Reference: <201005241913.o4OJDGWM010865@imap1.linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      8c3ba8d0
  5. 21 5月, 2010 8 次提交
    • J
      earlyprintk,vga,kdb: Fix \b and \r for earlyprintk=vga with kdb · 61eaf539
      Jason Wessel 提交于
      Allow kdb to work properly with with earlyprintk=vga by interpreting
      the backspace and carriage return output characters.  These
      interpretation of these characters is used for simple line editing
      provided in the kdb shell.
      
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      CC: x86@kernel.org
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      61eaf539
    • J
      x86,early dr regs,kgdb: Allow kernel debugger early dr register access · 0bb9fef9
      Jason Wessel 提交于
      If the kernel debugger was configured, attached and started with
      kgdbwait, the hardware breakpoint registers should get restored by the
      kgdb code which is managing the dr registers.
      
      CC: x86@kernel.org
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      0bb9fef9
    • J
      x86,kgdb: Implement early hardware breakpoint debugging · 031acd8c
      Jason Wessel 提交于
      It is not possible to use the hw_breakpoint.c API prior to mm_init(),
      but it is possible to use hardware breakpoints with the kernel
      debugger.
      
      Prior to smp_init() it is possible to simply write to the dr registers
      of the boot cpu directly.  This can be used up until the
      kgdb_arch_late() is invoked, at which point the standard hw_breakpoint.c
      API will get used.
      
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      031acd8c
    • J
      x86, kgdb, init: Add early and late debug states · 0b4b3827
      Jason Wessel 提交于
      The kernel debugger can operate well before mm_init(), but the x86
      hardware breakpoint code which uses the perf api requires that the
      kernel allocators are initialized.
      
      This means the kernel debug core needs to provide an optional arch
      specific call back to allow the initialization functions to run after
      the kernel has been further initialized.
      
      The kdb shell already had a similar restriction with an early
      initialization and late initialization.  The kdb_init() was moved into
      the debug core's version of the late init which is called
      dbg_late_init();
      
      CC: kgdb-bugreport@lists.sourceforge.net
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      0b4b3827
    • J
      x86, kgdb: early trap init for early debug · 29c84391
      Jan Kiszka 提交于
      Allow the x86 arch to have early exception processing for the purpose
      of debugging via the kgdb.
      Signed-off-by: NJan Kiszka <jan.kiszka@web.de>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      29c84391
    • J
      x86,kgdb: Add low level debug hook · f503b5ae
      Jason Wessel 提交于
      The only way the debugger can handle a trap in inside rcu_lock,
      notify_die, or atomic_notifier_call_chain without a triple fault is
      to have a low level "first opportunity handler" in the int3 exception
      handler.
      
      Generally this will be something the vast majority of folks will not
      need, but for those who need it, it is added as a kernel .config
      option called KGDB_LOW_LEVEL_TRAP.
      
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: H. Peter Anvin <hpa@zytor.com>
      CC: x86@kernel.org
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      f503b5ae
    • J
      kgdb: remove post_primary_code references · 98ec1878
      Jason Wessel 提交于
      Remove all the references to the kgdb_post_primary_code.  This
      function serves no useful purpose because you can obtain the same
      information from the "struct kgdb_state *ks" from with in the
      debugger, if for some reason you want the data.
      
      Also remove the unintentional duplicate assignment for ks->ex_vector.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      98ec1878
    • J
      kgdb: core changes to support kdb · dcc78711
      Jason Wessel 提交于
      These are the minimum changes to the kgdb core in order to enable an
      API to connect a new front end (kdb) to the debug core.
      
      This patch introduces the dbg_kdb_mode variable controls where the
      user level I/O is routed.  It will be routed to the gdbstub (kgdb) or
      to the kdb front end which is a simple shell available over the kgdboc
      connection.
      
      You can switch back and forth between kdb or the gdb stub mode of
      operation dynamically.  From gdb stub mode you can blindly type
      "$3#33", or from the kdb mode you can enter "kgdb" to switch to the
      gdb stub.
      
      The logic in the debug core depends on kdb to look for the typical gdb
      connection sequences and return immediately with KGDB_PASS_EVENT if a
      gdb serial command sequence is detected.  That should allow a
      reasonably seamless transition between kdb -> gdb without leaving the
      kernel exception state.  The two gdb serial queries that kdb is
      responsible for detecting are the "?" and "qSupported" packets.
      
      CC: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Acked-by: NMartin Hicks <mort@sgi.com>
      dcc78711
  6. 20 5月, 2010 2 次提交
    • H
      ACPI, APEI, Use ERST for persistent storage of MCE · 482908b4
      Huang Ying 提交于
      Traditionally, fatal MCE will cause Linux print error log to console
      then reboot. Because MCE registers will preserve their content after
      warm reboot, the hardware error can be logged to disk or network after
      reboot. But system may fail to warm reboot, then you may lose the
      hardware error log. ERST can help here. Through saving the hardware
      error log into flash via ERST before go panic, the hardware error log
      can be gotten from the flash after system boot successful again.
      
      The fatal MCE processing procedure with ERST involved is as follow:
      
      - Hardware detect error, MCE raised
      - MCE read MCE registers, check error severity (fatal), prepare error record
      - Write MCE error record into flash via ERST
      - Go panic, then trigger system reboot
      - System reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash
        for error record of previous boot via ERST, and output and clear
        them if available
      - /sbin/mcelog logs error records into disk or network
      
      ERST only accepts CPER record format, but there is no pre-defined CPER
      section can accommodate all information in struct mce, so a customized
      section type is defined to hold struct mce inside a CPER record as an
      error section.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      482908b4
    • H
      ACPI, APEI, Generic Hardware Error Source memory error support · d334a491
      Huang Ying 提交于
      Generic Hardware Error Source provides a way to report platform
      hardware errors (such as that from chipset). It works in so called
      "Firmware First" mode, that is, hardware errors are reported to
      firmware firstly, then reported to Linux by firmware. This way, some
      non-standard hardware error registers or non-standard hardware link
      can be checked by firmware to produce more valuable hardware error
      information for Linux.
      
      Now, only SCI notification type and memory errors are supported. More
      notification type and hardware error type will be added later. These
      memory errors are reported to user space through /dev/mcelog via
      faking a corrected Machine Check, so that the error memory page can be
      offlined by /sbin/mcelog if the error count for one page is beyond the
      threshold.
      
      On some machines, Machine Check can not report physical address for
      some corrected memory errors, but GHES can do that. So this simplified
      GHES is implemented firstly.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      d334a491
  7. 19 5月, 2010 7 次提交
    • G
      x86, paravirt: don't compute pvclock adjustments if we trust the tsc · 3a0d7256
      Glauber Costa 提交于
      If the HV told us we can fully trust the TSC, skip any
      correction
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Acked-by: NZachary Amsden <zamsden@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      3a0d7256
    • G
      x86: KVM guest: Try using new kvm clock msrs · 838815a7
      Glauber Costa 提交于
      We now added a new set of clock-related msrs in replacement of the old
      ones. In theory, we could just try to use them and get a return value
      indicating they do not exist, due to our use of kvm_write_msr_save.
      
      However, kvm clock registration happens very early, and if we ever
      try to write to a non-existant MSR, we raise a lethal #GP, since our
      idt handlers are not in place yet.
      
      So this patch tests for a cpuid feature exported by the host to
      decide which set of msrs are supported.
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Acked-by: NZachary Amsden <zamsden@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      838815a7
    • G
      x86, paravirt: Add a global synchronization point for pvclock · 489fb490
      Glauber Costa 提交于
      In recent stress tests, it was found that pvclock-based systems
      could seriously warp in smp systems. Using ingo's time-warp-test.c,
      I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
      (to be fair, it wasn't that bad in most of them). Investigating further, I
      found out that such warps were caused by the very offset-based calculation
      pvclock is based on.
      
      This happens even on some machines that report constant_tsc in its tsc flags,
      specially on multi-socket ones.
      
      Two reads of the same kernel timestamp at approx the same time, will likely
      have tsc timestamped in different occasions too. This means the delta we
      calculate is unpredictable at best, and can probably be smaller in a cpu
      that is legitimately reading clock in a forward ocasion.
      
      Some adjustments on the host could make this window less likely to happen,
      but still, it pretty much poses as an intrinsic problem of the mechanism.
      
      A while ago, I though about using a shared variable anyway, to hold clock
      last state, but gave up due to the high contention locking was likely
      to introduce, possibly rendering the thing useless on big machines. I argue,
      however, that locking is not necessary.
      
      We do a read-and-return sequence in pvclock, and between read and return,
      the global value can have changed. However, it can only have changed
      by means of an addition of a positive value. So if we detected that our
      clock timestamp is less than the current global, we know that we need to
      return a higher one, even though it is not exactly the one we compared to.
      
      OTOH, if we detect we're greater than the current time source, we atomically
      replace the value with our new readings. This do causes contention on big
      boxes (but big here means *BIG*), but it seems like a good trade off, since
      it provide us with a time source guaranteed to be stable wrt time warps.
      
      After this patch is applied, I don't see a single warp in time during 5 days
      of execution, in any of the machines I saw them before.
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Acked-by: NZachary Amsden <zamsden@redhat.com>
      CC: Jeremy Fitzhardinge <jeremy@goop.org>
      CC: Avi Kivity <avi@redhat.com>
      CC: Marcelo Tosatti <mtosatti@redhat.com>
      CC: Zachary Amsden <zamsden@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      489fb490
    • G
      x86, paravirt: Enable pvclock flags in vcpu_time_info structure · 424c32f1
      Glauber Costa 提交于
      This patch removes one padding byte and transform it into a flags
      field. New versions of guests using pvclock will query these flags
      upon each read.
      
      Flags, however, will only be interpreted when the guest decides to.
      It uses the pvclock_valid_flags function to signal that a specific
      set of flags should be taken into consideration. Which flags are valid
      are usually devised via HV negotiation.
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      CC: Jeremy Fitzhardinge <jeremy@goop.org>
      Acked-by: NZachary Amsden <zamsden@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      424c32f1
    • S
      KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) · cafd6659
      Shane Wang 提交于
      Per document, for feature control MSR:
      
        Bit 1 enables VMXON in SMX operation. If the bit is clear, execution
              of VMXON in SMX operation causes a general-protection exception.
        Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution
              of VMXON outside SMX operation causes a general-protection exception.
      
      This patch is to enable this kind of check with SMX for VMXON in KVM.
      Signed-off-by: NShane Wang <shane.wang@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      cafd6659
    • C
      perf, x86: P4_pmu_schedule_events -- use smp_processor_id instead of raw_ · 9d36dfcf
      Cyrill Gorcunov 提交于
      This snippet somehow escaped the commit:
      
       | commit 137351e0
       | Author: Cyrill Gorcunov <gorcunov@openvz.org>
       | Date:   Sat May 8 15:25:52 2010 +0400
       |
       |    x86, perf: P4 PMU -- protect sensible procedures from preemption
      
      so bring it eventually back. It helps to catch
      preemption issue (if there will be, rule of thumb --
      don't use raw_ if you can).
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100518212439.167259349@openvz.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9d36dfcf
    • C
      perf, x86: P4 PMU -- do a real check for ESCR address being in hash · 623aab89
      Cyrill Gorcunov 提交于
      To prevent from clashes in future code modifications
      do a real check for ESCR address being in hash. At
      moment the callers are known to pass sane values but
      better to be on a safe side.
      
      And comment fix.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      CC: Lin Ming <ming.m.lin@intel.com>
      CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100518212439.004503600@openvz.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      623aab89
  8. 18 5月, 2010 3 次提交
  9. 17 5月, 2010 1 次提交
  10. 15 5月, 2010 2 次提交
    • C
      x86, perf: P4 PMU - fix counters management logic · 1ff3d7d7
      Cyrill Gorcunov 提交于
      Jaswinder reported this #GP:
      
       |
       | Message from syslogd@ht at May 14 09:39:32 ...
       | kernel:[  314.908612] EIP: [<c100ccca>]
       | x86_perf_event_set_period+0x19d/0x1b2 SS:ESP 0068:edac3d70
       |
      
      Ming has narrowed it down to a comparision issue
      between arguments with different sizes and
      signs. As result event index reached a wrong
      value which in turn led to a GP fault.
      
      At the same time it was found that p4_next_cntr
      has broken logic and should return the counter
      index only if it was not yet borrowed for
      another event.
      Reported-by: NJaswinder Singh Rajput <jaswinderlinux@gmail.com>
      Reported-by: NLin Ming <ming.m.lin@intel.com>
      Bisected-by: NLin Ming <ming.m.lin@intel.com>
      Tested-by: NJaswinder Singh Rajput <jaswinderlinux@gmail.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100514190815.GG13509@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1ff3d7d7
    • F
      x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments · 7f284d3c
      Frank Arnold 提交于
      When running a quest kernel on xen we get:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
      IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
      PGD 0
      Oops: 0000 [#1] SMP
      last sysfs file:
      CPU 0
      Modules linked in:
      
      Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
      RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
      2ca/0x3df
      RSP: 0018:ffff880002203e08  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
      RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
      RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
      R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
      R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
      FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
      Stack:
       ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
      <0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
      <0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
      Call Trace:
       <IRQ>
       [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
       [<ffffffff81059140>] ? mod_timer+0x23/0x25
       [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
       [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
       [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
       [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
       [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
       [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
       [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
       [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
       [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
       <EOI>
       [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
       [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
       [<ffffffff810114eb>] ? default_idle+0x36/0x53
       [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
       [<ffffffff81423a9a>] rest_init+0x7e/0x80
       [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
       [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
       [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
      Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
       00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
       ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
      RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
       RSP <ffff880002203e08>
      CR2: 0000000000000038
      ---[ end trace a7919e7f17c0a726 ]---
      
      The L3 cache index disable feature of AMD CPUs has to be disabled if the
      kernel is running as guest on top of a hypervisor because northbridge
      devices are not available to the guest. Currently, this fixes a boot
      crash on top of Xen. In the future this will become an issue on KVM as
      well.
      
      Check if northbridge devices are present and do not enable the feature
      if there are none.
      
      [ hpa: backported to 2.6.34 ]
      Signed-off-by: NFrank Arnold <frank.arnold@amd.com>
      LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@kernel.org>
      7f284d3c
  11. 14 5月, 2010 1 次提交
  12. 13 5月, 2010 1 次提交
    • C
      x86, perf: P4 PMU -- use hash for p4_get_escr_idx() · 72001990
      Cyrill Gorcunov 提交于
      Linear search over all p4 MSRs should be fine if only
      we would not use it in events scheduling routine which
      is pretty time critical. Lets use hashes. It should speed
      scheduling up significantly.
      
      v2: Steven proposed to use more gentle approach than issue
          BUG on error, so we use WARN_ONCE now
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      LKML-Reference: <20100512174242.GA5190@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      72001990
  13. 12 5月, 2010 1 次提交
  14. 11 5月, 2010 4 次提交
    • J
      x86/amd-iommu: Add amd_iommu=off command line option · a5235725
      Joerg Roedel 提交于
      This patch adds a command line option to tell the AMD IOMMU
      driver to not initialize any IOMMU it finds.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      a5235725
    • M
      kprobes/x86: Fix removed int3 checking order · 829e9245
      Masami Hiramatsu 提交于
      Fix kprobe/x86 to check removed int3 when failing to get kprobe
      from hlist. Since we have a time window between checking int3
      exists on probed address and getting kprobe on that address,
      we can have following scenario:
      
       -------
       CPU1                     CPU2
       hit int3
       check int3 exists
                                remove int3
                                remove kprobe from hlist
       get kprobe from hlist
       no kprobe->OOPS!
       -------
      
      This patch moves int3 checking if there is no kprobe on that
      address for fixing this problem as follows:
      
       ------
       CPU1                     CPU2
       hit int3
                                remove int3
                                remove kprobe from hlist
       get kprobe from hlist
       no kprobe->check int3 exists
                ->rollback&retry
       ------
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100427223348.2322.9112.stgit@localhost6.localdomain6>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      829e9245
    • A
      x86: Introduce 'struct fpu' and related API · 86603283
      Avi Kivity 提交于
      Currently all fpu state access is through tsk->thread.xstate.  Since we wish
      to generalize fpu access to non-task contexts, wrap the state in a new
      'struct fpu' and convert existing access to use an fpu API.
      
      Signal frame handlers are not converted to the API since they will remain
      task context only things.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1273135546-29690-3-git-send-email-avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      86603283
    • A
      x86: Eliminate TS_XSAVE · c9ad4882
      Avi Kivity 提交于
      The fpu code currently uses current->thread_info->status & TS_XSAVE as
      a way to distinguish between XSAVE capable processors and older processors.
      The decision is not really task specific; instead we use the task status to
      avoid a global memory reference - the value should be the same across all
      threads.
      
      Eliminate this tie-in into the task structure by using an alternative
      instruction keyed off the XSAVE cpu feature; this results in shorter and
      faster code, without introducing a global memory reference.
      
      [ hpa: in the future, this probably should use an asm jmp ]
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1273135546-29690-2-git-send-email-avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      c9ad4882