1. 05 2月, 2011 1 次提交
  2. 03 2月, 2011 1 次提交
    • S
      x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms · f7448548
      Suresh Siddha 提交于
      Markus Kohn ran into a hard hang regression on an acer aspire
      1310, when acpi is enabled. git bisect showed the following
      commit as the bad one that introduced the boot regression.
      
      	commit d0af9eed
      	Author: Suresh Siddha <suresh.b.siddha@intel.com>
      	Date:   Wed Aug 19 18:05:36 2009 -0700
      
      	    x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
      
      Because of the UP configuration of that platform,
      native_smp_prepare_cpus() bailed out (in smp_sanity_check())
      before doing the set_mtrr_aps_delayed_init()
      
      Further down the boot path, native_smp_cpus_done() will call the
      delayed MTRR initialization for the AP's (mtrr_aps_init()) with
      mtrr_aps_delayed_init not set. This resulted in the boot
      processor reprogramming its MTRR's to the values seen during the
      start of the OS boot. While this is not needed ideally, this
      shouldn't have caused any side-effects. This is because the
      reprogramming of MTRR's (set_mtrr_state() that gets called via
      set_mtrr()) will check if the live register contents are
      different from what is being asked to write and will do the actual
      write only if they are different.
      
      BP's mtrr state is read during the start of the OS boot and
      typically nothing would have changed when we ask to reprogram it
      on BP again because of the above scenario on an UP platform. So
      on a normal UP platform no reprogramming of BP MTRR MSR's
      happens and all is well.
      
      However, on this platform, bios seems to be modifying the fixed
      mtrr range registers between the start of OS boot and when we
      double check the live registers for reprogramming BP MTRR
      registers. And as the live registers are modified, we end up
      reprogramming the MTRR's to the state seen during the start of
      the OS boot.
      
      During ACPI initialization, something in the bios (probably smi
      handler?) don't like this fact and results in a hard lockup.
      
      We didn't see this boot hang issue on this platform before the
      commit d0af9eed, because only
      the AP's (if any) will program its MTRR's to the value that BP
      had at the start of the OS boot.
      
      Fix this issue by checking mtrr_aps_delayed_init before
      continuing further in the mtrr_aps_init(). Now, only AP's (if
      any) will program its MTRR's to the BP values during boot.
      
      Addresses https://bugzilla.novell.com/show_bug.cgi?id=623393
      
        [ By the way, this behavior of the bios modifying MTRR's after the start
          of the OS boot is not common and the kernel is not prepared to
          handle this situation well. Irrespective of this issue, during
          suspend/resume, linux kernel will try to reprogram the BP's MTRR values
          to the values seen during the start of the OS boot. So suspend/resume might
          be already broken on this platform for all linux kernel versions. ]
      Reported-and-bisected-by: NMarkus Kohn <jabber@gmx.org>
      Tested-by: NMarkus Kohn <jabber@gmx.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Thomas Renninger <trenn@novell.com>
      Cc: Rafael Wysocki <rjw@novell.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: stable@kernel.org # [v2.6.32+]
      LKML-Reference: <1296694975.4418.402.camel@sbsiddha-MOBL3.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f7448548
  3. 28 1月, 2011 1 次提交
    • S
      perf: Fix Pentium4 raw event validation · d038b12c
      Stephane Eranian 提交于
      This patch fixes some issues with raw event validation on
      Pentium 4 (Netburst) based processors.
      
      As I was testing libpfm4 Netburst support, I ran into two
      problems in the p4_validate_raw_event() function:
      
         - the shared field must be checked ONLY when HT is on
         - the binding to ESCR register was missing
      
      The second item was causing raw events to not be encoded
      correctly compared to generic PMU events.
      
      With this patch, I can now pass Netburst events to libpfm4
      examples and get meaningful results:
      
        $ task -e global_power_events:running:u  noploop 1
        noploop for 1 seconds
        3,206,304,898 global_power_events:running
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: peterz@infradead.org
      Cc: paulus@samba.org
      Cc: davem@davemloft.net
      Cc: fweisbec@gmail.com
      Cc: perfmon2-devel@lists.sf.net
      Cc: eranian@gmail.com
      Cc: robert.richter@amd.com
      Cc: acme@redhat.com
      Cc: gorcunov@gmail.com
      Cc: ming.m.lin@intel.com
      LKML-Reference: <4d3efb2f.1252d80a.1a80.ffffc83f@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d038b12c
  4. 25 1月, 2011 1 次提交
    • J
      x86-64: Don't use pointer to out-of-scope variable in dump_trace() · 2e5aa682
      Jesper Juhl 提交于
      In arch/x86/kernel/dumpstack_64.c::dump_trace() we have this code:
      
      ...
        		if (!stack) {
        			unsigned long dummy;
        			stack = &dummy;
        			if (task && task != current)
        				stack = (unsigned long *)task->thread.sp;
        		}
      
        		bp = stack_frame(task, regs);
        		/*
        		 * Print function call entries in all stacks, starting at the
        		 * current stack address. If the stacks consist of nested
        		 * exceptions
        		 */
        		tinfo = task_thread_info(task);
      
        		for (;;) {
        			char *id;
        			unsigned long *estack_end;
        			estack_end = in_exception_stack(cpu, (unsigned long)stack,
        							&used, &id);
      ...
      
      You'll notice that we assign to 'stack' the address of the variable
      'dummy' which is only in-scope inside the 'if (!stack)'. So when we later
      access stack (at the end of the above, and assuming we did not take the
      'if (task && task != current)' branch) we'll be using the address of a
      variable that is no longer in scope. I believe this patch is the proper
      fix, but I freely admit that I'm not 100% certain.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      LKML-Reference: <alpine.LNX.2.00.1101242232590.10252@swampdragon.chaosbits.net>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      2e5aa682
  5. 22 1月, 2011 1 次提交
    • B
      x86, hotplug: Fix powersavings with offlined cores on AMD · 93789b32
      Borislav Petkov 提交于
      ea530692 made a CPU use monitor/mwait
      when offline. This is not the optimal choice for AMD wrt to powersavings
      and we'd prefer our cores to halt (i.e. enter C1) instead. For this, the
      same selection whether to use monitor/mwait has to be used as when we
      select the idle routine for the machine.
      
      With this patch, offlining cores 1-5 on a X6 machine allows core0 to
      boost again.
      
      [ hpa: putting this in urgent since it is a (power) regression fix ]
      Reported-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Cc: stable@kernel.org # 37.x
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.hl>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <1295534572-10730-1-git-send-email-bp@amd64.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      93789b32
  6. 21 1月, 2011 1 次提交
  7. 20 1月, 2011 1 次提交
  8. 19 1月, 2011 1 次提交
  9. 18 1月, 2011 2 次提交
    • B
      x86: Clear irqstack thread_info · 7b698ea3
      Brian Gerst 提交于
      Mathias Merz reported that v2.6.37 failed to boot on his
      system.
      
      Make sure that the thread_info part of the irqstack is
      initialized to zeroes.
      Reported-and-Tested-by: NMatthias Merz <linux@merz-ka.de>
      Signed-off-by: NBrian Gerst <brgerst@gmail.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <AANLkTimyKXfJ1x8tgwrr1hYnNLrPfgE1NTe4z7L6tUDm@mail.gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7b698ea3
    • S
      x86: Make relocatable kernel work with new binutils · 86b1e8dd
      Shaohua Li 提交于
      The CONFIG_RELOCATABLE=y option is broken with new binutils, which will make
      boot panic.
      
      According to Lu Hongjiu, the affected binutils are from 2.20.51.0.12 to
      2.21.51.0.3, which are release since Oct 22 this year. At least ubuntu 10.10 is
      using such binutils. See:
      
          http://sourceware.org/bugzilla/show_bug.cgi?id=12327
      
      The reason of the boot panic is that we have 'jiffies = jiffies_64;' in
      vmlinux.lds.S. The jiffies isn't in any section. In kernel build, there is
      warning saying jiffies is an absolute address and can't be relocatable. At
      runtime, jiffies will have virtual address 0.
      
      Signed-off-by: Shaohua Li<shaohua.li@intel.com>
      Cc: Lu Hongjiu<hongjiu.lu@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      LKML-Reference: <1295312269.1949.725.camel@sli10-conroe>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86b1e8dd
  10. 15 1月, 2011 2 次提交
    • J
      x86, mrst: Set correct APB timer IRQ affinity for secondary cpu · 6550904d
      Jacob Pan 提交于
      Offlining the secondary CPU causes the timer irq affinity to be set to
      CPU 0. When the secondary CPU is back online again, the wrong irq
      affinity will be used.
      
      This patch ensures secondary per CPU timer always has the correct
      IRQ affinity when enabled.
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      LKML-Reference: <1294963604-18111-1-git-send-email-jacob.jun.pan@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@kernel.org> 2.6.37
      6550904d
    • J
      x86: tsc: Fix calibration refinement conditionals to avoid divide by zero · 62627bec
      John Stultz 提交于
      Konrad Wilk reported that the new delayed calibration crashes with a
      divide by zero on Xen. The reason is that Xen sets the pmtimer
      address, but reading from it returns 0xffffff. That results in the
      ref_start and ref_stop value being the same, so the delta is zero
      which causes the divide by zero later in the calculation.
      
      The conditional (!hpet && !ref_start && !ref_stop) which sanity checks
      the calibration reference values doesn't really make sense. If the
      refs are null, but hpet is on, we still want to break out.
      
      The div by zero would be possible to trigger by chance if both reads
      from the hardware provided the exact same value (due to hardware
      wrapping).
      
      So checking if both the ref values are the same should handle if we
      don't have hardware (both null) or if they are the same value (either by
      invalid hardware, or by chance), avoiding the div by zero issue.
      
      [ tglx: Applied the same fix to native_calibrate_tsc() where this
        	check was copied from ]
      Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1295024788-15619-1-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      62627bec
  11. 14 1月, 2011 5 次提交
  12. 13 1月, 2011 2 次提交
    • T
      cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle... · f77cfe4e
      Thomas Renninger 提交于
      cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
      
      Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
      events -> this patch fixes it and makes cpu_idle events throwing less complex.
      
      It also introduces cpu_idle events for all architectures which use
      the cpuidle subsystem, namely:
        - arch/arm/mach-at91/cpuidle.c
        - arch/arm/mach-davinci/cpuidle.c
        - arch/arm/mach-kirkwood/cpuidle.c
        - arch/arm/mach-omap2/cpuidle34xx.c
        - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
        - arch/x86/kernel/process.c (did throw events before, but was a mess)
        - drivers/idle/intel_idle.c (did throw events before)
      
      Convention should be:
      Fire cpu_idle events inside the current pm_idle function (not somewhere
      down the the callee tree) to keep things easy.
      
      Current possible pm_idle functions in X86:
      c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
      -> this is really easy is now.
      
      This affects userspace:
      The type field of the cpu_idle power event can now direclty get
      mapped to:
      /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
      instead of throwing very CPU/mwait specific values.
      This change is not visible for the intel_idle driver.
      For the acpi_idle driver it should only be visible if the vendor
      misses out C-states in his BIOS.
      Another (perf timechart) patch reads out cpuidle info of cpu_idle
      events from:
      /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
      to the correct C-/cpuidle state again, even if e.g. vendors miss
      out C-states in their BIOS and for example only export C1 and C3.
      -> everything is fine.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      CC: Robert Schoene <robert.schoene@tu-dresden.de>
      CC: Jean Pihet <j-pihet@ti.com>
      CC: Arjan van de Ven <arjan@linux.intel.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: linux-pm@lists.linux-foundation.org
      CC: linux-acpi@vger.kernel.org
      CC: linux-kernel@vger.kernel.org
      CC: linux-perf-users@vger.kernel.org
      CC: linux-omap@vger.kernel.org
      Signed-off-by: NLen Brown <len.brown@intel.com>
      f77cfe4e
    • T
      ACPI, intel_idle: Cleanup idle= internal variables · d1896049
      Thomas Renninger 提交于
      Having four variables for the same thing:
        idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides
      is rather confusing and unnecessary complex.
      
      if idle= boot param is passed, only set up one variable:
      boot_option_idle_overrides
      
      Introduces following functional changes/fixes:
        - intel_idle driver does not register if any idle=xy
          boot param is passed.
        - processor_idle.c will also not register a cpuidle driver
          and get active if idle=halt is passed.
          Before a cpuidle driver with one (C1, halt) state got registered
          Now the default_idle function will be used which finally uses
          the same idle call to enter sleep state (safe_halt()), but
          without registering a whole cpuidle driver.
      
      That means idle= param will always avoid cpuidle drivers to register
      with one exception (same behavior as before):
      idle=nomwait
      may still register acpi_idle cpuidle driver, but C1 will not use
      mwait, but hlt. This can be a workaround for IO based deeper sleep
      states where C1 mwait causes problems.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      cc: x86@kernel.org
      Signed-off-by: NLen Brown <len.brown@intel.com>
      d1896049
  13. 12 1月, 2011 8 次提交
  14. 11 1月, 2011 3 次提交
  15. 10 1月, 2011 1 次提交
    • P
      x86, lapic-timer: Increase the max_delta to 31 bits · 4aed89d6
      Pierre Tardy 提交于
      Latest atom socs(penwell) does not have hpet timer.
      
      As their local APIC timer is clocked at 400KHZ, and the current
      code limit their Initial Counter register to 23 bits, they
      cannot sleep more than 1.34 seconds which leads to ~2 spurious
      wakeup per second (1 per thread)
      
      These SOCs support 32bit timer so we change the max_delta to at
      least 31bits. So we can at least sleep for 300 seconds.
      
      We could not find any previous chip errata where lapic would
      only have 23 bit precision As powertop is suggesting to activate
      HPET to "sleep longer", this could mean this problem is already
      known.
      
      Problem is here since very first implementation of lapic timer
      as a clock event e9e2cdb4 [PATCH] clockevents: i386 drivers.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NPierre Tardy <pierre.tardy@intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Adrian Bunk <bunk@stusta.de>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Andi Kleen <ak@suse.de>
      LKML-Reference: <1294327409-19426-1-git-send-email-pierre.tardy@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4aed89d6
  16. 09 1月, 2011 2 次提交
    • C
      perf, x86: P4 PMU - Fix unflagged overflows handling · 047a3772
      Cyrill Gorcunov 提交于
      Don found that P4 PMU reads CCCR register instead of counter
      itself (in attempt to catch unflagged event) this makes P4
      NMI handler to consume all NMIs it observes. So the other
      NMI users such as kgdb simply have no chance to get NMI
      on their hands.
      
      Side note: at moment there is no way to run nmi-watchdog
      together with perf tool. This is because both 'perf top' and
      nmi-watchdog use same event. So while nmi-watchdog reserves
      one event/counter for own needs there is no room for perf tool
      left (there is a way to disable nmi-watchdog on boot of course).
      
      Ming has tested this patch with the following results
      
       | 1. watchdog disabled
       |
       | kgdb tests on boot OK
       | perf works OK
       |
       | 2. watchdog enabled, without patch perf-x86-p4-nmi-4
       |
       | kgdb tests on boot hang
       |
       | 3. watchdog enabled, without patch perf-x86-p4-nmi-4 and do not run kgdb
       | tests on boot
       |
       | "perf top" partialy works
       |   cpu-cycles            no
       |   instructions          yes
       |   cache-references      no
       |   cache-misses          no
       |   branch-instructions   no
       |   branch-misses         yes
       |   bus-cycles            no
       |
       | 4. watchdog enabled, with patch perf-x86-p4-nmi-4 applied
       |
       | kgdb tests on boot OK
       | perf does not work, NMI "Dazed and confused" messages show up
       |
      
      Which means we still have problems with p4 box due to 'unknown'
      nmi happens but at least it should fix kgdb test cases.
      Reported-by: NJason Wessel <jason.wessel@windriver.com>
      Reported-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NLin Ming <ming.m.lin@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <4D275E7E.3040903@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      047a3772
    • R
      x86: Fix sparse non-ANSI function warnings in smpboot.c · 91d88ce2
      Randy Dunlap 提交于
      Fix sparse warning for non-ANSI function declaration:
      
        arch/x86/kernel/smpboot.c:100:30: warning: non-ANSI function declaration of function 'cpu_hotplug_driver_lock'
        arch/x86/kernel/smpboot.c:105:32: warning: non-ANSI function declaration of function 'cpu_hotplug_driver_unlock'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <20110108195914.95d366ea.randy.dunlap@oracle.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      91d88ce2
  17. 08 1月, 2011 1 次提交
    • F
      x86: Save rbp in pt_regs on irq entry · 625dbc3b
      Frederic Weisbecker 提交于
      From the x86_64 low level interrupt handlers, the frame pointer is
      saved right after the partial pt_regs frame.
      
      rbp is not supposed to be part of the irq partial saved registers,
      but it only requires to extend the pt_regs frame by 8 bytes to
      do so, plus a tiny stack offset fixup on irq exit.
      
      This changes a bit the semantics or get_irq_entry() that is supposed
      to provide only the value of caller saved registers and the cpu
      saved frame. However it's a win for unwinders that can walk through
      stack frames on top of get_irq_regs() snapshots.
      
      A noticeable impact is that it makes perf events cpu-clock and
      task-clock events based callchains working on x86_64.
      
      Let's then save rbp into the irq pt_regs.
      
      As a result with:
      
      	perf record -e cpu-clock perf bench sched messaging
      	perf report --stdio
      
      Before:
          20.94%             perf  [kernel.kallsyms]        [k] lock_acquire
                             |
                             --- lock_acquire
                                |
                                |--44.01%-- __write_nocancel
                                |
                                |--43.18%-- __read
                                |
                                |--6.08%-- fork
                                |          create_worker
                                |
                                |--0.88%-- _dl_fixup
                                |
                                |--0.65%-- do_lookup_x
                                |
                                |--0.53%-- __GI___libc_read
                                 --4.67%-- [...]
      
      After:
          19.23%         perf  [kernel.kallsyms]    [k] __lock_acquire
                         |
                         --- __lock_acquire
                            |
                            |--97.74%-- lock_acquire
                            |          |
                            |          |--21.82%-- _raw_spin_lock
                            |          |          |
                            |          |          |--37.26%-- unix_stream_recvmsg
                            |          |          |          sock_aio_read
                            |          |          |          do_sync_read
                            |          |          |          vfs_read
                            |          |          |          sys_read
                            |          |          |          system_call
                            |          |          |          __read
                            |          |          |
                            |          |          |--24.09%-- unix_stream_sendmsg
                            |          |          |          sock_aio_write
                            |          |          |          do_sync_write
                            |          |          |          vfs_write
                            |          |          |          sys_write
                            |          |          |          system_call
                            |          |          |          __write_nocancel
      
      v2: Fix cfi annotations.
      Reported-by: NSoeren Sandmann Pedersen <sandmann@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      625dbc3b
  18. 07 1月, 2011 6 次提交