提交 · 7eaeb34305dee26634f7c98ae62646da5cebe91d · openanolis / cloud-kernel

16 5月, 2013 18 次提交

clocksource: Provide unbind interface in sysfs · 7eaeb343

由 Thomas Gleixner 提交于 4月 25, 2013

With the module refcount held for the current clocksource there is no
way to unload the module.

Provide a sysfs interface which allows to unbind the clocksource. One
could argue that the clocksource override could be (ab)used to do so,
but the clocksource override cannot be used from the kernel itself,
while an unbind function can be used to programmatically check whether
a clocksource can be shutdown or not.

The unbind functionality uses the new skip current feature of
clocksource_select and verifies that a fallback clocksource has been
installed. If the clocksource which should be unbound is the current
clocksource and no fallback can be found, unbind returns -EBUSY.

This does not support the unbinding of a clocksource which is used as
the watchdog clocksource. No point in fostering crappy hardware.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143435.964218245@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

7eaeb343

clocksource: Split out user string input · 29b54078

由 Thomas Gleixner 提交于 4月 25, 2013

Split out the user string input for clocksource override. Preparatory
patch for unbind.

[ jstultz: Fix an off by one error ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143435.895851338@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

29b54078

clocksource: Allow clocksource select to skip current clocksource · f5a2e343

由 Thomas Gleixner 提交于 4月 25, 2013

Preparatory patch for clocksource unbind support.

Split out code from clocksource_select and modify it, so it skips the
current clocksource on request and tries to find a fallback
clocksource. Convert all existing users. No functional change.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143435.834965397@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f5a2e343

clocksource: Add module refcount · 09ac369c

由 Thomas Gleixner 提交于 4月 25, 2013

Add a module refcount, so the current clocksource cannot be removed
unconditionally.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143435.762417789@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

09ac369c

clocksource: Let timekeeping_notify return success/error · ba919d1c

由 Thomas Gleixner 提交于 4月 25, 2013

timekeeping_notify() can fail due cs->enable() failure. Though the
caller does not notice and happily keeps the wrong clocksource as the
current one.

Let the caller know about failure, so the current clocksource will be
shown correctly in sysfs.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143435.696321912@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

ba919d1c

clocksource: Always verify highres capability · 5d33b883

由 Thomas Gleixner 提交于 4月 25, 2013

If a clocksource has a (wrong) high rating, but can't be used as a
timebase for oneshot tick mode, it is unconditionally selected even
when the system is already in oneshot tick mode. This causes full
system failure.

Verify the clocksource selection against the oneshot mode.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130425143435.635040849@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

5d33b883

clocksource: apb_timer: Remove unsused function · fc1f7d56

由 Thomas Gleixner 提交于 4月 25, 2013

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Magnus Damm <magnus.damm@gmail.com>
Acked-by: NJamie Iles <jamie@jamieiles.com>
Link: http://lkml.kernel.org/r/20130425143435.558006195@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

fc1f7d56

Merge tag 'trace-fixes-v3.10-rc1' of... · c240a539

由 Linus Torvalds 提交于 5月 15, 2013

Merge tag 'trace-fixes-v3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
 "This includes a fix to a memory leak when adding filters to traces.

  Also, Masami Hiramatsu fixed up some minor bugs that were discovered
  by sparse."

* tag 'trace-fixes-v3.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing/kprobes: Make print_*probe_event static
  tracing/kprobes: Fix a sparse warning for incorrect type in assignment
  tracing/kprobes: Use rcu_dereference_raw for tp->files
  tracing: Fix leaks of filter preds

c240a539

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ae3b29e6

由 Linus Torvalds 提交于 5月 15, 2013

Pull x86 fixes from Thomas Gleixner:

 - Fix for a CPU hot-add deadlock in microcode update code

 - Fix for idle consolidation fallout

 - Documentation update for initial kernel direct mapping

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Add missing comments for initial kernel direct mapping
  x86/microcode: Add local mutex to fix physical CPU hot-add deadlock
  x86: Fix idle consolidation fallout

ae3b29e6

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 652df602

由 Linus Torvalds 提交于 5月 15, 2013

Pull perf fixes from Thomas Gleixner:

 - Fix for a task exit cleanup race caused by a missing a preempt
   disable

 - Cleanup of the event notification functions with a massive reduction
   of duplicated code

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Factor out auxiliary events notification
  perf: Fix EXIT event notification

652df602

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · cc51bf6e

由 Linus Torvalds 提交于 5月 15, 2013

Pull timer fixes from Thomas Gleixner:

 - Cure for not using zalloc in the first place, which leads to random
   crashes with CPUMASK_OFF_STACK.

 - Revert a user space visible change which broke udev

 - Add a missing cpu_online early return introduced by the new full
   dyntick conversions

 - Plug a long standing race in the timer wheel cpu hotplug code.
   Sigh...

 - Cleanup NOHZ per cpu data on cpu down to prevent stale data on cpu
   up.

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons
  timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE
  tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline
  tick: Cleanup NOHZ per cpu data on cpu down
  tick: Use zalloc_cpumask_var for allocating offstack cpumasks

cc51bf6e

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 37cae5e2

由 Linus Torvalds 提交于 5月 15, 2013

Pull core fixes from Thomas Gleixner:

 - Two fixlets for the fallout of the generic idle task conversion

 - Documentation update

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu/idle: Wrap cpu-idle poll mode within rcu_idle_enter/exit
  idle: Fix hlt/nohlt command-line handling in new generic idle
  kthread: Document ways of reducing OS jitter due to per-CPU kthreads

37cae5e2

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm · d21572c5

由 Linus Torvalds 提交于 5月 15, 2013

Pull ARM fixes from Russell King:
 "A small number of fixes for stuff from the last merge window, and in
  one case (IRQ time accounting) the previous merge window."

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7720/1: ARM v6/v7 cmpxchg64 shouldn't clear upper 32 bits of the old/new value
  ARM: 7715/1: MCPM: adapt to GIC changes after upstream merge
  ARM: 7714/1: mmc: mmci: Ensure return value of regulator_enable() is checked
  ARM: 7712/1: Remove trailing whitespace in arch/arm/Makefile
  ARM: 7711/1: dove: fix Dove cpu type from V7 to PJ4
  ARM: finally enable IRQ time accounting config

d21572c5

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 109c3c02

由 Linus Torvalds 提交于 5月 15, 2013

Pull Ceph fixes from Sage Weil:
 "Yes, this is a much larger pull than I would like after -rc1.  There
  are a few things included:

   - a few fixes for leaks and incorrect assertions
   - a few patches fixing behavior when mapped images are resized
   - handling for cloned/layered images that are flattened out from
     underneath the client

  The last bit was non-trivial, and there is some code movement and
  associated cleanup mixed in.  This was ready and was meant to go in
  last week but I missed the boat on Friday.  My only excuse is that I
  was waiting for an all clear from the testing and there were many
  other shiny things to distract me.

  Strictly speaking, handling the flatten case isn't a regression and
  could wait, so if you like we can try to pull the series apart, but
  Alex and I would much prefer to have it all in as it is a case real
  users will hit with 3.10."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (33 commits)
  rbd: re-submit flattened write request (part 2)
  rbd: re-submit write request for flattened clone
  rbd: re-submit read request for flattened clone
  rbd: detect when clone image is flattened
  rbd: reference count parent requests
  rbd: define parent image request routines
  rbd: define rbd_dev_unparent()
  rbd: don't release write request until necessary
  rbd: get parent info on refresh
  rbd: ignore zero-overlap parent
  rbd: support reading parent page data for writes
  rbd: fix parent request size assumption
  libceph: init sent and completed when starting
  rbd: kill rbd_img_request_get()
  rbd: only set up watch for mapped images
  rbd: set mapping read-only flag in rbd_add()
  rbd: support reading parent page data
  rbd: fix an incorrect assertion condition
  rbd: define rbd_dev_v2_header_info()
  rbd: get rid of trivial v1 header wrappers
  ...

109c3c02

tracing/kprobes: Make print_*probe_event static · b62fdd97

由 Masami Hiramatsu 提交于 5月 13, 2013

According to sparse warning, print_*probe_event static because
those functions are not directly called from outside.

Link: http://lkml.kernel.org/r/20130513115839.6545.83067.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

b62fdd97

tracing/kprobes: Fix a sparse warning for incorrect type in assignment · 3d1fc7b0

由 Masami Hiramatsu 提交于 5月 13, 2013

Fix a sparse warning about the rcu operated pointer is
defined without __rcu address space.

Link: http://lkml.kernel.org/r/20130513115837.6545.23322.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3d1fc7b0

tracing/kprobes: Use rcu_dereference_raw for tp->files · c02c7e65

由 Masami Hiramatsu 提交于 5月 13, 2013

Use rcu_dereference_raw() for accessing tp->files. Because the
write-side uses rcu_assign_pointer() for memory barrier,
the read-side also has to use rcu_dereference_raw() with
read memory barrier.

Link: http://lkml.kernel.org/r/20130513115834.6545.17022.stgit@mhiramat-M0-7522

Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tom Zanussi <tom.zanussi@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

c02c7e65

tracing: Fix leaks of filter preds · 60705c89

由 Steven Rostedt (Red Hat) 提交于 5月 14, 2013

Special preds are created when folding a series of preds that
can be done in serial. These are allocated in an ops field of
the pred structure. But they were never freed, causing memory
leaks.

This was discovered using the kmemleak checker:

unreferenced object 0xffff8800797fd5e0 (size 32):
  comm "swapper/0", pid 1, jiffies 4294690605 (age 104.608s)
  hex dump (first 32 bytes):
    00 00 01 00 03 00 05 00 07 00 09 00 0b 00 0d 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff814b52af>] kmemleak_alloc+0x73/0x98
    [<ffffffff8111ff84>] kmemleak_alloc_recursive.constprop.42+0x16/0x18
    [<ffffffff81120e68>] __kmalloc+0xd7/0x125
    [<ffffffff810d47eb>] kcalloc.constprop.24+0x2d/0x2f
    [<ffffffff810d4896>] fold_pred_tree_cb+0xa9/0xf4
    [<ffffffff810d3781>] walk_pred_tree+0x47/0xcc
    [<ffffffff810d5030>] replace_preds.isra.20+0x6f8/0x72f
    [<ffffffff810d50b5>] create_filter+0x4e/0x8b
    [<ffffffff81b1c30d>] ftrace_test_event_filter+0x5a/0x155
    [<ffffffff8100028d>] do_one_initcall+0xa0/0x137
    [<ffffffff81afbedf>] kernel_init_freeable+0x14d/0x1dc
    [<ffffffff814b24b7>] kernel_init+0xe/0xdb
    [<ffffffff814d539c>] ret_from_fork+0x7c/0xb0
    [<ffffffffffffffff>] 0xffffffffffffffff

Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: stable@vger.kernel.org # 2.6.39+
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

60705c89

15 5月, 2013 3 次提交

time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons · b4f711ee

由 John Stultz 提交于 4月 24, 2013

Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
which enables some minor compile time optimization to avoid
uncessary code in mostly the suspend/resume path could cause
problems for userland.

In particular, the dependency for RTC_HCTOSYS on
!ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
twice and simplifies suspend/resume, has the side effect
of causing the /sys/class/rtc/rtcN/hctosys flag to always be
zero, and this flag is commonly used by udev to setup the
/dev/rtc symlink to /dev/rtcN, which can cause pain for
older applications.

While the udev rules could use some work to be less fragile,
breaking userland should strongly be avoided. Additionally
the compile time optimizations are fairly minor, and the code
being optimized is likely to be reworked in the future, so
lets revert this change.
Reported-by: NKay Sievers <kay@vrfy.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: stable <stable@vger.kernel.org> #3.9
Cc: Feng Tang <feng.tang@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b4f711ee

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · b973425c

由 Linus Torvalds 提交于 5月 14, 2013

Pull ext4 update from Ted Ts'o:
 "Fixed regressions (two stability regressions and a performance
  regression) introduced during the 3.10-rc1 merge window.

  Also included is a bug fix relating to allocating blocks after
  resizing an ext3 file system when using the ext4 file system driver"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd,jbd2: fix oops in jbd2_journal_put_journal_head()
  ext4: revert "ext4: use io_end for multiple bios"
  ext4: limit group search loop for non-extent files
  ext4: fix fio regression

b973425c

Merge branch 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 7fb30d2b

由 Linus Torvalds 提交于 5月 14, 2013

Pull workqueue fix from Tejun Heo:
 "A fix for a workqueue_congested() regression that broke fscache"

* 'for-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: workqueue_congested() shouldn't translate WORK_CPU_UNBOUND into node number

7fb30d2b

14 5月, 2013 19 次提交

timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE · 42a5cf46

由 Tirupathi Reddy 提交于 5月 14, 2013

An inactive timer's base can refer to a offline cpu's base.

In the current code, cpu_base's lock is blindly reinitialized each
time a CPU is brought up. If a CPU is brought online during the period
that another thread is trying to modify an inactive timer on that CPU
with holding its timer base lock, then the lock will be reinitialized
under its feet. This leads to following SPIN_BUG().

<0> BUG: spinlock already unlocked on CPU#3, kworker/u:3/1466
<0> lock: 0xe3ebe000, .magic: dead4ead, .owner: kworker/u:3/1466, .owner_cpu: 1
<4> [<c0013dc4>] (unwind_backtrace+0x0/0x11c) from [<c026e794>] (do_raw_spin_unlock+0x40/0xcc)
<4> [<c026e794>] (do_raw_spin_unlock+0x40/0xcc) from [<c076c160>] (_raw_spin_unlock+0x8/0x30)
<4> [<c076c160>] (_raw_spin_unlock+0x8/0x30) from [<c009b858>] (mod_timer+0x294/0x310)
<4> [<c009b858>] (mod_timer+0x294/0x310) from [<c00a5e04>] (queue_delayed_work_on+0x104/0x120)
<4> [<c00a5e04>] (queue_delayed_work_on+0x104/0x120) from [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c)
<4> [<c04eae00>] (sdhci_msm_bus_voting+0x88/0x9c) from [<c04d8780>] (sdhci_disable+0x40/0x48)
<4> [<c04d8780>] (sdhci_disable+0x40/0x48) from [<c04bf300>] (mmc_release_host+0x4c/0xb0)
<4> [<c04bf300>] (mmc_release_host+0x4c/0xb0) from [<c04c7aac>] (mmc_sd_detect+0x90/0xfc)
<4> [<c04c7aac>] (mmc_sd_detect+0x90/0xfc) from [<c04c2504>] (mmc_rescan+0x7c/0x2c4)
<4> [<c04c2504>] (mmc_rescan+0x7c/0x2c4) from [<c00a6a7c>] (process_one_work+0x27c/0x484)
<4> [<c00a6a7c>] (process_one_work+0x27c/0x484) from [<c00a6e94>] (worker_thread+0x210/0x3b0)
<4> [<c00a6e94>] (worker_thread+0x210/0x3b0) from [<c00aad9c>] (kthread+0x80/0x8c)
<4> [<c00aad9c>] (kthread+0x80/0x8c) from [<c000ea80>] (kernel_thread_exit+0x0/0x8)

As an example, this particular crash occurred when CPU #3 is executing
mod_timer() on an inactive timer whose base is refered to offlined CPU
#2.  The code locked the timer_base corresponding to CPU #2. Before it
could proceed, CPU #2 came online and reinitialized the spinlock
corresponding to its base. Thus now CPU #3 held a lock which was
reinitialized. When CPU #3 finally ended up unlocking the old cpu_base
corresponding to CPU #2, we hit the above SPIN_BUG().

CPU #0		CPU #3				       CPU #2
------		-------				       -------
.....		 ......				      <Offline>
		mod_timer()
		 lock_timer_base
		   spin_lock_irqsave(&base->lock)

cpu_up(2)	 .....				        ......
							init_timers_cpu()
....		 .....				    	spin_lock_init(&base->lock)
.....		   spin_unlock_irqrestore(&base->lock)  ......
		   <spin_bug>

Allocation of per_cpu timer vector bases is done only once under
"tvec_base_done[]" check. In the current code, spinlock_initialization
of base->lock isn't under this check. When a CPU is up each time the
base lock is reinitialized. Move base spinlock initialization under
the check.
Signed-off-by: NTirupathi Reddy <tirupath@codeaurora.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1368520142-4136-1-git-send-email-tirupath@codeaurora.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

42a5cf46

rcu/idle: Wrap cpu-idle poll mode within rcu_idle_enter/exit · b47430d3

由 Srivatsa S. Bhat 提交于 5月 14, 2013

Bjørn Mork reported the following warning when running powertop.

[   49.289034] ------------[ cut here ]------------
[   49.289055] WARNING: at kernel/rcutree.c:502 rcu_eqs_exit_common.isra.48+0x3d/0x125()
[   49.289244] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.0-bisect-rcu-warn+ #107
[   49.289251]  ffffffff8157d8c8 ffffffff81801e28 ffffffff8137e4e3 ffffffff81801e68
[   49.289260]  ffffffff8103094f ffffffff81801e68 0000000000000000 ffff88023afcd9b0
[   49.289268]  0000000000000000 0140000000000000 ffff88023bee7700 ffffffff81801e78
[   49.289276] Call Trace:
[   49.289285]  [<ffffffff8137e4e3>] dump_stack+0x19/0x1b
[   49.289293]  [<ffffffff8103094f>] warn_slowpath_common+0x62/0x7b
[   49.289300]  [<ffffffff8103097d>] warn_slowpath_null+0x15/0x17
[   49.289306]  [<ffffffff810a9006>] rcu_eqs_exit_common.isra.48+0x3d/0x125
[   49.289314]  [<ffffffff81079b49>] ? trace_hardirqs_off_caller+0x37/0xa6
[   49.289320]  [<ffffffff810a9692>] rcu_idle_exit+0x85/0xa8
[   49.289327]  [<ffffffff8107076e>] trace_cpu_idle_rcuidle+0xae/0xff
[   49.289334]  [<ffffffff810708b1>] cpu_startup_entry+0x72/0x115
[   49.289341]  [<ffffffff813689e5>] rest_init+0x149/0x150
[   49.289347]  [<ffffffff8136889c>] ? csum_partial_copy_generic+0x16c/0x16c
[   49.289355]  [<ffffffff81a82d34>] start_kernel+0x3f0/0x3fd
[   49.289362]  [<ffffffff81a8274c>] ? repair_env_string+0x5a/0x5a
[   49.289368]  [<ffffffff81a82481>] x86_64_start_reservations+0x2a/0x2c
[   49.289375]  [<ffffffff81a82550>] x86_64_start_kernel+0xcd/0xd1
[   49.289379] ---[ end trace 07a1cc95e29e9036 ]---

The warning is that 'rdtp->dynticks' has an unexpected value, which roughly
translates to - the calls to rcu_idle_enter() and rcu_idle_exit() were not
made in the correct order, or otherwise messed up.

And Bjørn's painstaking debugging indicated that this happens when the idle
loop enters the poll mode. Looking at the poll function cpu_idle_poll(), and
the implementation of trace_cpu_idle_rcuidle(), the problem becomes very clear:
cpu_idle_poll() lacks calls to rcu_idle_enter/exit(), and trace_cpu_idle_rcuidle()
calls them in the reverse order - first rcu_idle_exit(), and then rcu_idle_enter().
Hence the even/odd alternative sequencing of rdtp->dynticks goes for a toss.

And powertop readily triggers this because powertop uses the idle-tracing
infrastructure extensively.

So, to fix this, wrap the code in cpu_idle_poll() within rcu_idle_enter/exit(),
so that it blends properly with the calls inside trace_cpu_idle_rcuidle() and
thus get the function ordering right.
Reported-and-tested-by: NBjørn Mork <bjorn@mork.no>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/519169BF.4080208@linux.vnet.ibm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b47430d3

tick: Don't invoke tick_nohz_stop_sched_tick() if the cpu is offline · f7ea0fd6

由 Thomas Gleixner 提交于 5月 13, 2013

commit 5b39939a (nohz: Move ts->idle_calls incrementation into strict
idle logic) moved code out of tick_nohz_stop_sched_tick() and missed
to bail out when the cpu is offline. That's causing subsequent
failures as an offline CPU is supposed to die and not to fiddle with
nohz magic.

Return false in can_stop_idle_tick() if the cpu is offline.
Reported-and-tested-by: NJiri Kosina <jkosina@suse.cz>
Reported-and-tested-by: NPrarit Bhargava <prarit@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305132138160.2863@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f7ea0fd6

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · a2c7a54f

由 Linus Torvalds 提交于 5月 14, 2013

Pull powerpc fixes from Benjamin Herrenschmidt:
 "This is mostly bug fixes (some of them regressions, some of them I
  deemed worth merging now) along with some patches from Li Zhong
  hooking up the new context tracking stuff (for the new full NO_HZ)"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
  powerpc: Set show_unhandled_signals to 1 by default
  powerpc/perf: Fix setting of "to" addresses for BHRB
  powerpc/pmu: Fix order of interpreting BHRB target entries
  powerpc/perf: Move BHRB code into CONFIG_PPC64 region
  powerpc: select HAVE_CONTEXT_TRACKING for pSeries
  powerpc: Use the new schedule_user API on userspace preemption
  powerpc: Exit user context on notify resume
  powerpc: Exception hooks for context tracking subsystem
  powerpc: Syscall hooks for context tracking subsystem
  powerpc/booke64: Fix kernel hangs at kernel_dbg_exc
  powerpc: Fix irq_set_affinity() return values
  powerpc: Provide __bswapdi2
  powerpc/powernv: Fix starting of secondary CPUs on OPALv2 and v3
  powerpc/powernv: Detect OPAL v3 API version
  powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again
  powerpc: Make CONFIG_RTAS_PROC depend on CONFIG_PROC_FS
  powerpc: Bring all threads online prior to migration/hibernation
  powerpc/rtas_flash: Fix validate_flash buffer overflow issue
  powerpc/kexec: Fix kexec when using VMX optimised memcpy
  powerpc: Fix build errors STRICT_MM_TYPECHECKS
  ...

a2c7a54f

B
powerpc: Set show_unhandled_signals to 1 by default · e34166ad
由 Benjamin Herrenschmidt 提交于 5月 14, 2013
```
Just like other architectures
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
e34166ad

powerpc/perf: Fix setting of "to" addresses for BHRB · 69123184

由 Michael Neuling 提交于 5月 13, 2013

Currently we only set the "to" address in the branch stack when the CPU
explicitly gives us a value. Unfortunately it only does this for XL form
branches (eg blr, bctr, bctar) and not I and B form branches (eg b, bc).

Fortunately if we read the instruction from memory we can extract the offset of
a branch and calculate the target address.

This adds a function power_pmu_bhrb_to() to calculate the target/to address of
the corresponding I and B form branches. It handles branches in both user and
kernel spaces. It also plumbs this into the perf brhb reading code.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

69123184

powerpc/pmu: Fix order of interpreting BHRB target entries · 506e70d1

由 Michael Neuling 提交于 5月 13, 2013

The current Branch History Rolling Buffer (BHRB) code misinterprets the order
of entries in the hardware buffer. It assumes that a branch target address
will be read _after_ its corresponding branch. In reality the branch target
comes before (lower mfbhrb entry) it's corresponding branch.

This is a rewrite of the code to take this into account.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

506e70d1

powerpc/perf: Move BHRB code into CONFIG_PPC64 region · d52f2dc4

由 Michael Neuling 提交于 5月 13, 2013

The new Branch History Rolling buffer (BHRB) code is only useful on 64bit
processors, so move it into the #ifdef CONFIG_PPC64 region.

This avoids code bloat on 32bit systems.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d52f2dc4

powerpc: select HAVE_CONTEXT_TRACKING for pSeries · a1797b2f

由 Li Zhong 提交于 5月 13, 2013

Start context tracking support from pSeries.
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a1797b2f

powerpc: Use the new schedule_user API on userspace preemption · 5d1c5745

由 Li Zhong 提交于 5月 13, 2013

This patch corresponds to
[PATCH] x86: Use the new schedule_user API on userspace preemption
  commit 0430499cSigned-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

5d1c5745

powerpc: Exit user context on notify resume · 106ed886

由 Li Zhong 提交于 5月 13, 2013

This patch allows RCU usage in do_notify_resume, e.g. signal handling.
It corresponds to
[PATCH] x86: Exit RCU extended QS on notify resume
  commit edf55fdaSigned-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

106ed886

powerpc: Exception hooks for context tracking subsystem · ba12eede

由 Li Zhong 提交于 5月 13, 2013

This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.

This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
  commit 6ba3c97a

But after the exception handling moved to generic code, and some changes in
following two commits:
56dd9470
  context_tracking: Move exception handling to generic code
6c1e0256
  context_tracking: Restore correct previous context state on exception exit

it is able for exception hooks to use the generic code above instead of a
redundant arch implementation.
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ba12eede

powerpc: Syscall hooks for context tracking subsystem · 22ecbe8d

由 Li Zhong 提交于 5月 13, 2013

This is the syscall slow path hooks for context tracking subsystem,
corresponding to
[PATCH] x86: Syscall hooks for userspace RCU extended QS
  commit bf5a3c13

TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
better for it to be in the same 16 bits with others in the group, so in the
asm code, andi. with this group could work.
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

22ecbe8d

powerpc/booke64: Fix kernel hangs at kernel_dbg_exc · 6cecf76b

由 Scott Wood 提交于 5月 13, 2013

MSR_DE is not cleared on entry to the kernel, and we don't clear it
explicitly outside of debug code.  If we have MSR_DE set in
prime_debug_regs(), and the new thread has events enabled in DBCR0
(e.g.  ICMP is set in thread->dbsr0, even though it was cleared in the
real DBCR0 when the thread got scheduled out), we'll end up taking a
debug exception in the kernel when DBCR0 is loaded.  DSRR0 will not
point to an exception vector, and the kernel ends up hanging at
kernel_dbg_exc.  Fix this by always clearing MSR_DE when we load new
debug state.

Another observed source of kernel_dbg_exc hangs is with the branch
taken event.  If this event is active, but we take a non-debug trap
(e.g. a TLB miss or an asynchronous interrupt) before the next branch.
We end up taking a branch-taken debug exception on the initial branch
instruction of the exception vector, but because the debug exception is
DBSR_BT rather than DBSR_IC we branch to kernel_dbg_exc before even
checking the DSRR0 address.  Fix this by checking for DBSR_BT as well
as DBSR_IC, which is what 32-bit does and what the comments suggest was
intended in the 64-bit code as well.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6cecf76b

powerpc: Fix irq_set_affinity() return values · dcb615ae

由 Alexander Gordeev 提交于 5月 13, 2013

Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

dcb615ae

powerpc: Provide __bswapdi2 · ca9d7aea

由 David Woodhouse 提交于 5月 13, 2013

Some versions of GCC apparently expect this to be provided by libgcc.

Updates from Mikey to fix 32 bit version and adding "r" to registers.
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ca9d7aea

powerpc/powernv: Fix starting of secondary CPUs on OPALv2 and v3 · b2b48584

由 Benjamin Herrenschmidt 提交于 5月 14, 2013

The current code fails to handle kexec on OPALv2. This fixes it
and adds code to improve the situation on OPALv3 where we can
query the CPU status from the firmware and decide what to do
based on that.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b2b48584

powerpc/powernv: Detect OPAL v3 API version · 75b93da4

由 Benjamin Herrenschmidt 提交于 5月 14, 2013

Future firmwares will support that new version
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

75b93da4

powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again · af945cf4

由 Li Zhong 提交于 5月 06, 2013

Saw this warning again, and this time from the ret_from_fork path.

It seems we could clear the back chain earlier in copy_thread(), which
could cover both path, and also fix potential lockdep usage in
schedule_tail(), or exception occurred before we clear the back chain.
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

af945cf4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功