提交 · 434d42cfd05a7cc452457a81d2029540cba12150 · openanolis / cloud-kernel

24 5月, 2011 1 次提交

kernel/watchdog.c: Use proper ANSI C prototypes · 5f2e8e2b

由 Linus Torvalds 提交于 5月 23, 2011

We try to enforce it by using -Wstrict-prototypes, but apparently they
sometimes get through. Introduced by 4eec42f3 ("watchdog: Change
the default timeout and configure nmi watchdog period based").
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5f2e8e2b

23 5月, 2011 7 次提交

hrtimers: Reorder clock bases · 68fa61c0

由 Thomas Gleixner 提交于 5月 20, 2011

The ordering of the clock bases is historical due to the
CLOCK_REALTIME and CLOCK_MONOTONIC constants. Now the hrtimer bases
have their own enumeration due to the gap between CLOCK_MONOTONIC and
CLOCK_BOOTTIME. So we can be more clever as most timers end up on the
CLOCK_MONOTONIC base due to the virtue of POSIX declaring that
relative CLOCK_REALTIME timers are not affected by time changes. In
desktop environments this is slowly changing as applications switch to
absolute timers, but I've observed empty CLOCK_REALTIME bases often
enough. There is no performance penalty or overhead when
CLOCK_REALTIME timers are active, but in case they are not we don't
skip over a full cache line.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NPeter Zijlstra <peterz@infradead.org>

68fa61c0

hrtimers: Avoid touching inactive timer bases · ab8177bc

由 Thomas Gleixner 提交于 5月 20, 2011

Instead of iterating over all possible timer bases avoid it by marking
the active bases in the cpu base.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NPeter Zijlstra <peterz@infradead.org>

ab8177bc

timerfd: Manage cancelable timers in timerfd · 9ec26907

由 Thomas Gleixner 提交于 5月 20, 2011

Peter is concerned about the extra scan of CLOCK_REALTIME_COS in the
timer interrupt. Yes, I did not think about it, because the solution
was so elegant. I didn't like the extra list in timerfd when it was
proposed some time ago, but with a rcu based list the list walk it's
less horrible than the original global lock, which was held over the
list iteration.
Requested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NPeter Zijlstra <peterz@infradead.org>

9ec26907

watchdog: Change the default timeout and configure nmi watchdog period based on watchdog_thresh · 4eec42f3

由 Mandeep Singh Baines 提交于 5月 22, 2011

Before the conversion of the NMI watchdog to perf event, the
watchdog timeout was 5 seconds. Now it is 60 seconds. For my
particular application, netbooks, 5 seconds was a better
timeout. With a short timeout, we catch faults earlier and are
able to send back a panic. With a 60 second timeout, the user is
unlikely to wait and will instead hit the power button, causing
us to lose the panic info.

This change configures the NMI period to watchdog_thresh and
sets the softlockup_thresh to watchdog_thresh * 2. In addition,
watchdog_thresh was reduced to 10 seconds as suggested by Ingo
Molnar.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1306127423-3347-4-git-send-email-msb@chromium.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
LKML-Reference: <20110517071642.GF22305@elte.hu>

4eec42f3

watchdog: Disable watchdog when thresh is zero · 586692a5

由 Mandeep Singh Baines 提交于 5月 22, 2011

This restores the previous behavior of softlock_thresh.

Currently, setting watchdog_thresh to zero causes the watchdog
kthreads to consume a lot of CPU.

In addition, the logic of proc_dowatchdog_thresh and
proc_dowatchdog_enabled has been factored into proc_dowatchdog.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1306127423-3347-3-git-send-email-msb@chromium.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
LKML-Reference: <20110517071018.GE22305@elte.hu>

586692a5

watchdog: Only disable/enable watchdog if neccessary · e04ab2bc

由 Mandeep Singh Baines 提交于 5月 22, 2011

Don't take any action on an unsuccessful write to /proc.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1306127423-3347-2-git-send-email-msb@chromium.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

e04ab2bc

watchdog: Fix rounding bug in get_sample_period() · 824c6b7f

由 Mandeep Singh Baines 提交于 5月 22, 2011

In get_sample_period(), softlockup_thresh is integer divided by
5 before the multiplication by NSEC_PER_SEC. This results in
softlockup_thresh being rounded down to the nearest integer
multiple of 5.

For example, a softlockup_thresh of 4 rounds down to 0.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1306127423-3347-1-git-send-email-msb@chromium.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

824c6b7f

21 5月, 2011 3 次提交

ipc: Add missing sys_ni entries for ipc/compat.c functions · be84bfcc

由 Kevin Cernekee 提交于 5月 17, 2011

When building with:

  CONFIG_64BIT=y
  CONFIG_MIPS32_COMPAT=y
  CONFIG_COMPAT=y
  CONFIG_MIPS32_O32=y
  CONFIG_MIPS32_N32=y
  CONFIG_SYSVIPC is not set
  (and implicitly: CONFIG_SYSVIPC_COMPAT is not set)

the final link fails with unresolved symbols for:

  compat_sys_semctl, compat_sys_msgsnd, compat_sys_msgrcv,
  compat_sys_shmctl, compat_sys_msgctl, compat_sys_semtimedop

The fix is to add cond_syscall declarations for all syscalls in
ipc/compat.c
Signed-off-by: NKevin Cernekee <cernekee@gmail.com>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be84bfcc

SCHED_TTWU_QUEUE is not longer needed since sparc32 now implements IPI · 17d9f311

由 Daniel Hellstrom 提交于 5月 20, 2011

Signed-off-by: NDaniel Hellstrom <daniel@gaisler.com>
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17d9f311

sanitize <linux/prefetch.h> usage · 268bb0ce

由 Linus Torvalds 提交于 5月 20, 2011

Commit e66eed65 ("list: remove prefetching from regular list
iterators") removed the include of prefetch.h from list.h, which
uncovered several cases that had apparently relied on that rather
obscure header file dependency.

So this fixes things up a bit, using

   grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]')
   grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]')

to guide us in finding files that either need <linux/prefetch.h>
inclusion, or have it despite not needing it.

There are more of them around (mostly network drivers), but this gets
many core ones.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

268bb0ce

20 5月, 2011 7 次提交

sched: Increase SCHED_LOAD_SCALE resolution · c8b28116

由 Nikhil Rao 提交于 5月 18, 2011

Introduce SCHED_LOAD_RESOLUTION, which scales is added to
SCHED_LOAD_SHIFT and increases the resolution of
SCHED_LOAD_SCALE. This patch sets the value of
SCHED_LOAD_RESOLUTION to 10, scaling up the weights for all
sched entities by a factor of 1024. With this extra resolution,
we can handle deeper cgroup hiearchies and the scheduler can do
better shares distribution and load load balancing on larger
systems (especially for low weight task groups).

This does not change the existing user interface, the scaled
weights are only used internally. We do not modify
prio_to_weight values or inverses, but use the original weights
when calculating the inverse which is used to scale execution
time delta in calc_delta_mine(). This ensures we do not lose
accuracy when accounting time to the sched entities. Thanks to
Nikunj Dadhania for fixing an bug in c_d_m() that broken fairness.

Below is some analysis of the performance costs/improvements of
this patch.

1. Micro-arch performance costs:

Experiment was to run Ingo's pipe_test_100k 200 times with the
task pinned to one cpu. I measured instruction, cycles and
stalled-cycles for the runs. See:

   http://thread.gmane.org/gmane.linux.kernel/1129232/focus=1129389

for more info.

-tip (baseline):

 Performance counter stats for '/root/load-scale/pipe-test-100k' (200 runs):

       964,991,769 instructions             #    0.82  insns per cycle
                                            #    0.33  stalled cycles per insn
                                            #    ( +-  0.05% )
     1,171,186,635 cycles                   #    0.000 GHz                      ( +-  0.08% )
       306,373,664 stalled-cycles-backend   #   26.16% backend  cycles idle     ( +-  0.28% )
       314,933,621 stalled-cycles-frontend  #   26.89% frontend cycles idle     ( +-  0.34% )

        1.122405684  seconds time elapsed  ( +-  0.05% )

-tip+patches:

 Performance counter stats for './load-scale/pipe-test-100k' (200 runs):

       963,624,821 instructions             #    0.82  insns per cycle
                                            #    0.33  stalled cycles per insn
                                            #    ( +-  0.04% )
     1,175,215,649 cycles                   #    0.000 GHz                      ( +-  0.08% )
       315,321,126 stalled-cycles-backend   #   26.83% backend  cycles idle     ( +-  0.28% )
       316,835,873 stalled-cycles-frontend  #   26.96% frontend cycles idle     ( +-  0.29% )

        1.122238659  seconds time elapsed  ( +-  0.06% )

With this patch, instructions decrease by ~0.10% and cycles
increase by 0.27%. This doesn't look statistically significant.
The number of stalled cycles in the backend increased from
26.16% to 26.83%. This can be attributed to the shifts we do in
c_d_m() and other places. The fraction of stalled cycles in the
frontend remains about the same, at 26.96% compared to 26.89% in -tip.

2. Balancing low-weight task groups

Test setup: run 50 tasks with random sleep/busy times (biased
around 100ms) in a low weight container (with cpu.shares = 2).
Measure %idle as reported by mpstat over a 10s window.

-tip (baseline):

06:47:48 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle    intr/s
06:47:49 PM  all   94.32    0.00    0.06    0.00    0.00    0.00    0.00    0.00    5.62  15888.00
06:47:50 PM  all   94.57    0.00    0.62    0.00    0.00    0.00    0.00    0.00    4.81  16180.00
06:47:51 PM  all   94.69    0.00    0.06    0.00    0.00    0.00    0.00    0.00    5.25  15966.00
06:47:52 PM  all   95.81    0.00    0.00    0.00    0.00    0.00    0.00    0.00    4.19  16053.00
06:47:53 PM  all   94.88    0.06    0.00    0.00    0.00    0.00    0.00    0.00    5.06  15984.00
06:47:54 PM  all   93.31    0.00    0.00    0.00    0.00    0.00    0.00    0.00    6.69  15806.00
06:47:55 PM  all   94.19    0.00    0.06    0.00    0.00    0.00    0.00    0.00    5.75  15896.00
06:47:56 PM  all   92.87    0.00    0.00    0.00    0.00    0.00    0.00    0.00    7.13  15716.00
06:47:57 PM  all   94.88    0.00    0.00    0.00    0.00    0.00    0.00    0.00    5.12  15982.00
06:47:58 PM  all   95.44    0.00    0.00    0.00    0.00    0.00    0.00    0.00    4.56  16075.00
Average:     all   94.49    0.01    0.08    0.00    0.00    0.00    0.00    0.00    5.42  15954.60

-tip+patches:

06:47:03 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle    intr/s
06:47:04 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16630.00
06:47:05 PM  all   99.69    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.31  16580.20
06:47:06 PM  all   99.69    0.00    0.06    0.00    0.00    0.00    0.00    0.00    0.25  16596.00
06:47:07 PM  all   99.20    0.00    0.74    0.00    0.00    0.06    0.00    0.00    0.00  17838.61
06:47:08 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16540.00
06:47:09 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16575.00
06:47:10 PM  all  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  16614.00
06:47:11 PM  all   99.94    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.06  16588.00
06:47:12 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00    0.00  16593.00
06:47:13 PM  all   99.94    0.00    0.06    0.00    0.00    0.00    0.00    0.00    0.00  16551.00
Average:     all   99.84    0.00    0.09    0.00    0.00    0.01    0.00    0.00    0.06  16711.58

We see an improvement in idle% on the system (drops from 5.42% on -tip to 0.06%
with the patches).

We see an improvement in idle% on the system (drops from 5.42%
on -tip to 0.06% with the patches).
Signed-off-by: NNikhil Rao <ncrao@google.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Stephan Barwolf <stephan.baerwolf@tu-ilmenau.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1305754668-18792-1-git-send-email-ncrao@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

c8b28116

sched: Introduce SCHED_POWER_SCALE to scale cpu_power calculations · 1399fa78

由 Nikhil Rao 提交于 5月 18, 2011

SCHED_LOAD_SCALE is used to increase nice resolution and to
scale cpu_power calculations in the scheduler. This patch
introduces SCHED_POWER_SCALE and converts all uses of
SCHED_LOAD_SCALE for scaling cpu_power to use SCHED_POWER_SCALE
instead.

This is a preparatory patch for increasing the resolution of
SCHED_LOAD_SCALE, and there is no need to increase resolution
for cpu_power calculations.
Signed-off-by: NNikhil Rao <ncrao@google.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Stephan Barwolf <stephan.baerwolf@tu-ilmenau.de>
Cc: Mike Galbraith <efault@gmx.de>
Link: http://lkml.kernel.org/r/1305738580-9924-3-git-send-email-ncrao@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

1399fa78

sched: Cleanup set_load_weight() · f05998d4

由 Nikhil Rao 提交于 5月 18, 2011

Avoid using long repetitious names; make this simpler and nicer
to read. No functional change introduced in this patch.
Signed-off-by: NNikhil Rao <ncrao@google.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Nikunj A. Dadhania <nikunj@linux.vnet.ibm.com>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Stephan Barwolf <stephan.baerwolf@tu-ilmenau.de>
Cc: Mike Galbraith <efault@gmx.de>
Link: http://lkml.kernel.org/r/1305738580-9924-2-git-send-email-ncrao@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

f05998d4

T
clockevents/source: Use u64 to make 32bit happy · c0e299b1
由 Thomas Gleixner 提交于 5月 20, 2011
```
unsigned long is not 64bit on 32bit machine.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
```
c0e299b1

extable, core_kernel_data(): Make sure all archs define _sdata · a2d063ac

由 Steven Rostedt 提交于 5月 19, 2011

A new utility function (core_kernel_data()) is used to determine if a
passed in address is part of core kernel data or not. It may or may not
return true for RO data, but this utility must work for RW data.

Thus both _sdata and _edata must be defined and continuous,
without .init sections that may later be freed and replaced by
volatile memory (memory that can be freed).

This utility function is used to determine if data is safe from
ever being freed. Thus it should return true for all RW global
data that is not in a module or has been allocated, or false
otherwise.

Also change core_kernel_data() back to the more precise _sdata condition
and document the function.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Acked-by: NHirokazu Takata <takata@linux-m32r.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Helge Deller <deller@gmx.de>
Cc: JamesE.J.Bottomley <jejb@parisc-linux.org>
Link: http://lkml.kernel.org/r/1305855298.1465.19.camel@gandalf.stny.rr.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
----
 arch/alpha/kernel/vmlinux.lds.S   |    1 +
 arch/m32r/kernel/vmlinux.lds.S    |    1 +
 arch/m68k/kernel/vmlinux-std.lds  |    2 ++
 arch/m68k/kernel/vmlinux-sun3.lds |    1 +
 arch/mips/kernel/vmlinux.lds.S    |    1 +
 arch/parisc/kernel/vmlinux.lds.S  |    3 +++
 kernel/extable.c                  |   12 +++++++++++-
 7 files changed, 20 insertions(+), 1 deletion(-)

a2d063ac

core_kernel_data(): Fix architectures that do not define _sdata · c5fc4721

由 Ingo Molnar 提交于 5月 20, 2011

Some architectures such as Alpha do not define _sdata but _data:

  kernel/built-in.o: In function `core_kernel_data':
  kernel/extable.c:77: undefined reference to `_sdata'

So expand the scope of the data range to the text addresses too,
this might be more correct anyway because this way we can
cover readonly variables as well.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/n/tip-i878c8a0e0g0ep4v7i6vxnhz@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

c5fc4721

Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" · 80d02085

由 Paul E. McKenney 提交于 5月 12, 2011

This reverts commit e59fb312.

This reversion was due to (extreme) boot-time slowdowns on SPARC seen by
Yinghai Lu and on x86 by Ingo
.
This is a non-trivial reversion due to intervening commits.

Conflicts:

	Documentation/RCU/trace.txt
	kernel/rcutree.c
Signed-off-by: NIngo Molnar <mingo@elte.hu>

80d02085

19 5月, 2011 22 次提交

clockevents: Provide interface to reconfigure an active clock event device · 80b816b7

由 Thomas Gleixner 提交于 5月 18, 2011

Some ARM SoCs have clock event devices which have their frequency
modified due to frequency scaling. Provide an interface which allows
to reconfigure an active device. After reconfiguration reprogram the
current pending event.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: LAK <linux-arm-kernel@lists.infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Reviewed-by: NIngo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.437459958%40linutronix.de%3E

80b816b7

clockevents: Provide combined configure and register function · 57f0fcbe

由 Thomas Gleixner 提交于 5月 18, 2011

All clockevent devices have the same open coded initialization
functions. Provide an interface which does all necessary
initialization in the core code.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Reviewed-by: NIngo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.331975870%40linutronix.de%3E

57f0fcbe

clocksource: Get rid of the hardcoded 5 seconds sleep time limit · 724ed53e

由 Thomas Gleixner 提交于 5月 18, 2011

Slow clocksources can have a way longer sleep time than 5 seconds and
even fast ones can easily cope with 600 seconds and still maintain
proper accuracy.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Reviewed-by: NIngo Molnar <mingo@elte.hu>
Link: http://lkml.kernel.org/r/%3C20110518210136.109811585%40linutronix.de%3E

724ed53e

J
params.c: Use new strtobool function to process boolean inputs · f721a465
由 Jonathan Cameron 提交于 4月 19, 2011
```
Signed-off-by: NJonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
f721a465

module: Use binary search in lookup_symbol() · 9d63487f

由 Alessio Igor Bogani 提交于 5月 18, 2011

The function is_exported() with its helper function lookup_symbol() are used to
verify if a provided symbol is effectively exported by the kernel or by the
modules. Now that both have their symbols sorted we can replace a linear search
with a binary search which provide a considerably speed-up.

This work was supported by a hardware donation from the CE Linux Forum.
Signed-off-by: NAlessio Igor Bogani <abogani@kernel.org>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

9d63487f

module: Use the binary search for symbols resolution · 403ed278

由 Alessio Igor Bogani 提交于 4月 20, 2011

Takes advantage of the order and locates symbols using binary search.

This work was supported by a hardware donation from the CE Linux Forum.
Signed-off-by: NAlessio Igor Bogani <abogani@kernel.org>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Tested-by: NDirk Behme <dirk.behme@googlemail.com>

403ed278

module: each_symbol_section instead of each_symbol · de4d8d53

由 Rusty Russell 提交于 4月 19, 2011

Instead of having a callback function for each symbol in the kernel,
have a callback for each array of symbols.

This eases the logic when we move to sorted symbols and binary search.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAlessio Igor Bogani <abogani@kernel.org>

de4d8d53

module: split unset_section_ro_nx function. · 01526ed0

由 Jan Glauber 提交于 5月 19, 2011

Split the unprotect function into a function per section to make
the code more readable and add the missing static declaration.
Signed-off-by: NJan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

01526ed0

module: undo module RONX protection correctly. · 448694a1

由 Jan Glauber 提交于 5月 19, 2011

While debugging I stumbled over two problems in the code that protects module
pages.

First issue is that disabling the protection before freeing init or unload of
a module is not symmetric with the enablement. For instance, if pages are set
to RO the page range from module_core to module_core + core_ro_size is
protected. If a module is unloaded the page range from module_core to
module_core + core_size is set back to RW.
So pages that were not set to RO are also changed to RW.
This is not critical but IMHO it should be symmetric.

Second issue is that while set_memory_rw & set_memory_ro are used for
RO/RW changes only set_memory_nx is involved for NX/X. One would await that
the inverse function is called when the NX protection should be removed,
which is not the case here, unless I'm missing something.
Signed-off-by: NJan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

448694a1

module: zero mod->init_ro_size after init is freed. · 4d10380e

由 Jan Glauber 提交于 5月 19, 2011

Reset mod->init_ro_size to zero after the init part of a module is unloaded.
Otherwise we need to check if module->init is NULL in the unprotect functions
in the next patch.
Signed-off-by: NJan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

4d10380e

minor ANSI prototype sparse fix · 5d05c708

由 Daniel J Blueman 提交于 3月 08, 2011

Fix function prototype to be ANSI-C compliant, consistent with other
function prototypes, addressing a sparse warning.
Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

5d05c708

module: deal with alignment issues in built-in module versions · b4bc8428

由 Dmitry Torokhov 提交于 2月 07, 2011

On m68k natural alignment is 2-byte boundary but we are trying to
align structures in __modver section on sizeof(void *) boundary.
This causes trouble when we try to access elements in this section
in array-like fashion when create "version" attributes for built-in
modules.

Moreover, as DaveM said, we can't reliably put structures into
independent objects, put them into a special section, and then expect
array access over them (via the section boundaries) after linking the
objects together to just "work" due to variable alignment choices in
different situations. The only solution that seems to work reliably
is to make an array of plain pointers to the objects in question and
put those pointers in the special section.
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NDmitry Torokhov <dtor@vmware.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

b4bc8428

ftrace: Add self-tests for multiple function trace users · 95950c2e

由 Steven Rostedt 提交于 5月 06, 2011

Add some basic sanity tests for multiple users of the function
tracer at startup.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

95950c2e

ftrace: Modify ftrace_set_filter/notrace to take ops · 936e074b

由 Steven Rostedt 提交于 5月 05, 2011

Since users of the function tracer can now pick and choose which
functions they want to trace agnostically from other users of the
function tracer, we need to pass the ops struct to the ftrace_set_filter()
functions.

The functions ftrace_set_global_filter() and ftrace_set_global_notrace()
is added to keep the old filter functions which are used to modify
the generic function tracers.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

936e074b

ftrace: Allow dynamically allocated function tracers · cdbe61bf

由 Steven Rostedt 提交于 5月 05, 2011

Now that functions may be selected individually, it only makes sense
that we should allow dynamically allocated trace structures to
be traced. This will allow perf to allocate a ftrace_ops structure
at runtime and use it to pick and choose which functions that
structure will trace.

Note, a dynamically allocated ftrace_ops will always be called
indirectly instead of being called directly from the mcount in
entry.S. This is because there's no safe way to prevent mcount
from being preempted before calling the function, unless we
modify every entry.S to do so (not likely). Thus, dynamically allocated
functions will now be called by the ftrace_ops_list_func() that
loops through the ops that are allocated if there are more than
one op allocated at a time. This loop is protected with a
preempt_disable.

To determine if an ftrace_ops structure is allocated or not, a new
util function was added to the kernel/extable.c called
core_kernel_data(), which returns 1 if the address is between
_sdata and _edata.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

cdbe61bf

ftrace: Implement separate user function filtering · b848914c

由 Steven Rostedt 提交于 5月 04, 2011

ftrace_ops that are registered to trace functions can now be
agnostic to each other in respect to what functions they trace.
Each ops has their own hash of the functions they want to trace
and a hash to what they do not want to trace. A empty hash for
the functions they want to trace denotes all functions should
be traced that are not in the notrace hash.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

b848914c

ftrace: Free hash with call_rcu_sched() · 07fd5515

由 Steven Rostedt 提交于 5月 05, 2011

When a hash is modified and might be in use, we need to perform
a schedule RCU operation on it, as the hashes will soon be used
directly in the function tracer callback.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

07fd5515

ftrace: Have global_ops store the functions that are to be traced · 2b499381

由 Steven Rostedt 提交于 5月 03, 2011

This is a step towards each ops structure defining its own set
of functions to trace. As the current code with pid's and such
are specific to the global_ops, it is restructured to be used
with the global ops.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2b499381

ftrace: Add ops parameter to ftrace_startup/shutdown functions · bd69c30b

由 Steven Rostedt 提交于 5月 03, 2011

In order to allow different ops to enable different functions,
the ftrace_startup() and ftrace_shutdown() functions need the
ops parameter passed to them.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bd69c30b

ftrace: Add enabled_functions file · 647bcd03

由 Steven Rostedt 提交于 5月 03, 2011

Add the enabled_functions file that is used to show all the
functions that have been enabled for tracing as well as their
ref counts. This helps seeing if any function has been registered
and what functions are being traced.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

647bcd03

ftrace: Use counters to enable functions to trace · ed926f9b

由 Steven Rostedt 提交于 5月 03, 2011

Every function has its own record that stores the instruction
pointer and flags for the function to be traced. There are only
two flags: enabled and free. The enabled flag states that tracing
for the function has been enabled (actively traced), and the free
flag states that the record no longer points to a function and can
be used by new functions (loaded modules).

These flags are now moved to the MSB of the flags (actually just
the top 32bits). The rest of the bits (30 bits) are now used as
a ref counter. Everytime a tracer register functions to trace,
those functions will have its counter incremented.

When tracing is enabled, to determine if a function should be traced,
the counter is examined, and if it is non-zero it is set to trace.

When a ftrace_ops is registered to trace functions, its hashes
are examined. If the ftrace_ops filter_hash count is zero, then
all functions are set to be traced, otherwise only the functions
in the hash are to be traced. The exception to this is if a function
is also in the ftrace_ops notrace_hash. Then that function's counter
is not incremented for this ftrace_ops.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ed926f9b

ftrace: Separate hash allocation and assignment · 33dc9b12

由 Steven Rostedt 提交于 5月 02, 2011

When filtering, allocate a hash to insert the function records.
After the filtering is complete, assign it to the ftrace_ops structure.

This allows the ftrace_ops structure to have a much smaller array of
hash buckets instead of wasting a lot of memory.

A read only empty_hash is created to be the minimum size that any ftrace_ops
can point to.

When a new hash is created, it has the following steps:

o Allocate a default hash.
o Walk the function records assigning the filtered records to the hash
o Allocate a new hash with the appropriate size buckets
o Move the entries from the default hash to the new hash.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

33dc9b12

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功