提交 · ab5348573908dedb62dd906aa60c5989fecfe38f · openeuler / raspberrypi-kernel

28 6月, 2013 1 次提交

lib: Move fonts from drivers/video/console/ to lib/fonts/ · ee89bd6b

由 Geert Uytterhoeven 提交于 6月 09, 2013

Several drivers need font support independent of CONFIG_VT, cfr. commit
9cbce8d7e1dae0744ca4f68d62aa7de18196b6f4, "console/font: Refactor font
support code selection logic").
Hence move the fonts and their support logic from drivers/video/console/ to
its own library directory lib/fonts/.
This also allows to limit processing of drivers/video/console/Makefile to
CONFIG_VT=y again.

[Kevin Hilman <khilman@linaro.org>: Update arch/arm/boot/compressed/Makefile]
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>

ee89bd6b

26 6月, 2013 6 次提交

locking-selftests: Handle unexpected failures more strictly · 166989e3

由 Maarten Lankhorst 提交于 6月 20, 2013

When CONFIG_PROVE_LOCKING is not enabled, more tests are
expected to pass unexpectedly, but there no tests that should
start to fail that pass with CONFIG_PROVE_LOCKING enabled.
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113151.4001.77963.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>

166989e3

mutex: Add more w/w tests to test EDEADLK path handling · f3cf139e

由 Maarten Lankhorst 提交于 6月 20, 2013

Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113141.4001.54331.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>

f3cf139e

mutex: Add more tests to lib/locking-selftest.c · 2fe3d4b1

由 Maarten Lankhorst 提交于 6月 20, 2013

None of the ww_mutex codepaths should be taken in the 'normal'
mutex calls. The easiest way to verify this is by using the
normal mutex calls, and making sure o.ctx is unmodified.
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: robclark@gmail.com
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113130.4001.45423.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>

2fe3d4b1

mutex: Add w/w tests to lib/locking-selftest.c · 1de99445

由 Maarten Lankhorst 提交于 6月 20, 2013

This stresses the lockdep code in some ways specifically useful
to ww_mutexes. It adds checks for most of the common locking
errors.
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: robclark@gmail.com
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113124.4001.23186.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>

1de99445

mutex: Add w/w mutex slowpath debugging · 23010027

由 Daniel Vetter 提交于 6月 20, 2013

Injects EDEADLK conditions at pseudo-random interval, with
exponential backoff up to UINT_MAX (to ensure that every lock
operation still completes in a reasonable time).

This way we can test the wound slowpath even for ww mutex users
where contention is never expected, and the ww deadlock
avoidance algorithm is only needed for correctness against
malicious userspace. An example would be protecting kernel
modesetting properties, which thanks to single-threaded X isn't
really expected to contend, ever.

I've looked into using the CONFIG_FAULT_INJECTION
infrastructure, but decided against it for two reasons:

- EDEADLK handling is mandatory for ww mutex users and should
  never affect the outcome of a syscall. This is in contrast to -ENOMEM
  injection. So fine configurability isn't required.

- The fault injection framework only allows to set a simple
  probability for failure. Now the probability that a ww mutex acquire
  stage with N locks will never complete (due to too many injected
  EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock
  operations for the completely uncontended case would be O(exp(N)).
  The per-acuiqire ctx exponential backoff solution choosen here only
  results in O(log N) overhead due to injection and so O(log N * N)
  lock operations. This way we can fail with high probability (and so
  have good test coverage even for fancy backoff and lock acquisition
  paths) without running into patalogical cases.

Note that EDEADLK will only ever be injected when we managed to
acquire the lock. This prevents any behaviour changes for users
which rely on the EALREADY semantics.
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>

23010027

mutex: Add support for wound/wait style locks · 040a0a37

由 Maarten Lankhorst 提交于 6月 24, 2013

Wound/wait mutexes are used when other multiple lock
acquisitions of a similar type can be done in an arbitrary
order. The deadlock handling used here is called wait/wound in
the RDBMS literature: The older tasks waits until it can acquire
the contended lock. The younger tasks needs to back off and drop
all the locks it is currently holding, i.e. the younger task is
wounded.

For full documentation please read Documentation/ww-mutex-design.txt.

References: https://lwn.net/Articles/548909/Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: NRob Clark <robdclark@gmail.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

040a0a37

19 6月, 2013 1 次提交

X.509: do not emit any informational output · a857c6e7

由 Arnd Bergmann 提交于 1月 08, 2013

When building a kernel using 'make -s', I expect to see an empty output,
except for build warnings and errors. The build_OID_registry code
always prints one line when run, which is not helpful to most people
building the kernels, and which makes it harder to automatically
check for build warnings.

Let's just remove the one line output.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>

a857c6e7

17 6月, 2013 1 次提交

percpu-refcount: use RCU-sched insted of normal RCU · a4244454

由 Tejun Heo 提交于 6月 16, 2013

percpu-refcount was incorrectly using preempt_disable/enable() for RCU
critical sections against call_rcu().  6a24474d ("percpu-refcount:
consistently use plain (non-sched) RCU") fixed it by converting the
preepmtion operations with rcu_read_[un]lock() citing that there isn't
any advantage in using sched-RCU over using the usual one; however,
rcu_read_[un]lock() for the preemptible RCU implementation -
CONFIG_TREE_PREEMPT_RCU, chosen when CONFIG_PREEMPT - are slightly
more expensive than preempt_disable/enable().

In a contrived microbench which repeats the followings,

 - percpu_ref_get()
 - copy 32 bytes of data into percpu buffer
 - percpu_put_get()
 - copy 32 bytes of data into percpu buffer

rcu_read_[un]lock() used in percpu_ref_get/put() makes it go slower by
about 15% when compared to using sched-RCU.

As the RCU critical sections are extremely short, using sched-RCU
shouldn't have any latency implications.  Convert to RCU-sched.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NKent Overstreet <koverstreet@google.com>
Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rusty Russell <rusty@rustcorp.com.au>

a4244454

14 6月, 2013 3 次提交

percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm() · dbece3a0

由 Tejun Heo 提交于 6月 13, 2013

Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed.  Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.

For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added.  The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.

While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().

v2: Patch description rephrased to emphasize that tryget() may
    continue to succeed on some CPUs after kill() returns as suggested
    by Kent.

v3: Function comment in percpu_ref_kill_and_confirm() updated warning
    people to not depend on the implied RCU grace period from the
    confirm callback as it's an implementation detail.
Signed-off-by: NTejun Heo <tj@kernel.org>
Slightly-Grumpily-Acked-by: NKent Overstreet <koverstreet@google.com>

dbece3a0

percpu-refcount: implement percpu_ref_cancel_init() · bc497bd3

由 Tejun Heo 提交于 6月 12, 2013

Normally, percpu_ref_init() initializes and percpu_ref_kill()
initiates destruction which completes asynchronously.  The
asynchronous destruction can be problematic in init failure path where
the caller wants to destroy half-constructed object - distinguishing
half-constructed objects from the usual release method can be painful
for complex objects.

This patch implements percpu_ref_cancel_init() which synchronously
destroys the percpu_ref without invoking release.  To avoid
unintentional misuses, the function requires the ref to have finished
percpu_ref_init() but never used and triggers WARN otherwise.

v2: Explain the weird name and usage restriction in the function
    comment.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NKent Overstreet <koverstreet@google.com>

bc497bd3

percpu-refcount: add __must_check to percpu_ref_init() and don't use... · acac7883

由 Tejun Heo 提交于 6月 12, 2013

percpu-refcount: add __must_check to percpu_ref_init() and don't use ACCESS_ONCE() in percpu_ref_kill_rcu()

Two small changes.

* Unlike most init functions, percpu_ref_init() allocates memory and
  may fail.  Let's mark it with __must_check in case the caller
  forgets.

* percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to
  dereference @ref->pcpu_count, which can be misleading.  The pointer
  is guaranteed to be valid and visible and can't change underneath
  the function.  Drop ACCESS_ONCE().
Signed-off-by: NTejun Heo <tj@kernel.org>

acac7883

13 6月, 2013 2 次提交

percpu-refcount: cosmetic updates · ac899061

由 Tejun Heo 提交于 6月 12, 2013

* s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t
  postfix for types and the type is gonna be used for a different type
  of callback too.

* Add @ARG to function comments.

* Drop unnecessary and unaligned indentation from percpu_ref_init()
  function comment.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NKent Overstreet <koverstreet@google.com>

ac899061

lib/mpi/mpicoder.c: looping issue, need stop when equal to zero, found by 'EXTRA_FLAGS=-W'. · 5402b804

由 Chen Gang 提交于 6月 12, 2013

For 'while' looping, need stop when 'nbytes == 0', or will cause issue.
('nbytes' is size_t which is always bigger or equal than zero).

The related warning: (with EXTRA_CFLAGS=-W)

  lib/mpi/mpicoder.c:40:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]
Signed-off-by: NChen Gang <gang.chen@asianux.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5402b804

08 6月, 2013 1 次提交

kobject: sanitize argument for format string · b7165ebb

由 Kees Cook 提交于 6月 06, 2013

Unlike kobject_set_name(), the kset_create_and_add() interface does not
provide a way to use format strings, so make sure that the interface
cannot be abused accidentally. It looks like all current callers use
static strings, so there's no existing flaw.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b7165ebb

06 6月, 2013 1 次提交

net: core: move mac_pton() to lib/net_utils.c · 4cd5773a

由 Andy Shevchenko 提交于 6月 04, 2013

Since we have at least one user of this function outside of CONFIG_NET
scope, we have to provide this function independently. The proposed
solution is to move it under lib/net_utils.c with corresponding
configuration variable and select wherever it is needed.
Signed-off-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Reported-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4cd5773a

04 6月, 2013 3 次提交

percpu-refcount: Don't use silly cmpxchg() · c1ae6e9b

由 Kent Overstreet 提交于 6月 03, 2013

The cmpxchg() was just to ensure the debug check didn't race, which was
a bit excessive. The caller is supposed to do the appropriate
synchronization, which means percpu_ref_kill() can just do a simple
store.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

c1ae6e9b

percpu: implement generic percpu refcounting · 215e262f

由 Kent Overstreet 提交于 5月 31, 2013

This implements a refcount with similar semantics to
atomic_get()/atomic_dec_and_test() - but percpu.

It also implements two stage shutdown, as we need it to tear down the
percpu counts.  Before dropping the initial refcount, you must call
percpu_ref_kill(); this puts the refcount in "shutting down mode" and
switches back to a single atomic refcount with the appropriate
barriers (synchronize_rcu()).

It's also legal to call percpu_ref_kill() multiple times - it only
returns true once, so callers don't have to reimplement shutdown
synchronization.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NTejun Heo <tj@kernel.org>

215e262f

debugfs: add get/set for atomic types · 3a76e5e0

由 Seth Jennings 提交于 6月 03, 2013

debugfs currently lack the ability to create attributes
that set/get atomic_t values.

This patch adds support for this through a new
debugfs_create_atomic_t() function.
Signed-off-by: NSeth Jennings <sjenning@linux.vnet.ibm.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NMel Gorman <mgorman@suse.de>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3a76e5e0

25 5月, 2013 1 次提交

MPILIB: disable usage of floating point registers on parisc · 70ef5578

由 Helge Deller 提交于 5月 05, 2013

The umul_ppmm() macro for parisc uses the xmpyu assembler statement
which does calculation via a floating point register.

But usage of floating point registers inside the Linux kernel are not
allowed and gcc will stop compilation due to the -mdisable-fpregs
compiler option.

Fix this by disabling the umul_ppmm() and udiv_qrnnd() macros. The
mpilib will then use the generic built-in implementations instead.
Signed-off-by: NHelge Deller <deller@gmx.de>

70ef5578

24 5月, 2013 1 次提交

lib: make iovec obj instead of lib · b4d3ba33

由 Randy Dunlap 提交于 5月 22, 2013

Fix build error io vmw_vmci.ko when CONFIG_VMWARE_VMCI=m by chaning
iovec.o from lib-y to obj-y.

  ERROR: "memcpy_toiovec" [drivers/misc/vmw_vmci/vmw_vmci.ko] undefined!
  ERROR: "memcpy_fromiovec" [drivers/misc/vmw_vmci/vmw_vmci.ko] undefined!
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b4d3ba33

22 5月, 2013 1 次提交

klist: del waiter from klist_remove_waiters before wakeup waitting process · ac5a2962

由 wang, biao 提交于 5月 16, 2013

There is a race between klist_remove and klist_release. klist_remove
uses a local var waiter saved on stack. When klist_release calls
wake_up_process(waiter->process) to wake up the waiter, waiter might run
immediately and reuse the stack. Then, klist_release calls
list_del(&waiter->list) to change previous
wait data and cause prior waiter thread corrupt.

The patch fixes it against kernel 3.9.
Signed-off-by: Nwang, biao <biao.wang@intel.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ac5a2962

20 5月, 2013 1 次提交

Hoist memcpy_fromiovec/memcpy_toiovec into lib/ · d2f83e90

由 Rusty Russell 提交于 5月 17, 2013

ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!

That function is only present with CONFIG_NET.  Turns out that
crypto/algif_skcipher.c also uses that outside net, but it actually
needs sockets anyway.

In addition, commit 6d4f0139 added
CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
that function and revert that commit too.

socket.h already includes uio.h, so no callers need updating; trying
only broke things fo x86_64 randconfig (thanks Fengguang!).
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

d2f83e90

08 5月, 2013 2 次提交

rwsem: check counter to avoid cmpxchg calls · 9607a85b

由 Davidlohr Bueso 提交于 5月 07, 2013

This patch tries to reduce the amount of cmpxchg calls in the writer
failed path by checking the counter value first before issuing the
instruction.  If ->count is not set to RWSEM_WAITING_BIAS then there is
no point wasting a cmpxchg call.

Furthermore, Michel states "I suppose it helps due to the case where
someone else steals the lock while we're trying to acquire
sem->wait_lock."

Two very different workloads and machines were used to see how this
patch improves throughput: pgbench on a quad-core laptop and aim7 on a
large 8 socket box with 80 cores.

Some results comparing Michel's fast-path write lock stealing
(tps-rwsem) on a quad-core laptop running pgbench:

  | db_size | clients  |  tps-rwsem     |   tps-patch  |
  +---------+----------+----------------+--------------+
  | 160 MB   |       1 |           6906 |         9153 | + 32.5
  | 160 MB   |       2 |          15931 |        22487 | + 41.1%
  | 160 MB   |       4 |          33021 |        32503 |
  | 160 MB   |       8 |          34626 |        34695 |
  | 160 MB   |      16 |          33098 |        34003 |
  | 160 MB   |      20 |          31343 |        31440 |
  | 160 MB   |      30 |          28961 |        28987 |
  | 160 MB   |      40 |          26902 |        26970 |
  | 160 MB   |      50 |          25760 |        25810 |
  ------------------------------------------------------
  | 1.6 GB   |       1 |           7729 |         7537 |
  | 1.6 GB   |       2 |          19009 |        23508 | + 23.7%
  | 1.6 GB   |       4 |          33185 |        32666 |
  | 1.6 GB   |       8 |          34550 |        34318 |
  | 1.6 GB   |      16 |          33079 |        32689 |
  | 1.6 GB   |      20 |          31494 |        31702 |
  | 1.6 GB   |      30 |          28535 |        28755 |
  | 1.6 GB   |      40 |          27054 |        27017 |
  | 1.6 GB   |      50 |          25591 |        25560 |
  ------------------------------------------------------
  | 7.6 GB   |       1 |           6224 |         7469 | + 20.0%
  | 7.6 GB   |       2 |          13611 |        12778 |
  | 7.6 GB   |       4 |          33108 |        32927 |
  | 7.6 GB   |       8 |          34712 |        34878 |
  | 7.6 GB   |      16 |          32895 |        33003 |
  | 7.6 GB   |      20 |          31689 |        31974 |
  | 7.6 GB   |      30 |          29003 |        28806 |
  | 7.6 GB   |      40 |          26683 |        26976 |
  | 7.6 GB   |      50 |          25925 |        25652 |
  ------------------------------------------------------

For the aim7 worloads, they overall improved on top of Michel's
patchset.  For full graphs on how the rwsem series plus this patch
behaves on a large 8 socket machine against a vanilla kernel:

  http://stgolabs.net/rwsem-aim7-results.tar.gzSigned-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9607a85b

kref: minor cleanup · 2d864e41

由 Anatol Pomozov 提交于 5月 07, 2013

 - make warning smp-safe
 - result of atomic _unless_zero functions should be checked by caller
   to avoid use-after-free error
 - trivial whitespace fix.

Link: https://lkml.org/lkml/2013/4/12/391

Tested: compile x86, boot machine and run xfstests
Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
[ Removed line-break, changed to use WARN_ON_ONCE()  - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d864e41

07 5月, 2013 13 次提交

rwsem: no need for explicit signed longs · b5f54181

由 Davidlohr Bueso 提交于 5月 07, 2013

Change explicit "signed long" declarations into plain "long" as suggested
by Peter Hurley.
Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Reviewed-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b5f54181

rwsem: do not block readers at head of queue if other readers are active · 25c39325

由 Michel Lespinasse 提交于 5月 07, 2013

This change fixes a race condition where a reader might determine it
needs to block, but by the time it acquires the wait_lock the rwsem has
active readers and no queued waiters.

In this situation the reader can run in parallel with the existing
active readers; it does not need to block until the active readers
complete.

Thanks to Peter Hurley for noticing this possible race.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

25c39325

rwsem: implement support for write lock stealing on the fastpath · fe6e674c

由 Michel Lespinasse 提交于 5月 07, 2013

When we decide to wake up readers, we must first grant them as many read
locks as necessary, and then actually wake up all these readers. But in
order to know how many read shares to grant, we must first count the
readers at the head of the queue. This might take a while if there are
many readers, and we want to be protected against a writer stealing the
lock while we're counting. To that end, we grant the first reader lock
before counting how many more readers are queued.

We also require some adjustments to the wake_type semantics.

RWSEM_WAKE_NO_ACTIVE used to mean that we had found the count to be
RWSEM_WAITING_BIAS, in which case the rwsem was known to be free as
nobody could steal it while we hold the wait_lock. This doesn't make
sense once we implement fastpath write lock stealing, so we now use
RWSEM_WAKE_ANY in that case.

Similarly, when rwsem_down_write_failed found that a read lock was
active, it would use RWSEM_WAKE_READ_OWNED which signalled that new
readers could be woken without checking first that the rwsem was
available. We can't do that anymore since the existing readers might
release their read locks, and a writer could steal the lock before we
wake up additional readers. So, we have to use a new RWSEM_WAKE_READERS
value to indicate we only want to wake readers, but we don't currently
hold any read lock.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fe6e674c

rwsem: simplify __rwsem_do_wake · 8cf5322c

由 Michel Lespinasse 提交于 5月 07, 2013

This is mostly for cleanup value:

- We don't need several gotos to handle the case where the first
  waiter is a writer. Two simple tests will do (and generate very
  similar code).

- In the remainder of the function, we know the first waiter is a reader,
  so we don't have to double check that. We can use do..while loops
  to iterate over the readers to wake (generates slightly better code).
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8cf5322c

rwsem: skip initial trylock in rwsem_down_write_failed · 9b0fc9c0

由 Michel Lespinasse 提交于 5月 07, 2013

We can skip the initial trylock in rwsem_down_write_failed() if there
are known active lockers already, thus saving one likely-to-fail
cmpxchg.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b0fc9c0

rwsem: avoid taking wait_lock in rwsem_down_write_failed · a7d2c573

由 Michel Lespinasse 提交于 5月 07, 2013

In rwsem_down_write_failed(), if there are active locks after we wake up
(i.e.  the lock got stolen from us), skip taking the wait_lock and go
back to sleep immediately.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a7d2c573

rwsem: use cmpxchg for trying to steal write lock · 5ede972d

由 Michel Lespinasse 提交于 5月 07, 2013

Using rwsem_atomic_update to try stealing the write lock forced us to
undo the adjustment in the failure path.  We can have simpler and faster
code by using cmpxchg instead.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ede972d

rwsem: more agressive lock stealing in rwsem_down_write_failed · ed00f643

由 Michel Lespinasse 提交于 5月 07, 2013

Some small code simplifications can be achieved by doing more agressive
lock stealing:

- When rwsem_down_write_failed() notices that there are no active locks
  (and thus no thread to wake us if we decided to sleep), it used to wake
  the first queued process. However, stealing the lock is also sufficient
  to deal with this case, so we don't need this check anymore.

- In try_get_writer_sem(), we can steal the lock even when the first waiter
  is a reader. This is correct because the code path that wakes readers is
  protected by the wait_lock. As to the performance effects of this change,
  they are expected to be minimal: readers are still granted the lock
  (rather than having to acquire it themselves) when they reach the front
  of the wait queue, so we have essentially the same behavior as in
  rwsem-spinlock.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ed00f643

rwsem: simplify rwsem_down_write_failed · 023fe4f7

由 Michel Lespinasse 提交于 5月 07, 2013

When waking writers, we never grant them the lock - instead, they have
to acquire it themselves when they run, and remove themselves from the
wait_list when they succeed.

As a result, we can do a few simplifications in rwsem_down_write_failed():

- We don't need to check for !waiter.task since __rwsem_do_wake() doesn't
  remove writers from the wait_list

- There is no point releaseing the wait_lock before entering the wait loop,
  as we will need to reacquire it immediately. We can change the loop so
  that the lock is always held at the start of each loop iteration.

- We don't need to get a reference on the task structure, since the task
  is responsible for removing itself from the wait_list. There is no risk,
  like in the rwsem_down_read_failed() case, that a task would wake up and
  exit (thus destroying its task structure) while __rwsem_do_wake() is
  still running - wait_lock protects against that.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

023fe4f7

rwsem: simplify rwsem_down_read_failed · da16922c

由 Michel Lespinasse 提交于 5月 07, 2013

When trying to acquire a read lock, the RWSEM_ACTIVE_READ_BIAS
adjustment doesn't cause other readers to block, so we never have to
worry about waking them back after canceling this adjustment in
rwsem_down_read_failed().

We also never want to steal the lock in rwsem_down_read_failed(), so we
don't have to grab the wait_lock either.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da16922c

rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed · 1e78277c

由 Michel Lespinasse 提交于 5月 07, 2013

Remove the rwsem_down_failed_common function and replace it with two
identical copies of its code in rwsem_down_{read,write}_failed.

This is because we want to make different optimizations in
rwsem_down_{read,write}_failed; we are adding this pure-duplication
step as a separate commit in order to make it easier to check the
following steps.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1e78277c

rwsem: shorter spinlocked section in rwsem_down_failed_common() · f7dd1cee

由 Michel Lespinasse 提交于 5月 07, 2013

This change reduces the size of the spinlocked and TASK_UNINTERRUPTIBLE
sections in rwsem_down_failed_common():

- We only need the sem->wait_lock to insert ourselves on the wait_list;
  the waiter node can be prepared outside of the wait_lock.

- The task state only needs to be set to TASK_UNINTERRUPTIBLE immediately
  before checking if we actually need to sleep; it doesn't need to protect
  the entire function.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f7dd1cee

rwsem: make the waiter type an enumeration rather than a bitmask · e2d57f78

由 Michel Lespinasse 提交于 5月 07, 2013

We are not planning to add some new waiter flags, so we can convert the
waiter type into an enumeration.

Background: David Howells suggested I do this back when I tried adding
a new waiter type for unfair readers. However, I believe the cleanup
applies regardless of that use case.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e2d57f78

06 5月, 2013 1 次提交

Give the OID registry file module info to avoid kernel tainting · 9e687946

由 David Howells 提交于 5月 04, 2013

Give the OID registry file module information so that it doesn't taint the
kernel when compiled as a module and loaded.
Reported-by: NDros Adamson <Weston.Adamson@netapp.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Trond Myklebust <Trond.Myklebust@netapp.com>
cc: stable@vger.kernel.org
cc: linux-nfs@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9e687946

01 5月, 2013 1 次提交

lib/decompress.c: fix initconst · 6f9982bd

由 Andi Kleen 提交于 4月 30, 2013

Signed-off-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6f9982bd