- 11 3月, 2014 4 次提交
-
-
由 Jason Low 提交于
When running workloads that have high contention in mutexes on an 8 socket machine, mutex spinners would often spin for a long time with no lock owner. The main reason why this is occuring is in __mutex_unlock_common_slowpath(), if __mutex_slowpath_needs_to_unlock(), then the owner needs to acquire the mutex->wait_lock before releasing the mutex (setting lock->count to 1). When the wait_lock is contended, this delays the mutex from being released. We should be able to release the mutex without holding the wait_lock. Signed-off-by: NJason Low <jason.low2@hp.com> Cc: chegu_vinod@hp.com Cc: paulmck@linux.vnet.ibm.com Cc: Waiman.Long@hp.com Cc: torvalds@linux-foundation.org Cc: tglx@linutronix.de Cc: riel@redhat.com Cc: akpm@linux-foundation.org Cc: davidlohr@hp.com Cc: hpa@zytor.com Cc: andi@firstfloor.org Cc: aswin@hp.com Cc: scott.norton@hp.com Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1390936396-3962-4-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jason Low 提交于
The mutex->spin_mlock was introduced in order to ensure that only 1 thread spins for lock acquisition at a time to reduce cache line contention. When lock->owner is NULL and the lock->count is still not 1, the spinner(s) will continually release and obtain the lock->spin_mlock. This can generate quite a bit of overhead/contention, and also might just delay the spinner from getting the lock. This patch modifies the way optimistic spinners are queued by queuing before entering the optimistic spinning loop as oppose to acquiring before every call to mutex_spin_on_owner(). So in situations where the spinner requires a few extra spins before obtaining the lock, then there will only be 1 spinner trying to get the lock and it will avoid the overhead from unnecessarily unlocking and locking the spin_mlock. Signed-off-by: NJason Low <jason.low2@hp.com> Cc: tglx@linutronix.de Cc: riel@redhat.com Cc: akpm@linux-foundation.org Cc: davidlohr@hp.com Cc: hpa@zytor.com Cc: andi@firstfloor.org Cc: aswin@hp.com Cc: scott.norton@hp.com Cc: chegu_vinod@hp.com Cc: Waiman.Long@hp.com Cc: paulmck@linux.vnet.ibm.com Cc: torvalds@linux-foundation.org Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1390936396-3962-3-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jason Low 提交于
The mutex_can_spin_on_owner() function should also return false if the task needs to be rescheduled to avoid entering the MCS queue when it needs to reschedule. Signed-off-by: NJason Low <jason.low2@hp.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Waiman.Long@hp.com Cc: torvalds@linux-foundation.org Cc: tglx@linutronix.de Cc: riel@redhat.com Cc: akpm@linux-foundation.org Cc: davidlohr@hp.com Cc: hpa@zytor.com Cc: andi@firstfloor.org Cc: aswin@hp.com Cc: scott.norton@hp.com Cc: chegu_vinod@hp.com Cc: paulmck@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1390936396-3962-2-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
The mcs_spinlock code is not meant (or suitable) as a generic locking primitive, therefore take it away from the normal includes and place it in kernel/locking/. This way the locking primitives implemented there can use it as part of their implementation but we do not risk it getting used inapropriately. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-byirmpamgr7h25m5kyavwpzx@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 1月, 2014 2 次提交
-
-
由 Tim Chen 提交于
We will need the MCS lock code for doing optimistic spinning for rwsem and queued rwlock. Extracting the MCS code from mutex.c and put into its own file allow us to reuse this code easily. We also inline mcs_spin_lock and mcs_spin_unlock functions for better efficiency. Note that using the smp_load_acquire/smp_store_release pair used in mcs_lock and mcs_unlock is not sufficient to form a full memory barrier across cpus for many architectures (except x86). For applications that absolutely need a full barrier across multiple cpus with mcs_unlock and mcs_lock pair, smp_mb__after_unlock_lock() should be used after mcs_lock. Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1390347360.3138.63.camel@schen9-DESKSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Waiman Long 提交于
This patch corrects the way memory barriers are used in the MCS lock with smp_load_acquire and smp_store_release fucnctions. The previous barriers could leak critical sections if mcs lock is used by itself. It is not a problem when mcs lock is embedded in mutex but will be an issue when the mcs_lock is used elsewhere. The patch removes the incorrect barriers and put in correct barriers with the pair of functions smp_load_acquire and smp_store_release. Suggested-by: NMichel Lespinasse <walken@google.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NWaiman Long <Waiman.Long@hp.com> Signed-off-by: NJason Low <jason.low2@hp.com> Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1390347353.3138.62.camel@schen9-DESKSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 11月, 2013 1 次提交
-
-
由 Peter Zijlstra 提交于
Fix this docbook error: >> docproc: kernel/mutex.c: No such file or directory by updating the stale references to kernel/mutex.c. Reported-by: fengguang.wu@intel.com Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-34pikw1tlsskj65rrt5iusrq@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 11月, 2013 1 次提交
-
-
由 Peter Zijlstra 提交于
Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-1ditvncg30dgbpvrz2bxfmke@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 10月, 2013 1 次提交
-
-
由 Tetsuo Handa 提交于
Commit 040a0a37 ("mutex: Add support for wound/wait style locks") used "!__builtin_constant_p(p == NULL)" but gcc 3.x cannot handle such expression correctly, leading to boot failure when built with CONFIG_DEBUG_MUTEXES=y. Fix it by explicitly passing a bool which tells whether p != NULL or not. [ PeterZ: This is a sad patch, but provided it actually generates similar code I suppose its the best we can do bar whole sale deprecating gcc-3. ] Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Cc: peterz@infradead.org Cc: imirkin@alum.mit.edu Cc: daniel.vetter@ffwll.ch Cc: robdclark@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/201310171945.AGB17114.FSQVtHOJFOOFML@I-love.SAKURA.ne.jpSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 31 7月, 2013 1 次提交
-
-
由 Maarten Lankhorst 提交于
The check needs to be for > 1, because ctx->acquired is already incremented. This will prevent ww_mutex_lock_slow from returning -EDEADLK and not locking the mutex. It caused a lot of false gpu lockups on radeon with CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y because a function that shouldn't be able to return -EDEADLK did. Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/51F775B5.201@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 26 7月, 2013 1 次提交
-
-
由 Davidlohr Bueso 提交于
Fengguang reported the following warning when optimistic spinning is disabled (ie: make allnoconfig): kernel/mutex.c:599:1: warning: label 'done' defined but not used Remove the 'done' label altogether. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 23 7月, 2013 1 次提交
-
-
由 Davidlohr Bueso 提交于
Upon entering the slowpath, we immediately attempt to acquire the lock by checking if it is already unlocked. If we are lucky enough that this is the case, then we don't need to deal with any waiter related logic. Furthermore any checks for an empty wait_list are unnecessary as we already know that count is non-negative and hence no one is waiting for the lock. Move the count check and xchg calls to be done before any waiters are setup - including waiter debugging. Upon failure to acquire the lock, the xchg sets the counter to 0, instead of -1 as it was originally. This can be done here since we set it back to -1 right at the beginning of the loop so other waiters are woken up when the lock is released. When tested on a 8-socket (80 core) system against a vanilla 3.10-rc1 kernel, this patch provides some small performance benefits (+2-6%). While it could be considered in the noise level, the average percentages were stable across multiple runs and no performance regressions were seen. Two big winners, for small amounts of users (10-100), were the short and compute workloads had a +19.36% and +%15.76% in jobs per minute. Also change some break statements to 'goto slowpath', which IMO makes a little more intuitive to read. Signed-off-by: NDavidlohr Bueso <davidlohr.bueso@hp.com> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1372450398.2106.1.camel@buesod1.americas.hpqcorp.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 7月, 2013 1 次提交
-
-
由 Peter Zijlstra 提交于
mutex_can_spin_on_owner() is technically broken in that it would in theory allow the compiler to load lock->owner twice, seeing a pointer first time and a NULL pointer the second time. Linus pointed out that a compiler has to be seriously broken to not compile this correctly - but nevertheless this change is correct as it will better document the implementation. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Acked-by: NDavidlohr Bueso <davidlohr.bueso@hp.com> Acked-by: NWaiman Long <Waiman.Long@hp.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRik van Riel <riel@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Link: http://lkml.kernel.org/r/20130719183101.GA20909@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 7月, 2013 1 次提交
-
-
由 Maarten Lankhorst 提交于
Move the definitions for wound/wait mutexes out to a separate header, ww_mutex.h. This reduces clutter in mutex.h, and increases readability. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Dave Airlie <airlied@gmail.com> Link: http://lkml.kernel.org/r/51D675DC.3000907@canonical.com [ Tidied up the code a bit. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 26 6月, 2013 3 次提交
-
-
由 Daniel Vetter 提交于
Injects EDEADLK conditions at pseudo-random interval, with exponential backoff up to UINT_MAX (to ensure that every lock operation still completes in a reasonable time). This way we can test the wound slowpath even for ww mutex users where contention is never expected, and the ww deadlock avoidance algorithm is only needed for correctness against malicious userspace. An example would be protecting kernel modesetting properties, which thanks to single-threaded X isn't really expected to contend, ever. I've looked into using the CONFIG_FAULT_INJECTION infrastructure, but decided against it for two reasons: - EDEADLK handling is mandatory for ww mutex users and should never affect the outcome of a syscall. This is in contrast to -ENOMEM injection. So fine configurability isn't required. - The fault injection framework only allows to set a simple probability for failure. Now the probability that a ww mutex acquire stage with N locks will never complete (due to too many injected EDEADLK backoffs) is zero. But the expected number of ww_mutex_lock operations for the completely uncontended case would be O(exp(N)). The per-acuiqire ctx exponential backoff solution choosen here only results in O(log N) overhead due to injection and so O(log N * N) lock operations. This way we can fail with high probability (and so have good test coverage even for fancy backoff and lock acquisition paths) without running into patalogical cases. Note that EDEADLK will only ever be injected when we managed to acquire the lock. This prevents any behaviour changes for users which rely on the EALREADY semantics. Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: rostedt@goodmis.org Cc: daniel@ffwll.ch Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130620113117.4001.21681.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maarten Lankhorst 提交于
Wound/wait mutexes are used when other multiple lock acquisitions of a similar type can be done in an arbitrary order. The deadlock handling used here is called wait/wound in the RDBMS literature: The older tasks waits until it can acquire the contended lock. The younger tasks needs to back off and drop all the locks it is currently holding, i.e. the younger task is wounded. For full documentation please read Documentation/ww-mutex-design.txt. References: https://lwn.net/Articles/548909/Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Acked-by: NRob Clark <robdclark@gmail.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: rostedt@goodmis.org Cc: daniel@ffwll.ch Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maarten Lankhorst 提交于
This will allow me to call functions that have multiple arguments if fastpath fails. This is required to support ticket mutexes, because they need to be able to pass an extra argument to the fail function. Originally I duplicated the functions, by adding __mutex_fastpath_lock_retval_arg. This ended up being just a duplication of the existing function, so a way to test if fastpath was called ended up being better. This also cleaned up the reservation mutex patch some by being able to call an atomic_set instead of atomic_xchg, and making it easier to detect if the wrong unlock function was previously used. Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org Cc: robclark@gmail.com Cc: rostedt@goodmis.org Cc: daniel@ffwll.ch Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patserSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 4月, 2013 4 次提交
-
-
由 Waiman Long 提交于
Linus suggested that probably all the supported architectures can allow a negative mutex count without incorrect behavior, so we can then back out the architecture specific change and allow the mutex count to go to any negative number. That should further reduce contention for non-x86 architecture. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NWaiman Long <Waiman.Long@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Norton Scott J <scott.norton@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-5-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Waiman Long 提交于
The current mutex spinning code (with MUTEX_SPIN_ON_OWNER option turned on) allow multiple tasks to spin on a single mutex concurrently. A potential problem with the current approach is that when the mutex becomes available, all the spinning tasks will try to acquire the mutex more or less simultaneously. As a result, there will be a lot of cacheline bouncing especially on systems with a large number of CPUs. This patch tries to reduce this kind of contention by putting the mutex spinners into a queue so that only the first one in the queue will try to acquire the mutex. This will reduce contention and allow all the tasks to move forward faster. The queuing of mutex spinners is done using an MCS lock based implementation which will further reduce contention on the mutex cacheline than a similar ticket spinlock based implementation. This patch will add a new field into the mutex data structure for holding the MCS lock. This expands the mutex size by 8 bytes for 64-bit system and 4 bytes for 32-bit system. This overhead will be avoid if the MUTEX_SPIN_ON_OWNER option is turned off. The following table shows the jobs per minute (JPM) scalability data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The numactl command is used to restrict the running of the fserver workloads to 1/2/4/8 nodes with hyperthreading off. +-----------------+-----------+-----------+-------------+----------+ | Configuration | Mean JPM | Mean JPM | Mean JPM | % Change | | | w/o patch | patch 1 | patches 1&2 | 1->1&2 | +-----------------+------------------------------------------------+ | | User Range 1100 - 2000 | +-----------------+------------------------------------------------+ | 8 nodes, HT off | 227972 | 227237 | 305043 | +34.2% | | 4 nodes, HT off | 393503 | 381558 | 394650 | +3.4% | | 2 nodes, HT off | 334957 | 325240 | 338853 | +4.2% | | 1 node , HT off | 198141 | 197972 | 198075 | +0.1% | +-----------------+------------------------------------------------+ | | User Range 200 - 1000 | +-----------------+------------------------------------------------+ | 8 nodes, HT off | 282325 | 312870 | 332185 | +6.2% | | 4 nodes, HT off | 390698 | 378279 | 393419 | +4.0% | | 2 nodes, HT off | 336986 | 326543 | 340260 | +4.2% | | 1 node , HT off | 197588 | 197622 | 197582 | 0.0% | +-----------------+-----------+-----------+-------------+----------+ At low user range 10-100, the JPM differences were within +/-1%. So they are not that interesting. The fserver workload uses mutex spinning extensively. With just the mutex change in the first patch, there is no noticeable change in performance. Rather, there is a slight drop in performance. This mutex spinning patch more than recovers the lost performance and show a significant increase of +30% at high user load with the full 8 nodes. Similar improvements were also seen in a 3.8 kernel. The table below shows the %time spent by different kernel functions as reported by perf when running the fserver workload at 1500 users with all 8 nodes. +-----------------------+-----------+---------+-------------+ | Function | % time | % time | % time | | | w/o patch | patch 1 | patches 1&2 | +-----------------------+-----------+---------+-------------+ | __read_lock_failed | 34.96% | 34.91% | 29.14% | | __write_lock_failed | 10.14% | 10.68% | 7.51% | | mutex_spin_on_owner | 3.62% | 3.42% | 2.33% | | mspin_lock | N/A | N/A | 9.90% | | __mutex_lock_slowpath | 1.46% | 0.81% | 0.14% | | _raw_spin_lock | 2.25% | 2.50% | 1.10% | +-----------------------+-----------+---------+-------------+ The fserver workload for an 8-node system is dominated by the contention in the read/write lock. Mutex contention also plays a role. With the first patch only, mutex contention is down (as shown by the __mutex_lock_slowpath figure) which help a little bit. We saw only a few percents improvement with that. By applying patch 2 as well, the single mutex_spin_on_owner figure is now split out into an additional mspin_lock figure. The time increases from 3.42% to 11.23%. It shows a great reduction in contention among the spinners leading to a 30% improvement. The time ratio 9.9/2.33=4.3 indicates that there are on average 4+ spinners waiting in the spin_lock loop for each spinner in the mutex_spin_on_owner loop. Contention in other locking functions also go down by quite a lot. The table below shows the performance change of both patches 1 & 2 over patch 1 alone in other AIM7 workloads (at 8 nodes, hyperthreading off). +--------------+---------------+----------------+-----------------+ | Workload | mean % change | mean % change | mean % change | | | 10-100 users | 200-1000 users | 1100-2000 users | +--------------+---------------+----------------+-----------------+ | alltests | 0.0% | -0.8% | +0.6% | | five_sec | -0.3% | +0.8% | +0.8% | | high_systime | +0.4% | +2.4% | +2.1% | | new_fserver | +0.1% | +14.1% | +34.2% | | shared | -0.5% | -0.3% | -0.4% | | short | -1.7% | -9.8% | -8.3% | +--------------+---------------+----------------+-----------------+ The short workload is the only one that shows a decline in performance probably due to the spinner locking and queuing overhead. Signed-off-by: NWaiman Long <Waiman.Long@hp.com> Reviewed-by: NDavidlohr Bueso <davidlohr.bueso@hp.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Norton Scott J <scott.norton@hp.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Waiman Long 提交于
In the __mutex_lock_common() function, an initial entry into the lock slow path will cause two atomic_xchg instructions to be issued. Together with the atomic decrement in the fast path, a total of three atomic read-modify-write instructions will be issued in rapid succession. This can cause a lot of cache bouncing when many tasks are trying to acquire the mutex at the same time. This patch will reduce the number of atomic_xchg instructions used by checking the counter value first before issuing the instruction. The atomic_read() function is just a simple memory read. The atomic_xchg() function, on the other hand, can be up to 2 order of magnitude or even more in cost when compared with atomic_read(). By using atomic_read() to check the value first before calling atomic_xchg(), we can avoid a lot of unnecessary cache coherency traffic. The only downside with this change is that a task on the slow path will have a tiny bit less chance of getting the mutex when competing with another task in the fast path. The same is true for the atomic_cmpxchg() function in the mutex-spin-on-owner loop. So an atomic_read() is also performed before calling atomic_cmpxchg(). The mutex locking and unlocking code for the x86 architecture can allow any negative number to be used in the mutex count to indicate that some tasks are waiting for the mutex. I am not so sure if that is the case for the other architectures. So the default is to avoid atomic_xchg() if the count has already been set to -1. For x86, the check is modified to include all negative numbers to cover a larger case. The following table shows the jobs per minutes (JPM) scalability data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The numactl command is used to restrict the running of the high_systime workloads to 1/2/4/8 nodes with hyperthreading on and off. +-----------------+-----------+------------+----------+ | Configuration | Mean JPM | Mean JPM | % Change | | | w/o patch | with patch | | +-----------------+-----------------------------------+ | | User Range 1100 - 2000 | +-----------------+-----------------------------------+ | 8 nodes, HT on | 36980 | 148590 | +301.8% | | 8 nodes, HT off | 42799 | 145011 | +238.8% | | 4 nodes, HT on | 61318 | 118445 | +51.1% | | 4 nodes, HT off | 158481 | 158592 | +0.1% | | 2 nodes, HT on | 180602 | 173967 | -3.7% | | 2 nodes, HT off | 198409 | 198073 | -0.2% | | 1 node , HT on | 149042 | 147671 | -0.9% | | 1 node , HT off | 126036 | 126533 | +0.4% | +-----------------+-----------------------------------+ | | User Range 200 - 1000 | +-----------------+-----------------------------------+ | 8 nodes, HT on | 41525 | 122349 | +194.6% | | 8 nodes, HT off | 49866 | 124032 | +148.7% | | 4 nodes, HT on | 66409 | 106984 | +61.1% | | 4 nodes, HT off | 119880 | 130508 | +8.9% | | 2 nodes, HT on | 138003 | 133948 | -2.9% | | 2 nodes, HT off | 132792 | 131997 | -0.6% | | 1 node , HT on | 116593 | 115859 | -0.6% | | 1 node , HT off | 104499 | 104597 | +0.1% | +-----------------+------------+-----------+----------+ At low user range 10-100, the JPM differences were within +/-1%. So they are not that interesting. AIM7 benchmark run has a pretty large run-to-run variance due to random nature of the subtests executed. So a difference of less than +-5% may not be really significant. This patch improves high_systime workload performance at 4 nodes and up by maintaining transaction rates without significant drop-off at high node count. The patch has practically no impact on 1 and 2 nodes system. The table below shows the percentage time (as reported by perf record -a -s -g) spent on the __mutex_lock_slowpath() function by the high_systime workload at 1500 users for 2/4/8-node configurations with hyperthreading off. +---------------+-----------------+------------------+---------+ | Configuration | %Time w/o patch | %Time with patch | %Change | +---------------+-----------------+------------------+---------+ | 8 nodes | 65.34% | 0.69% | -99% | | 4 nodes | 8.70% | 1.02% | -88% | | 2 nodes | 0.41% | 0.32% | -22% | +---------------+-----------------+------------------+---------+ It is obvious that the dramatic performance improvement at 8 nodes was due to the drastic cut in the time spent within the __mutex_lock_slowpath() function. The table below show the improvements in other AIM7 workloads (at 8 nodes, hyperthreading off). +--------------+---------------+----------------+-----------------+ | Workload | mean % change | mean % change | mean % change | | | 10-100 users | 200-1000 users | 1100-2000 users | +--------------+---------------+----------------+-----------------+ | alltests | +0.6% | +104.2% | +185.9% | | five_sec | +1.9% | +0.9% | +0.9% | | fserver | +1.4% | -7.7% | +5.1% | | new_fserver | -0.5% | +3.2% | +3.1% | | shared | +13.1% | +146.1% | +181.5% | | short | +7.4% | +5.0% | +4.2% | +--------------+---------------+----------------+-----------------+ Signed-off-by: NWaiman Long <Waiman.Long@hp.com> Reviewed-by: NDavidlohr Bueso <davidlohr.bueso@hp.com> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Norton: Scott J <scott.norton@hp.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-3-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Waiman Long 提交于
As mentioned by Ingo, the SCHED_FEAT_OWNER_SPIN scheduler feature bit was really just an early hack to make with/without mutex-spinning testable. So it is no longer necessary. This patch removes the SCHED_FEAT_OWNER_SPIN feature bit and move the mutex spinning code from kernel/sched/core.c back to kernel/mutex.c which is where they should belong. Signed-off-by: NWaiman Long <Waiman.Long@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chandramouleeswaran Aswin <aswin@hp.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: Norton Scott J <scott.norton@hp.com> Cc: Rik van Riel <riel@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1366226594-5506-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 2月, 2013 1 次提交
-
-
由 Clark Williams 提交于
Move rt scheduler definitions out of include/linux/sched.h into new file include/linux/sched/rt.h Signed-off-by: NClark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 01 3月, 2012 1 次提交
-
-
由 Thomas Gleixner 提交于
Coccinelle based conversion. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-24swm5zut3h9c4a6s46x8rws@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 31 10月, 2011 1 次提交
-
-
由 Paul Gortmaker 提交于
The changed files were only including linux/module.h for the EXPORT_SYMBOL infrastructure, and nothing else. Revector them onto the isolated export header for faster compile times. Nothing to see here but a whole lot of instances of: -#include <linux/module.h> +#include <linux/export.h> This commit is only changing the kernel dir; next targets will probably be mm, fs, the arch dirs, etc. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 25 5月, 2011 1 次提交
-
-
由 Peter Zijlstra 提交于
In order to convert i_mmap_lock to a mutex we need a mutex equivalent to spin_lock_nest_lock(), thus provide the mutex_lock_nest_lock() annotation. As with spin_lock_nest_lock(), mutex_lock_nest_lock() allows annotation of the locking pattern where an outer lock serializes the acquisition order of nested locks. That is, if every time you lock multiple locks A, say A1 and A2 you first acquire N, the order of acquiring A1 and A2 is irrelevant. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 4月, 2011 1 次提交
-
-
由 Jonathan Corbet 提交于
Neil Brown pointed out that lock_depth somehow escaped the BKL removal work. Let's get rid of it now. Note that the perf scripting utilities still have a bunch of code for dealing with common_lock_depth in tracepoints; I have left that in place in case anybody wants to use that code with older kernels. Suggested-by: NNeil Brown <neilb@suse.de> Signed-off-by: NJonathan Corbet <corbet@lwn.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/20110422111910.456c0e84@bike.lwn.netSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 14 4月, 2011 1 次提交
-
-
由 Peter Zijlstra 提交于
Since we now have p->on_cpu unconditionally available, use it to re-implement mutex_spin_on_owner. Requested-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NFrank Rowand <frank.rowand@am.sony.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110405152728.826338173@chello.nl
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 05 1月, 2011 1 次提交
-
-
由 Gerald Schaefer 提交于
The spinning mutex implementation uses cpu_relax() in busy loops as a compiler barrier. Depending on the architecture, cpu_relax() may do more than needed in this specific mutex spin loops. On System z we also give up the time slice of the virtual cpu in cpu_relax(), which prevents effective spinning on the mutex. This patch replaces cpu_relax() in the spinning mutex code with arch_mutex_cpu_relax(), which can be defined by each architecture that selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so this patch should not affect other architectures than System z for now. Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290437256.7455.4.camel@thinkpad> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 26 11月, 2010 1 次提交
-
-
由 Gerald Schaefer 提交于
The spinning mutex implementation uses cpu_relax() in busy loops as a compiler barrier. Depending on the architecture, cpu_relax() may do more than needed in this specific mutex spin loops. On System z we also give up the time slice of the virtual cpu in cpu_relax(), which prevents effective spinning on the mutex. This patch replaces cpu_relax() in the spinning mutex code with arch_mutex_cpu_relax(), which can be defined by each architecture that selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so this patch should not affect other architectures than System z for now. Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290437256.7455.4.camel@thinkpad> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 03 9月, 2010 1 次提交
-
-
由 Randy Dunlap 提交于
Fix kernel-doc notation in linux/mutex.h and kernel/mutex.c, then add these 2 files to the kernel-locking docbook as the Mutex API reference chapter. Add one API function to mutex-design.txt and correct a typo in that file. Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Cc: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <20100902154816.6cc2f9ad.randy.dunlap@oracle.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 19 5月, 2010 1 次提交
-
-
由 Tony Breeds 提交于
Currently, we can hit a nasty case with optimistic spinning on mutexes: CPU A tries to take a mutex, while holding the BKL CPU B tried to take the BLK while holding the mutex This looks like a AB-BA scenario but in practice, is allowed and happens due to the auto-release on schedule() nature of the BKL. In that case, the optimistic spinning code can get us into a situation where instead of going to sleep, A will spin waiting for B who is spinning waiting for A, and the only way out of that loop is the need_resched() test in mutex_spin_on_owner(). This patch fixes it by completely disabling spinning if we own the BKL. This adds one more detail to the extensive list of reasons why it's a bad idea for kernel code to be holding the BKL. Signed-off-by: NTony Breeds <tony@bakeyournoodle.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: <stable@kernel.org> LKML-Reference: <20100519054636.GC12389@ozlabs.org> [ added an unlikely() attribute to the branch ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 03 12月, 2009 1 次提交
-
-
由 Frederic Weisbecker 提交于
Introduce CONFIG_MUTEX_SPIN_ON_OWNER so that we can centralize in a single place the conditions that determine its definition and use. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> LKML-Reference: <1259783357-8542-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org>
-
- 30 4月, 2009 1 次提交
-
-
由 Andrew Morton 提交于
include/linux/mutex.h:136: warning: 'mutex_lock' declared inline after being called include/linux/mutex.h:136: warning: previous declaration of 'mutex_lock' was here uninline it. [ Impact: clean up and uninline, address compiler warning ] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <200904292318.n3TNIsi6028340@imap1.linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 21 4月, 2009 1 次提交
-
-
由 Peter Zijlstra 提交于
Lai Jiangshan's patch reminded me that I promised Nick to remove that extra call overhead in schedule(). Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090313112300.927414207@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 10 4月, 2009 1 次提交
-
-
由 Heiko Carstens 提交于
Impact: performance regression fix for s390 The adaptive spinning mutexes will not always do what one would expect on virtualized architectures like s390. Especially the cpu_relax() loop in mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled away by the hypervisor. We would end up in a cpu_relax() loop when there is no chance that the state of the mutex changes until the target cpu has been scheduled again by the hypervisor. For that reason we should change the default behaviour to no-spin on s390. We do have an instruction which allows to yield the current cpu in favour of a different target cpu. Also we have an instruction which allows us to figure out if the target cpu is physically backed. However we need to do some performance tests until we can come up with a solution that will do the right thing on s390. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> LKML-Reference: <20090409184834.7a0df7b2@osiris.boeblingen.de.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 06 4月, 2009 1 次提交
-
-
由 H. Peter Anvin 提交于
Impact: build fix mutex_lock() is was defined inline in kernel/mutex.c, but wasn't declared so not in <linux/mutex.h>. This didn't cause a problem until checkin 3a2d367d9aabac486ac4444c6c7ec7a1dab16267 added the atomic_dec_and_mutex_lock() inline in between declaration and definion. This broke building with CONFIG_ALLOW_WARNINGS=n, e.g. make allnoconfig. Either from the source code nor the allnoconfig binary output I cannot find any internal references to mutex_lock() in kernel/mutex.c, so presumably this "inline" is now-useless legacy. Cc: Eric Paris <eparis@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Orig-LKML-Reference: <tip-3a2d367d9aabac486ac4444c6c7ec7a1dab16267@git.kernel.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 15 1月, 2009 3 次提交
-
-
由 Chris Mason 提交于
Spin more agressively. This is less fair but also markedly faster. The numbers: * dbench 50 (higher is better): spin 1282MB/s v10 548MB/s v10 no wait 1868MB/s * 4k creates (numbers in files/second higher is better): spin avg 200.60 median 193.20 std 19.71 high 305.93 low 186.82 v10 avg 180.94 median 175.28 std 13.91 high 229.31 low 168.73 v10 no wait avg 232.18 median 222.38 std 22.91 high 314.66 low 209.12 * File stats (numbers in seconds, lower is better): spin 2.27s v10 5.1s v10 no wait 1.6s ( The source changes are smaller than they look, I just moved the need_resched checks in __mutex_lock_common after the cmpxchg. ) Signed-off-by: NChris Mason <chris.mason@oracle.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Change mutex contention behaviour such that it will sometimes busy wait on acquisition - moving its behaviour closer to that of spinlocks. This concept got ported to mainline from the -rt tree, where it was originally implemented for rtmutexes by Steven Rostedt, based on work by Gregory Haskins. Testing with Ingo's test-mutex application (http://lkml.org/lkml/2006/1/8/50) gave a 345% boost for VFS scalability on my testbox: # ./test-mutex-shm V 16 10 | grep "^avg ops" avg ops/sec: 296604 # ./test-mutex-shm V 16 10 | grep "^avg ops" avg ops/sec: 85870 The key criteria for the busy wait is that the lock owner has to be running on a (different) cpu. The idea is that as long as the owner is running, there is a fair chance it'll release the lock soon, and thus we'll be better off spinning instead of blocking/scheduling. Since regular mutexes (as opposed to rtmutexes) do not atomically track the owner, we add the owner in a non-atomic fashion and deal with the races in the slowpath. Furthermore, to ease the testing of the performance impact of this new code, there is means to disable this behaviour runtime (without having to reboot the system), when scheduler debugging is enabled (CONFIG_SCHED_DEBUG=y), by issuing the following command: # echo NO_OWNER_SPIN > /debug/sched_features This command re-enables spinning again (this is also the default): # echo OWNER_SPIN > /debug/sched_features Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
The problem is that dropping the spinlock right before schedule is a voluntary preemption point and can cause a schedule, right after which we schedule again. Fix this inefficiency by keeping preemption disabled until we schedule, do this by explicity disabling preemption and providing a schedule() variant that assumes preemption is already disabled. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-