1. 28 2月, 2017 2 次提交
    • M
      ipc/sem: add hysteresis · 9de5ab8a
      Manfred Spraul 提交于
      sysv sem has two lock modes: One with per-semaphore locks, one lock mode
      with a single global lock for the whole array.  When switching from the
      per-semaphore locks to the global lock, all per-semaphore locks must be
      scanned for ongoing operations.
      
      The patch adds a hysteresis for switching from the global lock to the
      per semaphore locks.  This reduces how often the per-semaphore locks
      must be scanned.
      
      Compared to the initial patch, this is a simplified solution: Setting
      USE_GLOBAL_LOCK_HYSTERESIS to 1 restores the current behavior.
      
      In theory, a workload with exactly 10 simple sops and then one complex
      op now scales a bit worse, but this is pure theory: If there is
      concurrency, the it won't be exactly 10:1:10:1:10:1:...  If there is no
      concurrency, then there is no need for scalability.
      
      Link: http://lkml.kernel.org/r/1476851896-3590-3-git-send-email-manfred@colorfullife.comSigned-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: <1vier1@web.de>
      Cc: kernel test robot <xiaolong.ye@intel.com>
      Cc: <felixh@informatik.uni-bremen.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9de5ab8a
    • M
      ipc/sem.c: avoid using spin_unlock_wait() · 27d7be18
      Manfred Spraul 提交于
      a) The ACQUIRE in spin_lock() applies to the read, not to the store, at
         least for powerpc.  This forces to add a smp_mb() into the fast path.
      
      b) The memory barrier provided by spin_unlock_wait() is right now arch
         dependent.
      
      Therefore: Use spin_lock()/spin_unlock() instead of spin_unlock_wait().
      
      Advantage: faster single op semop calls(), observed +8.9% on x86.  (the
      other solution would be arch dependencies in ipc/sem).
      
      Disadvantage: slower complex op semop calls, if (and only if) there are
      no sleeping operations.
      
      The next patch adds hysteresis, this further reduces the probability
      that the slow path is used.
      
      Link: http://lkml.kernel.org/r/1476851896-3590-2-git-send-email-manfred@colorfullife.comSigned-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: <1vier1@web.de>
      Cc: kernel test robot <xiaolong.ye@intel.com>
      Cc: <felixh@informatik.uni-bremen.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27d7be18
  2. 25 2月, 2017 2 次提交
  3. 11 1月, 2017 1 次提交
  4. 15 12月, 2016 9 次提交
  5. 21 11月, 2016 1 次提交
  6. 28 10月, 2016 1 次提交
  7. 12 10月, 2016 6 次提交
    • N
      ipc/sem.c: add cond_resched in exit_sme · 2a1613a5
      Nikolay Borisov 提交于
      In CONFIG_PREEMPT=n kernel a softlockup was observed while the for loop in
      exit_sem.  Apparently it's possible for the loop to take quite a long time
      and it doesn't have a scheduling point in it.  Since the codes is
      executing under an rcu read section this may also cause rcu stalls, which
      in turn block synchronize_rcu operations, which more or less de-stabilises
      the whole system.
      
      Fix this by introducing a cond_resched() at the beginning of the loop.
      
      So this patch fixes the following:
      
        NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119]
        CPU: 10 PID: 18119 Comm: httpd Tainted: G           O    4.4.20-clouder2 #6
        Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
        task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000
        RIP: 0010:[<ffffffff81614bc7>]  [<ffffffff81614bc7>] _raw_spin_lock+0x17/0x30
        RSP: 0018:ffff881c95553e40  EFLAGS: 00000246
        RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d
        RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4
        RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88
        R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0
        R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0
        FS:  0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0
        Call Trace:
          ? exit_sem+0x7c/0x280
          do_exit+0x338/0xb40
          do_group_exit+0x43/0xd0
          SyS_exit_group+0x14/0x20
          entry_SYSCALL_64_fastpath+0x16/0x6e
      
      Link: http://lkml.kernel.org/r/1475154992-6363-1-git-send-email-kernel@kyup.comSigned-off-by: NNikolay Borisov <kernel@kyup.com>
      Cc: Herton R. Krzesinski <herton@redhat.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a1613a5
    • D
      ipc/msg: avoid waking sender upon full queue · ed27f912
      Davidlohr Bueso 提交于
      Blocked tasks queued in q_senders waiting for their message to fit in the
      queue are blindly awoken every time we think there's a remote chance this
      might happen.  This could cause numerous (and expensive -- thundering
      herd-ish) bogus wakeups if the queue is still really full.  Adding to the
      scheduling cost/overhead, there's also the fact that we need to take the
      ipc object lock and requeue ourselves in the q_senders list.
      
      By keeping track of the blocked sender's message size, we can know
      previously if the wakeup ought to occur or not.  Otherwise, to maintain
      the current wakeup order we just move it to the tail.  This is exactly
      what occurs right now if the sender needs to go back to sleep.
      
      The case of EIDRM is left completely untouched, as we need to wakeup all
      the tasks, and shouldn't be playing games in the first place.
      
      This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context
      switches (avg out of ten runs).  Although these tests are really about
      functionality (as opposed to performance), is does show the direct
      benefits of the optimization.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed27f912
    • D
      ipc/msg: make ss_wakeup() kill arg boolean · d0d6a2a9
      Davidlohr Bueso 提交于
      ... 'tis annoying.
      
      Link: http://lkml.kernel.org/r/1469748819-19484-4-git-send-email-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dave@stgolabs.net>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0d6a2a9
    • D
      ipc/msg: batch queue sender wakeups · e3658538
      Davidlohr Bueso 提交于
      Currently the use of wake_qs in sysv msg queues are only for the receiver
      tasks that are blocked on the queue.  But blocked sender tasks (due to
      queue size constraints) still are awoken with the ipc object lock held,
      which can be a problem particularly for small sized queues and far from
      gracious for -rt (just like it was for the receiver side).
      
      The paths that actually wakeup a sender are obviously related to when we
      are either getting rid of the queue or after (some) space is freed-up
      after a receiver takes the msg (msgrcv).  Furthermore, with the exception
      of msgrcv, we can always piggy-back on expunge_all that has its own tasks
      lined-up for waking.  Finally, upon unlinking the message, it should be no
      problem delaying the wakeups a bit until after we've released the lock.
      
      Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3658538
    • S
      ipc/msg: implement lockless pipelined wakeups · ee51636c
      Sebastian Andrzej Siewior 提交于
      This patch moves the wakeup_process() invocation so it is not done under
      the ipc global lock by making use of a lockless wake_q.  With this change,
      the waiter is woken up once the message has been assigned and it does not
      need to loop on SMP if the message points to NULL.  In the signal case we
      still need to check the pointer under the lock to verify the state.
      
      This change should also avoid the introduction of preempt_disable() in -RT
      which avoids a busy-loop which pools for the NULL -> !NULL change if the
      waiter has a higher priority compared to the waker.
      
      By making use of wake_qs, the logic of sysv msg queues is greatly
      simplified (and very well suited as we can batch lockless wakeups),
      particularly around the lockless receive algorithm.
      
      This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
      Radeon R7, 12 Compute Cores 4C+8G":
      
      test             |   before   |   after    | diff
      -----------------|------------|------------|----------
      pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
      pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
      pmsg-shared 2 60 | 22,884,224 | 24,278,200 | +  ~6.09 %
      
      Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.netSigned-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee51636c
    • M
      ipc/sem.c: fix complex_count vs. simple op race · 5864a2fd
      Manfred Spraul 提交于
      Commit 6d07b68c ("ipc/sem.c: optimize sem_lock()") introduced a
      race:
      
      sem_lock has a fast path that allows parallel simple operations.
      There are two reasons why a simple operation cannot run in parallel:
       - a non-simple operations is ongoing (sma->sem_perm.lock held)
       - a complex operation is sleeping (sma->complex_count != 0)
      
      As both facts are stored independently, a thread can bypass the current
      checks by sleeping in the right positions.  See below for more details
      (or kernel bugzilla 105651).
      
      The patch fixes that by creating one variable (complex_mode)
      that tracks both reasons why parallel operations are not possible.
      
      The patch also updates stale documentation regarding the locking.
      
      With regards to stable kernels:
      The patch is required for all kernels that include the
      commit 6d07b68c ("ipc/sem.c: optimize sem_lock()") (3.10?)
      
      The alternative is to revert the patch that introduced the race.
      
      The patch is safe for backporting, i.e. it makes no assumptions
      about memory barriers in spin_unlock_wait().
      
      Background:
      Here is the race of the current implementation:
      
      Thread A: (simple op)
      - does the first "sma->complex_count == 0" test
      
      Thread B: (complex op)
      - does sem_lock(): This includes an array scan. But the scan can't
        find Thread A, because Thread A does not own sem->lock yet.
      - the thread does the operation, increases complex_count,
        drops sem_lock, sleeps
      
      Thread A:
      - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
      - sleeps before the complex_count test
      
      Thread C: (complex op)
      - does sem_lock (no array scan, complex_count==1)
      - wakes up Thread B.
      - decrements complex_count
      
      Thread A:
      - does the complex_count test
      
      Bug:
      Now both thread A and thread C operate on the same array, without
      any synchronization.
      
      Fixes: 6d07b68c ("ipc/sem.c: optimize sem_lock()")
      Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
      Reported-by: <felixh@informatik.uni-bremen.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <1vier1@web.de>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5864a2fd
  8. 28 9月, 2016 1 次提交
  9. 23 9月, 2016 2 次提交
  10. 09 8月, 2016 1 次提交
  11. 03 8月, 2016 2 次提交
  12. 27 7月, 2016 2 次提交
  13. 24 6月, 2016 4 次提交
  14. 14 6月, 2016 2 次提交
  15. 24 5月, 2016 1 次提交
  16. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  17. 23 3月, 2016 1 次提交
    • D
      ipc/sem: make semctl setting sempid consistent · a5f4db87
      Davidlohr Bueso 提交于
      As indicated by bug#112271, Linux sets the sempid value upon semctl, and
      not only for semop calls.  However, within semctl we only do this for
      SETVAL, leaving SETALL without updating the field, and therefore rather
      inconsistent behavior when compared to other Unices.
      
      There is really no documentation regarding this and therefore users
      should not make assumptions.  With this patch, along with updating
      semctl.2 manpages, this scenario should become less ambiguous As such,
      set sempid on SETALL cmd.
      
      Also update some in-code documentation, specifying where the sempid is
      set.
      
      Passes ltp and custom testcase where a child (fork) does SETALL to the
      set.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Reported-by: NPhilip Semanchuk <linux_kernel.20.ick@spamgourmet.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Herton R. Krzesinski <herton@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5f4db87
  18. 19 2月, 2016 1 次提交