1. 15 12月, 2016 5 次提交
    • D
      ipc/sem: optimize perform_atomic_semop() · 4ce33ec2
      Davidlohr Bueso 提交于
      This is the main workhorse that deals with semop user calls such that
      the waitforzero or semval update operations, on the set, can complete on
      not as the sma currently stands.  Currently, the set is iterated twice
      (setting semval, then backwards for the sempid value).  Slowpaths, and
      particularly SEM_UNDO calls, must undo any altered sem when it is
      detected that the caller must block or has errored-out.
      
      With larger sets, there can occur situations where this involves a lot
      of cycles and can obviously be a suboptimal use of cached resources in
      shared memory.  Ie, discarding CPU caches that are also calling semop
      and have the sembuf cached (and can complete), while the current lock
      holder doing the semop will block, error, or does a waitforzero
      operation.
      
      This patch proposes still iterating the set twice, but the first scan is
      read-only, and we perform the actual updates afterward, once we know
      that the call will succeed.  In order to not suffer from the overhead of
      dealing with sops that act on the same sem_num, such (rare) cases use
      perform_atomic_semop_slow(), which is exactly what we have now.
      Duplicates are detected before grabbing sem_lock, and uses simple a
      32/64-bit hash array variable to based on the sem_num we are working on.
      
      In addition add some comments to when we expect to the caller to block.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [colin.king@canonical.com: ensure we left shift a ULL rather than a 32 bit integer]
        Link: http://lkml.kernel.org/r/20161028181129.7311-1-colin.king@canonical.com
      Link: http://lkml.kernel.org/r/20160921194603.GB21438@linux-80c1.suseSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ce33ec2
    • D
      ipc/sem: rework task wakeups · 9ae949fa
      Davidlohr Bueso 提交于
      Our sysv sems have been using the notion of lockless wakeups for a
      while, ever since commit 0a2b9d4c ("ipc/sem.c: move wake_up_process
      out of the spinlock section"), in order to reduce the sem_lock hold
      times.  This in-house pending queue can be replaced by wake_q (just like
      all the rest of ipc now), in that it provides the following advantages:
      
       o Simplifies and gets rid of unnecessary code.
      
       o We get rid of the IN_WAKEUP complexities. Given that wake_q_add()
         grabs reference to the task, if awoken due to an unrelated event,
         between the wake_q_add() and wake_up_q() window, we cannot race with
         sys_exit and the imminent call to wake_up_process().
      
       o By not spinning IN_WAKEUP, we no longer need to disable preemption.
      
      In consequence, the wakeup paths (after schedule(), that is) must
      acknowledge an external signal/event, as well spurious wakeup occurring
      during the pending wakeup window.  Obviously no changes in semantics
      that could be visible to the user.  The fastpath is _only_ for when we
      know for sure that we were awoken due to a the waker's successful semop
      call (queue.status is not -EINTR).
      
      On a 48-core Haswell, running the ipcscale 'waitforzero' test, the
      following is seen with increasing thread counts:
      
                                     v4.8-rc5                v4.8-rc5
                                                              semopv2
      Hmean    sembench-sem-2      574733.00 (  0.00%)   578322.00 (  0.62%)
      Hmean    sembench-sem-8      811708.00 (  0.00%)   824689.00 (  1.59%)
      Hmean    sembench-sem-12     842448.00 (  0.00%)   845409.00 (  0.35%)
      Hmean    sembench-sem-21     933003.00 (  0.00%)   977748.00 (  4.80%)
      Hmean    sembench-sem-48     935910.00 (  0.00%)  1004759.00 (  7.36%)
      Hmean    sembench-sem-79     937186.00 (  0.00%)   983976.00 (  4.99%)
      Hmean    sembench-sem-234    974256.00 (  0.00%)  1060294.00 (  8.83%)
      Hmean    sembench-sem-265    975468.00 (  0.00%)  1016243.00 (  4.18%)
      Hmean    sembench-sem-296    991280.00 (  0.00%)  1042659.00 (  5.18%)
      Hmean    sembench-sem-327    975415.00 (  0.00%)  1029977.00 (  5.59%)
      Hmean    sembench-sem-358   1014286.00 (  0.00%)  1049624.00 (  3.48%)
      Hmean    sembench-sem-389    972939.00 (  0.00%)  1043127.00 (  7.21%)
      Hmean    sembench-sem-420    981909.00 (  0.00%)  1056747.00 (  7.62%)
      Hmean    sembench-sem-451    990139.00 (  0.00%)  1051609.00 (  6.21%)
      Hmean    sembench-sem-482    965735.00 (  0.00%)  1040313.00 (  7.72%)
      
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: merge fix for WAKE_Q to DEFINE_WAKE_Q rename]
        Link: http://lkml.kernel.org/r/20161122210410.5eca9fc2@canb.auug.org.au
      Link: http://lkml.kernel.org/r/1474225896-10066-3-git-send-email-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ae949fa
    • D
      ipc/sem: do not call wake_sem_queue_do() prematurely ... as this call should... · 248e7357
      Davidlohr Bueso 提交于
      ipc/sem: do not call wake_sem_queue_do() prematurely ... as this call should obviously be paired with its _prepare()
      
      counterpart.  At least whenever possible, as there is no harm in calling
      it bogusly as we do now in a few places.  Immediate error semop(2) paths
      that are far from ever having the task block can be simplified and avoid
      a few unnecessary loads on their way out of the call as it is not deeply
      nested.
      
      Link: http://lkml.kernel.org/r/1474225896-10066-2-git-send-email-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      248e7357
    • S
      ipc/shm.c: coding style fixes · 63980c80
      Shailesh Pandey 提交于
      This patch fixes below warnings:
      
        WARNING: Missing a blank line after declarations
        WARNING: Block comments use a trailing */ on a separate line
        ERROR: spaces required around that '=' (ctx:WxV)
      
      Above warnings were reported by checkpatch.pl
      
      Link: http://lkml.kernel.org/r/1478604980-18062-1-git-send-email-p.shailesh@samsung.comSigned-off-by: NShailesh Pandey <p.shailesh@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63980c80
    • J
      ipc: msg, make msgrcv work with LONG_MIN · 99989835
      Jiri Slaby 提交于
      When LONG_MIN is passed to msgrcv, one would expect to recieve any
      message.  But convert_mode does *msgtyp = -*msgtyp and -LONG_MIN is
      undefined.  In particular, with my gcc -LONG_MIN produces -LONG_MIN
      again.
      
      So handle this case properly by assigning LONG_MAX to *msgtyp if
      LONG_MIN was specified as msgtyp to msgrcv.
      
      This code:
        long msg[] = { 100, 200 };
        int m = msgget(IPC_PRIVATE, IPC_CREAT | 0644);
        msgsnd(m, &msg, sizeof(msg), 0);
        msgrcv(m, &msg, sizeof(msg), LONG_MIN, 0);
      
      produces currently nothing:
      
        msgget(IPC_PRIVATE, IPC_CREAT|0644)     = 65538
        msgsnd(65538, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
        msgrcv(65538, ...
      
      Except a UBSAN warning:
      
        UBSAN: Undefined behaviour in ipc/msg.c:745:13
        negation of -9223372036854775808 cannot be represented in type 'long int':
      
      With the patch, I see what I expect:
      
        msgget(IPC_PRIVATE, IPC_CREAT|0644)     = 0
        msgsnd(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, 0) = 0
        msgrcv(0, {100, "\310\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16, -9223372036854775808, 0) = 16
      
      Link: http://lkml.kernel.org/r/20161024082633.10148-1-jslaby@suse.czSigned-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99989835
  2. 21 11月, 2016 1 次提交
  3. 28 10月, 2016 1 次提交
  4. 12 10月, 2016 6 次提交
    • N
      ipc/sem.c: add cond_resched in exit_sme · 2a1613a5
      Nikolay Borisov 提交于
      In CONFIG_PREEMPT=n kernel a softlockup was observed while the for loop in
      exit_sem.  Apparently it's possible for the loop to take quite a long time
      and it doesn't have a scheduling point in it.  Since the codes is
      executing under an rcu read section this may also cause rcu stalls, which
      in turn block synchronize_rcu operations, which more or less de-stabilises
      the whole system.
      
      Fix this by introducing a cond_resched() at the beginning of the loop.
      
      So this patch fixes the following:
      
        NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [httpd:18119]
        CPU: 10 PID: 18119 Comm: httpd Tainted: G           O    4.4.20-clouder2 #6
        Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
        task: ffff88348d695280 ti: ffff881c95550000 task.ti: ffff881c95550000
        RIP: 0010:[<ffffffff81614bc7>]  [<ffffffff81614bc7>] _raw_spin_lock+0x17/0x30
        RSP: 0018:ffff881c95553e40  EFLAGS: 00000246
        RAX: 0000000000000000 RBX: ffff883161b1eea8 RCX: 000000000000000d
        RDX: 0000000000000001 RSI: 000000000000000e RDI: ffff883161b1eea4
        RBP: ffff881c95553ea0 R08: ffff881c95553e68 R09: ffff883fef376f88
        R10: ffff881fffb58c20 R11: ffffea0072556600 R12: ffff883161b1eea0
        R13: ffff88348d695280 R14: ffff883dec427000 R15: ffff8831621672a0
        FS:  0000000000000000(0000) GS:ffff881fffb40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f3b3723e020 CR3: 0000000001c0a000 CR4: 00000000001406e0
        Call Trace:
          ? exit_sem+0x7c/0x280
          do_exit+0x338/0xb40
          do_group_exit+0x43/0xd0
          SyS_exit_group+0x14/0x20
          entry_SYSCALL_64_fastpath+0x16/0x6e
      
      Link: http://lkml.kernel.org/r/1475154992-6363-1-git-send-email-kernel@kyup.comSigned-off-by: NNikolay Borisov <kernel@kyup.com>
      Cc: Herton R. Krzesinski <herton@redhat.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a1613a5
    • D
      ipc/msg: avoid waking sender upon full queue · ed27f912
      Davidlohr Bueso 提交于
      Blocked tasks queued in q_senders waiting for their message to fit in the
      queue are blindly awoken every time we think there's a remote chance this
      might happen.  This could cause numerous (and expensive -- thundering
      herd-ish) bogus wakeups if the queue is still really full.  Adding to the
      scheduling cost/overhead, there's also the fact that we need to take the
      ipc object lock and requeue ourselves in the q_senders list.
      
      By keeping track of the blocked sender's message size, we can know
      previously if the wakeup ought to occur or not.  Otherwise, to maintain
      the current wakeup order we just move it to the tail.  This is exactly
      what occurs right now if the sender needs to go back to sleep.
      
      The case of EIDRM is left completely untouched, as we need to wakeup all
      the tasks, and shouldn't be playing games in the first place.
      
      This patch was seen to save on the 'msgctl10' ltp testcase ~15% in context
      switches (avg out of ten runs).  Although these tests are really about
      functionality (as opposed to performance), is does show the direct
      benefits of the optimization.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/1469748819-19484-6-git-send-email-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed27f912
    • D
      ipc/msg: make ss_wakeup() kill arg boolean · d0d6a2a9
      Davidlohr Bueso 提交于
      ... 'tis annoying.
      
      Link: http://lkml.kernel.org/r/1469748819-19484-4-git-send-email-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dave@stgolabs.net>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0d6a2a9
    • D
      ipc/msg: batch queue sender wakeups · e3658538
      Davidlohr Bueso 提交于
      Currently the use of wake_qs in sysv msg queues are only for the receiver
      tasks that are blocked on the queue.  But blocked sender tasks (due to
      queue size constraints) still are awoken with the ipc object lock held,
      which can be a problem particularly for small sized queues and far from
      gracious for -rt (just like it was for the receiver side).
      
      The paths that actually wakeup a sender are obviously related to when we
      are either getting rid of the queue or after (some) space is freed-up
      after a receiver takes the msg (msgrcv).  Furthermore, with the exception
      of msgrcv, we can always piggy-back on expunge_all that has its own tasks
      lined-up for waking.  Finally, upon unlinking the message, it should be no
      problem delaying the wakeups a bit until after we've released the lock.
      
      Link: http://lkml.kernel.org/r/1469748819-19484-3-git-send-email-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3658538
    • S
      ipc/msg: implement lockless pipelined wakeups · ee51636c
      Sebastian Andrzej Siewior 提交于
      This patch moves the wakeup_process() invocation so it is not done under
      the ipc global lock by making use of a lockless wake_q.  With this change,
      the waiter is woken up once the message has been assigned and it does not
      need to loop on SMP if the message points to NULL.  In the signal case we
      still need to check the pointer under the lock to verify the state.
      
      This change should also avoid the introduction of preempt_disable() in -RT
      which avoids a busy-loop which pools for the NULL -> !NULL change if the
      waiter has a higher priority compared to the waker.
      
      By making use of wake_qs, the logic of sysv msg queues is greatly
      simplified (and very well suited as we can batch lockless wakeups),
      particularly around the lockless receive algorithm.
      
      This has been tested with Manred's pmsg-shared tool on a "AMD A10-7800
      Radeon R7, 12 Compute Cores 4C+8G":
      
      test             |   before   |   after    | diff
      -----------------|------------|------------|----------
      pmsg-shared 8 60 | 19,347,422 | 30,442,191 | + ~57.34 %
      pmsg-shared 4 60 | 21,367,197 | 35,743,458 | + ~67.28 %
      pmsg-shared 2 60 | 22,884,224 | 24,278,200 | +  ~6.09 %
      
      Link: http://lkml.kernel.org/r/1469748819-19484-2-git-send-email-dave@stgolabs.netSigned-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee51636c
    • M
      ipc/sem.c: fix complex_count vs. simple op race · 5864a2fd
      Manfred Spraul 提交于
      Commit 6d07b68c ("ipc/sem.c: optimize sem_lock()") introduced a
      race:
      
      sem_lock has a fast path that allows parallel simple operations.
      There are two reasons why a simple operation cannot run in parallel:
       - a non-simple operations is ongoing (sma->sem_perm.lock held)
       - a complex operation is sleeping (sma->complex_count != 0)
      
      As both facts are stored independently, a thread can bypass the current
      checks by sleeping in the right positions.  See below for more details
      (or kernel bugzilla 105651).
      
      The patch fixes that by creating one variable (complex_mode)
      that tracks both reasons why parallel operations are not possible.
      
      The patch also updates stale documentation regarding the locking.
      
      With regards to stable kernels:
      The patch is required for all kernels that include the
      commit 6d07b68c ("ipc/sem.c: optimize sem_lock()") (3.10?)
      
      The alternative is to revert the patch that introduced the race.
      
      The patch is safe for backporting, i.e. it makes no assumptions
      about memory barriers in spin_unlock_wait().
      
      Background:
      Here is the race of the current implementation:
      
      Thread A: (simple op)
      - does the first "sma->complex_count == 0" test
      
      Thread B: (complex op)
      - does sem_lock(): This includes an array scan. But the scan can't
        find Thread A, because Thread A does not own sem->lock yet.
      - the thread does the operation, increases complex_count,
        drops sem_lock, sleeps
      
      Thread A:
      - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
      - sleeps before the complex_count test
      
      Thread C: (complex op)
      - does sem_lock (no array scan, complex_count==1)
      - wakes up Thread B.
      - decrements complex_count
      
      Thread A:
      - does the complex_count test
      
      Bug:
      Now both thread A and thread C operate on the same array, without
      any synchronization.
      
      Fixes: 6d07b68c ("ipc/sem.c: optimize sem_lock()")
      Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
      Reported-by: <felixh@informatik.uni-bremen.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <1vier1@web.de>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5864a2fd
  5. 28 9月, 2016 1 次提交
  6. 23 9月, 2016 2 次提交
  7. 09 8月, 2016 1 次提交
  8. 03 8月, 2016 2 次提交
  9. 27 7月, 2016 2 次提交
  10. 24 6月, 2016 4 次提交
  11. 14 6月, 2016 2 次提交
  12. 24 5月, 2016 1 次提交
  13. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  14. 23 3月, 2016 1 次提交
    • D
      ipc/sem: make semctl setting sempid consistent · a5f4db87
      Davidlohr Bueso 提交于
      As indicated by bug#112271, Linux sets the sempid value upon semctl, and
      not only for semop calls.  However, within semctl we only do this for
      SETVAL, leaving SETALL without updating the field, and therefore rather
      inconsistent behavior when compared to other Unices.
      
      There is really no documentation regarding this and therefore users
      should not make assumptions.  With this patch, along with updating
      semctl.2 manpages, this scenario should become less ambiguous As such,
      set sempid on SETALL cmd.
      
      Also update some in-code documentation, specifying where the sempid is
      set.
      
      Passes ltp and custom testcase where a child (fork) does SETALL to the
      set.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Reported-by: NPhilip Semanchuk <linux_kernel.20.ick@spamgourmet.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Herton R. Krzesinski <herton@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5f4db87
  15. 19 2月, 2016 1 次提交
  16. 23 1月, 2016 2 次提交
  17. 21 1月, 2016 1 次提交
  18. 15 1月, 2016 1 次提交
    • V
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov 提交于
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  19. 07 11月, 2015 1 次提交
    • D
      ipc,msg: drop dst nil validation in copy_msg · 5f2a2d5d
      Davidlohr Bueso 提交于
      d0edd852 ("ipc: convert invalid scenarios to use WARN_ON") relaxed the
      nil dst parameter check, originally being a full BUG_ON.  However, this
      check seems quite unnecessary when the only purpose is for
      ceckpoint/restore (MSG_COPY flag):
      
      o The copy variable is set initially to nil, apparently as a way of
        ensuring that prepare_copy is previously called.  Which is in fact done,
        unconditionally at the beginning of do_msgrcv.
      
      o There is no concurrency with 'copy' (stack allocated in do_msgrcv).
      
      Furthermore, any errors in 'copy' (and thus prepare_copy/copy_msg) should
      always handled by IS_ERR() family.  Therefore remove this check altogether
      as it can never occur with the current users.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f2a2d5d
  20. 01 10月, 2015 1 次提交
    • L
      Initialize msg/shm IPC objects before doing ipc_addid() · b9a53227
      Linus Torvalds 提交于
      As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
      having initialized the IPC object state.  Yes, we initialize the IPC
      object in a locked state, but with all the lockless RCU lookup work,
      that IPC object lock no longer means that the state cannot be seen.
      
      We already did this for the IPC semaphore code (see commit e8577d1f:
      "ipc/sem.c: fully initialize sem_array before making it visible") but we
      clearly forgot about msg and shm.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9a53227
  21. 11 9月, 2015 1 次提交
    • D
      ipc: convert invalid scenarios to use WARN_ON · d0edd852
      Davidlohr Bueso 提交于
      Considering Linus' past rants about the (ab)use of BUG in the kernel, I
      took a look at how we deal with such calls in ipc.  Given that any errors
      or corruption in ipc code are most likely contained within the set of
      processes participating in the broken mechanisms, there aren't really many
      strong fatal system failure scenarios that would require a BUG call.
      Also, if something is seriously wrong, ipc might not be the place for such
      a BUG either.
      
      1. For example, recently, a customer hit one of these BUG_ONs in shm
         after failing shm_lock().  A busted ID imho does not merit a BUG_ON,
         and WARN would have been better.
      
      2. MSG_COPY functionality of posix msgrcv(2) for checkpoint/restore.
         I don't see how we can hit this anyway -- at least it should be IS_ERR.
          The 'copy' arg from do_msgrcv is always set by calling prepare_copy()
         first and foremost.  We could also probably drop this check altogether.
          Either way, it does not merit a BUG_ON.
      
      3. No ->fault() callback for the fs getting the corresponding page --
         seems selfish to make the system unusable.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0edd852
  22. 15 8月, 2015 2 次提交
新手
引导
客服 返回
顶部