1. 23 3月, 2016 1 次提交
    • D
      ipc/sem: make semctl setting sempid consistent · a5f4db87
      Davidlohr Bueso 提交于
      As indicated by bug#112271, Linux sets the sempid value upon semctl, and
      not only for semop calls.  However, within semctl we only do this for
      SETVAL, leaving SETALL without updating the field, and therefore rather
      inconsistent behavior when compared to other Unices.
      
      There is really no documentation regarding this and therefore users
      should not make assumptions.  With this patch, along with updating
      semctl.2 manpages, this scenario should become less ambiguous As such,
      set sempid on SETALL cmd.
      
      Also update some in-code documentation, specifying where the sempid is
      set.
      
      Passes ltp and custom testcase where a child (fork) does SETALL to the
      set.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Reported-by: NPhilip Semanchuk <linux_kernel.20.ick@spamgourmet.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Herton R. Krzesinski <herton@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5f4db87
  2. 23 1月, 2016 1 次提交
  3. 15 8月, 2015 3 次提交
    • M
      ipc/sem.c: update/correct memory barriers · 3ed1f8a9
      Manfred Spraul 提交于
      sem_lock() did not properly pair memory barriers:
      
      !spin_is_locked() and spin_unlock_wait() are both only control barriers.
      The code needs an acquire barrier, otherwise the cpu might perform read
      operations before the lock test.
      
      As no primitive exists inside <include/spinlock.h> and since it seems
      noone wants another primitive, the code creates a local primitive within
      ipc/sem.c.
      
      With regards to -stable:
      
      The change of sem_wait_array() is a bugfix, the change to sem_lock() is a
      nop (just a preprocessor redefinition to improve the readability).  The
      bugfix is necessary for all kernels that use sem_wait_array() (i.e.:
      starting from 3.10).
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Kirill Tkhai <ktkhai@parallels.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ed1f8a9
    • H
      ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem() · a9795584
      Herton R. Krzesinski 提交于
      After we acquire the sma->sem_perm lock in exit_sem(), we are protected
      against a racing IPC_RMID operation.  Also at that point, we are the last
      user of sem_undo_list.  Therefore it isn't required that we acquire or use
      ulp->lock.
      Signed-off-by: NHerton R. Krzesinski <herton@redhat.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Rafael Aquini <aquini@redhat.com>
      CC: Aristeu Rozanski <aris@redhat.com>
      Cc: David Jeffery <djeffery@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9795584
    • H
      ipc,sem: fix use after free on IPC_RMID after a task using same semaphore set exits · 602b8593
      Herton R. Krzesinski 提交于
      The current semaphore code allows a potential use after free: in
      exit_sem we may free the task's sem_undo_list while there is still
      another task looping through the same semaphore set and cleaning the
      sem_undo list at freeary function (the task called IPC_RMID for the same
      semaphore set).
      
      For example, with a test program [1] running which keeps forking a lot
      of processes (which then do a semop call with SEM_UNDO flag), and with
      the parent right after removing the semaphore set with IPC_RMID, and a
      kernel built with CONFIG_SLAB, CONFIG_SLAB_DEBUG and
      CONFIG_DEBUG_SPINLOCK, you can easily see something like the following
      in the kernel log:
      
         Slab corruption (Not tainted): kmalloc-64 start=ffff88003b45c1c0, len=64
         000: 6b 6b 6b 6b 6b 6b 6b 6b 00 6b 6b 6b 6b 6b 6b 6b  kkkkkkkk.kkkkkkk
         010: ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
         Prev obj: start=ffff88003b45c180, len=64
         000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
         010: ff ff ff ff ff ff ff ff c0 fb 01 37 00 88 ff ff  ...........7....
         Next obj: start=ffff88003b45c200, len=64
         000: 00 00 00 00 ad 4e ad de ff ff ff ff 5a 5a 5a 5a  .....N......ZZZZ
         010: ff ff ff ff ff ff ff ff 68 29 a7 3c 00 88 ff ff  ........h).<....
         BUG: spinlock wrong CPU on CPU#2, test/18028
         general protection fault: 0000 [#1] SMP
         Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
         CPU: 2 PID: 18028 Comm: test Not tainted 4.2.0-rc5+ #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
         RIP: spin_dump+0x53/0xc0
         Call Trace:
           spin_bug+0x30/0x40
           do_raw_spin_unlock+0x71/0xa0
           _raw_spin_unlock+0xe/0x10
           freeary+0x82/0x2a0
           ? _raw_spin_lock+0xe/0x10
           semctl_down.clone.0+0xce/0x160
           ? __do_page_fault+0x19a/0x430
           ? __audit_syscall_entry+0xa8/0x100
           SyS_semctl+0x236/0x2c0
           ? syscall_trace_leave+0xde/0x130
           entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 8b 80 88 03 00 00 48 8d 88 60 05 00 00 48 c7 c7 a0 2c a4 81 31 c0 65 8b 15 eb 40 f3 7e e8 08 31 68 00 4d 85 e4 44 8b 4b 08 74 5e <45> 8b 84 24 88 03 00 00 49 8d 8c 24 60 05 00 00 8b 53 04 48 89
         RIP  [<ffffffff810d6053>] spin_dump+0x53/0xc0
          RSP <ffff88003750fd68>
         ---[ end trace 783ebb76612867a0 ]---
         NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [test:18053]
         Modules linked in: 8021q mrp garp stp llc nf_conntrack_ipv4 nf_defrag_ipv4 ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables binfmt_misc ppdev input_leds joydev parport_pc parport floppy serio_raw virtio_balloon virtio_rng virtio_console virtio_net iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcspkr qxl ttm drm_kms_helper drm snd_hda_codec_generic i2c_piix4 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore crc32c_intel virtio_pci virtio_ring virtio pata_acpi ata_generic [last unloaded: speedstep_lib]
         CPU: 3 PID: 18053 Comm: test Tainted: G      D         4.2.0-rc5+ #1
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
         RIP: native_read_tsc+0x0/0x20
         Call Trace:
           ? delay_tsc+0x40/0x70
           __delay+0xf/0x20
           do_raw_spin_lock+0x96/0x140
           _raw_spin_lock+0xe/0x10
           sem_lock_and_putref+0x11/0x70
           SYSC_semtimedop+0x7bf/0x960
           ? handle_mm_fault+0xbf6/0x1880
           ? dequeue_task_fair+0x79/0x4a0
           ? __do_page_fault+0x19a/0x430
           ? kfree_debugcheck+0x16/0x40
           ? __do_page_fault+0x19a/0x430
           ? __audit_syscall_entry+0xa8/0x100
           ? do_audit_syscall_entry+0x66/0x70
           ? syscall_trace_enter_phase1+0x139/0x160
           SyS_semtimedop+0xe/0x10
           SyS_semop+0x10/0x20
           entry_SYSCALL_64_fastpath+0x12/0x71
         Code: 47 10 83 e8 01 85 c0 89 47 10 75 08 65 48 89 3d 1f 74 ff 7e c9 c3 0f 1f 44 00 00 55 48 89 e5 e8 87 17 04 00 66 90 c9 c3 0f 1f 00 <55> 48 89 e5 0f 31 89 c1 48 89 d0 48 c1 e0 20 89 c9 48 09 c8 c9
         Kernel panic - not syncing: softlockup: hung tasks
      
      I wasn't able to trigger any badness on a recent kernel without the
      proper config debugs enabled, however I have softlockup reports on some
      kernel versions, in the semaphore code, which are similar as above (the
      scenario is seen on some servers running IBM DB2 which uses semaphore
      syscalls).
      
      The patch here fixes the race against freeary, by acquiring or waiting
      on the sem_undo_list lock as necessary (exit_sem can race with freeary,
      while freeary sets un->semid to -1 and removes the same sem_undo from
      list_proc or when it removes the last sem_undo).
      
      After the patch I'm unable to reproduce the problem using the test case
      [1].
      
      [1] Test case used below:
      
          #include <stdio.h>
          #include <sys/types.h>
          #include <sys/ipc.h>
          #include <sys/sem.h>
          #include <sys/wait.h>
          #include <stdlib.h>
          #include <time.h>
          #include <unistd.h>
          #include <errno.h>
      
          #define NSEM 1
          #define NSET 5
      
          int sid[NSET];
      
          void thread()
          {
                  struct sembuf op;
                  int s;
                  uid_t pid = getuid();
      
                  s = rand() % NSET;
                  op.sem_num = pid % NSEM;
                  op.sem_op = 1;
                  op.sem_flg = SEM_UNDO;
      
                  semop(sid[s], &op, 1);
                  exit(EXIT_SUCCESS);
          }
      
          void create_set()
          {
                  int i, j;
                  pid_t p;
                  union {
                          int val;
                          struct semid_ds *buf;
                          unsigned short int *array;
                          struct seminfo *__buf;
                  } un;
      
                  /* Create and initialize semaphore set */
                  for (i = 0; i < NSET; i++) {
                          sid[i] = semget(IPC_PRIVATE , NSEM, 0644 | IPC_CREAT);
                          if (sid[i] < 0) {
                                  perror("semget");
                                  exit(EXIT_FAILURE);
                          }
                  }
                  un.val = 0;
                  for (i = 0; i < NSET; i++) {
                          for (j = 0; j < NSEM; j++) {
                                  if (semctl(sid[i], j, SETVAL, un) < 0)
                                          perror("semctl");
                          }
                  }
      
                  /* Launch threads that operate on semaphore set */
                  for (i = 0; i < NSEM * NSET * NSET; i++) {
                          p = fork();
                          if (p < 0)
                                  perror("fork");
                          if (p == 0)
                                  thread();
                  }
      
                  /* Free semaphore set */
                  for (i = 0; i < NSET; i++) {
                          if (semctl(sid[i], NSEM, IPC_RMID))
                                  perror("IPC_RMID");
                  }
      
                  /* Wait for forked processes to exit */
                  while (wait(NULL)) {
                          if (errno == ECHILD)
                                  break;
                  };
          }
      
          int main(int argc, char **argv)
          {
                  pid_t p;
      
                  srand(time(NULL));
      
                  while (1) {
                          p = fork();
                          if (p < 0) {
                                  perror("fork");
                                  exit(EXIT_FAILURE);
                          }
                          if (p == 0) {
                                  create_set();
                                  goto end;
                          }
      
                          /* Wait for forked processes to exit */
                          while (wait(NULL)) {
                                  if (errno == ECHILD)
                                          break;
                          };
                  }
          end:
                  return 0;
          }
      
      [akpm@linux-foundation.org: use normal comment layout]
      Signed-off-by: NHerton R. Krzesinski <herton@redhat.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Rafael Aquini <aquini@redhat.com>
      CC: Aristeu Rozanski <aris@redhat.com>
      Cc: David Jeffery <djeffery@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      602b8593
  4. 01 7月, 2015 1 次提交
  5. 16 4月, 2015 1 次提交
  6. 18 2月, 2015 1 次提交
  7. 14 12月, 2014 1 次提交
  8. 04 12月, 2014 1 次提交
  9. 07 6月, 2014 9 次提交
  10. 28 1月, 2014 6 次提交
  11. 17 10月, 2013 1 次提交
    • M
      ipc/sem.c: synchronize semop and semctl with IPC_RMID · 6e224f94
      Manfred Spraul 提交于
      After acquiring the semlock spinlock, operations must test that the
      array is still valid.
      
       - semctl() and exit_sem() would walk stale linked lists (ugly, but
         should be ok: all lists are empty)
      
       - semtimedop() would sleep forever - and if woken up due to a signal -
         access memory after free.
      
      The patch also:
       - standardizes the tests for .deleted, so that all tests in one
         function leave the function with the same approach.
       - unconditionally tests for .deleted immediately after every call to
         sem_lock - even it it means that for semctl(GETALL), .deleted will be
         tested twice.
      
      Both changes make the review simpler: After every sem_lock, there must
      be a test of .deleted, followed by a goto to the cleanup code (if the
      function uses "goto cleanup").
      
      The only exception is semctl_down(): If sem_ids().rwsem is locked, then
      the presence in ids->ipcs_idr is equivalent to !.deleted, thus no
      additional test is required.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e224f94
  12. 01 10月, 2013 4 次提交
    • M
      ipc/sem.c: update sem_otime for all operations · 0e8c6656
      Manfred Spraul 提交于
      In commit 0a2b9d4c ("ipc/sem.c: move wake_up_process out of the
      spinlock section"), the update of semaphore's sem_otime(last semop time)
      was moved to one central position (do_smart_update).
      
      But since do_smart_update() is only called for operations that modify
      the array, this means that wait-for-zero semops do not update sem_otime
      anymore.
      
      The fix is simple:
      Non-alter operations must update sem_otime.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Reported-by: NJia He <jiakernel@gmail.com>
      Tested-by: NJia He <jiakernel@gmail.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e8c6656
    • M
      ipc/sem.c: synchronize the proc interface · d8c63376
      Manfred Spraul 提交于
      The proc interface is not aware of sem_lock(), it instead calls
      ipc_lock_object() directly.  This means that simple semop() operations
      can run in parallel with the proc interface.  Right now, this is
      uncritical, because the implementation doesn't do anything that requires
      a proper synchronization.
      
      But it is dangerous and therefore should be fixed.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8c63376
    • M
      ipc/sem.c: optimize sem_lock() · 6d07b68c
      Manfred Spraul 提交于
      Operations that need access to the whole array must guarantee that there
      are no simple operations ongoing.  Right now this is achieved by
      spin_unlock_wait(sem->lock) on all semaphores.
      
      If complex_count is nonzero, then this spin_unlock_wait() is not
      necessary, because it was already performed in the past by the thread
      that increased complex_count and even though sem_perm.lock was dropped
      inbetween, no simple operation could have started, because simple
      operations cannot start when complex_count is non-zero.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: NDavidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d07b68c
    • M
      ipc/sem.c: fix race in sem_lock() · 5e9d5275
      Manfred Spraul 提交于
      The exclusion of complex operations in sem_lock() is insufficient: after
      acquiring the per-semaphore lock, a simple op must first check that
      sem_perm.lock is not locked and only after that test check
      complex_count.  The current code does it the other way around - and that
      creates a race.  Details are below.
      
      The patch is a complete rewrite of sem_lock(), based in part on the code
      from Mike Galbraith.  It removes all gotos and all loops and thus the
      risk of livelocks.
      
      I have tested the patch (together with the next one) on my i3 laptop and
      it didn't cause any problems.
      
      The bug is probably also present in 3.10 and 3.11, but for these kernels
      it might be simpler just to move the test of sma->complex_count after
      the spin_is_locked() test.
      
      Details of the bug:
      
      Assume:
       - sma->complex_count = 0.
       - Thread 1: semtimedop(complex op that must sleep)
       - Thread 2: semtimedop(simple op).
      
      Pseudo-Trace:
      
      Thread 1: sem_lock(): acquire sem_perm.lock
      Thread 1: sem_lock(): check for ongoing simple ops
      			Nothing ongoing, thread 2 is still before sem_lock().
      Thread 1: try_atomic_semop()
      	<<< preempted.
      
      Thread 2: sem_lock():
              static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
                                            int nsops)
              {
                      int locknum;
               again:
                      if (nsops == 1 && !sma->complex_count) {
                              struct sem *sem = sma->sem_base + sops->sem_num;
      
                              /* Lock just the semaphore we are interested in. */
                              spin_lock(&sem->lock);
      
                              /*
                               * If sma->complex_count was set while we were spinning,
                               * we may need to look at things we did not lock here.
                               */
                              if (unlikely(sma->complex_count)) {
                                      spin_unlock(&sem->lock);
                                      goto lock_array;
                              }
              <<<<<<<<<
      	<<< complex_count is still 0.
      	<<<
              <<< Here it is preempted
              <<<<<<<<<
      
      Thread 1: try_atomic_semop() returns, notices that it must sleep.
      Thread 1: increases sma->complex_count.
      Thread 1: drops sem_perm.lock
      Thread 2:
                      /*
                       * Another process is holding the global lock on the
                       * sem_array; we cannot enter our critical section,
                       * but have to wait for the global lock to be released.
                       */
                      if (unlikely(spin_is_locked(&sma->sem_perm.lock))) {
                              spin_unlock(&sem->lock);
                              spin_unlock_wait(&sma->sem_perm.lock);
                              goto again;
                      }
      	<<< sem_perm.lock already dropped, thus no "goto again;"
      
                      locknum = sops->sem_num;
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: <stable@vger.kernel.org>	[3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e9d5275
  13. 25 9月, 2013 1 次提交
    • D
      ipc: fix race with LSMs · 53dad6d3
      Davidlohr Bueso 提交于
      Currently, IPC mechanisms do security and auditing related checks under
      RCU.  However, since security modules can free the security structure,
      for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
      race if the structure is freed before other tasks are done with it,
      creating a use-after-free condition.  Manfred illustrates this nicely,
      for instance with shared mem and selinux:
      
       -> do_shmat calls rcu_read_lock()
       -> do_shmat calls shm_object_check().
           Checks that the object is still valid - but doesn't acquire any locks.
           Then it returns.
       -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
       -> selinux_shm_shmat calls ipc_has_perm()
       -> ipc_has_perm accesses ipc_perms->security
      
      shm_close()
       -> shm_close acquires rw_mutex & shm_lock
       -> shm_close calls shm_destroy
       -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
       -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
       -> ipc_free_security calls kfree(ipc_perms->security)
      
      This patch delays the freeing of the security structures after all RCU
      readers are done.  Furthermore it aligns the security life cycle with
      that of the rest of IPC - freeing them based on the reference counter.
      For situations where we need not free security, the current behavior is
      kept.  Linus states:
      
       "... the old behavior was suspect for another reason too: having the
        security blob go away from under a user sounds like it could cause
        various other problems anyway, so I think the old code was at least
        _prone_ to bugs even if it didn't have catastrophic behavior."
      
      I have tested this patch with IPC testcases from LTP on both my
      quad-core laptop and on a 64 core NUMA server.  In both cases selinux is
      enabled, and tests pass for both voluntary and forced preemption models.
      While the mentioned races are theoretical (at least no one as reported
      them), I wanted to make sure that this new logic doesn't break anything
      we weren't aware of.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53dad6d3
  14. 12 9月, 2013 1 次提交
  15. 10 7月, 2013 8 次提交