1. 26 9月, 2019 3 次提交
  2. 08 9月, 2019 1 次提交
    • A
      ipc: fix sparc64 ipc() wrapper · fb377eb8
      Arnd Bergmann 提交于
      Matt bisected a sparc64 specific issue with semctl, shmctl and msgctl
      to a commit from my y2038 series in linux-5.1, as I missed the custom
      sys_ipc() wrapper that sparc64 uses in place of the generic version that
      I patched.
      
      The problem is that the sys_{sem,shm,msg}ctl() functions in the kernel
      now do not allow being called with the IPC_64 flag any more, resulting
      in a -EINVAL error when they don't recognize the command.
      
      Instead, the correct way to do this now is to call the internal
      ksys_old_{sem,shm,msg}ctl() functions to select the API version.
      
      As we generally move towards these functions anyway, change all of
      sparc_ipc() to consistently use those in place of the sys_*() versions,
      and move the required ksys_*() declarations into linux/syscalls.h
      
      The IS_ENABLED(CONFIG_SYSVIPC) check is required to avoid link
      errors when ipc is disabled.
      Reported-by: NMatt Turner <mattst88@gmail.com>
      Fixes: 275f2214 ("ipc: rename old-style shmctl/semctl/msgctl syscalls")
      Cc: stable@vger.kernel.org
      Tested-by: NMatt Turner <mattst88@gmail.com>
      Tested-by: NAnatoly Pugachev <matorola@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      fb377eb8
  3. 06 9月, 2019 1 次提交
  4. 19 7月, 2019 1 次提交
  5. 17 7月, 2019 1 次提交
    • K
      ipc/mqueue.c: only perform resource calculation if user valid · a318f12e
      Kees Cook 提交于
      Andreas Christoforou reported:
      
        UBSAN: Undefined behaviour in ipc/mqueue.c:414:49 signed integer overflow:
        9 * 2305843009213693951 cannot be represented in type 'long int'
        ...
        Call Trace:
          mqueue_evict_inode+0x8e7/0xa10 ipc/mqueue.c:414
          evict+0x472/0x8c0 fs/inode.c:558
          iput_final fs/inode.c:1547 [inline]
          iput+0x51d/0x8c0 fs/inode.c:1573
          mqueue_get_inode+0x8eb/0x1070 ipc/mqueue.c:320
          mqueue_create_attr+0x198/0x440 ipc/mqueue.c:459
          vfs_mkobj+0x39e/0x580 fs/namei.c:2892
          prepare_open ipc/mqueue.c:731 [inline]
          do_mq_open+0x6da/0x8e0 ipc/mqueue.c:771
      
      Which could be triggered by:
      
              struct mq_attr attr = {
                      .mq_flags = 0,
                      .mq_maxmsg = 9,
                      .mq_msgsize = 0x1fffffffffffffff,
                      .mq_curmsgs = 0,
              };
      
              if (mq_open("/testing", 0x40, 3, &attr) == (mqd_t) -1)
                      perror("mq_open");
      
      mqueue_get_inode() was correctly rejecting the giant mq_msgsize, and
      preparing to return -EINVAL.  During the cleanup, it calls
      mqueue_evict_inode() which performed resource usage tracking math for
      updating "user", before checking if there was a valid "user" at all
      (which would indicate that the calculations would be sane).  Instead,
      delay this check to after seeing a valid "user".
      
      The overflow was real, but the results went unused, so while the flaw is
      harmless, it's noisy for kernel fuzzers, so just fix it by moving the
      calculation under the non-NULL "user" where it actually gets used.
      
      Link: http://lkml.kernel.org/r/201906072207.ECB65450@keescookSigned-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NAndreas Christoforou <andreaschristofo@gmail.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a318f12e
  6. 05 6月, 2019 1 次提交
  7. 26 5月, 2019 1 次提交
  8. 24 5月, 2019 1 次提交
  9. 15 5月, 2019 6 次提交
    • M
      ipc: do cyclic id allocation for the ipc object. · 99db46ea
      Manfred Spraul 提交于
      For ipcmni_extend mode, the sequence number space is only 7 bits.  So
      the chance of id reuse is relatively high compared with the non-extended
      mode.
      
      To alleviate this id reuse problem, this patch enables cyclic allocation
      for the index to the radix tree (idx).  The disadvantage is that this
      can cause a slight slow-down of the fast path, as the radix tree could
      be higher than necessary.
      
      To limit the radix tree height, I have chosen the following limits:
       1) The cycling is done over in_use*1.5.
       2) At least, the cycling is done over
         "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements
         "ipcmni_extended": 4096 elements
      
      Result:
      - for normal mode:
      	No change for <= 42 active ipc elements. With more than 42
      	active ipc elements, a 2nd level would be added to the radix
      	tree.
      	Without cyclic allocation, a 2nd level would be added only with
      	more than 63 active elements.
      
      - for extended mode:
      	Cycling creates always at least a 2-level radix tree.
      	With more than 2730 active objects, a 3rd level would be
      	added, instead of > 4095 active objects until the 3rd level
      	is added without cyclic allocation.
      
      For a 2-level radix tree compared to a 1-level radix tree, I have
      observed < 1% performance impact.
      
      Notes:
      1) Normal "x=semget();y=semget();" is unaffected: Then the idx
        is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic()
        is used.
      
      2) The -1% happens in a microbenchmark after this situation:
      	x=semget();
      	for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);}
      	y=semget();
      	Now perform semget calls on x and y that do not sleep.
      
      3) The worst-case reuse cycle time is unfortunately unaffected:
         If you have 2^24-1 ipc objects allocated, and get/remove the last
         possible element in a loop, then the id is reused after 128
         get/remove pairs.
      
      Performance check:
      A microbenchmark that performes no-op semop() randomly on two IDs,
      with only these two IDs allocated.
      The IDs were set using /proc/sys/kernel/sem_next_id.
      The test was run 5 times, averages are shown.
      
      1 & 2: Base (6.22 seconds for 10.000.000 semops)
      1 & 40: -0.2%
      1 & 3348: - 0.8%
      1 & 27348: - 1.6%
      1 & 15777204: - 3.2%
      
      Or: ~12.6 cpu cycles per additional radix tree level.
      The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower
      than what I remember (spectre impact?).
      
      V2 of the patch:
      - use "min" and "max"
      - use RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE instead of
      	(2<<12).
      
      [akpm@linux-foundation.org: fix max() warning]
      Link: http://lkml.kernel.org/r/20190329204930.21620-3-longman@redhat.comSigned-off-by: NManfred Spraul <manfred@colorfullife.com>
      Acked-by: NWaiman Long <longman@redhat.com>
      Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99db46ea
    • M
      ipc: conserve sequence numbers in ipcmni_extend mode · 3278a2c2
      Manfred Spraul 提交于
      Rewrite, based on the patch from Waiman Long:
      
      The mixing in of a sequence number into the IPC IDs is probably to avoid
      ID reuse in userspace as much as possible.  With ipcmni_extend mode, the
      number of usable sequence numbers is greatly reduced leading to higher
      chance of ID reuse.
      
      To address this issue, we need to conserve the sequence number space as
      much as possible.  Right now, the sequence number is incremented for
      every new ID created.  In reality, we only need to increment the
      sequence number when new allocated ID is not greater than the last one
      allocated.  It is in such case that the new ID may collide with an
      existing one.  This is being done irrespective of the ipcmni mode.
      
      In order to avoid any races, the index is first allocated and then the
      pointer is replaced.
      
      Changes compared to the initial patch:
       - Handle failures from idr_alloc().
       - Avoid that concurrent operations can see the wrong sequence number.
         (This is achieved by using idr_replace()).
       - IPCMNI_SEQ_SHIFT is not a constant, thus renamed to
         ipcmni_seq_shift().
       - IPCMNI_SEQ_MAX is not a constant, thus renamed to ipcmni_seq_max().
      
      Link: http://lkml.kernel.org/r/20190329204930.21620-2-longman@redhat.comSigned-off-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Suggested-by: NMatthew Wilcox <willy@infradead.org>
      Acked-by: NWaiman Long <longman@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3278a2c2
    • W
      ipc: allow boot time extension of IPCMNI from 32k to 16M · 5ac893b8
      Waiman Long 提交于
      The maximum number of unique System V IPC identifiers was limited to
      32k.  That limit should be big enough for most use cases.
      
      However, there are some users out there requesting for more, especially
      those that are migrating from Solaris which uses 24 bits for unique
      identifiers.  To satisfy the need of those users, a new boot time kernel
      option "ipcmni_extend" is added to extend the IPCMNI value to 16M.  This
      is a 512X increase which should be big enough for users out there that
      need a large number of unique IPC identifier.
      
      The use of this new option will change the pattern of the IPC
      identifiers returned by functions like shmget(2).  An application that
      depends on such pattern may not work properly.  So it should only be
      used if the users really need more than 32k of unique IPC numbers.
      
      This new option does have the side effect of reducing the maximum number
      of unique sequence numbers from 64k down to 128.  So it is a trade-off.
      
      The computation of a new IPC id is not done in the performance critical
      path.  So a little bit of additional overhead shouldn't have any real
      performance impact.
      
      Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ac893b8
    • D
      ipc/mqueue: optimize msg_get() · a5091fda
      Davidlohr Bueso 提交于
      Our msg priorities became an rbtree as of d6629859 ("ipc/mqueue:
      improve performance of send/recv").  However, consuming a msg in
      msg_get() remains logarithmic (still being better than the case before
      of course).  By applying well known techniques to cache pointers we can
      have the node with the highest priority in O(1), which is specially nice
      for the rt cases.  Furthermore, some callers can call msg_get() in a
      loop.
      
      A new msg_tree_erase() helper is also added to encapsulate the tree
      removal and node_cache game.  Passes ltp mq testcases.
      
      Link: http://lkml.kernel.org/r/20190321190216.1719-2-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5091fda
    • D
      ipc/mqueue: remove redundant wq task assignment · 0ecb5821
      Davidlohr Bueso 提交于
      We already store the current task fo the new waiter before calling
      wq_sleep() in both send and recv paths.  Trivially remove the redundant
      assignment.
      
      Link: http://lkml.kernel.org/r/20190321190216.1719-1-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ecb5821
    • L
      ipc: prevent lockup on alloc_msg and free_msg · d6a2946a
      Li Rongqing 提交于
      msgctl10 of ltp triggers the following lockup When CONFIG_KASAN is
      enabled on large memory SMP systems, the pages initialization can take a
      long time, if msgctl10 requests a huge block memory, and it will block
      rcu scheduler, so release cpu actively.
      
      After adding schedule() in free_msg, free_msg can not be called when
      holding spinlock, so adding msg to a tmp list, and free it out of
      spinlock
      
        rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
        rcu:     Tasks blocked on level-1 rcu_node (CPUs 16-31): P32505
        rcu:     Tasks blocked on level-1 rcu_node (CPUs 48-63): P34978
        rcu:     (detected by 11, t=35024 jiffies, g=44237529, q=16542267)
        msgctl10        R  running task    21608 32505   2794 0x00000082
        Call Trace:
         preempt_schedule_irq+0x4c/0xb0
         retint_kernel+0x1b/0x2d
        RIP: 0010:__is_insn_slot_addr+0xfb/0x250
        Code: 82 1d 00 48 8b 9b 90 00 00 00 4c 89 f7 49 c1 ee 03 e8 59 83 1d 00 48 b8 00 00 00 00 00 fc ff df 4c 39 eb 48 89 9d 58 ff ff ff <41> c6 04 06 f8 74 66 4c 8d 75 98 4c 89 f1 48 c1 e9 03 48 01 c8 48
        RSP: 0018:ffff88bce041f758 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
        RAX: dffffc0000000000 RBX: ffffffff8471bc50 RCX: ffffffff828a2a57
        RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88bce041f780
        RBP: ffff88bce041f828 R08: ffffed15f3f4c5b3 R09: ffffed15f3f4c5b3
        R10: 0000000000000001 R11: ffffed15f3f4c5b2 R12: 000000318aee9b73
        R13: ffffffff8471bc50 R14: 1ffff1179c083ef0 R15: 1ffff1179c083eec
         kernel_text_address+0xc1/0x100
         __kernel_text_address+0xe/0x30
         unwind_get_return_address+0x2f/0x50
         __save_stack_trace+0x92/0x100
         create_object+0x380/0x650
         __kmalloc+0x14c/0x2b0
         load_msg+0x38/0x1a0
         do_msgsnd+0x19e/0xcf0
         do_syscall_64+0x117/0x400
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
        rcu:     Tasks blocked on level-1 rcu_node (CPUs 0-15): P32170
        rcu:     (detected by 14, t=35016 jiffies, g=44237525, q=12423063)
        msgctl10        R  running task    21608 32170  32155 0x00000082
        Call Trace:
         preempt_schedule_irq+0x4c/0xb0
         retint_kernel+0x1b/0x2d
        RIP: 0010:lock_acquire+0x4d/0x340
        Code: 48 81 ec c0 00 00 00 45 89 c6 4d 89 cf 48 8d 6c 24 20 48 89 3c 24 48 8d bb e4 0c 00 00 89 74 24 0c 48 c7 44 24 20 b3 8a b5 41 <48> c1 ed 03 48 c7 44 24 28 b4 25 18 84 48 c7 44 24 30 d0 54 7a 82
        RSP: 0018:ffff88af83417738 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
        RAX: dffffc0000000000 RBX: ffff88bd335f3080 RCX: 0000000000000002
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88bd335f3d64
        RBP: ffff88af83417758 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000001 R11: ffffed13f3f745b2 R12: 0000000000000000
        R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
         is_bpf_text_address+0x32/0xe0
         kernel_text_address+0xec/0x100
         __kernel_text_address+0xe/0x30
         unwind_get_return_address+0x2f/0x50
         __save_stack_trace+0x92/0x100
         save_stack+0x32/0xb0
         __kasan_slab_free+0x130/0x180
         kfree+0xfa/0x2d0
         free_msg+0x24/0x50
         do_msgrcv+0x508/0xe60
         do_syscall_64+0x117/0x400
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Davidlohr said:
       "So after releasing the lock, the msg rbtree/list is empty and new
        calls will not see those in the newly populated tmp_msg list, and
        therefore they cannot access the delayed msg freeing pointers, which
        is good. Also the fact that the node_cache is now freed before the
        actual messages seems to be harmless as this is wanted for
        msg_insert() avoiding GFP_ATOMIC allocations, and after releasing the
        info->lock the thing is freed anyway so it should not change things"
      
      Link: http://lkml.kernel.org/r/1552029161-4957-1-git-send-email-lirongqing@baidu.comSigned-off-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NZhang Yu <zhangyu31@baidu.com>
      Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d6a2946a
  10. 02 5月, 2019 1 次提交
  11. 08 4月, 2019 1 次提交
    • N
      rhashtable: use bit_spin_locks to protect hash bucket. · 8f0db018
      NeilBrown 提交于
      This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the
      bucket pointer to lock the hash chain for that bucket.
      
      The benefits of a bit spin_lock are:
       - no need to allocate a separate array of locks.
       - no need to have a configuration option to guide the
         choice of the size of this array
       - locking cost is often a single test-and-set in a cache line
         that will have to be loaded anyway.  When inserting at, or removing
         from, the head of the chain, the unlock is free - writing the new
         address in the bucket head implicitly clears the lock bit.
         For __rhashtable_insert_fast() we ensure this always happens
         when adding a new key.
       - even when lockings costs 2 updates (lock and unlock), they are
         in a cacheline that needs to be read anyway.
      
      The cost of using a bit spin_lock is a little bit of code complexity,
      which I think is quite manageable.
      
      Bit spin_locks are sometimes inappropriate because they are not fair -
      if multiple CPUs repeatedly contend of the same lock, one CPU can
      easily be starved.  This is not a credible situation with rhashtable.
      Multiple CPUs may want to repeatedly add or remove objects, but they
      will typically do so at different buckets, so they will attempt to
      acquire different locks.
      
      As we have more bit-locks than we previously had spinlocks (by at
      least a factor of two) we can expect slightly less contention to
      go with the slightly better cache behavior and reduced memory
      consumption.
      
      To enhance type checking, a new struct is introduced to represent the
        pointer plus lock-bit
      that is stored in the bucket-table.  This is "struct rhash_lock_head"
      and is empty.  A pointer to this needs to be cast to either an
      unsigned lock, or a "struct rhash_head *" to be useful.
      Variables of this type are most often called "bkt".
      
      Previously "pprev" would sometimes point to a bucket, and sometimes a
      ->next pointer in an rhash_head.  As these are now different types,
      pprev is NULL when it would have pointed to the bucket. In that case,
      'blk' is used, together with correct locking protocol.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f0db018
  12. 08 3月, 2019 2 次提交
  13. 28 2月, 2019 1 次提交
    • D
      ipc: Convert mqueue fs to fs_context · 935c6912
      David Howells 提交于
      Convert the mqueue filesystem to use the filesystem context stuff.
      
      Notes:
      
       (1) The relevant ipc namespace is selected in when the context is
           initialised (and it defaults to the current task's ipc namespace).
           The caller can override this before calling vfs_get_tree().
      
       (2) Rather than simply calling kern_mount_data(), mq_init_ns() and
           mq_internal_mount() create a context, adjust it and then do the rest
           of the mount procedure.
      
       (3) The lazy mqueue mounting on creation of a new namespace is retained
           from a previous patch, but the avoidance of sget() if no superblock
           yet exists is reverted and the superblock is again keyed on the
           namespace pointer.
      
           Yes, there was a performance gain in not searching the superblock
           hash, but it's only paid once per ipc namespace - and only if someone
           uses mqueue within that namespace, so I'm not sure it's worth it,
           especially as calling sget() allows avoidance of recursion.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      935c6912
  14. 07 2月, 2019 1 次提交
    • A
      y2038: syscalls: rename y2038 compat syscalls · 8dabe724
      Arnd Bergmann 提交于
      A lot of system calls that pass a time_t somewhere have an implementation
      using a COMPAT_SYSCALL_DEFINEx() on 64-bit architectures, and have
      been reworked so that this implementation can now be used on 32-bit
      architectures as well.
      
      The missing step is to redefine them using the regular SYSCALL_DEFINEx()
      to get them out of the compat namespace and make it possible to build them
      on 32-bit architectures.
      
      Any system call that ends in 'time' gets a '32' suffix on its name for
      that version, while the others get a '_time32' suffix, to distinguish
      them from the normal version, which takes a 64-bit time argument in the
      future.
      
      In this step, only 64-bit architectures are changed, doing this rename
      first lets us avoid touching the 32-bit architectures twice.
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      8dabe724
  15. 26 1月, 2019 1 次提交
    • A
      ipc: rename old-style shmctl/semctl/msgctl syscalls · 275f2214
      Arnd Bergmann 提交于
      The behavior of these system calls is slightly different between
      architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
      symbol. Most architectures that implement the split IPC syscalls don't set
      that symbol and only get the modern version, but alpha, arm, microblaze,
      mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.
      
      For the architectures that so far only implement sys_ipc(), i.e. m68k,
      mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
      when adding the split syscalls, so we need to distinguish between the
      two groups of architectures.
      
      The method I picked for this distinction is to have a separate system call
      entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
      does not. The system call tables of the five architectures are changed
      accordingly.
      
      As an additional benefit, we no longer need the configuration specific
      definition for ipc_parse_version(), it always does the same thing now,
      but simply won't get called on architectures with the modern interface.
      
      A small downside is that on architectures that do set
      ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
      that are never called. They only add a few bytes of bloat, so it seems
      better to keep them compared to adding yet another Kconfig symbol.
      I considered adding new syscall numbers for the IPC_64 variants for
      consistency, but decided against that for now.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      275f2214
  16. 18 1月, 2019 1 次提交
  17. 31 10月, 2018 2 次提交
  18. 06 10月, 2018 1 次提交
  19. 03 10月, 2018 1 次提交
    • E
      signal: Distinguish between kernel_siginfo and siginfo · ae7795bc
      Eric W. Biederman 提交于
      Linus recently observed that if we did not worry about the padding
      member in struct siginfo it is only about 48 bytes, and 48 bytes is
      much nicer than 128 bytes for allocating on the stack and copying
      around in the kernel.
      
      The obvious thing of only adding the padding when userspace is
      including siginfo.h won't work as there are sigframe definitions in
      the kernel that embed struct siginfo.
      
      So split siginfo in two; kernel_siginfo and siginfo.  Keeping the
      traditional name for the userspace definition.  While the version that
      is used internally to the kernel and ultimately will not be padded to
      128 bytes is called kernel_siginfo.
      
      The definition of struct kernel_siginfo I have put in include/signal_types.h
      
      A set of buildtime checks has been added to verify the two structures have
      the same field offsets.
      
      To make it easy to verify the change kernel_siginfo retains the same
      size as siginfo.  The reduction in size comes in a following change.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ae7795bc
  20. 05 9月, 2018 1 次提交
  21. 27 8月, 2018 1 次提交
    • A
      y2038: globally rename compat_time to old_time32 · 9afc5eee
      Arnd Bergmann 提交于
      Christoph Hellwig suggested a slightly different path for handling
      backwards compatibility with the 32-bit time_t based system calls:
      
      Rather than simply reusing the compat_sys_* entry points on 32-bit
      architectures unchanged, we get rid of those entry points and the
      compat_time types by renaming them to something that makes more sense
      on 32-bit architectures (which don't have a compat mode otherwise),
      and then share the entry points under the new name with the 64-bit
      architectures that use them for implementing the compatibility.
      
      The following types and interfaces are renamed here, and moved
      from linux/compat_time.h to linux/time32.h:
      
      old				new
      ---				---
      compat_time_t			old_time32_t
      struct compat_timeval		struct old_timeval32
      struct compat_timespec		struct old_timespec32
      struct compat_itimerspec	struct old_itimerspec32
      ns_to_compat_timeval()		ns_to_old_timeval32()
      get_compat_itimerspec64()	get_old_itimerspec32()
      put_compat_itimerspec64()	put_old_itimerspec32()
      compat_get_timespec64()		get_old_timespec32()
      compat_put_timespec64()		put_old_timespec32()
      
      As we already have aliases in place, this patch addresses only the
      instances that are relevant to the system call interface in particular,
      not those that occur in device drivers and other modules. Those
      will get handled separately, while providing the 64-bit version
      of the respective interfaces.
      
      I'm not renaming the timex, rusage and itimerval structures, as we are
      still debating what the new interface will look like, and whether we
      will need a replacement at all.
      
      This also doesn't change the names of the syscall entry points, which can
      be done more easily when we actually switch over the 32-bit architectures
      to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to
      SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix.
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      9afc5eee
  22. 23 8月, 2018 10 次提交