1. 03 10月, 2018 1 次提交
    • E
      signal: Distinguish between kernel_siginfo and siginfo · ae7795bc
      Eric W. Biederman 提交于
      Linus recently observed that if we did not worry about the padding
      member in struct siginfo it is only about 48 bytes, and 48 bytes is
      much nicer than 128 bytes for allocating on the stack and copying
      around in the kernel.
      
      The obvious thing of only adding the padding when userspace is
      including siginfo.h won't work as there are sigframe definitions in
      the kernel that embed struct siginfo.
      
      So split siginfo in two; kernel_siginfo and siginfo.  Keeping the
      traditional name for the userspace definition.  While the version that
      is used internally to the kernel and ultimately will not be padded to
      128 bytes is called kernel_siginfo.
      
      The definition of struct kernel_siginfo I have put in include/signal_types.h
      
      A set of buildtime checks has been added to verify the two structures have
      the same field offsets.
      
      To make it easy to verify the change kernel_siginfo retains the same
      size as siginfo.  The reduction in size comes in a following change.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      ae7795bc
  2. 27 8月, 2018 1 次提交
    • A
      y2038: globally rename compat_time to old_time32 · 9afc5eee
      Arnd Bergmann 提交于
      Christoph Hellwig suggested a slightly different path for handling
      backwards compatibility with the 32-bit time_t based system calls:
      
      Rather than simply reusing the compat_sys_* entry points on 32-bit
      architectures unchanged, we get rid of those entry points and the
      compat_time types by renaming them to something that makes more sense
      on 32-bit architectures (which don't have a compat mode otherwise),
      and then share the entry points under the new name with the 64-bit
      architectures that use them for implementing the compatibility.
      
      The following types and interfaces are renamed here, and moved
      from linux/compat_time.h to linux/time32.h:
      
      old				new
      ---				---
      compat_time_t			old_time32_t
      struct compat_timeval		struct old_timeval32
      struct compat_timespec		struct old_timespec32
      struct compat_itimerspec	struct old_itimerspec32
      ns_to_compat_timeval()		ns_to_old_timeval32()
      get_compat_itimerspec64()	get_old_itimerspec32()
      put_compat_itimerspec64()	put_old_itimerspec32()
      compat_get_timespec64()		get_old_timespec32()
      compat_put_timespec64()		put_old_timespec32()
      
      As we already have aliases in place, this patch addresses only the
      instances that are relevant to the system call interface in particular,
      not those that occur in device drivers and other modules. Those
      will get handled separately, while providing the 64-bit version
      of the respective interfaces.
      
      I'm not renaming the timex, rusage and itimerval structures, as we are
      still debating what the new interface will look like, and whether we
      will need a replacement at all.
      
      This also doesn't change the names of the syscall entry points, which can
      be done more easily when we actually switch over the 32-bit architectures
      to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to
      SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix.
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      9afc5eee
  3. 20 4月, 2018 2 次提交
    • A
      y2038: ipc: Enable COMPAT_32BIT_TIME · b0d17578
      Arnd Bergmann 提交于
      Three ipc syscalls (mq_timedsend, mq_timedreceive and and semtimedop)
      take a timespec argument. After we move 32-bit architectures over to
      useing 64-bit time_t based syscalls, we need seperate entry points for
      the old 32-bit based interfaces.
      
      This changes the #ifdef guards for the existing 32-bit compat syscalls
      to check for CONFIG_COMPAT_32BIT_TIME instead, which will then be
      enabled on all existing 32-bit architectures.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      b0d17578
    • A
      y2038: ipc: Use __kernel_timespec · 21fc538d
      Arnd Bergmann 提交于
      This is a preparatation for changing over __kernel_timespec to 64-bit
      times, which involves assigning new system call numbers for mq_timedsend(),
      mq_timedreceive() and semtimedop() for compatibility with future y2038
      proof user space.
      
      The existing ABIs will remain available through compat code.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      21fc538d
  4. 25 3月, 2018 1 次提交
    • E
      Revert "mqueue: switch to on-demand creation of internal mount" · cfb2f6f6
      Eric W. Biederman 提交于
      This reverts commit 36735a6a.
      
      Aleksa Sarai <asarai@suse.de> writes:
      > [REGRESSION v4.16-rc6] [PATCH] mqueue: forbid unprivileged user access to internal mount
      >
      > Felix reported weird behaviour on 4.16.0-rc6 with regards to mqueue[1],
      > which was introduced by 36735a6a ("mqueue: switch to on-demand
      > creation of internal mount").
      >
      > Basically, the reproducer boils down to being able to mount mqueue if
      > you create a new user namespace, even if you don't unshare the IPC
      > namespace.
      >
      > Previously this was not possible, and you would get an -EPERM. The mount
      > is the *host* mqueue mount, which is being cached and just returned from
      > mqueue_mount(). To be honest, I'm not sure if this is safe or not (or if
      > it was intentional -- since I'm not familiar with mqueue).
      >
      > To me it looks like there is a missing permission check. I've included a
      > patch below that I've compile-tested, and should block the above case.
      > Can someone please tell me if I'm missing something? Is this actually
      > safe?
      >
      > [1]: https://github.com/docker/docker/issues/36674
      
      The issue is a lot deeper than a missing permission check.  sb->s_user_ns
      was is improperly set as well.  So in addition to the filesystem being
      mounted when it should not be mounted, so things are not allow that should
      be.
      
      We are practically to the release of 4.16 and there is no agreement between
      Al Viro and myself on what the code should looks like to fix things properly.
      So revert the code to what it was before so that we can take our time
      and discuss this properly.
      
      Fixes: 36735a6a ("mqueue: switch to on-demand creation of internal mount")
      Reported-by: NFelix Abecassis <fabecassis@nvidia.com>
      Reported-by: NAleksa Sarai <asarai@suse.de>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cfb2f6f6
  5. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  6. 07 2月, 2018 1 次提交
  7. 13 1月, 2018 1 次提交
    • E
      signal: Ensure generic siginfos the kernel sends have all bits initialized · faf1f22b
      Eric W. Biederman 提交于
      Call clear_siginfo to ensure stack allocated siginfos are fully
      initialized before being passed to the signal sending functions.
      
      This ensures that if there is the kind of confusion documented by
      TRAP_FIXME, FPE_FIXME, or BUS_FIXME the kernel won't send unitialized
      data to userspace when the kernel generates a signal with SI_USER but
      the copy to userspace assumes it is a different kind of signal, and
      different fields are initialized.
      
      This also prepares the way for turning copy_siginfo_to_user
      into a copy_to_user, by removing the need in many cases to perform
      a field by field copy simply to skip the uninitialized fields.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      faf1f22b
  8. 06 1月, 2018 7 次提交
  9. 28 11月, 2017 2 次提交
    • A
      ipc, kernel, mm: annotate ->poll() instances · 9dd95748
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9dd95748
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
  10. 04 9月, 2017 1 次提交
  11. 10 7月, 2017 1 次提交
    • C
      mqueue: fix a use-after-free in sys_mq_notify() · f991af3d
      Cong Wang 提交于
      The retry logic for netlink_attachskb() inside sys_mq_notify()
      is nasty and vulnerable:
      
      1) The sock refcnt is already released when retry is needed
      2) The fd is controllable by user-space because we already
         release the file refcnt
      
      so we when retry but the fd has been just closed by user-space
      during this small window, we end up calling netlink_detachskb()
      on the error path which releases the sock again, later when
      the user-space closes this socket a use-after-free could be
      triggered.
      
      Setting 'sock' to NULL here should be sufficient to fix it.
      Reported-by: NGeneBlue <geneblue.mail@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f991af3d
  12. 05 7月, 2017 1 次提交
  13. 02 3月, 2017 3 次提交
  14. 28 2月, 2017 1 次提交
  15. 21 11月, 2016 1 次提交
  16. 28 9月, 2016 1 次提交
  17. 24 6月, 2016 3 次提交
    • E
      vfs: Generalize filesystem nodev handling. · a2982cc9
      Eric W. Biederman 提交于
      Introduce a function may_open_dev that tests MNT_NODEV and a new
      superblock flab SB_I_NODEV.  Use this new function in all of the
      places where MNT_NODEV was previously tested.
      
      Add the new SB_I_NODEV s_iflag to proc, sysfs, and mqueuefs as those
      filesystems should never support device nodes, and a simple superblock
      flags makes that very hard to get wrong.  With SB_I_NODEV set if any
      device nodes somehow manage to show up on on a filesystem those
      device nodes will be unopenable.
      Acked-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      a2982cc9
    • E
      ipc/mqueue: The mqueue filesystem should never contain executables · 3ee69014
      Eric W. Biederman 提交于
      Set SB_I_NOEXEC on mqueuefs to ensure small implementation mistakes
      do not result in executable on mqueuefs by accident.
      Acked-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3ee69014
    • E
      vfs: Pass data, ns, and ns->userns to mount_ns · d91ee87d
      Eric W. Biederman 提交于
      Today what is normally called data (the mount options) is not passed
      to fill_super through mount_ns.
      
      Pass the mount options and the namespace separately to mount_ns so
      that filesystems such as proc that have mount options, can use
      mount_ns.
      
      Pass the user namespace to mount_ns so that the standard permission
      check that verifies the mounter has permissions over the namespace can
      be performed in mount_ns instead of in each filesystems .mount method.
      Thus removing the duplication between mqueuefs and proc in terms of
      permission checks.  The extra permission check does not currently
      affect the rpc_pipefs filesystem and the nfsd filesystem as those
      filesystems do not currently allow unprivileged mounts.  Without
      unpvileged mounts it is guaranteed that the caller has already passed
      capable(CAP_SYS_ADMIN) which guarantees extra permission check will
      pass.
      
      Update rpc_pipefs and the nfsd filesystem to ensure that the network
      namespace reference is always taken in fill_super and always put in kill_sb
      so that the logic is simpler and so that errors originating inside of
      fill_super do not cause a network namespace leak.
      Acked-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      d91ee87d
  18. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  19. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  20. 15 1月, 2016 1 次提交
    • V
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov 提交于
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  21. 07 8月, 2015 1 次提交
    • M
      ipc: modify message queue accounting to not take kernel data structures into account · de54b9ac
      Marcus Gelderie 提交于
      A while back, the message queue implementation in the kernel was
      improved to use btrees to speed up retrieval of messages, in commit
      d6629859 ("ipc/mqueue: improve performance of send/recv").
      
      That patch introducing the improved kernel handling of message queues
      (using btrees) has, as a by-product, changed the meaning of the QSIZE
      field in the pseudo-file created for the queue.  Before, this field
      reflected the size of the user-data in the queue.  Since, it also takes
      kernel data structures into account.  For example, if 13 bytes of user
      data are in the queue, on my machine the file reports a size of 61
      bytes.
      
      There was some discussion on this topic before (for example
      https://lkml.org/lkml/2014/10/1/115).  Commenting on a th lkml, Michael
      Kerrisk gave the following background
      (https://lkml.org/lkml/2015/6/16/74):
      
          The pseudofiles in the mqueue filesystem (usually mounted at
          /dev/mqueue) expose fields with metadata describing a message
          queue. One of these fields, QSIZE, as originally implemented,
          showed the total number of bytes of user data in all messages in
          the message queue, and this feature was documented from the
          beginning in the mq_overview(7) page. In 3.5, some other (useful)
          work happened to break the user-space API in a couple of places,
          including the value exposed via QSIZE, which now includes a measure
          of kernel overhead bytes for the queue, a figure that renders QSIZE
          useless for its original purpose, since there's no way to deduce
          the number of overhead bytes consumed by the implementation.
          (The other user-space breakage was subsequently fixed.)
      
      This patch removes the accounting of kernel data structures in the
      queue.  Reporting the size of these data-structures in the QSIZE field
      was a breaking change (see Michael's comment above).  Without the QSIZE
      field reporting the total size of user-data in the queue, there is no
      way to deduce this number.
      
      It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
      against the worst-case size of the queue (in both the old and the new
      implementation).  Therefore, the kernel overhead accounting in QSIZE is
      not necessary to help the user understand the limitations RLIMIT imposes
      on the processes.
      Signed-off-by: NMarcus Gelderie <redmnic@gmail.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: John Duffy <jb_duffy@btinternet.com>
      Cc: Arto Bendiken <arto@bendiken.net>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de54b9ac
  22. 08 5月, 2015 1 次提交
    • D
      ipc/mqueue: Implement lockless pipelined wakeups · fa6004ad
      Davidlohr Bueso 提交于
      This patch moves the wakeup_process() invocation so it is not done under
      the info->lock by making use of a lockless wake_q. With this change, the
      waiter is woken up once it is STATE_READY and it does not need to loop
      on SMP if it is still in STATE_PENDING. In the timeout case we still need
      to grab the info->lock to verify the state.
      
      This change should also avoid the introduction of preempt_disable() in -rt
      which avoids a busy-loop which pools for the STATE_PENDING -> STATE_READY
      change if the waiter has a higher priority compared to the waker.
      
      Additionally, this patch micro-optimizes wq_sleep by using the cheaper
      cousin of set_current_state(TASK_INTERRUPTABLE) as we will block no
      matter what, thus get rid of the implied barrier.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NGeorge Spelvin <linux@horizon.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Mason <clm@fb.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/1430748166.1940.17.camel@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fa6004ad
  23. 16 4月, 2015 1 次提交
  24. 20 11月, 2014 1 次提交
    • A
      new helper: audit_file() · 9f45f5bf
      Al Viro 提交于
      ... for situations when we don't have any candidate in pathnames - basically,
      in descriptor-based syscalls.
      
      [Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9f45f5bf
  25. 08 4月, 2014 1 次提交
  26. 26 2月, 2014 1 次提交
    • D
      ipc,mqueue: remove limits for the amount of system-wide queues · f3713fd9
      Davidlohr Bueso 提交于
      Commit 93e6f119 ("ipc/mqueue: cleanup definition names and
      locations") added global hardcoded limits to the amount of message
      queues that can be created.  While these limits are per-namespace,
      reality is that it ends up breaking userspace applications.
      Historically users have, at least in theory, been able to create up to
      INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
      for some workloads and use cases.  For instance, Madars reports:
      
       "This update imposes bad limits on our multi-process application.  As
        our app uses approaches that each process opens its own set of queues
        (usually something about 3-5 queues per process).  In some scenarios
        we might run up to 3000 processes or more (which of-course for linux
        is not a problem).  Thus we might need up to 9000 queues or more.  All
        processes run under one user."
      
      Other affected users can be found in launchpad bug #1155695:
        https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695
      
      Instead of increasing this limit, revert it entirely and fallback to the
      original way of dealing queue limits -- where once a user's resource
      limit is reached, and all memory is used, new queues cannot be created.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reported-by: NMadars Vitolins <m@silodev.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>	[3.5+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3713fd9
  27. 28 1月, 2014 2 次提交