1. 03 11月, 2011 3 次提交
  2. 01 11月, 2011 1 次提交
  3. 05 8月, 2011 1 次提交
  4. 04 8月, 2011 2 次提交
  5. 31 7月, 2011 2 次提交
  6. 27 7月, 2011 3 次提交
    • V
      ipc: introduce shm_rmid_forced sysctl · b34a6b1d
      Vasiliy Kulikov 提交于
      Add support for the shm_rmid_forced sysctl.  If set to 1, all shared
      memory objects in current ipc namespace will be automatically forced to
      use IPC_RMID.
      
      The POSIX way of handling shmem allows one to create shm objects and
      call shmdt(), leaving shm object associated with no process, thus
      consuming memory not counted via rlimits.
      
      With shm_rmid_forced=1 the shared memory object is counted at least for
      one process, so OOM killer may effectively kill the fat process holding
      the shared memory.
      
      It obviously breaks POSIX - some programs relying on the feature would
      stop working.  So set shm_rmid_forced=1 only if you're sure nobody uses
      "orphaned" memory.  Use shm_rmid_forced=0 by default for compatability
      reasons.
      
      The feature was previously impemented in -ow as a configure option.
      
      [akpm@linux-foundation.org: fix documentation, per Randy]
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: readability/conventionality tweaks]
      [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@canonical.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Solar Designer <solar@openwall.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b34a6b1d
    • J
      ipc/mqueue.c: fix mq_open() return value · d40dcdb0
      Jiri Slaby 提交于
      We return ENOMEM from mqueue_get_inode even when we have enough memory.
      Namely in case the system rlimit of mqueue was reached.  This error
      propagates to mq_queue and user sees the error unexpectedly.  So fix
      this up to properly return EMFILE as described in the manpage:
      
      	EMFILE The process already has the maximum number of files and
      	       message queues open.
      
      instead of:
      
      	ENOMEM Insufficient memory.
      
      With the previous patch we just switch to ERR_PTR/PTR_ERR/IS_ERR error
      handling here.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d40dcdb0
    • J
      ipc/mqueue.c: refactor failure handling · 04715206
      Jiri Slaby 提交于
      If new_inode fails to allocate an inode we need only to return with
      NULL.  But now we test the opposite and have all the work in a nested
      block.  So do the opposite to save one indentation level (and remove
      unnecessary line breaks).
      
      This is only a preparation/cleanup for the next patch where we fix up
      return values from mqueue_get_inode.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04715206
  7. 26 7月, 2011 1 次提交
  8. 21 7月, 2011 3 次提交
  9. 27 5月, 2011 1 次提交
  10. 11 5月, 2011 1 次提交
  11. 31 3月, 2011 1 次提交
  12. 28 3月, 2011 1 次提交
  13. 26 3月, 2011 1 次提交
  14. 24 3月, 2011 2 次提交
  15. 07 1月, 2011 1 次提交
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
  16. 30 10月, 2010 1 次提交
  17. 29 10月, 2010 1 次提交
  18. 28 10月, 2010 2 次提交
    • D
      ipc: initialize structure memory to zero for compat functions · 03145beb
      Dan Rosenberg 提交于
      This takes care of leaking uninitialized kernel stack memory to
      userspace from non-zeroed fields in structs in compat ipc functions.
      Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03145beb
    • H
      ipc/shm.c: add RSS and swap size information to /proc/sysvipc/shm · b7952180
      Helge Deller 提交于
      The kernel currently provides no functionality to analyze the RSS and swap
      space usage of each individual sysvipc shared memory segment.
      
      This patch adds this info for each existing shm segment by extending the
      output of /proc/sysvipc/shm by two columns for RSS and swap.
      
      Since shmctl(SHM_INFO) already provides a similiar calculation (it
      currently sums up all RSS/swap info for all segments), I did split out a
      static function which is now used by the /proc/sysvipc/shm output and
      shmctl(SHM_INFO).
      
      SAP products (esp.  the SAP Netweaver ABAP Kernel) uses lots of big shared
      memory segments (we often have Linux systems with >= 16GB shm usage).
      Sometimes we get customer reports about "slow" system responses and while
      looking into their configurations we often find massive swapping activity
      on the system.  With this patch it's now easy to see from the command line
      if and which shm segments gets swapped out (and how much) and can more
      easily give recommendations for system tuning.  Without the patch it's
      currently not possible to do such shm analysis at all.
      
      Also...
      
      Add some spaces in front of the "size" field for 64bit kernels to get the
      columns correct if you cat the contents of the file.  In
      sysvipc_shm_proc_show() the kernel prints the size value in "SPEC_SIZE"
      format, which is defined like this:
      
      #if BITS_PER_LONG <= 32
      #define SIZE_SPEC "%10lu"
      #else
      #define SIZE_SPEC "%21lu"
      #endif
      
      So, if the header is not adjusted, the columns are not correctly aligned.
      I actually tested this on 32- and 64-bit and it seems correct now.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7952180
  19. 26 10月, 2010 2 次提交
  20. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  21. 02 10月, 2010 1 次提交
    • D
      sys_semctl: fix kernel stack leakage · 982f7c2b
      Dan Rosenberg 提交于
      The semctl syscall has several code paths that lead to the leakage of
      uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
      IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
      version of the semid_ds struct.
      
      The copy_semid_to_user() function declares a semid_ds struct on the stack
      and copies it back to the user without initializing or zeroing the
      "sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
      allowing the leakage of 16 bytes of kernel stack memory.
      
      The code is still reachable on 32-bit systems - when calling semctl()
      newer glibc's automatically OR the IPC command with the IPC_64 flag, but
      invoking the syscall directly allows users to use the older versions of
      the struct.
      Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      982f7c2b
  22. 10 8月, 2010 1 次提交
  23. 21 7月, 2010 1 次提交
  24. 05 6月, 2010 1 次提交
  25. 28 5月, 2010 5 次提交
    • C
      drop unused dentry argument to ->fsync · 7ea80859
      Christoph Hellwig 提交于
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7ea80859
    • J
      ipc/sem.c: use ERR_CAST · 4de85cd6
      Julia Lawall 提交于
      Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
      clear what is the purpose of the operation, which otherwise looks like a
      no-op.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      type T;
      T x;
      identifier f;
      @@
      
      T f (...) { <+...
      - ERR_PTR(PTR_ERR(x))
      + x
       ...+> }
      
      @@
      expression x;
      @@
      
      - ERR_PTR(PTR_ERR(x))
      + ERR_CAST(x)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4de85cd6
    • M
      ipc/sem.c: update description of the implementation · c5cf6359
      Manfred Spraul 提交于
      ipc/sem.c begins with a 15 year old description about bugs in the initial
      implementation in Linux-1.0.  The patch replaces that with a top level
      description of the current code.
      
      A TODO could be derived from this text:
      
      The opengroup man page for semop() does not mandate FIFO.  Thus there is
      no need for a semaphore array list of pending operations.
      
      If
      
      - this list is removed
      - the per-semaphore array spinlock is removed (possible if there is no
        list to protect)
      - sem_otime is moved into the semaphores and calculated on demand during
        semctl()
      
      then the array would be read-mostly - which would significantly improve
      scaling for applications that use semaphore arrays with lots of entries.
      
      The price would be expensive semctl() calls:
      
      	for(i=0;i<sma->sem_nsems;i++) spin_lock(sma->sem_lock);
      	<do stuff>
      	for(i=0;i<sma->sem_nsems;i++) spin_unlock(sma->sem_lock);
      
      I'm not sure if the complexity is worth the effort, thus here is the
      documentation of the current behavior first.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5cf6359
    • M
      ipc/sem.c: move wake_up_process out of the spinlock section · 0a2b9d4c
      Manfred Spraul 提交于
      The wake-up part of semtimedop() consists out of two steps:
      
      - the right tasks must be identified.
      - they must be woken up.
      
      Right now, both steps run while the array spinlock is held.  This patch
      reorders the code and moves the actual wake_up_process() behind the point
      where the spinlock is dropped.
      
      The code also moves setting sem->sem_otime to one place: It does not make
      sense to set the last modify time multiple times.
      
      [akpm@linux-foundation.org: repair kerneldoc]
      [akpm@linux-foundation.org: fix uninitialised retval]
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a2b9d4c
    • M
      ipc/sem.c: optimize update_queue() for bulk wakeup calls · fd5db422
      Manfred Spraul 提交于
      The following series of patches tries to fix the spinlock contention
      reported by Chris Mason - his benchmark exposes problems of the current
      code:
      
      - In the worst case, the algorithm used by update_queue() is O(N^2).
        Bulk wake-up calls can enter this worst case.  The patch series fix
        that.
      
        Note that the benchmark app doesn't expose the problem, it just should
        be fixed: Real world apps might do the wake-ups in another order than
        perfect FIFO.
      
      - The part of the code that runs within the semaphore array spinlock is
        significantly larger than necessary.
      
        The patch series fixes that.  This change is responsible for the main
        improvement.
      
      - The cacheline with the spinlock is also used for a variable that is
        read in the hot path (sem_base) and for a variable that is unnecessarily
        written to multiple times (sem_otime).  The last step of the series
        cacheline-aligns the spinlock.
      
      This patch:
      
      The SysV semaphore code allows to perform multiple operations on all
      semaphores in the array as atomic operations.  After a modification,
      update_queue() checks which of the waiting tasks can complete.
      
      The algorithm that is used to identify the tasks is O(N^2) in the worst
      case.  For some cases, it is simple to avoid the O(N^2).
      
      The patch adds a detection logic for some cases, especially for the case
      of an array where all sleeping tasks are single sembuf operations and a
      multi-sembuf operation is used to wake up multiple tasks.
      
      A big database application uses that approach.
      
      The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
      the patch breaks that.
      
      [akpm@linux-foundation.org: make do_smart_update() static]
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd5db422
新手
引导
客服 返回
顶部