1. 12 9月, 2013 3 次提交
  2. 10 7月, 2013 4 次提交
  3. 10 5月, 2013 1 次提交
  4. 08 5月, 2013 1 次提交
  5. 01 5月, 2013 1 次提交
    • R
      ipc: sysv shared memory limited to 8TiB · d69f3bad
      Robin Holt 提交于
      Trying to run an application which was trying to put data into half of
      memory using shmget(), we found that having a shmall value below 8EiB-8TiB
      would prevent us from using anything more than 8TiB.  By setting
      kernel.shmall greater than 8EiB-8TiB would make the job work.
      
      In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.
      
      ipc/shm.c:
       458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
       459 {
      ...
       465         int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
      ...
       474         if (ns->shm_tot + numpages > ns->shm_ctlall)
       475                 return -ENOSPC;
      
      [akpm@linux-foundation.org: make ipc/shm.c:newseg()'s numpages size_t, not int]
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Reported-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d69f3bad
  6. 24 2月, 2013 2 次提交
  7. 23 2月, 2013 2 次提交
  8. 12 12月, 2012 1 次提交
    • A
      mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB · 42d7395f
      Andi Kleen 提交于
      There was some desire in large applications using MAP_HUGETLB or
      SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on
      others.  This is useful together with NUMA policy: use 2MB interleaving
      on some mappings, but 1GB on local mappings.
      
      This patch extends the IPC/SHM syscall interfaces slightly to allow
      specifying the page size.
      
      It borrows some upper bits in the existing flag arguments and allows
      encoding the log of the desired page size in addition to the *_HUGETLB
      flag.  When 0 is specified the default size is used, this makes the
      change fully compatible.
      
      Extending the internal hugetlb code to handle this is straight forward.
      Instead of a single mount it just keeps an array of them and selects the
      right mount based on the specified page size.  When no page size is
      specified it uses the mount of the default page size.
      
      The change is not visible in /proc/mounts because internal mounts don't
      appear there.  It also has very little overhead: the additional mounts
      just consume a super block, but not more memory when not used.
      
      I also exported the new flags to the user headers (they were previously
      under __KERNEL__).  Right now only symbols for x86 and some other
      architecture for 1GB and 2MB are defined.  The interface should already
      work for all other architectures though.  Only architectures that define
      multiple hugetlb sizes actually need it (that is currently x86, tile,
      powerpc).  However tile and powerpc have user configurable hugetlb
      sizes, so it's not easy to add defines.  A program on those
      architectures would need to query sysfs and use the appropiate log2.
      
      [akpm@linux-foundation.org: cleanups]
      [rientjes@google.com: fix build]
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42d7395f
  9. 07 9月, 2012 1 次提交
  10. 31 7月, 2012 1 次提交
  11. 08 6月, 2012 1 次提交
  12. 01 6月, 2012 2 次提交
  13. 22 3月, 2012 1 次提交
  14. 24 1月, 2012 2 次提交
    • H
      SHM_UNLOCK: fix Unevictable pages stranded after swap · 24513264
      Hugh Dickins 提交于
      Commit cc39c6a9 ("mm: account skipped entries to avoid looping in
      find_get_pages") correctly fixed an infinite loop; but left a problem
      that find_get_pages() on shmem would return 0 (appearing to callers to
      mean end of tree) when it meets a run of nr_pages swap entries.
      
      The only uses of find_get_pages() on shmem are via pagevec_lookup(),
      called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
      scan_mapping_unevictable_pages().  The first is already commented, and
      not worth worrying about; but the second can leave pages on the
      Unevictable list after an unusual sequence of swapping and locking.
      
      Fix that by using shmem_find_get_pages_and_swap() (then ignoring the
      swap) instead of pagevec_lookup().
      
      But I don't want to contaminate vmscan.c with shmem internals, nor
      shmem.c with LRU locking.  So move scan_mapping_unevictable_pages() into
      shmem.c, renaming it shmem_unlock_mapping(); and rename
      check_move_unevictable_page() to check_move_unevictable_pages(), looping
      down an array of pages, oftentimes under the same lock.
      
      Leave out the "rotate unevictable list" block: that's a leftover from
      when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed
      handling involved looking at pages at tail of LRU.
      
      Was there significance to the sequence first ClearPageUnevictable, then
      test page_evictable, then SetPageUnevictable here? I think not, we're
      under LRU lock, and have no barriers between those.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: <stable@vger.kernel.org> [back to 3.1 but will need respins]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24513264
    • H
      SHM_UNLOCK: fix long unpreemptible section · 85046579
      Hugh Dickins 提交于
      scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
      evictable again once the shared memory is unlocked.  It does this with
      pagevec_lookup()s across the whole object (which might occupy most of
      memory), and takes 300ms to unlock 7GB here.  A cond_resched() every
      PAGEVEC_SIZE pages would be good.
      
      However, KOSAKI-san points out that this is called under shmem.c's
      info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
      There is no strong reason for that: we need to take these pages off the
      unevictable list soonish, but those locks are not required for it.
      
      So move the call to scan_mapping_unevictable_pages() from shmem.c's
      unlock handling up to shm.c's unlock handling.  Remove the recently
      added barrier, not needed now we have spin_unlock() before the scan.
      
      Use get_file(), with subsequent fput(), to make sure we have a reference
      to mapping throughout scan_mapping_unevictable_pages(): that's something
      that was previously guaranteed by the shm_lock().
      
      Remove shmctl's lru_add_drain_all(): we don't fault in pages at SHM_LOCK
      time, and we lazily discover them to be Unevictable later, so it serves
      no purpose for SHM_LOCK; and serves no purpose for SHM_UNLOCK, since
      pages still on pagevec are not marked Unevictable.
      
      The original code avoided redundant rescans by checking VM_LOCKED flag
      at its level: now avoid them by checking shp's SHM_LOCKED.
      
      The original code called scan_mapping_unevictable_pages() on a locked
      area at shm_destroy() time: perhaps we once had accounting cross-checks
      which required that, but not now, so skip the overhead and just let
      inode eviction deal with them.
      
      Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
      under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
      more as comment than to save space; comment them used for SHM_UNLOCK.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85046579
  15. 05 8月, 2011 1 次提交
  16. 04 8月, 2011 2 次提交
  17. 31 7月, 2011 2 次提交
  18. 27 7月, 2011 1 次提交
    • V
      ipc: introduce shm_rmid_forced sysctl · b34a6b1d
      Vasiliy Kulikov 提交于
      Add support for the shm_rmid_forced sysctl.  If set to 1, all shared
      memory objects in current ipc namespace will be automatically forced to
      use IPC_RMID.
      
      The POSIX way of handling shmem allows one to create shm objects and
      call shmdt(), leaving shm object associated with no process, thus
      consuming memory not counted via rlimits.
      
      With shm_rmid_forced=1 the shared memory object is counted at least for
      one process, so OOM killer may effectively kill the fat process holding
      the shared memory.
      
      It obviously breaks POSIX - some programs relying on the feature would
      stop working.  So set shm_rmid_forced=1 only if you're sure nobody uses
      "orphaned" memory.  Use shm_rmid_forced=0 by default for compatability
      reasons.
      
      The feature was previously impemented in -ow as a configure option.
      
      [akpm@linux-foundation.org: fix documentation, per Randy]
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: readability/conventionality tweaks]
      [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@canonical.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Solar Designer <solar@openwall.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b34a6b1d
  19. 21 7月, 2011 1 次提交
    • J
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik 提交于
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
  20. 27 5月, 2011 1 次提交
  21. 31 3月, 2011 1 次提交
  22. 24 3月, 2011 1 次提交
  23. 30 10月, 2010 1 次提交
  24. 28 10月, 2010 1 次提交
    • H
      ipc/shm.c: add RSS and swap size information to /proc/sysvipc/shm · b7952180
      Helge Deller 提交于
      The kernel currently provides no functionality to analyze the RSS and swap
      space usage of each individual sysvipc shared memory segment.
      
      This patch adds this info for each existing shm segment by extending the
      output of /proc/sysvipc/shm by two columns for RSS and swap.
      
      Since shmctl(SHM_INFO) already provides a similiar calculation (it
      currently sums up all RSS/swap info for all segments), I did split out a
      static function which is now used by the /proc/sysvipc/shm output and
      shmctl(SHM_INFO).
      
      SAP products (esp.  the SAP Netweaver ABAP Kernel) uses lots of big shared
      memory segments (we often have Linux systems with >= 16GB shm usage).
      Sometimes we get customer reports about "slow" system responses and while
      looking into their configurations we often find massive swapping activity
      on the system.  With this patch it's now easy to see from the command line
      if and which shm segments gets swapped out (and how much) and can more
      easily give recommendations for system tuning.  Without the patch it's
      currently not possible to do such shm analysis at all.
      
      Also...
      
      Add some spaces in front of the "size" field for 64bit kernels to get the
      columns correct if you cat the contents of the file.  In
      sysvipc_shm_proc_show() the kernel prints the size value in "SPEC_SIZE"
      format, which is defined like this:
      
      #if BITS_PER_LONG <= 32
      #define SIZE_SPEC "%10lu"
      #else
      #define SIZE_SPEC "%21lu"
      #endif
      
      So, if the header is not adjusted, the columns are not correctly aligned.
      I actually tested this on 32- and 64-bit and it seems correct now.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7952180
  25. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  26. 28 5月, 2010 1 次提交
  27. 13 3月, 2010 1 次提交
  28. 17 1月, 2010 1 次提交
  29. 17 12月, 2009 1 次提交