1. 14 2月, 2015 1 次提交
  2. 14 12月, 2014 1 次提交
    • D
      fs, seq_file: fallback to vmalloc instead of oom kill processes · 5cec38ac
      David Rientjes 提交于
      Since commit 058504ed ("fs/seq_file: fallback to vmalloc allocation"),
      seq_buf_alloc() falls back to vmalloc() when the kmalloc() for contiguous
      memory fails.  This was done to address order-4 slab allocations for
      reading /proc/stat on large machines and noticed because
      PAGE_ALLOC_COSTLY_ORDER < 4, so there is no infinite loop in the page
      allocator when allocating new slab for such high-order allocations.
      
      Contiguous memory isn't necessary for caller of seq_buf_alloc(), however.
      Other GFP_KERNEL high-order allocations that are <=
      PAGE_ALLOC_COSTLY_ORDER will simply loop forever in the page allocator and
      oom kill processes as a result.
      
      We don't want to kill processes so that we can allocate contiguous memory
      in situations when contiguous memory isn't necessary.
      
      This patch does the kmalloc() allocation with __GFP_NORETRY for high-order
      allocations.  This still utilizes memory compaction and direct reclaim in
      the allocation path, the only difference is that it will fail immediately
      instead of oom kill processes when out of memory.
      
      [akpm@linux-foundation.org: add comment]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cec38ac
  3. 30 10月, 2014 1 次提交
  4. 04 7月, 2014 1 次提交
    • H
      fs/seq_file: fallback to vmalloc allocation · 058504ed
      Heiko Carstens 提交于
      There are a couple of seq_files which use the single_open() interface.
      This interface requires that the whole output must fit into a single
      buffer.
      
      E.g.  for /proc/stat allocation failures have been observed because an
      order-4 memory allocation failed due to memory fragmentation.  In such
      situations reading /proc/stat is not possible anymore.
      
      Therefore change the seq_file code to fallback to vmalloc allocations
      which will usually result in a couple of order-0 allocations and hence
      also work if memory is fragmented.
      
      For reference a call trace where reading from /proc/stat failed:
      
        sadc: page allocation failure: order:4, mode:0x1040d0
        CPU: 1 PID: 192063 Comm: sadc Not tainted 3.10.0-123.el7.s390x #1
        [...]
        Call Trace:
          show_stack+0x6c/0xe8
          warn_alloc_failed+0xd6/0x138
          __alloc_pages_nodemask+0x9da/0xb68
          __get_free_pages+0x2e/0x58
          kmalloc_order_trace+0x44/0xc0
          stat_open+0x5a/0xd8
          proc_reg_open+0x8a/0x140
          do_dentry_open+0x1bc/0x2c8
          finish_open+0x46/0x60
          do_last+0x382/0x10d0
          path_openat+0xc8/0x4f8
          do_filp_open+0x46/0xa8
          do_sys_open+0x114/0x1f0
          sysc_tracego+0x14/0x1a
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Tested-by: NDavid Rientjes <rientjes@google.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com>
      Cc: Andrea Righi <andrea@betterlinux.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Stefan Bader <stefan.bader@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      058504ed
  5. 19 11月, 2013 1 次提交
  6. 15 11月, 2013 1 次提交
  7. 25 10月, 2013 1 次提交
    • G
      seq_file: always update file->f_pos in seq_lseek() · 05e16745
      Gu Zheng 提交于
      This issue was first pointed out by Jiaxing Wang several months ago, but no
      further comments:
      https://lkml.org/lkml/2013/6/29/41
      
      As we know pread() does not change f_pos, so after pread(), file->f_pos
      and m->read_pos become different. And seq_lseek() does not update file->f_pos
      if offset equals to m->read_pos, so after pread() and seq_lseek()(lseek to
      m->read_pos), then a subsequent read may read from a wrong position, the
      following program produces the problem:
      
          char str1[32] = { 0 };
          char str2[32] = { 0 };
          int poffset = 10;
          int count = 20;
      
          /*open any seq file*/
          int fd = open("/proc/modules", O_RDONLY);
      
          pread(fd, str1, count, poffset);
          printf("pread:%s\n", str1);
      
          /*seek to where m->read_pos is*/
          lseek(fd, poffset+count, SEEK_SET);
      
          /*supposed to read from poffset+count, but this read from position 0*/
          read(fd, str2, count);
          printf("read:%s\n", str2);
      
      out put:
      pread:
       ck_netbios_ns 12665
      read:
       nf_conntrack_netbios
      
      /proc/modules:
      nf_conntrack_netbios_ns 12665 0 - Live 0xffffffffa038b000
      nf_conntrack_broadcast 12589 1 nf_conntrack_netbios_ns, Live 0xffffffffa0386000
      
      So we always update file->f_pos to offset in seq_lseek() to fix this issue.
      Signed-off-by: NJiaxing Wang <hello.wjx@gmail.com>
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      05e16745
  8. 08 7月, 2013 1 次提交
  9. 10 4月, 2013 1 次提交
    • A
      new helper: single_open_size() · 2043f495
      Al Viro 提交于
      Same as single_open(), but preallocates the buffer of given size.
      Doesn't make any sense for sizes up to PAGE_SIZE and doesn't make
      sense if output of show() exceeds PAGE_SIZE only rarely - seq_read()
      will take care of growing the buffer and redoing show().  If you
      _know_ that it will be large, it might make more sense to look into
      saner iterator, rather than go with single-shot one.  If that's
      impossible, single_open_size() might be for you.
      
      Again, don't use that without a good reason; occasionally that's really
      the best way to go, but very often there are better solutions.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2043f495
  10. 28 2月, 2013 3 次提交
  11. 11 1月, 2013 1 次提交
  12. 18 12月, 2012 1 次提交
  13. 15 8月, 2012 1 次提交
  14. 11 6月, 2012 1 次提交
  15. 24 3月, 2012 3 次提交
    • K
      seq_file: add seq_set_overflow(), seq_overflow() · e075f591
      KAMEZAWA Hiroyuki 提交于
      It is undocumented but a seq_file's overflow state is indicated by
      m->count == m->size.  Add seq_set_overflow() and seq_overflow() to
      set/check overflow status explicitly.
      
      Based on an idea from Eric Dumazet.
      
      [akpm@linux-foundation.org: tweak code comment]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e075f591
    • K
      procfs: speed up /proc/pid/stat, statm · bda7bad6
      KAMEZAWA Hiroyuki 提交于
      Process accounting applications as top, ps visit some files under
      /proc/<pid>.  With seq_put_decimal_ull(), we can optimize /proc/<pid>/stat
      and /proc/<pid>/statm files.
      
      This patch adds
        - seq_put_decimal_ll() for signed values.
        - allow delimiter == 0.
        - convert seq_printf() to seq_put_decimal_ull/ll in /proc/stat, statm.
      
      Test result on a system with 2000+ procs.
      
      Before patch:
        [kamezawa@bluextal test]$ top -b -n 1 | wc -l
        2223
        [kamezawa@bluextal test]$ time top -b -n 1 > /dev/null
      
        real    0m0.675s
        user    0m0.044s
        sys     0m0.121s
      
        [kamezawa@bluextal test]$ time ps -elf > /dev/null
      
        real    0m0.236s
        user    0m0.056s
        sys     0m0.176s
      
      After patch:
        kamezawa@bluextal ~]$ time top -b -n 1 > /dev/null
      
        real    0m0.657s
        user    0m0.052s
        sys     0m0.100s
      
        [kamezawa@bluextal ~]$ time ps -elf > /dev/null
      
        real    0m0.198s
        user    0m0.050s
        sys     0m0.145s
      
      Considering top, ps tend to scan /proc periodically, this will reduce cpu
      consumption by top/ps to some extent.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bda7bad6
    • K
      procfs: add num_to_str() to speed up /proc/stat · 1ac101a5
      KAMEZAWA Hiroyuki 提交于
      == stat_check.py
      num = 0
      with open("/proc/stat") as f:
              while num < 1000 :
                      data = f.read()
                      f.seek(0, 0)
                      num = num + 1
      ==
      
      perf shows
      
          20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
          13.41%  stat_check.py  [kernel.kallsyms]    [k] number
          12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
          10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
           4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
           4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf
      
      This patch removes most of calls to vsnprintf() by adding num_to_str()
      and seq_print_decimal_ull(), which prints decimal numbers without rich
      functions provided by printf().
      
      On my 8cpu box.
      == Before patch ==
      [root@bluextal test]# time ./stat_check.py
      
      real    0m0.150s
      user    0m0.026s
      sys     0m0.121s
      
      == After patch ==
      [root@bluextal test]# time ./stat_check.py
      
      real    0m0.055s
      user    0m0.022s
      sys     0m0.030s
      
      [akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
      [andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrea Righi <andrea@betterlinux.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Turner <pjt@google.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ac101a5
  16. 22 3月, 2012 1 次提交
  17. 29 2月, 2012 1 次提交
  18. 04 1月, 2012 1 次提交
  19. 07 12月, 2011 1 次提交
    • A
      fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API · 02125a82
      Al Viro 提交于
      __d_path() API is asking for trouble and in case of apparmor d_namespace_path()
      getting just that.  The root cause is that when __d_path() misses the root
      it had been told to look for, it stores the location of the most remote ancestor
      in *root.  Without grabbing references.  Sure, at the moment of call it had
      been pinned down by what we have in *path.  And if we raced with umount -l, we
      could have very well stopped at vfsmount/dentry that got freed as soon as
      prepend_path() dropped vfsmount_lock.
      
      It is safe to compare these pointers with pre-existing (and known to be still
      alive) vfsmount and dentry, as long as all we are asking is "is it the same
      address?".  Dereferencing is not safe and apparmor ended up stepping into
      that.  d_namespace_path() really wants to examine the place where we stopped,
      even if it's not connected to our namespace.  As the result, it looked
      at ->d_sb->s_magic of a dentry that might've been already freed by that point.
      All other callers had been careful enough to avoid that, but it's really
      a bad interface - it invites that kind of trouble.
      
      The fix is fairly straightforward, even though it's bigger than I'd like:
      	* prepend_path() root argument becomes const.
      	* __d_path() is never called with NULL/NULL root.  It was a kludge
      to start with.  Instead, we have an explicit function - d_absolute_root().
      Same as __d_path(), except that it doesn't get root passed and stops where
      it stops.  apparmor and tomoyo are using it.
      	* __d_path() returns NULL on path outside of root.  The main
      caller is show_mountinfo() and that's precisely what we pass root for - to
      skip those outside chroot jail.  Those who don't want that can (and do)
      use d_path().
      	* __d_path() root argument becomes const.  Everyone agrees, I hope.
      	* apparmor does *NOT* try to use __d_path() or any of its variants
      when it sees that path->mnt is an internal vfsmount.  In that case it's
      definitely not mounted anywhere and dentry_path() is exactly what we want
      there.  Handling of sysctl()-triggered weirdness is moved to that place.
      	* if apparmor is asked to do pathname relative to chroot jail
      and __d_path() tells it we it's not in that jail, the sucker just calls
      d_absolute_path() instead.  That's the other remaining caller of __d_path(),
      BTW.
              * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway -
      the normal seq_file logics will take care of growing the buffer and redoing
      the call of ->show() just fine).  However, if it gets path not reachable
      from root, it returns SEQ_SKIP.  The only caller adjusted (i.e. stopped
      ignoring the return value as it used to do).
      Reviewed-by: NJohn Johansen <john.johansen@canonical.com>
      ACKed-by: NJohn Johansen <john.johansen@canonical.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      02125a82
  20. 26 10月, 2010 1 次提交
  21. 23 9月, 2010 1 次提交
  22. 08 3月, 2010 1 次提交
  23. 23 2月, 2010 1 次提交
  24. 11 2月, 2010 1 次提交
  25. 24 9月, 2009 2 次提交
  26. 19 6月, 2009 1 次提交
  27. 30 3月, 2009 1 次提交
  28. 19 2月, 2009 1 次提交
    • E
      seq_file: properly cope with pread · 8f19d472
      Eric Biederman 提交于
      Currently seq_read assumes that the offset passed to it is always the
      offset it passed to user space.  In the case pread this assumption is
      broken and we do the wrong thing when presented with pread.
      
      To solve this I introduce an offset cache inside of struct seq_file so we
      know where our logical file position is.  Then in seq_read if we try to
      read from another offset we reset our data structures and attempt to go to
      the offset user space wanted.
      
      [akpm@linux-foundation.org: restore FMODE_PWRITE]
      [pjt@google.com: seq_open needs its fmode opened up to take advantage of this]
      Signed-off-by: NEric Biederman <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Turner <pjt@google.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f19d472
  29. 06 2月, 2009 2 次提交
  30. 01 1月, 2009 1 次提交
  31. 30 12月, 2008 1 次提交
  32. 29 11月, 2008 1 次提交
  33. 24 11月, 2008 1 次提交
  34. 23 11月, 2008 1 次提交