1. 03 5月, 2009 2 次提交
    • K
      mm: fix Committed_AS underflow on large NR_CPUS environment · 00a62ce9
      KOSAKI Motohiro 提交于
      The Committed_AS field can underflow in certain situations:
      
      >         # while true; do cat /proc/meminfo  | grep _AS; sleep 1; done | uniq -c
      >               1 Committed_AS: 18446744073709323392 kB
      >              11 Committed_AS: 18446744073709455488 kB
      >               6 Committed_AS:    35136 kB
      >               5 Committed_AS: 18446744073709454400 kB
      >               7 Committed_AS:    35904 kB
      >               3 Committed_AS: 18446744073709453248 kB
      >               2 Committed_AS:    34752 kB
      >               9 Committed_AS: 18446744073709453248 kB
      >               8 Committed_AS:    34752 kB
      >               3 Committed_AS: 18446744073709320960 kB
      >               7 Committed_AS: 18446744073709454080 kB
      >               3 Committed_AS: 18446744073709320960 kB
      >               5 Committed_AS: 18446744073709454080 kB
      >               6 Committed_AS: 18446744073709320960 kB
      
      Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
      not check for underflow.
      
      But NR_CPUS proportional isn't good calculation.  In general,
      possibility of lock contention is proportional to the number of online
      cpus, not theorical maximum cpus (NR_CPUS).
      
      The current kernel has generic percpu-counter stuff.  using it is right
      way.  it makes code simplify and percpu_counter_read_positive() don't
      make underflow issue.
      Reported-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Eric B Munson <ebmunson@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: <stable@kernel.org>		[All kernel versions]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00a62ce9
    • V
      pagemap: require aligned-length, non-null reads of /proc/pid/pagemap · 08161786
      Vitaly Mayatskikh 提交于
      The intention of commit aae8679b
      ("pagemap: fix bug in add_to_pagemap, require aligned-length reads of
      /proc/pid/pagemap") was to force reads of /proc/pid/pagemap to be a
      multiple of 8 bytes, but now it allows to read 0 bytes, which actually
      puts some data to user's buffer.  According to POSIX, if count is zero,
      read() should return zero and has no other results.
      Signed-off-by: NVitaly Mayatskikh <v.mayatskih@gmail.com>
      Cc: Thomas Tuttle <ttuttle@google.com>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08161786
  2. 23 4月, 2009 1 次提交
    • M
      [S390] /proc/stat idle field for idle cpus · e1c80530
      Martin Schwidefsky 提交于
      The cpu idle field in the output of /proc/stat is too small for cpus
      that have been idle for more than a tick. Add the architecture hook
      arch_idle_time that allows to add the not accounted idle time of a
      sleeping cpu without waking the cpu.
      
      The s390 implementation of arch_idle_time uses the already existing
      s390_idle_data per_cpu variable to find the sleep time of a neighboring
      idle cpu.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e1c80530
  3. 17 4月, 2009 1 次提交
  4. 09 4月, 2009 1 次提交
  5. 07 4月, 2009 1 次提交
  6. 03 4月, 2009 1 次提交
    • D
      nommu: fix a number of issues with the per-MM VMA patch · 33e5d769
      David Howells 提交于
      Fix a number of issues with the per-MM VMA patch:
      
       (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
           a NOMMU system with more than 2G pages.  Makes no difference on a 32-bit
           system.
      
       (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
           lest it overflow.
      
       (3) Move the allocation of the vm_area_struct slab back for fork.c.
      
       (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.
      
       (5) Use BUG_ON() rather than if () BUG().
      
       (6) Make the default validate_nommu_regions() a static inline rather than a
           #define.
      
       (7) Make free_page_series()'s objection to pages with a refcount != 1 more
           informative.
      
       (8) Adjust the __put_nommu_region() banner comment to indicate that the
           semaphore must be held for writing.
      
       (9) Limit the number of warnings about munmaps of non-mmapped regions.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33e5d769
  7. 01 4月, 2009 4 次提交
  8. 31 3月, 2009 5 次提交
  9. 30 3月, 2009 1 次提交
  10. 29 3月, 2009 1 次提交
    • H
      fix setuid sometimes wouldn't · 7c2c7d99
      Hugh Dickins 提交于
      check_unsafe_exec() also notes whether the fs_struct is being
      shared by more threads than will get killed by the exec, and if so
      sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
      But /proc/<pid>/cwd and /proc/<pid>/root lookups make transient
      use of get_fs_struct(), which also raises that sharing count.
      
      This might occasionally cause a setuid program not to change euid,
      in the same way as happened with files->count (check_unsafe_exec
      also looks at sighand->count, but /proc doesn't raise that one).
      
      We'd prefer exec not to unshare fs_struct: so fix this in procfs,
      replacing get_fs_struct() by get_fs_path(), which does path_get
      while still holding task_lock, instead of raising fs->count.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: stable@kernel.org
      ___
      
       fs/proc/base.c |   50 +++++++++++++++--------------------------------
       1 file changed, 16 insertions(+), 34 deletions(-)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c2c7d99
  11. 28 3月, 2009 2 次提交
  12. 18 3月, 2009 1 次提交
    • L
      Avoid 64-bit "switch()" statements on 32-bit architectures · ee568b25
      Linus Torvalds 提交于
      Commit ee6f779b ("filp->f_pos not
      correctly updated in proc_task_readdir") changed the proc code to use
      filp->f_pos directly, rather than through a temporary variable.  In the
      process, that caused the operations to be done on the full 64 bits, even
      though the offset is never that big.
      
      That's all fine and dandy per se, but for some unfathomable reason gcc
      generates absolutely horrid code when using 64-bit values in switch()
      statements.  To the point of actually calling out to gcc helper
      functions like __cmpdi2 rather than just doing the trivial comparisons
      directly the way gcc does for normal compares.  At which point we get
      link failures, because we really don't want to support that kind of
      crazy code.
      
      Fix this by just casting the f_pos value to "unsigned long", which
      is plenty big enough for /proc, and avoids the gcc code generation issue.
      Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Zhang Le <r0bertz@gentoo.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee568b25
  13. 16 3月, 2009 1 次提交
    • Z
      filp->f_pos not correctly updated in proc_task_readdir · ee6f779b
      Zhang Le 提交于
      filp->f_pos only get updated at the end of the function. Thus d_off of those
      dirents who are in the middle will be 0, and this will cause a problem in
      glibc's readdir implementation, specifically endless loop. Because when overflow
      occurs, f_pos will be set to next dirent to read, however it will be 0, unless
      the next one is the last one. So it will start over again and again.
      
      There is a sample program in man 2 gendents. This is the output of the program
      running on a multithread program's task dir before this patch is applied:
      
        $ ./a.out /proc/3807/task
        --------------- nread=128 ---------------
        i-node#  file type  d_reclen  d_off   d_name
          506442  directory    16          1  .
          506441  directory    16          0  ..
          506443  directory    16          0  3807
          506444  directory    16          0  3809
          506445  directory    16          0  3812
          506446  directory    16          0  3861
          506447  directory    16          0  3862
          506448  directory    16          8  3863
      
      This is the output after this patch is applied
      
        $ ./a.out /proc/3807/task
        --------------- nread=128 ---------------
        i-node#  file type  d_reclen  d_off   d_name
          506442  directory    16          1  .
          506441  directory    16          2  ..
          506443  directory    16          3  3807
          506444  directory    16          4  3809
          506445  directory    16          5  3812
          506446  directory    16          6  3861
          506447  directory    16          7  3862
          506448  directory    16          8  3863
      Signed-off-by: NZhang Le <r0bertz@gentoo.org>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee6f779b
  14. 11 3月, 2009 1 次提交
  15. 25 2月, 2009 1 次提交
  16. 24 2月, 2009 1 次提交
  17. 09 1月, 2009 1 次提交
    • M
      vmcore: remove saved_max_pfn check · 921d58c0
      Magnus Damm 提交于
      Remove the saved_max_pfn check from the /proc/vmcore function
      read_from_oldmem().  No need to verify, we should be able to just trust
      that "elfcorehdr=" is correctly passed to the crash kernel on the kernel
      command line like we do with other parameters.
      
      The read_from_oldmem() function in fs/proc/vmcore.c is quite similar to
      read_from_oldmem() in drivers/char/mem.c, but only in the latter it makes
      sense to use saved_max_pfn.  For oldmem it is used to determine when to
      stop reading.  For vmcore we already have the elf header info pointing out
      the physical memory regions, no need to pass the end-of- old-memory twice.
      
      Removing the saved_max_pfn check from vmcore makes it possible for
      architectures to skip oldmem but still support crash dump through vmcore -
      without the need for the old saved_max_pfn cruft.
      
      Architectures that want to play safe can do the saved_max_pfn check in
      copy_oldmem_page().  Not sure why anyone would want to do that, but that's
      even safer than today - the saved_max_pfn check in vmcore removed by this
      patch only checks the first page.
      Signed-off-by: NMagnus Damm <damm@igel.co.jp>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      921d58c0
  18. 08 1月, 2009 2 次提交
    • D
      NOMMU: Improve procfs output using per-MM VMAs · 38f71479
      David Howells 提交于
      Improve procfs output using per-MM VMAs for process memory accounting.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMike Frysinger <vapier.adi@gmail.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      38f71479
    • D
      NOMMU: Make VMAs per MM as for MMU-mode linux · 8feae131
      David Howells 提交于
      Make VMAs per mm_struct as for MMU-mode linux.  This solves two problems:
      
       (1) In SYSV SHM where nattch for a segment does not reflect the number of
           shmat's (and forks) done.
      
       (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
           exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
           that a VMA might be shared and already have its vm_mm assigned to another
           process or a dead process.
      
      A new struct (vm_region) is introduced to track a mapped region and to remember
      the circumstances under which it may be shared and the vm_list_struct structure
      is discarded as it's no longer required.
      
      This patch makes the following additional changes:
      
       (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
           with no recourse to __GFP_COMP, so the pages are not composite.  Instead,
           each page has a reference on it held by the region.  Anything else that is
           interested in such a page will have to get a reference on it to retain it.
           When the pages are released due to unmapping, each page is passed to
           put_page() and will be freed when the page usage count reaches zero.
      
       (2) Excess pages are trimmed after an allocation as the allocation must be
           made as a power-of-2 quantity of pages.
      
       (3) VMAs are added to the parent MM's R/B tree and mmap lists.  As an MM may
           end up with overlapping VMAs within the tree, the VMA struct address is
           appended to the sort key.
      
       (4) Non-anonymous VMAs are now added to the backing inode's prio list.
      
       (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
           the backing region.  The VMA and region structs will be split if
           necessary.
      
       (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
           segment instead of all the attachments at that addresss.  Multiple
           shmat()'s return the same address under NOMMU-mode instead of different
           virtual addresses as under MMU-mode.
      
       (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.
      
       (8) /proc/maps is now the global list of mapped regions, and may list bits
           that aren't actually mapped anywhere.
      
       (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
           of RAM currently allocated by mmap to hold mappable regions that can't be
           mapped directly.  These are copies of the backing device or file if not
           anonymous.
      
      These changes make NOMMU mode more similar to MMU mode.  The downside is that
      NOMMU mode requires some extra memory to track things over NOMMU without this
      patch (VMAs are no longer shared, and there are now region structs).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMike Frysinger <vapier.adi@gmail.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      8feae131
  19. 07 1月, 2009 2 次提交
  20. 06 1月, 2009 2 次提交
  21. 05 1月, 2009 6 次提交
    • W
      230e40fb
    • H
      proc: fix sparse warning · dfe6b7d9
      Hannes Eder 提交于
      fs/proc/base.c:312:4: warning: do-while statement is not a compound statement
      Signed-off-by: NHannes Eder <hannes@hanneseder.net>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      dfe6b7d9
    • K
      proc: add /proc/*/stack · 2ec220e2
      Ken Chen 提交于
      /proc/*/stack adds the ability to query a task's stack trace. It is more
      useful than /proc/*/wchan as it provides full stack trace instead of single
      depth. Example output:
      
      	$ cat /proc/self/stack
      	[<c010a271>] save_stack_trace_tsk+0x17/0x35
      	[<c01827b4>] proc_pid_stack+0x4a/0x76
      	[<c018312d>] proc_single_show+0x4a/0x5e
      	[<c016bdec>] seq_read+0xf3/0x29f
      	[<c015a004>] vfs_read+0x6d/0x91
      	[<c015a0c1>] sys_read+0x3b/0x60
      	[<c0102eda>] syscall_call+0x7/0xb
      	[<ffffffff>] 0xffffffff
      
      [add save_stack_trace_tsk() on mips, ACK Ralf --adobriyan]
      Signed-off-by: NKen Chen <kenchen@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      2ec220e2
    • A
      proc: remove '##' usage · 631f9c18
      Alexey Dobriyan 提交于
      Inability to jump to /proc/*/foo handlers with ctags is annoying.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      631f9c18
    • A
      proc: remove useless WARN_ONs · ecae934e
      Alexey Dobriyan 提交于
      NULL "struct inode *" means VFS passed NULL inode to ->open.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      ecae934e
    • A
      proc: stop using BKL · b4df2b92
      Alexey Dobriyan 提交于
      There are four BKL users in proc: de_put(), proc_lookup_de(),
      proc_readdir_de(), proc_root_readdir(),
      
      1) de_put()
      -----------
      de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
      needed. BKL doesn't matter to possible refcount leak as well.
      
      2) proc_lookup_de()
      -------------------
      Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
      potentially blocking, all callers of proc_lookup_de() eventually end up
      from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
      doesn't protect anything.
      
      3) proc_readdir_de()
      --------------------
      "." and ".." part doesn't need BKL, walking PDE list is under
      proc_subdir_lock, calling filldir callback is potentially blocking
      because it writes to luserspace. All proc_readdir_de() callers
      eventually come from ->readdir hook which is under directory's
      ->i_mutex -- BKL doesn't protect anything.
      
      4) proc_root_readdir_de()
      -------------------------
      proc_root_readdir_de is ->readdir hook, see (3).
      
      Since readdir hooks doesn't use BKL anymore, switch to
      generic_file_llseek, since it also takes directory's i_mutex.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      b4df2b92
  22. 26 12月, 2008 1 次提交
  23. 22 12月, 2008 1 次提交
    • I
      sched: fix warning in fs/proc/base.c · 826e08b0
      Ingo Molnar 提交于
      Stephen Rothwell reported this new (harmless) build warning on platforms that
      define u64 to long:
      
       fs/proc/base.c: In function 'proc_pid_schedstat':
       fs/proc/base.c:352: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'
      
      asm-generic/int-l64.h platforms strike again: that file should be eliminated.
      
      Fix it by casting the parameters to long long.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      826e08b0