提交 · 7fdf523067666b0eaff330f362401ee50ce187c4 · openeuler / raspberrypi-kernel

03 5月, 2009 2 次提交

mm: fix Committed_AS underflow on large NR_CPUS environment · 00a62ce9

由 KOSAKI Motohiro 提交于 4月 30, 2009

The Committed_AS field can underflow in certain situations:

>         # while true; do cat /proc/meminfo  | grep _AS; sleep 1; done | uniq -c
>               1 Committed_AS: 18446744073709323392 kB
>              11 Committed_AS: 18446744073709455488 kB
>               6 Committed_AS:    35136 kB
>               5 Committed_AS: 18446744073709454400 kB
>               7 Committed_AS:    35904 kB
>               3 Committed_AS: 18446744073709453248 kB
>               2 Committed_AS:    34752 kB
>               9 Committed_AS: 18446744073709453248 kB
>               8 Committed_AS:    34752 kB
>               3 Committed_AS: 18446744073709320960 kB
>               7 Committed_AS: 18446744073709454080 kB
>               3 Committed_AS: 18446744073709320960 kB
>               5 Committed_AS: 18446744073709454080 kB
>               6 Committed_AS: 18446744073709320960 kB

Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does
not check for underflow.

But NR_CPUS proportional isn't good calculation.  In general,
possibility of lock contention is proportional to the number of online
cpus, not theorical maximum cpus (NR_CPUS).

The current kernel has generic percpu-counter stuff.  using it is right
way.  it makes code simplify and percpu_counter_read_positive() don't
make underflow issue.
Reported-by: NDave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: <stable@kernel.org>		[All kernel versions]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00a62ce9

pagemap: require aligned-length, non-null reads of /proc/pid/pagemap · 08161786

由 Vitaly Mayatskikh 提交于 4月 30, 2009

The intention of commit aae8679b
("pagemap: fix bug in add_to_pagemap, require aligned-length reads of
/proc/pid/pagemap") was to force reads of /proc/pid/pagemap to be a
multiple of 8 bytes, but now it allows to read 0 bytes, which actually
puts some data to user's buffer.  According to POSIX, if count is zero,
read() should return zero and has no other results.
Signed-off-by: NVitaly Mayatskikh <v.mayatskih@gmail.com>
Cc: Thomas Tuttle <ttuttle@google.com>
Acked-by: NMatt Mackall <mpm@selenic.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08161786

23 4月, 2009 1 次提交

[S390] /proc/stat idle field for idle cpus · e1c80530

由 Martin Schwidefsky 提交于 4月 23, 2009

The cpu idle field in the output of /proc/stat is too small for cpus
that have been idle for more than a tick. Add the architecture hook
arch_idle_time that allows to add the not accounted idle time of a
sleeping cpu without waking the cpu.

The s390 implementation of arch_idle_time uses the already existing
s390_idle_data per_cpu variable to find the sleep time of a neighboring
idle cpu.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e1c80530

17 4月, 2009 1 次提交

proc: mounts_poll() make consistent to mdstat_poll · 31b07093

由 KOSAKI Motohiro 提交于 4月 09, 2009

In recently sysfs_poll discussion, Neil Brown pointed out /proc/mounts
also should be fixed.

SUSv3 says "Regular files shall always poll TRUE for reading and
writing".  see
http://www.opengroup.org/onlinepubs/009695399/functions/poll.html

Then, mounts_poll()'s default should be "POLLIN | POLLRDNORM".  it mean
always readable.

In addition, event trigger should use "POLLERR | POLLPRI" instead
POLLERR.  it makes consistent to mdstat_poll() and sysfs_poll(). and,
select(2) can handle POLLPRI easily.
Reported-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

31b07093

09 4月, 2009 1 次提交

nommu: fix typo vma->pg_off to vma->vm_pgoff · 4c967291

由 Nobuhiro Iwamatsu 提交于 4月 07, 2009

6260a4b0 ("/proc/pid/maps: don't show
pgoff of pure ANON VMAs" had a typo.

fs/proc/task_nommu.c:138: error: 'struct vm_area_struct' has no member named 'pg_off'
distcc[21484] ERROR: compile fs/proc/task_nommu.c on sprygo/32 failed
Signed-off-by: NNobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c967291

07 4月, 2009 1 次提交

/proc/pid/maps: don't show pgoff of pure ANON VMAs · 6260a4b0

由 KAMEZAWA Hiroyuki 提交于 4月 06, 2009

Recently, it's argued that what proc/pid/maps shows is ugly when a 32bit
binary runs on 64bit host.

/proc/pid/maps outputs vma's pgoff member but vma->pgoff is of no use
information is the vma is for ANON.  With this patch, /proc/pid/maps shows
just 0 if no file backing store.

[akpm@linux-foundation.org: coding-style fixes]
[kamezawa.hiroyu@jp.fujitsu.com: coding-style fixes]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mike Waychison <mikew@google.com>
Reported-by: NYing Han <yinghan@google.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6260a4b0

03 4月, 2009 1 次提交

nommu: fix a number of issues with the per-MM VMA patch · 33e5d769

由 David Howells 提交于 4月 02, 2009

Fix a number of issues with the per-MM VMA patch:

 (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
     a NOMMU system with more than 2G pages.  Makes no difference on a 32-bit
     system.

 (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
     lest it overflow.

 (3) Move the allocation of the vm_area_struct slab back for fork.c.

 (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.

 (5) Use BUG_ON() rather than if () BUG().

 (6) Make the default validate_nommu_regions() a static inline rather than a
     #define.

 (7) Make free_page_series()'s objection to pages with a refcount != 1 more
     informative.

 (8) Adjust the __put_nommu_region() banner comment to indicate that the
     semaphore must be held for writing.

 (9) Limit the number of warnings about munmaps of non-mmapped regions.
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33e5d769

01 4月, 2009 4 次提交

proc tty: remove struct tty_operations::read_proc · 0f043a81

由 Alexey Dobriyan 提交于 3月 31, 2009

struct tty_operations::proc_fops took it's place and there is one less
create_proc_read_entry() user now!
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0f043a81

proc tty: add struct tty_operations::proc_fops · ae149b6b

由 Alexey Dobriyan 提交于 3月 31, 2009

Used for gradual switch of TTY drivers from using ->read_proc which helps
with gradual switch from ->read_proc for the whole tree.

As side effect, fix possible race condition when ->data initialized after
PDE is hooked into proc tree.

->proc_fops takes precedence over ->read_proc.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ae149b6b

Get rid of indirect include of fs_struct.h · 5ad4e53b

由 Al Viro 提交于 3月 29, 2009

Don't pull it in sched.h; very few files actually need it and those
can include directly.  sched.h itself only needs forward declaration
of struct fs_struct;
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5ad4e53b

New locking/refcounting for fs_struct · 498052bb

由 Al Viro 提交于 3月 30, 2009

* all changes of current->fs are done under task_lock and write_lock of
  old fs->lock
* refcount is not atomic anymore (same protection)
* its decrements are done when removing reference from current; at the
  same time we decide whether to free it.
* put_fs_struct() is gone
* new field - ->in_exec.  Set by check_unsafe_exec() if we are trying to do
  execve() and only subthreads share fs_struct.  Cleared when finishing exec
  (success and failure alike).  Makes CLONE_FS fail with -EAGAIN if set.
* check_unsafe_exec() may fail with -EAGAIN if another execve() from subthread
  is in progress.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

498052bb

31 3月, 2009 5 次提交

Revert "proc: revert /proc/uptime to ->read_proc hook" · a9caa3de

由 Alexey Dobriyan 提交于 2月 20, 2009

This reverts commit 6c87df37.

proc files implemented through seq_file do pread(2) now.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

a9caa3de

proc 2/2: remove struct proc_dir_entry::owner · 99b76233

由 Alexey Dobriyan 提交于 3月 25, 2009

Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.

We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.

But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.

->read_proc/->write_proc were just fixed to not require ->owner for
protection.

rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.

Removing ->owner will also make PDE smaller.

So, let's nuke it.

Kudos to Jeff Layton for reminding about this, let's say, oversight.

http://bugzilla.kernel.org/show_bug.cgi?id=12454Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

99b76233

proc 1/2: do PDE usecounting even for ->read_proc, ->write_proc · 3dec7f59

由 Alexey Dobriyan 提交于 2月 20, 2009

struct proc_dir_entry::owner is going to be removed. Now it's only necessary
to protect PDEs which are using ->read_proc, ->write_proc hooks.

However, ->owner assignments are racy and make it very easy for someone to switch
->owner on live PDE (as some subsystems do) without fixing refcounts and so on.

http://bugzilla.kernel.org/show_bug.cgi?id=12454

So, ->owner is on death row.

Proxy file operations exist already (proc_file_operations), just bump usecount
when necessary.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

3dec7f59

proc: fix sparse warnings in pagemap_read() · 09729a99

由 Milind Arun Choudhary 提交于 2月 20, 2009

fs/proc/task_mmu.c:696:12: warning: cast removes address space of expression
fs/proc/task_mmu.c:696:9: warning: incorrect type in assignment (different address spaces)
fs/proc/task_mmu.c:696:9: expected unsigned long long [noderef] [usertype] <asn:1>*out
fs/proc/task_mmu.c:696:9: got unsigned long long [usertype] *<noident>
fs/proc/task_mmu.c:697:12: warning: cast removes address space of expression
fs/proc/task_mmu.c:697:9: warning: incorrect type in assignment (different address spaces)
fs/proc/task_mmu.c:697:9: expected unsigned long long [noderef] [usertype] <asn:1>*end
fs/proc/task_mmu.c:697:9: got unsigned long long [usertype] *<noident>
fs/proc/task_mmu.c:723:12: warning: cast removes address space of expression
fs/proc/task_mmu.c:723:26: error: subtraction of different types can't work (different address spaces)
fs/proc/task_mmu.c:725:24: error: subtraction of different types can't work (different address spaces)
Signed-off-by: NMilind Arun Choudhary <milindchoudhary@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

09729a99

proc: move fs/proc/inode-alloc.txt comment into a source file · 1681bc30

由 Randy Dunlap 提交于 1月 13, 2009

so that people will realize that it exists and can update it as needed.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

1681bc30

30 3月, 2009 1 次提交

trivial: fix typo "kernal" -> "kernel" · 973c32be

由 Uwe Kleine-Koenig 提交于 1月 12, 2009

Signed-off-by: NUwe Kleine-Koenig <Uwe.Kleine-Koenig@digi.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

973c32be

29 3月, 2009 1 次提交

fix setuid sometimes wouldn't · 7c2c7d99

由 Hugh Dickins 提交于 3月 28, 2009

check_unsafe_exec() also notes whether the fs_struct is being
shared by more threads than will get killed by the exec, and if so
sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
But /proc/<pid>/cwd and /proc/<pid>/root lookups make transient
use of get_fs_struct(), which also raises that sharing count.

This might occasionally cause a setuid program not to change euid,
in the same way as happened with files->count (check_unsafe_exec
also looks at sighand->count, but /proc doesn't raise that one).

We'd prefer exec not to unshare fs_struct: so fix this in procfs,
replacing get_fs_struct() by get_fs_path(), which does path_get
while still holding task_lock, instead of raising fs->count.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
___

 fs/proc/base.c |   50 +++++++++++++++--------------------------------
 1 file changed, 16 insertions(+), 34 deletions(-)
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c2c7d99

28 3月, 2009 2 次提交

vfs: simple_set_mnt() should return void · a3ec947c

由 Sukadev Bhattiprolu 提交于 3月 04, 2009

simple_set_mnt() is defined as returning 'int' but always returns 0.
Callers assume simple_set_mnt() never fails and don't properly cleanup if
it were to _ever_ fail.  For instance, get_sb_single() and get_sb_nodev()
should:

        up_write(sb->s_unmount);
        deactivate_super(sb);

if simple_set_mnt() fails.

Since simple_set_mnt() never fails, would be cleaner if it did not
return anything.

[akpm@linux-foundation.org: fix build]
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a3ec947c

A
constify dentry_operations: procfs · d72f71eb
由 Al Viro 提交于 2月 20, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d72f71eb

18 3月, 2009 1 次提交

Avoid 64-bit "switch()" statements on 32-bit architectures · ee568b25

由 Linus Torvalds 提交于 3月 17, 2009

Commit ee6f779b ("filp->f_pos not
correctly updated in proc_task_readdir") changed the proc code to use
filp->f_pos directly, rather than through a temporary variable.  In the
process, that caused the operations to be done on the full 64 bits, even
though the offset is never that big.

That's all fine and dandy per se, but for some unfathomable reason gcc
generates absolutely horrid code when using 64-bit values in switch()
statements.  To the point of actually calling out to gcc helper
functions like __cmpdi2 rather than just doing the trivial comparisons
directly the way gcc does for normal compares.  At which point we get
link failures, because we really don't want to support that kind of
crazy code.

Fix this by just casting the f_pos value to "unsigned long", which
is plenty big enough for /proc, and avoids the gcc code generation issue.
Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Zhang Le <r0bertz@gentoo.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee568b25

16 3月, 2009 1 次提交

filp->f_pos not correctly updated in proc_task_readdir · ee6f779b

由 Zhang Le 提交于 3月 16, 2009

filp->f_pos only get updated at the end of the function. Thus d_off of those
dirents who are in the middle will be 0, and this will cause a problem in
glibc's readdir implementation, specifically endless loop. Because when overflow
occurs, f_pos will be set to next dirent to read, however it will be 0, unless
the next one is the last one. So it will start over again and again.

There is a sample program in man 2 gendents. This is the output of the program
running on a multithread program's task dir before this patch is applied:

  $ ./a.out /proc/3807/task
  --------------- nread=128 ---------------
  i-node#  file type  d_reclen  d_off   d_name
    506442  directory    16          1  .
    506441  directory    16          0  ..
    506443  directory    16          0  3807
    506444  directory    16          0  3809
    506445  directory    16          0  3812
    506446  directory    16          0  3861
    506447  directory    16          0  3862
    506448  directory    16          8  3863

This is the output after this patch is applied

  $ ./a.out /proc/3807/task
  --------------- nread=128 ---------------
  i-node#  file type  d_reclen  d_off   d_name
    506442  directory    16          1  .
    506441  directory    16          2  ..
    506443  directory    16          3  3807
    506444  directory    16          4  3809
    506445  directory    16          5  3812
    506446  directory    16          6  3861
    506447  directory    16          7  3862
    506448  directory    16          8  3863
Signed-off-by: NZhang Le <r0bertz@gentoo.org>
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee6f779b

11 3月, 2009 1 次提交

proc: fix kflags to uflags copying in /proc/kpageflags · ad3bdefe

由 Wu Fengguang 提交于 3月 11, 2009

Fix kpf_copy_bit(src,dst) to be kpf_copy_bit(dst,src) to match the
actual call patterns, e.g. kpf_copy_bit(kflags, KPF_LOCKED, PG_locked).

This misplacement of src/dst only affected reporting of PG_writeback,
PG_reclaim and PG_buddy. For others kflags==uflags so not affected.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad3bdefe

25 2月, 2009 1 次提交

proc: fix PG_locked reporting in /proc/kpageflags · e07a4b92

由 Helge Bahmann 提交于 2月 20, 2009

Expr always evaluates to zero.

Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

e07a4b92

24 2月, 2009 1 次提交

proc: proc_get_inode should de_put when inode already initialized · cac71121

由 Krzysztof Sachanowicz 提交于 2月 23, 2009

de_get is called before every proc_get_inode, but corresponding de_put is
called only when dropping last reference to an inode. This might cause
something like
remove_proc_entry: /proc/stats busy, count=14496
to be printed to the syslog.

The fix is to call de_put in case of an already initialized inode in
proc_get_inode.
Signed-off-by: NKrzysztof Sachanowicz <analyzer1@gmail.com>
Tested-by: NMarcin Pilipczuk <marcin.pilipczuk@gmail.com>
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cac71121

09 1月, 2009 1 次提交

vmcore: remove saved_max_pfn check · 921d58c0

由 Magnus Damm 提交于 1月 07, 2009

Remove the saved_max_pfn check from the /proc/vmcore function
read_from_oldmem(). No need to verify, we should be able to just trust
that "elfcorehdr=" is correctly passed to the crash kernel on the kernel
command line like we do with other parameters.

The read_from_oldmem() function in fs/proc/vmcore.c is quite similar to
read_from_oldmem() in drivers/char/mem.c, but only in the latter it makes
sense to use saved_max_pfn. For oldmem it is used to determine when to
stop reading. For vmcore we already have the elf header info pointing out
the physical memory regions, no need to pass the end-of- old-memory twice.

Removing the saved_max_pfn check from vmcore makes it possible for
architectures to skip oldmem but still support crash dump through vmcore -
without the need for the old saved_max_pfn cruft.

Architectures that want to play safe can do the saved_max_pfn check in
copy_oldmem_page(). Not sure why anyone would want to do that, but that's
even safer than today - the saved_max_pfn check in vmcore removed by this
patch only checks the first page.
Signed-off-by: NMagnus Damm <damm@igel.co.jp>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NSimon Horman <horms@verge.net.au>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

921d58c0

08 1月, 2009 2 次提交

NOMMU: Improve procfs output using per-MM VMAs · 38f71479

由 David Howells 提交于 1月 08, 2009

Improve procfs output using per-MM VMAs for process memory accounting.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NMike Frysinger <vapier.adi@gmail.com>
Acked-by: NPaul Mundt <lethal@linux-sh.org>

38f71479

NOMMU: Make VMAs per MM as for MMU-mode linux · 8feae131

由 David Howells 提交于 1月 08, 2009

Make VMAs per mm_struct as for MMU-mode linux.  This solves two problems:

 (1) In SYSV SHM where nattch for a segment does not reflect the number of
     shmat's (and forks) done.

 (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
     exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
     that a VMA might be shared and already have its vm_mm assigned to another
     process or a dead process.

A new struct (vm_region) is introduced to track a mapped region and to remember
the circumstances under which it may be shared and the vm_list_struct structure
is discarded as it's no longer required.

This patch makes the following additional changes:

 (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
     with no recourse to __GFP_COMP, so the pages are not composite.  Instead,
     each page has a reference on it held by the region.  Anything else that is
     interested in such a page will have to get a reference on it to retain it.
     When the pages are released due to unmapping, each page is passed to
     put_page() and will be freed when the page usage count reaches zero.

 (2) Excess pages are trimmed after an allocation as the allocation must be
     made as a power-of-2 quantity of pages.

 (3) VMAs are added to the parent MM's R/B tree and mmap lists.  As an MM may
     end up with overlapping VMAs within the tree, the VMA struct address is
     appended to the sort key.

 (4) Non-anonymous VMAs are now added to the backing inode's prio list.

 (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
     the backing region.  The VMA and region structs will be split if
     necessary.

 (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
     segment instead of all the attachments at that addresss.  Multiple
     shmat()'s return the same address under NOMMU-mode instead of different
     virtual addresses as under MMU-mode.

 (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

 (8) /proc/maps is now the global list of mapped regions, and may list bits
     that aren't actually mapped anywhere.

 (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
     of RAM currently allocated by mmap to hold mappable regions that can't be
     mapped directly.  These are copies of the backing device or file if not
     anonymous.

These changes make NOMMU mode more similar to MMU mode.  The downside is that
NOMMU mode requires some extra memory to track things over NOMMU without this
patch (VMAs are no longer shared, and there are now region structs).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NMike Frysinger <vapier.adi@gmail.com>
Acked-by: NPaul Mundt <lethal@linux-sh.org>

8feae131

07 1月, 2009 2 次提交

mm: report the MMU pagesize in /proc/pid/smaps · 3340289d

由 Mel Gorman 提交于 1月 06, 2009

The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
kernel to back a VMA.  This matches the size used by the MMU in the
majority of cases.  However, one counter-example occurs on PPC64 kernels
whereby a kernel using 64K as a base pagesize may still use 4K pages for
the MMU on older processor.  To distinguish, this patch reports
MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3340289d

mm: report the pagesize backing a VMA in /proc/pid/smaps · 08fba699

由 Mel Gorman 提交于 1月 06, 2009

It is useful to verify a hugepage-aware application is using the expected
pagesizes for its memory regions. This patch creates an entry called
KernelPageSize in /proc/pid/smaps that is the size of page used by the
kernel to back a VMA. The entry is not called PageSize as it is possible
the MMU uses a different size. This extension should not break any sensible
parser that skips lines containing unrecognised information.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: N"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08fba699

06 1月, 2009 2 次提交

trivial: fix then -> than typos in comments and documentation · 025dfdaf

由 Frederik Schwarzer 提交于 10月 16, 2008

- (better, more, bigger ...) then -> (...) than
Signed-off-by: NFrederik Schwarzer <schwarzerf@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

025dfdaf

zero i_uid/i_gid on inode allocation · 56ff5efa

由 Al Viro 提交于 12月 09, 2008

... and don't bother in callers.  Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.

i_mode is not worth the effort; it has no common default value.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

56ff5efa

05 1月, 2009 6 次提交

W
proc: remove write-only variable in proc_pident_lookup() · 230e40fb
由 WANG Cong 提交于 12月 30, 2008
```
Signed-off-by: NWANG Cong <wangcong@zeuux.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
230e40fb

proc: fix sparse warning · dfe6b7d9

由 Hannes Eder 提交于 12月 30, 2008

fs/proc/base.c:312:4: warning: do-while statement is not a compound statement
Signed-off-by: NHannes Eder <hannes@hanneseder.net>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

dfe6b7d9

proc: add /proc/*/stack · 2ec220e2

由 Ken Chen 提交于 11月 10, 2008

/proc/*/stack adds the ability to query a task's stack trace. It is more
useful than /proc/*/wchan as it provides full stack trace instead of single
depth. Example output:

	$ cat /proc/self/stack
	[<c010a271>] save_stack_trace_tsk+0x17/0x35
	[<c01827b4>] proc_pid_stack+0x4a/0x76
	[<c018312d>] proc_single_show+0x4a/0x5e
	[<c016bdec>] seq_read+0xf3/0x29f
	[<c015a004>] vfs_read+0x6d/0x91
	[<c015a0c1>] sys_read+0x3b/0x60
	[<c0102eda>] syscall_call+0x7/0xb
	[<ffffffff>] 0xffffffff

[add save_stack_trace_tsk() on mips, ACK Ralf --adobriyan]
Signed-off-by: NKen Chen <kenchen@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

2ec220e2

proc: remove '##' usage · 631f9c18

由 Alexey Dobriyan 提交于 11月 10, 2008

Inability to jump to /proc/*/foo handlers with ctags is annoying.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

631f9c18

proc: remove useless WARN_ONs · ecae934e

由 Alexey Dobriyan 提交于 11月 09, 2008

NULL "struct inode *" means VFS passed NULL inode to ->open.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

ecae934e

proc: stop using BKL · b4df2b92

由 Alexey Dobriyan 提交于 10月 27, 2008

There are four BKL users in proc: de_put(), proc_lookup_de(),
proc_readdir_de(), proc_root_readdir(),

1) de_put()
-----------
de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
needed. BKL doesn't matter to possible refcount leak as well.

2) proc_lookup_de()
-------------------
Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
potentially blocking, all callers of proc_lookup_de() eventually end up
from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
doesn't protect anything.

3) proc_readdir_de()
--------------------
"." and ".." part doesn't need BKL, walking PDE list is under
proc_subdir_lock, calling filldir callback is potentially blocking
because it writes to luserspace. All proc_readdir_de() callers
eventually come from ->readdir hook which is under directory's
->i_mutex -- BKL doesn't protect anything.

4) proc_root_readdir_de()
-------------------------
proc_root_readdir_de is ->readdir hook, see (3).

Since readdir hooks doesn't use BKL anymore, switch to
generic_file_llseek, since it also takes directory's i_mutex.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

b4df2b92

26 12月, 2008 1 次提交

proc: remove ifdef CONFIG_SPARSE_IRQ from stat.c · 26ddd8d5

由 KOSAKI Motohiro 提交于 12月 26, 2008

Impact: cleanup

irq_desc can be NULL when CONFIG_SPARSE_IRQ=y only.
therefore, NULL checking can move into kstat_irqs_cpu() of SPARSE_IRQ version.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: N"Yinghai Lu" <yinghai@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

26ddd8d5

22 12月, 2008 1 次提交

sched: fix warning in fs/proc/base.c · 826e08b0

由 Ingo Molnar 提交于 12月 22, 2008

Stephen Rothwell reported this new (harmless) build warning on platforms that
define u64 to long:

 fs/proc/base.c: In function 'proc_pid_schedstat':
 fs/proc/base.c:352: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'

asm-generic/int-l64.h platforms strike again: that file should be eliminated.

Fix it by casting the parameters to long long.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

826e08b0