提交 · 13f0feafa6b8aead57a2a328e2fca6a5828bf286 · openeuler / raspberrypi-kernel

10 8月, 2009 1 次提交

mm_for_maps: simplify, use ptrace_may_access() · 13f0feaf

由 Oleg Nesterov 提交于 6月 23, 2009

It would be nice to kill __ptrace_may_access(). It requires task_lock(),
but this lock is only needed to read mm->flags in the middle.

Convert mm_for_maps() to use ptrace_may_access(), this also simplifies
the code a little bit.

Also, we do not need to take ->mmap_sem in advance. In fact I think
mm_for_maps() should not play with ->mmap_sem at all, the caller should
take this lock.

With or without this patch, without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

13f0feaf

17 6月, 2009 1 次提交

oom: move oom_adj value from task_struct to mm_struct · 2ff05b2b

由 David Rientjes 提交于 6月 16, 2009

The per-task oom_adj value is a characteristic of its mm more than the
task itself since it's not possible to oom kill any thread that shares the
mm.  If a task were to be killed while attached to an mm that could not be
freed because another thread were set to OOM_DISABLE, it would have
needlessly been terminated since there is no potential for future memory
freeing.

This patch moves oomkilladj (now more appropriately named oom_adj) from
struct task_struct to struct mm_struct.  This requires task_lock() on a
task to check its oom_adj value to protect against exec, but it's already
necessary to take the lock when dereferencing the mm to find the total VM
size for the badness heuristic.

This fixes a livelock if the oom killer chooses a task and another thread
sharing the same memory has an oom_adj value of OOM_DISABLE.  This occurs
because oom_kill_task() repeatedly returns 1 and refuses to kill the
chosen task while select_bad_process() will repeatedly choose the same
task during the next retry.

Taking task_lock() in select_bad_process() to check for OOM_DISABLE and in
oom_kill_task() to check for threads sharing the same memory will be
removed in the next patch in this series where it will no longer be
necessary.

Writing to /proc/pid/oom_adj for a kthread will now return -EINVAL since
these threads are immune from oom killing already.  They simply report an
oom_adj value of OOM_DISABLE.

Cc: Nick Piggin <npiggin@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ff05b2b

29 5月, 2009 1 次提交

procfs: make errno values consistent when open pident vs exit(2) race occurs · bd6daba9

由 KOSAKI Motohiro 提交于 5月 28, 2009

proc_pident_instantiate() has following call flow.

proc_pident_lookup()
  proc_pident_instantiate()
    proc_pid_make_inode()

And, proc_pident_lookup() has following error handling.

	const struct pid_entry *p, *last;
	error = ERR_PTR(-ENOENT);
	if (!task)
		goto out_no_task;

Then, proc_pident_instantiate should return ENOENT too when racing against
exit(2) occur.

EINAL has two bad reason.
  - it implies caller is wrong. bad the race isn't caller's mistake.
  - man 2 open don't explain EINVAL. user often don't handle it.

Note: Other proc_pid_make_inode() caller already use ENOENT properly.
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd6daba9

11 5月, 2009 1 次提交

CRED: Guard the setprocattr security hook against ptrace · 107db7c7

由 David Howells 提交于 5月 08, 2009

Guard the setprocattr security hook against ptrace by taking the target task's
cred_guard_mutex around it. The problem is that setprocattr() may otherwise
note the lack of a debugger, and then perform an action on that basis whilst
letting a debugger attach between the two points. Holding cred_guard_mutex
across the test and the action prevents ptrace_attach() from doing that.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

107db7c7

05 5月, 2009 1 次提交

proc: avoid information leaks to non-privileged processes · f83ce3e6

由 Jake Edge 提交于 5月 04, 2009

By using the same test as is used for /proc/pid/maps and /proc/pid/smaps,
only allow processes that can ptrace() a given process to see information
that might be used to bypass address space layout randomization (ASLR).
These include eip, esp, wchan, and start_stack in /proc/pid/stat as well
as the non-symbolic output from /proc/pid/wchan.

ASLR can be bypassed by sampling eip as shown by the proof-of-concept
code at http://code.google.com/p/fuzzyaslr/ As part of a presentation
(http://www.cr0.org/paper/to-jt-linux-alsr-leak.pdf) esp and wchan were
also noted as possibly usable information leaks as well.  The
start_stack address also leaks potentially useful information.

Cc: Stable Team <stable@kernel.org>
Signed-off-by: NJake Edge <jake@lwn.net>
Acked-by: NArjan van de Ven <arjan@linux.intel.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f83ce3e6

17 4月, 2009 1 次提交

proc: mounts_poll() make consistent to mdstat_poll · 31b07093

由 KOSAKI Motohiro 提交于 4月 09, 2009

In recently sysfs_poll discussion, Neil Brown pointed out /proc/mounts
also should be fixed.

SUSv3 says "Regular files shall always poll TRUE for reading and
writing".  see
http://www.opengroup.org/onlinepubs/009695399/functions/poll.html

Then, mounts_poll()'s default should be "POLLIN | POLLRDNORM".  it mean
always readable.

In addition, event trigger should use "POLLERR | POLLPRI" instead
POLLERR.  it makes consistent to mdstat_poll() and sysfs_poll(). and,
select(2) can handle POLLPRI easily.
Reported-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

31b07093

01 4月, 2009 1 次提交

Get rid of indirect include of fs_struct.h · 5ad4e53b

由 Al Viro 提交于 3月 29, 2009

Don't pull it in sched.h; very few files actually need it and those
can include directly.  sched.h itself only needs forward declaration
of struct fs_struct;
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5ad4e53b

29 3月, 2009 1 次提交

fix setuid sometimes wouldn't · 7c2c7d99

由 Hugh Dickins 提交于 3月 28, 2009

check_unsafe_exec() also notes whether the fs_struct is being
shared by more threads than will get killed by the exec, and if so
sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid.
But /proc/<pid>/cwd and /proc/<pid>/root lookups make transient
use of get_fs_struct(), which also raises that sharing count.

This might occasionally cause a setuid program not to change euid,
in the same way as happened with files->count (check_unsafe_exec
also looks at sighand->count, but /proc doesn't raise that one).

We'd prefer exec not to unshare fs_struct: so fix this in procfs,
replacing get_fs_struct() by get_fs_path(), which does path_get
while still holding task_lock, instead of raising fs->count.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
___

 fs/proc/base.c |   50 +++++++++++++++--------------------------------
 1 file changed, 16 insertions(+), 34 deletions(-)
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c2c7d99

28 3月, 2009 1 次提交
- A
  constify dentry_operations: procfs · d72f71eb
  由 Al Viro 提交于 2月 20, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  d72f71eb
18 3月, 2009 1 次提交

Avoid 64-bit "switch()" statements on 32-bit architectures · ee568b25

由 Linus Torvalds 提交于 3月 17, 2009

Commit ee6f779b ("filp->f_pos not
correctly updated in proc_task_readdir") changed the proc code to use
filp->f_pos directly, rather than through a temporary variable.  In the
process, that caused the operations to be done on the full 64 bits, even
though the offset is never that big.

That's all fine and dandy per se, but for some unfathomable reason gcc
generates absolutely horrid code when using 64-bit values in switch()
statements.  To the point of actually calling out to gcc helper
functions like __cmpdi2 rather than just doing the trivial comparisons
directly the way gcc does for normal compares.  At which point we get
link failures, because we really don't want to support that kind of
crazy code.

Fix this by just casting the f_pos value to "unsigned long", which
is plenty big enough for /proc, and avoids the gcc code generation issue.
Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Zhang Le <r0bertz@gentoo.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee568b25

16 3月, 2009 1 次提交

filp->f_pos not correctly updated in proc_task_readdir · ee6f779b

由 Zhang Le 提交于 3月 16, 2009

filp->f_pos only get updated at the end of the function. Thus d_off of those
dirents who are in the middle will be 0, and this will cause a problem in
glibc's readdir implementation, specifically endless loop. Because when overflow
occurs, f_pos will be set to next dirent to read, however it will be 0, unless
the next one is the last one. So it will start over again and again.

There is a sample program in man 2 gendents. This is the output of the program
running on a multithread program's task dir before this patch is applied:

  $ ./a.out /proc/3807/task
  --------------- nread=128 ---------------
  i-node#  file type  d_reclen  d_off   d_name
    506442  directory    16          1  .
    506441  directory    16          0  ..
    506443  directory    16          0  3807
    506444  directory    16          0  3809
    506445  directory    16          0  3812
    506446  directory    16          0  3861
    506447  directory    16          0  3862
    506448  directory    16          8  3863

This is the output after this patch is applied

  $ ./a.out /proc/3807/task
  --------------- nread=128 ---------------
  i-node#  file type  d_reclen  d_off   d_name
    506442  directory    16          1  .
    506441  directory    16          2  ..
    506443  directory    16          3  3807
    506444  directory    16          4  3809
    506445  directory    16          5  3812
    506446  directory    16          6  3861
    506447  directory    16          7  3862
    506448  directory    16          8  3863
Signed-off-by: NZhang Le <r0bertz@gentoo.org>
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee6f779b

06 1月, 2009 1 次提交

zero i_uid/i_gid on inode allocation · 56ff5efa

由 Al Viro 提交于 12月 09, 2008

... and don't bother in callers.  Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.

i_mode is not worth the effort; it has no common default value.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

56ff5efa

05 1月, 2009 5 次提交

W
proc: remove write-only variable in proc_pident_lookup() · 230e40fb
由 WANG Cong 提交于 12月 30, 2008
```
Signed-off-by: NWANG Cong <wangcong@zeuux.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
230e40fb

proc: fix sparse warning · dfe6b7d9

由 Hannes Eder 提交于 12月 30, 2008

fs/proc/base.c:312:4: warning: do-while statement is not a compound statement
Signed-off-by: NHannes Eder <hannes@hanneseder.net>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

dfe6b7d9

proc: add /proc/*/stack · 2ec220e2

由 Ken Chen 提交于 11月 10, 2008

/proc/*/stack adds the ability to query a task's stack trace. It is more
useful than /proc/*/wchan as it provides full stack trace instead of single
depth. Example output:

	$ cat /proc/self/stack
	[<c010a271>] save_stack_trace_tsk+0x17/0x35
	[<c01827b4>] proc_pid_stack+0x4a/0x76
	[<c018312d>] proc_single_show+0x4a/0x5e
	[<c016bdec>] seq_read+0xf3/0x29f
	[<c015a004>] vfs_read+0x6d/0x91
	[<c015a0c1>] sys_read+0x3b/0x60
	[<c0102eda>] syscall_call+0x7/0xb
	[<ffffffff>] 0xffffffff

[add save_stack_trace_tsk() on mips, ACK Ralf --adobriyan]
Signed-off-by: NKen Chen <kenchen@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

2ec220e2

proc: remove '##' usage · 631f9c18

由 Alexey Dobriyan 提交于 11月 10, 2008

Inability to jump to /proc/*/foo handlers with ctags is annoying.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

631f9c18

proc: remove useless WARN_ONs · ecae934e

由 Alexey Dobriyan 提交于 11月 09, 2008

NULL "struct inode *" means VFS passed NULL inode to ->open.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

ecae934e

22 12月, 2008 1 次提交

sched: fix warning in fs/proc/base.c · 826e08b0

由 Ingo Molnar 提交于 12月 22, 2008

Stephen Rothwell reported this new (harmless) build warning on platforms that
define u64 to long:

 fs/proc/base.c: In function 'proc_pid_schedstat':
 fs/proc/base.c:352: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'

asm-generic/int-l64.h platforms strike again: that file should be eliminated.

Fix it by casting the parameters to long long.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

826e08b0

18 12月, 2008 1 次提交

schedstat: consolidate per-task cpu runtime stats · 9c2c4802

由 Ken Chen 提交于 12月 16, 2008

Impact: simplify code

When we turn on CONFIG_SCHEDSTATS, per-task cpu runtime is accumulated
twice. Once in task->se.sum_exec_runtime and once in sched_info.cpu_time.
These two stats are exactly the same.

Given that task->se.sum_exec_runtime is always accumulated by the core
scheduler, sched_info can reuse that data instead of duplicate the accounting.
Signed-off-by: NKen Chen <kenchen@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9c2c4802

11 12月, 2008 1 次提交

KSYM_SYMBOL_LEN fixes · 9c246247

由 Hugh Dickins 提交于 12月 09, 2008

Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked
to my 966c8c12 sprint_symbol(): use
less stack exposing a bug in slub's list_locations() -
kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was
beyond the end of page provided.

The 100 slop which list_locations() allows at end of page looks roughly
enough for all the other stuff it might print after the symbol before
it checks again: break out KSYM_SYMBOL_LEN earlier than before.

Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they
need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer
where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies
them.

[akpm@linux-foundation.org: ftrace.h needs module.h]
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc Miles Lane <miles.lane@gmail.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NSteven Rostedt <srostedt@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c246247

14 11月, 2008 2 次提交

CRED: Use RCU to access another task's creds and to release a task's own creds · c69e8d9c

由 David Howells 提交于 11月 14, 2008

Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

c69e8d9c

CRED: Separate task security context from task_struct · b6dff3ec

由 David Howells 提交于 11月 14, 2008

Separate the task security context from task_struct.  At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

b6dff3ec

21 10月, 2008 1 次提交
- A
  [PATCH] introduce fmode_t, do annotations · aeb5d727
  由 Al Viro 提交于 9月 02, 2008
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  aeb5d727
10 10月, 2008 3 次提交

proc: remove kernel.maps_protect · 3bbfe059

由 Alexey Dobriyan 提交于 10月 10, 2008

After commit 831830b5 aka
"restrict reading from /proc/<pid>/maps to those who share ->mm or can ptrace"
sysctl stopped being relevant because commit moved security checks from ->show
time to ->start time (mm_for_maps()).
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NKees Cook <kees.cook@canonical.com>

3bbfe059

[PATCH] proc: show personality via /proc/pid/personality · 47830723

由 Kees Cook 提交于 10月 06, 2008

Make process personality flags visible in /proc.  Since a process's
personality is potentially sensitive (e.g. READ_IMPLIES_EXEC), make this
file only readable by the process owner.
Signed-off-by: NKees Cook <kees.cook@canonical.com>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

47830723

[PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock() · a6bebbc8

由 Lai Jiangshan 提交于 10月 05, 2008

lock_task_sighand() make sure task->sighand is being protected,
so we do not need rcu_read_lock().
[ exec() will get task->sighand->siglock before change task->sighand! ]

But code using rcu_read_lock() _just_ to protect lock_task_sighand()
only appear in procfs. (and some code in procfs use lock_task_sighand()
without such redundant protection.)

Other subsystem may put lock_task_sighand() into rcu_read_lock()
critical region, but these rcu_read_lock() are used for protecting
"for_each_process()", "find_task_by_vpid()" etc. , not for protecting
lock_task_sighand().
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
[ok from Oleg]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>

a6bebbc8

06 8月, 2008 1 次提交

proc: fix warnings · 7c44319d

由 Alexander Beregalov 提交于 8月 05, 2008

proc: fix warnings

fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 5 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 6 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 7 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 8 has type 'u64'
fs/proc/base.c:2429: warning: format '%llu' expects type 'long long unsigned int', but argument 9 has type 'u64'
Signed-off-by: NAlexander Beregalov <a.beregalov@gmail.com>
Acked-by: NAndrea Righi <righi.andrea@gmail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c44319d

28 7月, 2008 2 次提交

task IO accounting: move all IO statistics in struct task_io_accounting · 940389b8

由 Andrea Righi 提交于 7月 28, 2008

Simplify the code of include/linux/task_io_accounting.h.

It is also more reasonable to have all the task i/o-related statistics in a
single struct (task_io_accounting).
Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

940389b8

task IO accounting: improve code readability · 5995477a

由 Andrea Righi 提交于 7月 27, 2008

Put all i/o statistics in struct proc_io_accounting and use inline functions to
initialize and increment statistics, removing a lot of single variable
assignments.

This also reduces the kernel size as following (with CONFIG_TASK_XACCT=y and
CONFIG_TASK_IO_ACCOUNTING=y).

    text    data     bss     dec     hex filename
   11651       0       0   11651    2d83 kernel/exit.o.before
   11619       0       0   11619    2d63 kernel/exit.o.after
   10886     132     136   11154    2b92 kernel/fork.o.before
   10758     132     136   11026    2b12 kernel/fork.o.after

 3082029  807968 4818600 8708597  84e1f5 vmlinux.o.before
 3081869  807968 4818600 8708437  84e155 vmlinux.o.after
Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
Acked-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5995477a

27 7月, 2008 4 次提交

task IO accounting: correctly account threads IO statistics · b2d002db

由 Andrea Righi 提交于 7月 26, 2008

Oleg Nesterov points out that we should check that the task is still alive
before we iterate over the threads.  This patch includes a fixup for this.

Also simplify do_io_accounting() implementation.
Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2d002db

[PATCH] sanitize ->permission() prototype · e6305c43

由 Al Viro 提交于 7月 15, 2008

* kill nameidata * argument; map the 3 bits in ->flags anybody cares
  about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
  MAY_... found in mask.

The obvious next target in that direction is permission(9)

folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e6305c43

/proc/PID/syscall · ebcb6734

由 Roland McGrath 提交于 7月 25, 2008

This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files.
These use task_current_syscall() to show the task's current system call
number and argument registers, stack pointer and PC.  For a task blocked
but not in a syscall, the file shows "-1" in place of the syscall number,
followed by only the SP and PC.  For a task that's not blocked, it shows
"running".
Signed-off-by: NRoland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ebcb6734

tracehook: tracehook_tracer_task · 0d094efe

由 Roland McGrath 提交于 7月 25, 2008

This adds the tracehook_tracer_task() hook to consolidate all forms of
"Who is using ptrace on me?" logic.  This is used for "TracerPid:" in
/proc and for permission checks.  We also clean up the selinux code the
called an identical accessor.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0d094efe

26 7月, 2008 1 次提交

task IO accounting: provide distinct tgid/tid I/O statistics · 297c5d92

由 Andrea Righi 提交于 7月 25, 2008

Report per-thread I/O statistics in /proc/pid/task/tid/io and aggregate
parent I/O statistics in /proc/pid/io.  This approach follows the same
model used to account per-process and per-thread CPU times.

As a practial application, this allows for example to quickly find the top
I/O consumer when a process spawns many child threads that perform the
actual I/O work, because the aggregated I/O statistics can always be found
in /proc/pid/io.

[ Oleg Nesterov points out that we should check that the task is still
  alive before we iterate over the threads, but also says that we can do
  that fixup on top of this later.  - Linus ]
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
Cc: Matt Heaton <matt@hostmonster.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Acked-by-with-comments: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

297c5d92

14 7月, 2008 1 次提交

Security: split proc ptrace checking into read vs. attach · 006ebb40

由 Stephen Smalley 提交于 5月 19, 2008

Enable security modules to distinguish reading of process state via
proc from full ptrace access by renaming ptrace_may_attach to
ptrace_may_access and adding a mode argument indicating whether only
read access or full attach access is requested. This allows security
modules to permit access to reading process state without granting
full ptrace access. The base DAC/capability checking remains unchanged.

Read access to /proc/pid/mem continues to apply a full ptrace attach
check since check_mem_permission() already requires the current task
to already be ptracing the target. The other ptrace checks within
proc for elements like environ, maps, and fds are changed to pass the
read mode instead of attach.

In the SELinux case, we model such reading of process state as a
reading of a proc file labeled with the target process' label. This
enables SELinux policy to permit such reading of process state without
permitting control or manipulation of the target process, as there are
a number of cases where programs probe for such information via proc
but do not need to be able to control the target (e.g. procps,
lsof, PolicyKit, ConsoleKit). At present we have to choose between
allowing full ptrace in policy (more permissive than required/desired)
or breaking functionality (or in some cases just silencing the denials
via dontaudit rules but this can hide genuine attacks).

This version of the patch incorporates comments from Casey Schaufler
(change/replace existing ptrace_may_attach interface, pass access
mode), and Chris Wright (provide greater consistency in the checking).

Note that like their predecessors __ptrace_may_attach and
ptrace_may_attach, the __ptrace_may_access and ptrace_may_access
interfaces use different return value conventions from each other (0
or -errno vs. 1 or 0). I retained this difference to avoid any
changes to the caller logic but made the difference clearer by
changing the latter interface to return a bool rather than an int and
by adding a comment about it to ptrace.h for any future callers.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Acked-by: NChris Wright <chrisw@sous-sol.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

006ebb40

07 6月, 2008 1 次提交

proc: calculate the correct /proc/<pid> link count · aed54175

由 Vegard Nossum 提交于 6月 05, 2008

This patch:

  commit e9720acd
  Author: Pavel Emelyanov <xemul@openvz.org>
  Date:   Fri Mar 7 11:08:40 2008 -0800

    [NET]: Make /proc/net a symlink on /proc/self/net (v3)

introduced a /proc/self/net directory without bumping the corresponding
link count for /proc/self.

This patch replaces the static link count initializations with a call that
counts the number of directory entries in the given pid_entry table
whenever it is instantiated, and thus relieves the burden of manually
keeping the two in sync.

[akpm@linux-foundation.org: cleanup]
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aed54175

17 5月, 2008 1 次提交

[PATCH] open sessionid permissions · 6ee65046

由 Steve Grubb 提交于 4月 29, 2008

The current permissions on sessionid are a little too restrictive.
Signed-off-by: NSteve Grubb <sgrubb@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6ee65046

02 5月, 2008 1 次提交

[PATCH] split linux/file.h · 9f3acc31

由 Al Viro 提交于 4月 24, 2008

Initial splitoff of the low-level stuff; taken to fdtable.h
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9f3acc31

29 4月, 2008 2 次提交

procfs: mem permission cleanup · 638fa202

由 Roland McGrath 提交于 4月 29, 2008

This cleans up the permission checks done for /proc/PID/mem i/o calls.  It
puts all the logic in a new function, check_mem_permission().

The old code repeated the (!MAY_PTRACE(task) || !ptrace_may_attach(task))
magical expression multiple times.  The new function does all that work in one
place, with clear comments.

The old code called security_ptrace() twice on successful checks, once in
MAY_PTRACE() and once in __ptrace_may_attach().  Now it's only called once,
and only if all other checks have succeeded.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

638fa202

procfs task exe symlink · 925d1c40

由 Matt Helsley 提交于 4月 29, 2008

The kernel implements readlink of /proc/pid/exe by getting the file from
the first executable VMA.  Then the path to the file is reconstructed and
reported as the result.

Because of the VMA walk the code is slightly different on nommu systems.
This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
walking the VMAs to find the first executable file-backed VMA we store a
reference to the exec'd file in the mm_struct.

That reference would prevent the filesystem holding the executable file
from being unmounted even after unmapping the VMAs.  So we track the number
of VM_EXECUTABLE VMAs and drop the new reference when the last one is
unmapped.  This avoids pinning the mounted filesystem.

[akpm@linux-foundation.org: improve comments]
[yamamoto@valinux.co.jp: fix dup_mmap]
Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: David Howells <dhowells@redhat.com>
Cc:"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NYAMAMOTO Takashi <yamamoto@valinux.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

925d1c40