提交 · 96c62d51cc5a3ea31ddef606544f014922591a64 · openeuler / raspberrypi-kernel

13 2月, 2007 2 次提交

[PATCH] mark struct inode_operations const 3 · c5ef1c42

由 Arjan van de Ven 提交于 2月 12, 2007

Many struct inode_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c5ef1c42

[PATCH] mark struct file_operations const 6 · 00977a59

由 Arjan van de Ven 提交于 2月 12, 2007

Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00977a59

12 2月, 2007 1 次提交

[PATCH] ifdef ->rchar, ->wchar, ->syscr, ->syscw from task_struct · 4b98d11b

由 Alexey Dobriyan 提交于 2月 10, 2007

They are fat: 4x8 bytes in task_struct.
They are uncoditionally updated in every fork, read, write and sendfile.
They are used only if you have some "extended acct fields feature".

And please, please, please, read(2) knows about bytes, not characters,
why it is called "rchar"?
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b98d11b

02 2月, 2007 1 次提交

[PATCH] procfs: Fix listing of /proc/NOT_A_TGID/task · 7d895244

由 Guillaume Chazarain 提交于 1月 31, 2007

Listing /proc/PID/task were PID is not a TGID should not result in
duplicated entries.

	[g ~]$ pidof thunderbird-bin
	2751
	[g ~]$ ls /proc/2751/task
	2751  2770  2771  2824  2826  2834  2835  2851  2853
	[g ~]$ ls /proc/2770/task
	2751  2770  2771  2824  2826  2834  2835  2851  2853
	2770  2771  2824  2826  2834  2835  2851  2853
	[g ~]$
Signed-off-by: NGuillaume Chazarain <guichaz@yahoo.fr>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7d895244

27 1月, 2007 1 次提交

[PATCH] Fix NULL ->nsproxy dereference in /proc/*/mounts · 863c4702

由 Alexey Dobriyan 提交于 1月 26, 2007

/proc/*/mounstats was fixed, all right, but...

To reproduce:

	while true; do
		find /proc -type f 2>/dev/null | xargs cat 1>/dev/null 2>/dev/null;
	done

BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000c
 printing eip:
c01754df
*pde = 00000000
Oops: 0000 [#28]
Modules linked in: af_packet ohci_hcd e1000 ehci_hcd uhci_hcd usbcore xfs
CPU:    0
EIP:    0060:[<c01754df>]    Not tainted VLI
EFLAGS: 00010286   (2.6.20-rc5 #1)
EIP is at mounts_open+0x1c/0xac
eax: 00000000   ebx: d5898ac0   ecx: d1d27b18   edx: d1d27a50
esi: e6083e10   edi: d3c87f38   ebp: d5898ac0   esp: d3c87ef0
ds: 007b   es: 007b   ss: 0068
Process cat (pid: 18071, ti=d3c86000 task=f7d5f070 task.ti=d3c86000)
Stack: d5898ac0 e6083e10 d3c87f38 c01754c3 c0147c91 c18c52c0 d343f314 d5898ac0
       00008000 d3c87f38 ffffff9c c0147e09 d5898ac0 00000000 00000000 c0147e4b
       00000000 d3c87f38 d343f314 c18c52c0 c015e53e 00001000 08051000 00000101
Call Trace:
 [<c01754c3>] mounts_open+0x0/0xac
 [<c0147c91>] __dentry_open+0xa1/0x18c
 [<c0147e09>] nameidata_to_filp+0x31/0x3a
 [<c0147e4b>] do_filp_open+0x39/0x40
 [<c015e53e>] seq_read+0x128/0x2aa
 [<c0147e8c>] do_sys_open+0x3a/0x6d
 [<c0147efa>] sys_open+0x1c/0x20
 [<c0102b76>] sysenter_past_esp+0x5f/0x85
 [<c02a0033>] unix_stream_recvmsg+0x3bf/0x4bf
 =======================
Code: 5d c3 89 d8 e8 06 e0 f9 ff eb bd 0f 0b eb fe 55 57 56 53 89 d5 8b 40 f0 31 d2 e8 02 c1 fa ff 89 c2 85 c0 74 5c 8b 80 48 04 00 00 <8b> 58 0c 85 db 74 02 ff 03 ff 4a 08 0f 94 c0 84 c0 75 74 85 db
EIP: [<c01754df>] mounts_open+0x1c/0xac SS:ESP 0068:d3c87ef0

A race with do_exit()'s call to exit_namespaces().
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

863c4702

11 12月, 2006 1 次提交

[PATCH] io-accounting: report in procfs · aba76fdb

由 Andrew Morton 提交于 12月 10, 2006

Add a simple /proc/pid/io to show the IO accounting fields.

Maybe this shouldn't be merged in mainline - the preferred reporting channel
is taskstats.  But given the poor state of our userspace support for
taskstats, this is useful for developer-testing, at least.  And it improves
the changes that the procps developers will wire it up into top(1).  Opinions
are sought.

The patch also wires up the existing IO-accounting fields.

It's a bit racy on 32-bit machines: if process A reads process B's
/proc/pid/io while process B is updating one of those 64-bit counters, process
A could see an intermediate result.

Cc: Jay Lan <jlan@sgi.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Chris Sturtivant <csturtiv@sgi.com>
Cc: Tony Ernst <tee@sgi.com>
Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
Cc: David Wright <daw@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

aba76fdb

09 12月, 2006 3 次提交

[PATCH] fault injection: process filtering for fault-injection capabilities · f4f154fd

由 Akinobu Mita 提交于 12月 08, 2006

This patch provides process filtering feature.
The process filter allows failing only permitted processes
by /proc/<pid>/make-it-fail

Please see the example that demostrates how to inject slab allocation
failures into module init/cleanup code
in Documentation/fault-injection/fault-injection.txt
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f4f154fd

[PATCH] rename struct namespace to struct mnt_namespace · 6b3286ed

由 Kirill Korotaev 提交于 12月 08, 2006

Rename 'struct namespace' to 'struct mnt_namespace' to avoid confusion with
other namespaces being developped for the containers : pid, uts, ipc, etc.
'namespace' variables and attributes are also renamed to 'mnt_ns'
Signed-off-by: NKirill Korotaev <dev@sw.ru>
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6b3286ed

[PATCH] proc: change uses of f_{dentry, vfsmnt} to use f_path · 2fddfeef

由 Josef "Jeff" Sipek 提交于 12月 08, 2006

Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the proc
filesystem code.
Signed-off-by: NJosef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2fddfeef

08 12月, 2006 2 次提交

[PATCH] make fs/proc/base.c:proc_pid_instantiate() static · 9711ef99

由 Adrian Bunk 提交于 12月 06, 2006

Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9711ef99

[PATCH] Allow user processes to raise their oom_adj value · 8fb4fc68

由 Guillem Jover 提交于 12月 06, 2006

Currently a user process cannot rise its own oom_adj value (i.e.
unprotecting itself from the OOM killer).  As this value is stored in the
task structure it gets inherited and the unprivileged childs will be unable
to rise it.

The EPERM will be handled by the generic proc fs layer, as only processes
with the proper caps or the owner of the process will be able to write to
the file.  So we allow only the processes with CAP_SYS_RESOURCE to lower
the value, otherwise it will get an EACCES which seems more appropriate
than EPERM.
Signed-off-by: NGuillem Jover <guillem.jover@nokia.com>
Acked-by: NAndrea Arcangeli <andrea@novell.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8fb4fc68

26 11月, 2006 1 次提交

[PATCH] mounstats NULL pointer dereference · 701e054e

由 Vasily Tarasov 提交于 11月 25, 2006

OpenVZ developers team has encountered the following problem in 2.6.19-rc6
kernel. After some seconds of running script

while [[ 1 ]]
do
	find  /proc -name mountstats | xargs cat
done

this Oops appears:

BUG: unable to handle kernel NULL pointer dereference at virtual address
00000010
 printing eip:
c01a6b70
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: xt_length ipt_ttl xt_tcpmss ipt_TCPMSS iptable_mangle
iptable_filter xt_multiport xt_limit ipt_tos ipt_REJECT ip_tables x_tables
parport_pc lp parport sunrpc af_packet thermal processor fan button battery
asus_acpi ac ohci_hcd ehci_hcd usbcore i2c_nforce2 i2c_core tg3 floppy
pata_amd
ide_cd cdrom sata_nv libata
CPU:    1
EIP:    0060:[<c01a6b70>]    Not tainted VLI
EFLAGS: 00010246   (2.6.19-rc6 #2)
EIP is at mountstats_open+0x70/0xf0
eax: 00000000   ebx: e6247030   ecx: e62470f8   edx: 00000000
esi: 00000000   edi: c01a6b00   ebp: c33b83c0   esp: f4105eb4
ds: 007b   es: 007b   ss: 0068
Process cat (pid: 6044, ti=f4105000 task=f4104a70 task.ti=f4105000)
Stack: c33b83c0 c04ee940 f46a4a80 c33b83c0 e4df31b4 c01a6b00 f4105000 c0169231
       e4df31b4 c33b83c0 c33b83c0 f4105f20 00000003 f4105000 c0169445 f2503cf0
       f7f8c4c0 00008000 c33b83c0 00000000 00008000 c0169350 f4105f20 00008000
Call Trace:
 [<c01a6b00>] mountstats_open+0x0/0xf0
 [<c0169231>] __dentry_open+0x181/0x250
 [<c0169445>] nameidata_to_filp+0x35/0x50
 [<c0169350>] do_filp_open+0x50/0x60
 [<c01873d6>] seq_read+0xc6/0x300
 [<c0169511>] get_unused_fd+0x31/0xc0
 [<c01696d3>] do_sys_open+0x63/0x110
 [<c01697a7>] sys_open+0x27/0x30
 [<c01030bd>] sysenter_past_esp+0x56/0x79
 =======================
Code: 45 74 8b 54 24 20 89 44 24 08 8b 42 f0 31 d2 e8 47 cb f8 ff 85 c0 89 c3
74 51 8d 80 a0 04 00 00 e8 46 06 2c 00 8b 83 48 04 00 00 <8b> 78 10 85 ff 74
03
f0 ff 07 b0 01 86 83 a0 04 00 00 f0 ff 4b
EIP: [<c01a6b70>] mountstats_open+0x70/0xf0 SS:ESP 0068:f4105eb4

The problem is that task->nsproxy can be equal NULL for some time during
task exit. This patch fixes the BUG.
Signed-off-by: NVasily Tarasov <vtaras@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

701e054e

21 10月, 2006 1 次提交

[PATCH] OOM killer meets userspace headers · 8ac773b4

由 Alexey Dobriyan 提交于 10月 19, 2006

Despite mm.h is not being exported header, it does contain one thing
which is part of userspace ABI -- value disabling OOM killer for given
process. So,
a) create and export include/linux/oom.h
b) move OOM_DISABLE define there.
c) turn bounding values of /proc/$PID/oom_adj into defines and export
   them too.

Note: mass __KERNEL__ removal will be done later.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8ac773b4

17 10月, 2006 1 次提交

[PATCH] PROC_NUMBUF is wrong · 0187f879

由 Andrew Morton 提交于 10月 17, 2006

Actually, the decimal representation of a 32-bit signed number can take 12
bytes, including the \0.

And then some code adds a \n as well, so let's give it 13 bytes.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0187f879

02 10月, 2006 14 次提交

[PATCH] introduce get_task_pid() to fix unsafe get_pid() · 1a657f78

由 Oleg Nesterov 提交于 10月 02, 2006

proc_pid_make_inode:

	ei->pid = get_pid(task_pid(task));

I think this is not safe.  get_pid() can be preempted after checking "pid
!= NULL".  Then the task exits, does detach_pid(), and RCU frees the pid.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1a657f78

[PATCH] proc: comment what proc_fill_cache does · 1c0d04c9

由 Eric W. Biederman 提交于 10月 02, 2006

Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1c0d04c9

[PATCH] proc: remove the useless SMP-safe comments from /proc · 5e61feaf

由 Eric W. Biederman 提交于 10月 02, 2006

Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5e61feaf

[PATCH] proc: remove trailing blank entry from pid_entry arrays · 7bcd6b0e

由 Eric W. Biederman 提交于 10月 02, 2006

It was pointed out that since I am taking ARRAY_SIZE anyway the trailing empty
entry is silly and just wastes space.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7bcd6b0e

[PATCH] proc: properly compute TGID_OFFSET · 8e95bd93

由 Eric W. Biederman 提交于 10月 02, 2006

The value doesn't change but this ensures I will have the proper value when
other files are added to proc_base_stuff.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8e95bd93

[PATCH] proc: Use pid_task instead of open coding it · 7fbaac00

由 Eric W. Biederman 提交于 10月 02, 2006

Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7fbaac00

[PATCH] proc: Merge proc_tid_attr and proc_tgid_attr · 72d9dcfc

由 Eric W. Biederman 提交于 10月 02, 2006

The implementation is exactly the same and there is currently nothing to
distinguish proc_tid_attr, and proc_tgid_attr.  So it is pointless to have two
separate implementations.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

72d9dcfc

[PATCH] proc: Remove the hard coded inode numbers · 61a28784

由 Eric W. Biederman 提交于 10月 02, 2006

The hard coded inode numbers in proc currently limit its maintainability,
its flexibility, and what can be done with the rest of system.  /proc limits
pid-max to 32768 on 32 bit systems it limits fd-max to 32768 on all systems,
and placing the pid in the inode number really gets in the way of implementing
subdirectories of per process information.

Ever since people started adding to the middle of the file type enumeration we
haven't been maintaing the historical inode numbers, all we have really
succeeded in doing is keeping the pid in the proc inode number.  The pid is
already available in the directory name so no information is lost removing it
from the inode number.

So if something in user space cares if we remove the inode number from the
/proc inode it is almost certainly broken.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

61a28784

[PATCH] proc: Factor out an instantiate method from every lookup method · 444ceed8

由 Eric W. Biederman 提交于 10月 02, 2006

To remove the hard coded proc inode numbers it is necessary to be able to
create the proc inodes during readdir.  The instantiate methods are the subset
of lookup that is needed to accomplish that.

This first step just splits the lookup methods into 2 functions.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

444ceed8

[PATCH] proc: Make the generation of the self symlink table driven · 801199ce

由 Eric W. Biederman 提交于 10月 02, 2006

This patch generalizes the concept of files in /proc that are related to
processes but live in the root directory of /proc

Ideally this would reuse infrastructure from the rest of the process specific
parts of proc but unfortunately security_task_to_inode must not be called on
files that are not strictly per process. security_task_to_inode really needs
to be reexamined as the security label can change in important places that we
are not currently catching, but I'm not certain that simplifies this problem.

By at least matching the structure of the rest of proc we get more idiom reuse
and it becomes easier to spot problems in the way things are put together.

Later things like /proc/mounts are likely to be moved into proc_base as well.
If union mounts are ever supported we may be able to make /proc a union mount,
and properly split it into 2 filesystems.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

801199ce

[PATCH] namespaces: incorporate fs namespace into nsproxy · 1651e14e

由 Serge E. Hallyn 提交于 10月 02, 2006

This moves the mount namespace into the nsproxy.  The mount namespace count
now refers to the number of nsproxies point to it, rather than the number of
tasks.  As a result, the unshare_namespace() function in kernel/fork.c no
longer checks whether it is being shared.
Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Andrey Savochkin <saw@sw.ru>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1651e14e

[PATCH] proc: modify proc_pident_lookup to be completely table driven · 20cdc894

由 Eric W. Biederman 提交于 10月 02, 2006

Currently proc_pident_lookup gets the names and types from a table and then
has a huge switch statement to get the inode and file operations it needs.
That is silly and is becoming increasingly hard to maintain so I just put all
of the information in the table.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

20cdc894

[PATCH] proc: reorder the functions in base.c · 28a6d671

由 Eric W. Biederman 提交于 10月 02, 2006

There were enough changes in my last round of cleaning up proc I had to break
up the patch series into smaller chunks, and my last chunk never got resent.

This patchset gives proc dynamic inode numbers (the static inode numbers were
a pain to maintain and prevent all kinds of things), and removes the horrible
switch statements that had to be kept in sync with everything else.  Being
fully table driver takes us 90% of the way of being able to register new
process specific attributes in proc.

This patch:

Group the functions by what they implement instead of by type of operation.
As it existed base.c was quickly approaching the point where it could not be
followed.

No functionality or code changes asside from adding/removing forward
declartions are implemented in this patch.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

28a6d671

[PATCH] proc: readdir race fix (take 3) · 0804ef4b

由 Eric W. Biederman 提交于 10月 02, 2006

The problem: An opendir, readdir, closedir sequence can fail to report
process ids that are continually in use throughout the sequence of system
calls.  For this race to trigger the process that proc_pid_readdir stops at
must exit before readdir is called again.

This can cause ps to fail to report processes, and it is in violation of
posix guarantees and normal application expectations with respect to
readdir.

Currently there is no way to work around this problem in user space short
of providing a gargantuan buffer to user space so the directory read all
happens in on system call.

This patch implements the normal directory semantics for proc, that
guarantee that a directory entry that is neither created nor destroyed
while reading the directory entry will be returned.  For directory that are
either created or destroyed during the readdir you may or may not see them.
 Furthermore you may seek to a directory offset you have previously seen.

These are the guarantee that ext[23] provides and that posix requires, and
more importantly that user space expects.  Plus it is a simple semantic to
implement reliable service.  It is just a matter of calling readdir a
second time if you are wondering if something new has show up.

These better semantics are implemented by scanning through the pids in
numerical order and by making the file offset a pid plus a fixed offset.

The pid scan happens on the pid bitmap, which when you look at it is
remarkably efficient for a brute force algorithm.  Given that a typical
cache line is 64 bytes and thus covers space for 64*8 == 200 pids.  There
are only 40 cache lines for the entire 32K pid space.  A typical system
will have 100 pids or more so this is actually fewer cache lines we have to
look at to scan a linked list, and the worst case of having to scan the
entire pid bitmap is pretty reasonable.

If we need something more efficient we can go to a more efficient data
structure for indexing the pids, but for now what we have should be
sufficient.

In addition this takes no additional locks and is actually less code than
what we are doing now.

Also another very subtle bug in this area has been fixed.  It is possible
to catch a task in the middle of de_thread where a thread is assuming the
thread of it's thread group leader.  This patch carefully handles that case
so if we hit it we don't fail to return the pid, that is undergoing the
de_thread dance.

Thanks to KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> for
providing the first fix, pointing this out and working on it.

[oleg@tv-sign.ru: fix it]
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0804ef4b

30 9月, 2006 1 次提交

[PATCH] fix mem_write() return value · f7ca54f4

由 Frederik Deweerdt 提交于 9月 29, 2006

At the beginning of the routine, "copied" is set to 0, but it is no good
because in lines 805 and 812 it is set to other values.  Finally, the
routine returns as if it copied 12 (=ENOMEM) bytes less than it actually
did.
Signed-off-by: NFrederik Deweerdt <frederik.deweerdt@gmail.com>
Acked-by: NEric Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f7ca54f4

16 7月, 2006 1 次提交

Don't allow chmod() on the /proc/<pid>/ files · 6d76fa58

由 Linus Torvalds 提交于 7月 15, 2006

This just turns off chmod() on the /proc/<pid>/ files, since there is no
good reason to allow it, and had we disallowed it originally, the nasty
/proc race exploit wouldn't have been possible.

The other patches already fixed the problem chmod() could cause, so this
is really just some final mop-up..

This particular version is based off a patch by Eugene and Marcel which
had much better naming than my original equivalent one.
Signed-off-by: NEugene Teo <eteo@redhat.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6d76fa58

15 7月, 2006 2 次提交

Relax /proc fix a bit · 9ee8ab9f

由 Linus Torvalds 提交于 7月 14, 2006

Clearign all of i_mode was a bit draconian. We only really care about
S_ISUID/ISGID, after all.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9ee8ab9f

Fix nasty /proc vulnerability · 18b0bbd8

由 Linus Torvalds 提交于 7月 14, 2006

We have a bad interaction with both the kernel and user space being able
to change some of the /proc file status.  This fixes the most obvious
part of it, but I expect we'll also make it harder for users to modify
even their "own" files in /proc.
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

18b0bbd8

01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

27 6月, 2006 7 次提交

[PATCH] SELinux: Add sockcreate node to procattr API · 42c3e03e

由 Eric Paris 提交于 6月 26, 2006

Below is a patch to add a new /proc/self/attr/sockcreate A process may write a
context into this interface and all subsequent sockets created will be labeled
with that context. This is the same idea as the fscreate interface where a
process can specify the label of a file about to be created. At this time one
envisioned user of this will be xinetd. It will be able to better label
sockets for the actual services. At this time all sockets take the label of
the creating process, so all xinitd sockets would just be labeled the same.

I tested this by creating a tcp sender and listener. The sender was able to
write to this new proc file and then create sockets with the specified label.
I am able to be sure the new label was used since the avc denial messages
kicked out by the kernel included both the new security permission
setsockcreate and all the socket denials were for the new label, not the label
of the running process.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

42c3e03e

[PATCH] cleanup next_tid() · c1df7fb8

由 Oleg Nesterov 提交于 6月 26, 2006

Try to make next_tid() a bit more readable and deletes unnecessary
"pid_alive(pos)" check.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c1df7fb8

[PATCH] simplify/fix first_tid() · a872ff0c

由 Oleg Nesterov 提交于 6月 26, 2006

first_tid:

	/* If nr exceeds the number of threads there is nothing todo */
	if (nr) {
		if (nr >= get_nr_threads(leader))
			goto done;
	}

This is not reliable: sub-threads can exit after this check, so the
'for' loop below can overlap and proc_task_readdir() can return an
already filldir'ed dirents.

	for (; pos && pid_alive(pos); pos = next_thread(pos)) {
		if (--nr > 0)
			continue;

Off-by-one error, will return 'leader' when nr == 1.

This patch tries to fix these problems and simplify the code.
Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a872ff0c

[PATCH] proc: Remove tasklist_lock from proc_task_readdir. · cc288738

由 Eric W. Biederman 提交于 6月 26, 2006

This is just like my previous removal of tasklist_lock from first_tgid, and
next_tgid.  It simply had to wait until it was rcu safe to walk the thread
list.

This should be the last instance of the tasklist_lock in proc.  So user
processes should not be able to influence the tasklist lock hold times.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

cc288738

[PATCH] proc: Cleanup proc_fd_access_allowed · df26c40e

由 Eric W. Biederman 提交于 6月 26, 2006

In process of getting proc_fd_access_allowed to work it has developed a few
warts.  In particular the special case that always allows introspection and
the special case to allow inspection of kernel threads.

The special case for introspection is needed for /proc/self/mem.

The special case for kernel threads really should be overridable
by security modules.

So consolidate these checks into ptrace.c:may_attach().

The check to always allow introspection is trivial.

The check to allow access to kernel threads, and zombies is a little
trickier.  mem_read and mem_write already verify an mm exists so it isn't
needed twice.  proc_fd_access_allowed only doesn't want a check to verify
task->mm exits, s it prevents all access to kernel threads.  So just move
the task->mm check into ptrace_attach where it is needed for practical
reasons.

I did a quick audit and none of the security modules in the kernel seem to
care if they are passed a task without an mm into security_ptrace.  So the
above move should be safe and it allows security modules to come up with
more restrictive policy.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

df26c40e

[PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks · 778c1144

由 Eric W. Biederman 提交于 6月 26, 2006

Since 2.2 we have been doing a chroot check to see if it is appropriate to
return a read or follow one of these magic symlinks. The chroot check was
asking a question about the visibility of files to the calling process and
it was actually checking the destination process, and not the files
themselves. That test was clearly bogus.

In my first pass through I simply fixed the test to check the visibility of
the files themselves. That naive approach to fixing the permissions was
too strict and resulted in cases where a task could not even see all of
it's file descriptors.

What has disturbed me about relaxing this check is that file descriptors
are per-process private things, and they are occasionaly used a user space
capability tokens. Looking a little farther into the symlink path on /proc
I did find userid checks and a check for capability (CAP_DAC_OVERRIDE) so
there were permissions checking this.

But I was still concerned about privacy. Besides /proc there is only one
other way to find out this kind of information, and that is ptrace. ptrace
has been around for a long time and it has a well established security
model.

So after thinking about it I finally realized that the permission checks
that make sense are the permission checks applied to ptrace_attach. The
checks are simple per process, and won't cause nasty surprises for people
coming from less capable unices.

Unfortunately there is one case that the current ptrace_attach test does
not cover: Zombies and kernel threads. Single stepping those kinds of
processes is impossible. Being able to see which file descriptors are open
on these tasks is important to lsof, fuser and friends. So for these
special processes I made the rule you can't find out unless you have
CAP_SYS_PTRACE.

These proc permission checks should now conform to the principle of least
surprise. As well as using much less code to implement :)
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

778c1144

[PATCH] proc: optimize proc_check_dentry_visible · 5b0c1dd3

由 Eric W. Biederman 提交于 6月 26, 2006

The code doesn't need to sleep to when making this check so I can just do the
comparison and not worry about the reference counts.

TODO: While looking at this I realized that my original cleanup did not push
the permission check far enough down into the stack.  The call of
proc_check_dentry_visible needs to move out of the generic proc
readlink/follow link code and into the individual get_link instances.
Otherwise the shared resources checks are not quite correct (shared
files_struct does not require a shared fs_struct), and there are races with
unshare.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5b0c1dd3