提交 · 3eb07c8c8adb6f0572baba844ba2d9e501654316 · OpenHarmony / kernel_linux

20 10月, 2007 10 次提交

pid namespaces: initialize the namespace's proc_mnt · 6f4e6433

由 Pavel Emelyanov 提交于 10月 18, 2007

The namespace's proc_mnt must be kern_mount-ed to make this pointer always
valid, independently of whether the user space mounted the proc or not.  This
solves raced in proc_flush_task, etc.  with the proc_mnt switching from NULL
to not-NULL.

The initialization is done after the init's pid is created and hashed to make
proc_get_sb() finr it and get for root inode.

Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the
superblock holds the namespace we must explicitly break this circle to destroy
all the stuff.  This is done after the init of the namespace dies.  Running a
few steps forward - when init exits it will kill all its children, so no
proc_mnt will be needed after its death.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6f4e6433

pid namespaces: make proc_flush_task() actually from entries from multiple namespaces · 130f77ec

由 Pavel Emelyanov 提交于 10月 18, 2007

This means that proc_flush_task_mnt() is to be called for many proc mounts and
with different ids, depending on the namespace this pid is to be flushed from.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

130f77ec

pid namespaces: make proc have multiple superblocks - one for each namespace · 07543f5c

由 Pavel Emelyanov 提交于 10月 18, 2007

Each pid namespace have to be visible through its own proc mount.  Thus we
need to have per-namespace proc trees with their own superblocks.

We cannot easily show different pid namespace via one global proc tree, since
each pid refers to different tasks in different namespaces.  E.g.  pid 1
refers to the init task in the initial namespace and to some other task when
seeing from another namespace.  Moreover - pid, exisintg in one namespace may
not exist in the other.

This approach has one move advantage is that the tasks from the init namespace
can see what tasks live in another namespace by reading entries from another
proc tree.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

07543f5c

pid namespaces: helpers to find the task by its numerical ids · 198fe21b

由 Pavel Emelyanov 提交于 10月 18, 2007

When searching the task by numerical id on may need to find it using global
pid (as it is done now in kernel) or by its virtual id, e.g.  when sending a
signal to a task from one namespace the sender will specify the task's virtual
id and we should find the task by this value.

[akpm@linux-foundation.org: fix gfs2 linkage]
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

198fe21b

pid namespaces: prepare proc_flust_task() to flush entries from multiple proc trees · 60347f67

由 Pavel Emelyanov 提交于 10月 18, 2007

The first part is trivial - we just make the proc_flush_task() to operate on
arbitrary vfsmount with arbitrary ids and pass the pid and global proc_mnt to
it.

The other change is more tricky: I moved the proc_flush_task() call in
release_task() higher to address the following problem.

When flushing task from many proc trees we need to know the set of ids (not
just one pid) to find the dentries' names to flush.  Thus we need to pass the
task's pid to proc_flush_task() as struct pid is the only object that can
provide all the pid numbers.  But after __exit_signal() task has detached all
his pids and this information is lost.

This creates a tiny gap for proc_pid_lookup() to bring some dentries back to
tree and keep them in hash (since pids are still alive before __exit_signal())
till the next shrink, but since proc_flush_task() does not provide a 100%
guarantee that the dentries will be flushed, this is OK to do so.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60347f67

Make access to task's nsproxy lighter · cf7b708c

由 Pavel Emelyanov 提交于 10月 18, 2007

When someone wants to deal with some other taks's namespaces it has to lock
the task and then to get the desired namespace if the one exists.  This is
slow on read-only paths and may be impossible in some cases.

E.g.  Oleg recently noticed a race between unshare() and the (sent for
review in cgroups) pid namespaces - when the task notifies the parent it
has to know the parent's namespace, but taking the task_lock() is
impossible there - the code is under write locked tasklist lock.

On the other hand switching the namespace on task (daemonize) and releasing
the namespace (after the last task exit) is rather rare operation and we
can sacrifice its speed to solve the issues above.

The access to other task namespaces is proposed to be performed
like this:

     rcu_read_lock();
     nsproxy = task_nsproxy(tsk);
     if (nsproxy != NULL) {
             / *
               * work with the namespaces here
               * e.g. get the reference on one of them
               * /
     } / *
         * NULL task_nsproxy() means that this task is
         * almost dead (zombie)
         * /
     rcu_read_unlock();

This patch has passed the review by Eric and Oleg :) and,
of course, tested.

[clg@fr.ibm.com: fix unshare()]
[ebiederm@xmission.com: Update get_net_ns_by_pid]
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf7b708c

pid namespaces: define and use task_active_pid_ns() wrapper · 2894d650

由 Sukadev Bhattiprolu 提交于 10月 18, 2007

With multiple pid namespaces, a process is known by some pid_t in every
ancestor pid namespace.  Every time the process forks, the child process also
gets a pid_t in every ancestor pid namespace.

While a process is visible in >=1 pid namespaces, it can see pid_t's in only
one pid namespace.  We call this pid namespace it's "active pid namespace",
and it is always the youngest pid namespace in which the process is known.

This patch defines and uses a wrapper to find the active pid namespace of a
process.  The implementation of the wrapper will be changed in when support
for multiple pid namespaces are added.

Changelog:
	2.6.22-rc4-mm2-pidns1:
	- [Pavel Emelianov, Alexey Dobriyan] Back out the change to use
	  task_active_pid_ns() in child_reaper() since task->nsproxy
	  can be NULL during task exit (so child_reaper() continues to
	  use init_pid_ns).

	  to implement child_reaper() since init_pid_ns.child_reaper to
	  implement child_reaper() since tsk->nsproxy can be NULL during exit.

	2.6.21-rc6-mm1:
	- Rename task_pid_ns() to task_active_pid_ns() to reflect that a
	  process can have multiple pid namespaces.
Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: NPavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2894d650

pid namespaces: round up the API · a47afb0f

由 Pavel Emelianov 提交于 10月 18, 2007

The set of functions process_session, task_session, process_group and
task_pgrp is confusing, as the names can be mixed with each other when looking
at the code for a long time.

The proposals are to
* equip the functions that return the integer with _nr suffix to
  represent that fact,
* and to make all functions work with task (not process) by making
  the common prefix of the same name.

For monotony the routines signal_session() and set_signal_session() are
replaced with task_session_nr() and set_task_session(), especially since they
are only used with the explicit task->signal dereference.
Signed-off-by: NPavel Emelianov <xemul@openvz.org>
Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a47afb0f

Task Control Groups: make cpusets a client of cgroups · 8793d854

由 Paul Menage 提交于 10月 18, 2007

Remove the filesystem support logic from the cpusets system and makes cpusets
a cgroup subsystem

The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get
passed through to the cgroup filesystem with the appropriate options to
emulate the old cpuset filesystem behaviour.
Signed-off-by: NPaul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8793d854

Task Control Groups: add procfs interface · a424316c

由 Paul Menage 提交于 10月 18, 2007

Add:

/proc/cgroups - general system info

/proc/*/cgroup - per-task cgroup membership info

[a.p.zijlstra@chello.nl: cgroups: bdi init hooks]
Signed-off-by: NPaul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a424316c

17 10月, 2007 8 次提交

Don't truncate /proc/PID/environ at 4096 characters · 315e28c8

由 James Pearson 提交于 10月 16, 2007

/proc/PID/environ currently truncates at 4096 characters, patch based on
the /proc/PID/mem code.
Signed-off-by: NJames Pearson <james-p@moving-picture.com>
Cc: Anton Arapov <aarapov@redhat.com>
Cc: Jan Engelhardt <jengelh@computergmbh.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

315e28c8

Fix f_version type: should be u64 instead of unsigned long · 2b47c361

由 Mathieu Desnoyers 提交于 10月 16, 2007

Fix f_version type: should be u64 instead of long

There is a type inconsistency between struct inode i_version and struct file
f_version.

fs.h:

struct inode
  u64                     i_version;

and

struct file
  unsigned long           f_version;

Users do:

fs/ext3/dir.c:

if (filp->f_version != inode->i_version) {

So why isn't f_version a u64 ? It becomes a problem if versions gets
higher than 2^32 and we are on an architecture where longs are 32 bits.

This patch changes the f_version type to u64, and updates the users accordingly.

It applies to 2.6.23-rc2-mm2.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Martin Bligh <mbligh@google.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: <linux-ext4@vger.kernel.org>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b47c361

report the per-irq statistics on all arches · f13ef775

由 Ravikiran G Thirumalai 提交于 10月 16, 2007

Commit 4004c69a avoids too many remote cpu
references while reporting per-irq stats.  Since we will not have the same
performance penalty of bringing in remote cpu cachelines while reporting
per-irq stats anymore, we can now afford to be consistent and report this
statistic on all arches, all configs.

akpm: affects ia64, alpha and ppc64, mainly.

Kiran earlier said:

Read to /proc/stat takes:
Plain: 	2.622832
With speedup patch: 0.013194
With the per-irq stats commented out: 0.008124

So the performance problems which originally caused those architectures to
disable this statistic should now be fixed up.
Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f13ef775

fs/proc/mmu.c: headers butchery · 87400c04

由 Alexey Dobriyan 提交于 10月 16, 2007

fs/proc/mmu.c consists of only one function which uses only:
1) struct vmalloc_info *
2) struct vm_struct *
3) struct vmalloc_info
4) vmlist
5) VMALLOC_TOTAL, VMALLOC_START, VMALLOC_END
6) read_lock, read_unlock
7) vmlist_lock
8) struct vm_struct

This gives us linux/spinlock.h, asm/pgtable.h, "internal.h", linux/vmalloc.h.
asm/pgtable.h uses PKMAP_BASE on i386, for which asm/highmem.h is needed.
But, linux/highmem.h is actually used to make it compile everywhere.
I'll deal later with this particular i386 surprise.

Cross-compile tested on many archs and configs.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87400c04

SLAB_PANIC more (proc, posix-timers, shmem) · 040b5c6f

由 Alexey Dobriyan 提交于 10月 16, 2007

These aren't modular, so SLAB_PANIC is OK.
Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

040b5c6f

Slab API: remove useless ctor parameter and reorder parameters · 4ba9b9d0

由 Christoph Lameter 提交于 10月 16, 2007

Slab constructors currently have a flags parameter that is never used.  And
the order of the arguments is opposite to other slab functions.  The object
pointer is placed before the kmem_cache pointer.

Convert

        ctor(void *object, struct kmem_cache *s, unsigned long flags)

to

        ctor(struct kmem_cache *s, void *object)

throughout the kernel

[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ba9b9d0

Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo · 467c996c

由 Mel Gorman 提交于 10月 16, 2007

This patch provides fragmentation avoidance statistics via /proc/pagetypeinfo.
The information is collected only on request so there is no runtime overhead.
The statistics are in three parts:

The first part prints information on the size of blocks that pages are
being grouped on and looks like

Page block order: 10
Pages per block: 1024

The second part is a more detailed version of /proc/buddyinfo and looks like

Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
Node 0, zone DMA, type Unmovable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Reclaimable 1 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 0 0
Node 0, zone DMA, type Reserve 0 4 4 0 0 0 0 1 0 1 0
Node 0, zone Normal, type Unmovable 111 8 4 4 2 3 1 0 0 0 0
Node 0, zone Normal, type Reclaimable 293 89 8 0 0 0 0 0 0 0 0
Node 0, zone Normal, type Movable 1 6 13 9 7 6 3 0 0 0 0
Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 4

The third part looks like

Number of blocks type Unmovable Reclaimable Movable Reserve
Node 0, zone DMA 0 1 2 1
Node 0, zone Normal 3 17 94 4

To walk the zones within a node with interrupts disabled, walk_zones_in_node()
is introduced and shared between /proc/buddyinfo, /proc/zoneinfo and
/proc/pagetypeinfo to reduce code duplication. It seems specific to what
vmstat.c requires but could be broken out as a general utility function in
mmzone.c if there were other other potential users.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NAndy Whitcroft <apw@shadowen.org>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

467c996c

Group short-lived and reclaimable kernel allocations · e12ba74d

由 Mel Gorman 提交于 10月 16, 2007

This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations. When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved. i.e. they can be migrated by deleting
them and re-reading the information from elsewhere.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e12ba74d

15 10月, 2007 3 次提交

sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields · 9ac52315

由 Laurent Vivier 提交于 10月 15, 2007

like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct.

Modify /proc/<pid>/stat to display these new fields.
Signed-off-by: NLaurent Vivier <Laurent.Vivier@bull.net>
Acked-by: NAvi Kivity <avi@qumranet.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9ac52315

sched: guest CPU accounting: add guest-CPU /proc/stat field · 5e84cfde

由 Laurent Vivier 提交于 10月 15, 2007

as recent CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time
used by the CPU to run virtual CPU. Modify /proc/stat to display this
new field.
Signed-off-by: NLaurent Vivier <Laurent.Vivier@bull.net>
Acked-by: NAvi Kivity <avi@qumranet.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5e84cfde

sched: clean up schedstats, cnt -> count · 2d72376b

由 Ingo Molnar 提交于 10月 15, 2007

rename all 'cnt' fields and variables to the less yucky 'count' name.

yuckage noticed by Andrew Morton.

no change in code, other than the /proc/sched_debug bkl_count string got
a bit larger:

   text    data     bss     dec     hex filename
  38236    3506      24   41766    a326 sched.o.before
  38240    3506      24   41770    a32a sched.o.after
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>

2d72376b

11 10月, 2007 5 次提交

[NETNS]: Move some code into __init section when CONFIG_NET_NS=n · 4665079c

由 Pavel Emelyanov 提交于 10月 08, 2007

With the net namespaces many code leaved the __init section,
thus making the kernel occupy more memory than it did before.
Since we have a config option that prohibits the namespace
creation, the functions that initialize/finalize some netns
stuff are simply not needed and can be freed after the boot.

Currently, this is almost not noticeable, since few calls
are no longer in __init, but when the namespaces will be
merged it will be possible to free more code. I propose to
use the __net_init, __net_exit and __net_initdata "attributes"
for functions/variables that are not used if the CONFIG_NET_NS
is not set to save more space in memory.

The exiting functions cannot just reside in the __exit section,
as noticed by David, since the init section will have
references on it and the compilation will fail due to modpost
checks. These references can exist, since the init namespace
never dies and the exit callbacks are never called. So I
introduce the __exit_refok attribute just like it is already
done with the __init_refok.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4665079c

[NET]: Fix race when opening a proc file while a network namespace is exiting. · 077130c0

由 Eric W. Biederman 提交于 9月 13, 2007

The problem:  proc_net files remember which network namespace the are
against but do not remember hold a reference count (as that would pin
the network namespace).   So we currently have a small window where
the reference count on a network namespace may be incremented when opening
a /proc file when it has already gone to zero.

To fix this introduce maybe_get_net and get_proc_net.

maybe_get_net increments the network namespace reference count only if it is
greater then zero, ensuring we don't increment a reference count after it
has gone to zero.

get_proc_net handles all of the magic to go from a proc inode to the network
namespace instance and call maybe_get_net on it.

PROC_NET the old accessor is removed so that we don't get confused and use
the wrong helper function.

Then I fix up the callers to use get_proc_net and handle the case case
where get_proc_net returns NULL.  In that case I return -ENXIO because
effectively the network namespace has already gone away so the files
we are trying to access don't exist anymore.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NPaul E. McKenney <paulmck@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

077130c0

[NETNS]: Fix export symbols. · 36ac3135

由 Daniel Lezcano 提交于 9月 12, 2007

Add the appropriate EXPORT_SYMBOLS for proc_net_create,
proc_net_fops_create and proc_net_remove to fix errors when
compiling allmodconfig
Signed-off-by: NMark Nelson <markn@au1.ibm.com>
Acked-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36ac3135

D
[NET]: Fix missed addition of fs/proc/proc_net.c · 3c12afe7
由 David S. Miller 提交于 9月 12, 2007
```
My bad.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
3c12afe7

[NET]: Make /proc/net per network namespace · 457c4cbc

由 Eric W. Biederman 提交于 9月 12, 2007

This patch makes /proc/net per network namespace. It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

457c4cbc

10 10月, 2007 1 次提交

Rework /proc/locks via seq_files and seq_list helpers · 7f8ada98

由 Pavel Emelyanov 提交于 10月 01, 2007

Currently /proc/locks is shown with a proc_read function, but its behavior
is rather complex as it has to manually handle current offset and buffer
length.  On the other hand, files that show objects from lists can be
easily reimplemented using the sequential files and the seq_list_XXX()
helpers.

This saves (as usually) 16 lines of code and more than 200 from
the .text section.

[akpm@linux-foundation.org: no externs in C]
[akpm@linux-foundation.org: warning fixes]
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

7f8ada98

12 9月, 2007 1 次提交

Fix select on /proc files without ->poll · dd23aae4

由 Alexey Dobriyan 提交于 9月 11, 2007

Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit
786d7e16 aka "Fix rmmod/read/write races
in /proc entries" broke SBCL + SLIME combo.

The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
->poll handler.  The new code makes ->poll always there and returns 0 by
default, which is not correct.  Return DEFAULT_POLLMASK instead.

Steps to reproduce:

	install emacs, SBCL, SLIME
	emacs
	M-x slime	in *inferior-lisp* buffer
	[watch it doing "Connecting to Swank on port X.."]

Please, apply before 2.6.23.

P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd23aae4

23 8月, 2007 1 次提交

sched: accounting regression since rc1 · efe567fc

由 Christian Borntraeger 提交于 8月 23, 2007

Fix the accounting regression for CONFIG_VIRT_CPU_ACCOUNTING.  It
reverts parts of commit b27f03d4 by
converting fs/proc/array.c back to cputime_t.  The new functions
task_utime and task_stime now return cputime_t instead of clock_t.  If
CONFIG_VIRT_CPU_ACCOUTING is set, task->utime and task->stime are
returned directly instead of using sum_exec_runtime.

Patch is tested on s390x with and without VIRT_CPU_ACCOUTING as well as
on i386.

[ mingo@elte.hu: cleanups, comments. ]
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

efe567fc

01 8月, 2007 1 次提交

Fix leaks on /proc/{*/sched,sched_debug,timer_list,timer_stats} · 5ea473a1

由 Alexey Dobriyan 提交于 7月 31, 2007

On every open/close one struct seq_operations leaks.
Kudos to /proc/slab_allocators.
Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ea473a1

29 7月, 2007 1 次提交

Fix procfs compat_ioctl regression · 778f3dd5

由 David Miller 提交于 7月 27, 2007

It is important to only provide the compat_ioctl method
if the downstream de->proc_fops does too, otherwise this
utterly confuses the logic in fs/compat_ioctl.c and we
end up doing the wrong thing.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NAlexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

778f3dd5

22 7月, 2007 1 次提交

x86_64: Avoid too many remote cpu references due to /proc/stat · c3508f8f

由 Ravikiran G Thirumalai 提交于 7月 21, 2007

Too many remote cpu references due to /proc/stat.

On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem.
On every call to kstat_irqs, the process brings in per-cpu data from all
online cpus.  Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS
results in (256+32*63) * 63 remote cpu references on a 64 cpu config.
/proc/stat is parsed by common commands like top, who etc, causing lots
of cacheline transfers

This statistic seems useless.  Other 'big iron' arches disable this.

AK: changed to remove for all SMP setups
AK: add comment
Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c3508f8f

20 7月, 2007 4 次提交

mm: Remove slab destructors from kmem_cache_create(). · 20c2df83

由 Paul Mundt 提交于 7月 20, 2007

Slab destructors were no longer supported after Christoph's
c59def9f change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

20c2df83

coredump masking: add an interface for core dump filter · 3cb4a0bb

由 Kawai, Hidehiro 提交于 7月 19, 2007

This patch adds an interface to set/reset flags which determines each memory
segment should be dumped or not when a core file is generated.

/proc/<pid>/coredump_filter file is provided to access the flags.  You can
change the flag status for a particular process by writing to or reading from
the file.

The flag status is inherited to the child process when it is created.
Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3cb4a0bb

coredump masking: reimplementation of dumpable using two flags · 6c5d5238

由 Kawai, Hidehiro 提交于 7月 19, 2007

This patch changes mm_struct.dumpable to a pair of bit flags.

set_dumpable() converts three-value dumpable to two flags and stores it into
lower two bits of mm_struct.flags instead of mm_struct.dumpable.
get_dumpable() behaves in the opposite way.

[akpm@linux-foundation.org: export set_dumpable]
Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6c5d5238

Avoid too many remote cpu references due to /proc/stat · 4004c69a

由 Ravikiran G Thirumalai 提交于 7月 19, 2007

Optimize show_stat to collect per-irq information just once.

On x86_64, with newer kernel versions, kstat_irqs is a bit of a problem.
On every call to kstat_irqs, the process brings in per-cpu data from all
online cpus. Doing this for NR_IRQS, which is now 256 + 32 * NR_CPUS
results in (256+32*63) * 63 remote cpu references on a 64 cpu config.
Considering the fact that we already compute this value per-cpu, we can
save on the remote references as below.
Signed-off-by: NAlok N Kataria <alok.kataria@calsoftinc.com>
Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4004c69a

18 7月, 2007 1 次提交

kallsyms: make KSYM_NAME_LEN include space for trailing '\0' · 9281acea

由 Tejun Heo 提交于 7月 17, 2007

KSYM_NAME_LEN is peculiar in that it does not include the space for the
trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
buffer.  This is nonsense and error-prone.  Moreover, when the caller
forgets that it's very likely to subtly bite back by corrupting the stack
because the last position of the buffer is always cleared to zero.

This patch increments KSYM_NAME_LEN by one and updates code accordingly.

* off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
  is fixed.

* Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
  MODULE_NAME_LEN was treated as if it didn't include space for the
  trailing '\0'.  Fix it.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Acked-by: NPaulo Marques <pmarques@grupopie.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9281acea

17 7月, 2007 3 次提交

move seccomp from /proc to a prctl · 1d9d02fe

由 Andrea Arcangeli 提交于 7月 15, 2007

This reduces the memory footprint and it enforces that only the current
task can enable seccomp on itself (this is a requirement for a
strightforward [modulo preempt ;) ] TIF_NOTSC implementation).
Signed-off-by: NAndrea Arcangeli <andrea@cpushare.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d9d02fe

taskstats: add context-switch counters · b663a79c

由 Maxim Uvarov 提交于 7月 15, 2007

Make available to the user the following task and process performance
statistics:

	* Involuntary Context Switches (task_struct->nivcsw)
	* Voluntary Context Switches (task_struct->nvcsw)

Statistics information is available from:
	1. taskstats interface (Documentation/accounting/)
	2. /proc/PID/status (task only).

This data is useful for detecting hyperactivity patterns between processes.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: NMaxim Uvarov <muvarov@ru.mvista.com>
Cc: Shailabh Nagar <nagar@watson.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Jonathan Lim <jlim@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b663a79c

/proc/*/environ: wrong placing of ptrace_may_attach() check · da58a161

由 Alexey Dobriyan 提交于 7月 15, 2007

It's a bit dopey-looking and can permit a task to cause a pagefault in an mm
which it doesn't have permission to read from.
Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da58a161

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年