提交 · 51f72f4a0f92e4abde33a8bca0fac9667575d035 · openanolis / cloud-kernel

02 2月, 2012 1 次提交

sysctl: An easier to read version of find_subdir · 51f72f4a

由 Eric W. Biederman 提交于 1月 30, 2012

Suggested-by: NLucian Adrian Grijincu <lucian.grijincu@gmail.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

51f72f4a

31 1月, 2012 2 次提交

sysctl: fix memset parameters in setup_sysctl_set() · 1347440d

由 Dan Carpenter 提交于 1月 30, 2012

The current code is a nop.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

1347440d

sysctl: remove an unused variable · 47981787

由 Dan Carpenter 提交于 1月 30, 2012

"links" is never used, so we can remove it.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

47981787

25 1月, 2012 28 次提交

sysctl: Add register_sysctl for normal sysctl users · fea478d4

由 Eric W. Biederman 提交于 1月 20, 2012

The plan is to convert all callers of register_sysctl_table
and register_sysctl_paths to register_sysctl.  The interface
to register_sysctl is enough nicer this should make the callers
a bit more readable.  Additionally after the conversion the
230 lines of backwards compatibility can be removed.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

fea478d4

sysctl: Index sysctl directories with rbtrees. · ac13ac6f

由 Eric W. Biederman 提交于 1月 09, 2012

One of the most important jobs of sysctl is to export network stack
tunables.  Several of those tunables are per network device.  In
several instances people are running with 1000+ network devices in
there network stacks, which makes the simple per directory linked list
in sysctl a scaling bottleneck.   Replace O(N^2) sysctl insertion and
lookup times with O(NlogN) by using an rbtree to index the sysctl
directories.

Benchmark before:
    make-dummies 0 999 -> 0.32s
    rmmod dummy        -> 0.12s
    make-dummies 0 9999 -> 1m17s
    rmmod dummy         -> 17s

Benchmark after:
    make-dummies 0 999 -> 0.074s
    rmmod dummy        -> 0.070s
    make-dummies 0 9999 -> 3.4s
    rmmod dummy         -> 0.44s

Benchmark after (without dev_snmp6):
    make-dummies 0 9999 -> 0.75s
    rmmod dummy         -> 0.44s
    make-dummies 0 99999 -> 11s
    rmmod dummy          -> 4.3s

At 10,000 dummy devices the bottleneck becomes the time to add and
remove the files under /proc/sys/net/dev_snmp6.  I have commented
out the code that adds and removes files under /proc/sys/net/dev_snmp6
and taken measurments of creating and destroying 100,000 dummies to
verify the sysctl continues to scale.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

ac13ac6f

sysctl: Make the header lists per directory. · 9e3d47df

由 Eric W. Biederman 提交于 1月 07, 2012

Slightly enhance efficiency and clarity of the code by making the
header list per directory instead of per set.

Benchmark before:
    make-dummies 0 999 -> 0.63s
    rmmod dummy        -> 0.12s
    make-dummies 0 9999 -> 2m35s
    rmmod dummy         -> 18s

Benchmark after:
    make-dummies 0 999 -> 0.32s
    rmmod dummy        -> 0.12s
    make-dummies 0 9999 -> 1m17s
    rmmod dummy         -> 17s
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

9e3d47df

sysctl: Move sysctl_check_dups into insert_header · e54012ce

由 Eric W. Biederman 提交于 1月 18, 2012

Simplify the callers of insert_header by removing explicit calls to check
for duplicates and instead have insert_header do the work.

This makes the code slightly more maintainable by enabling changes to
data structures where the insertion of new entries without duplicate
suppression is not possible.

There is not always a convenient path string where insert_header
is called so modify sysctl_check_dups to use sysctl_print_dir
when printing the full path when a duplicate is discovered.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

e54012ce

sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy · 60a47a2e

由 Eric W. Biederman 提交于 1月 08, 2012

An nsproxy argument here has always been awkard and now the nsproxy argument
is completely unnecessary so remove it, replacing it with the set we want
the registered tables to show up in.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

60a47a2e

sysctl: Replace root_list with links between sysctl_table_sets. · 0e47c99d

由 Eric W. Biederman 提交于 1月 07, 2012

Piecing together directories by looking first in one directory
tree, than in another directory tree and finally in a third
directory tree makes it hard to verify that some directory
entries are not multiply defined and makes it hard to create
efficient implementations the sysctl filesystem.

Replace the sysctl wide list of roots with autogenerated
links from the core sysctl directory tree to the other
sysctl directory trees.

This simplifies sysctl directory reading and lookups as now
only entries in a single sysctl directory tree need to be
considered.

Benchmark before:
    make-dummies 0 999 -> 0.44s
    rmmod dummy        -> 0.065s
    make-dummies 0 9999 -> 1m36s
    rmmod dummy         -> 0.4s

Benchmark after:
    make-dummies 0 999 -> 0.63s
    rmmod dummy        -> 0.12s
    make-dummies 0 9999 -> 2m35s
    rmmod dummy         -> 18s

The slowdown is caused by the lookups used in insert_headers
and put_links to see if we need to add links or remove links.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

0e47c99d

sysctl: Add sysctl_print_dir and use it in get_subdir · 6980128f

由 Eric W. Biederman 提交于 1月 21, 2012

When there are errors it is very nice to know the full sysctl path.
Add a simple function that computes the sysctl path and prints it
out.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

6980128f

sysctl: Stop requiring explicit management of sysctl directories · 7ec66d06

由 Eric W. Biederman 提交于 12月 29, 2011

Simplify the code and the sysctl semantics by autogenerating
sysctl directories when a sysctl table is registered that needs
the directories and autodeleting the directories when there are
no more sysctl tables registered that need them.

Autogenerating directories keeps sysctl tables from depending
on each other, removing all of the arcane register/unregister
ordering constraints and makes it impossible to get the order
wrong when reigsering and unregistering sysctl tables.

Autogenerating directories yields one unique entity that dentries
can point to, retaining the current effective use of the dcache.

Add struct ctl_dir as the type of these new autogenerated
directories.

The attached_by and attached_to fields in ctl_table_header are
removed as they are no longer needed.

The child field in ctl_table is no longer needed by the core of
the sysctl code.  ctl_table.child can be removed once all of the
existing users have been updated.

Benchmark before:
    make-dummies 0 999 -> 0.7s
    rmmod dummy        -> 0.07s
    make-dummies 0 9999 -> 1m10s
    rmmod dummy         -> 0.4s

Benchmark after:
    make-dummies 0 999 -> 0.44s
    rmmod dummy        -> 0.065s
    make-dummies 0 9999 -> 1m36s
    rmmod dummy         -> 0.4s
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

7ec66d06

sysctl: Add a root pointer to ctl_table_set · 9eb47c26

由 Eric W. Biederman 提交于 1月 22, 2012

Add a ctl_table_root pointer to ctl_table set so it is easy to
go from a ctl_table_set to a ctl_table_root.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

9eb47c26

sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry · 6a75ce16

由 Eric W. Biederman 提交于 1月 18, 2012

Replace sysctl_head_next with first_entry and next_entry.  These new
iterators operate at the level of sysctl table entries and filter
out any sysctl tables that should not be shown.

Utilizing two specialized functions instead of a single function removes
conditionals for handling awkward special cases that only come up
at the beginning of iteration, making the iterators easier to read
and understand.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

6a75ce16

sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry. · 076c3eed

由 Eric W. Biederman 提交于 1月 09, 2012

Replace the helpers that proc_sys_lookup uses with helpers that work
in terms of an entire sysctl directory.  This is worse for sysctl_lock
hold times but it is much better for code clarity and the code cleanups
to come.

find_in_table is no longer needed so it is removed.

find_entry a general helper to find entries in a directory is added.

lookup_entry is a simple wrapper around find_entry that takes the
sysctl_lock increases the use count if an entry is found and drops
the sysctl_lock.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

076c3eed

sysctl: Normalize the root_table data structure. · a194558e

由 Eric W. Biederman 提交于 1月 21, 2012

Every other directory has a .child member and we look at the .child
for our entries.  Do the same for the root_table.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

a194558e

E
sysctl: Factor out insert_header and erase_header · 8425d6aa
由 Eric W. Biederman 提交于 1月 09, 2012
```
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
```
8425d6aa
E
sysctl: Factor out init_header from __register_sysctl_paths · e0d04529
由 Eric W. Biederman 提交于 1月 09, 2012
```
Factor out a routing to initialize the sysctl_table_header.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
```
e0d04529

sysctl: Initial support for auto-unregistering sysctl tables. · 938aaa4f

由 Eric W. Biederman 提交于 1月 09, 2012

Add nreg to ctl_table_header.  When nreg drops to 0 the ctl_table_header
will be unregistered.

Factor out drop_sysctl_table from unregister_sysctl_table, and add
the logic for decrementing nreg.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

938aaa4f

sysctl: A more obvious version of grab_header. · 3cc3e046

由 Eric W. Biederman 提交于 1月 07, 2012

Instead of relying on sysct_head_next(NULL) to magically
return the right header for the root directory instead
explicitly transform NULL into the root directories header.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

3cc3e046

sysctl: Remove the now unused ctl_table parent field. · 8d6ecfcc

由 Eric W. Biederman 提交于 1月 06, 2012

While useful at one time for selinux and the sysctl sanity
checks those users no longer use the parent field and we can
safely remove it.
Inspired-by: NLucian Adrian Grijincu <lucian.grijincu@gmil.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

8d6ecfcc

sysctl: Improve the sysctl sanity checks · 7c60c48f

由 Eric W. Biederman 提交于 1月 21, 2012

- Stop validating subdirectories now that we only register leaf tables

- Cleanup and improve the duplicate filename check.
  * Run the duplicate filename check under the sysctl_lock to guarantee
    we never add duplicate names.
  * Reduce the duplicate filename check to nearly O(M*N) where M is the
    number of entries in tthe table we are registering and N is the
    number of entries in the directory before we got there.

- Move the duplicate filename check into it's own function and call
  it directtly from __register_sysctl_table

- Kill the config option as the sanity checks are now cheap enough
  the config option is unnecessary. The original reason for the config
  option was because we had a huge table used to verify the proc filename
  to binary sysctl mapping.  That table has now evolved into the binary_sysctl
  translation layer and is no longer part of the sysctl_check code.

- Tighten up the permission checks.  Guarnateeing that files only have read
  or write permissions.

- Removed redudant check for parents having a procname as now everything has
  a procname.

- Generalize the backtrace logic so that we print a backtrace from
  any failure of __register_sysctl_table that was not caused by
  a memmory allocation failure.  The backtrace allows us to track
  down who erroneously registered a sysctl table.

Bechmark before (CONFIG_SYSCTL_CHECK=y):
    make-dummies 0 999 -> 12s
    rmmod dummy        -> 0.08s

Bechmark before (CONFIG_SYSCTL_CHECK=n):
    make-dummies 0 999 -> 0.7s
    rmmod dummy        -> 0.06s
    make-dummies 0 99999 -> 1m13s
    rmmod dummy          -> 0.38s

Benchmark after:
    make-dummies 0 999 -> 0.65s
    rmmod dummy        -> 0.055s
    make-dummies 0 9999 -> 1m10s
    rmmod dummy         -> 0.39s

The sysctl sanity checks now impose no measurable cost.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

7c60c48f

sysctl: register only tables of sysctl files · f728019b

由 Eric W. Biederman 提交于 1月 22, 2012

Split the registration of a complex ctl_table array which may have
arbitrary numbers of directories (->child != NULL) and tables of files
into a series of simpler registrations that only register tables of files.

Graphically:

   register('dir', { + file-a
                     + file-b
                     + subdir1
                       + file-c
                     + subdir2
                       + file-d
                       + file-e })

is transformed into:
   wrapper->subheaders[0] = register('dir', {file1-a, file1-b})
   wrapper->subheaders[1] = register('dir/subdir1', {file-c})
   wrapper->subheaders[2] = register('dir/subdir2', {file-d, file-e})
   return wrapper

This guarantees that __register_sysctl_table will only see a simple
ctl_table array with all entries having (->child == NULL).

Care was taken to pass the original simple ctl_table arrays to
__register_sysctl_table whenever possible.

This change is derived from a similar patch written
by Lucrian Grijincu.
Inspired-by: NLucian Adrian Grijincu <lucian.grijincu@gmail.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

f728019b

sysctl: Add ctl_table chains into cstring paths · ec6a5266

由 Eric W. Biederman 提交于 1月 21, 2012

For any component of table passed to __register_sysctl_paths
that actually serves as a path, add that to the cstring path
that is passed to __register_sysctl_table.

The result is that for most calls to __register_sysctl_paths
we only pass a table to __register_sysctl_table that contains
no child directories.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

ec6a5266

sysctl: Add support for register sysctl tables with a normal cstring path. · 6e9d5164

由 Eric W. Biederman 提交于 1月 21, 2012

Make __register_sysctl_table the core sysctl registration operation and
make it take a char * string as path.

Now that binary paths have been banished into the real of backwards
compatibility in kernel/binary_sysctl.c where they can be safely
ignored there is no longer a need to use struct ctl_path to represent
path names when registering ctl_tables.

Start the transition to using normal char * strings to represent
pathnames when registering sysctl tables.  Normal strings are easier
to deal with both in the internal sysctl implementation and for
programmers registering sysctl tables.

__register_sysctl_paths is turned into a backwards compatibility wrapper
that converts a ctl_path array into a normal char * string.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

6e9d5164

sysctl: Create local copies of directory names used in paths · f05e53a7

由 Eric W. Biederman 提交于 1月 21, 2012

Creating local copies of directory names is a good idea for
two reasons.
- The dynamic names used by callers must be copied into new
  strings by the callers today to ensure the strings do not
  change between register and unregister of the sysctl table.

- Sysctl directories have a potentially different lifetime
  than the time between register and unregister of any
  particular sysctl table.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

f05e53a7

sysctl: Remove the unnecessary sysctl_set parent concept. · bd295b56

由 Eric W. Biederman 提交于 1月 22, 2012

In sysctl_net register the two networking roots in the proper order.

In register_sysctl walk the sysctl sets in the reverse order of the
sysctl roots.

Remove parent from ctl_table_set and setup_sysctl_set as it is no
longer needed.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

bd295b56

sysctl: Implement retire_sysctl_set · 97324cd8

由 Eric W. Biederman 提交于 1月 09, 2012

This adds a small helper retire_sysctl_set to remove the intimate knowledge about
the how a sysctl_set is implemented from net/sysct_net.c
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

97324cd8

sysctl: Make the directories have nlink == 1 · a15e2098

由 Eric W. Biederman 提交于 1月 08, 2012

I goofed when I made sysctl directories have nlink == 0.
nlink == 0 means the directory has been deleted.
nlink == 1 meands a directory does not count subdirectories.

Use the default nlink == 1 for sysctl directories.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

a15e2098

sysctl: Move the implementation into fs/proc/proc_sysctl.c · 1f87f0b5

由 Eric W. Biederman 提交于 1月 06, 2012

Move the core sysctl code from kernel/sysctl.c and kernel/sysctl_check.c
into fs/proc/proc_sysctl.c.

Currently sysctl maintenance is hampered by the sysctl implementation
being split across 3 files with artificial layering between them.
Consolidate the entire sysctl implementation into 1 file so that
it is easier to see what is going on and hopefully allowing for
simpler maintenance.

For functions that are now only used in fs/proc/proc_sysctl.c remove
their declarations from sysctl.h and make them static in fs/proc/proc_sysctl.c
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

1f87f0b5

sysctl: Register the base sysctl table like any other sysctl table. · de4e83bd

由 Eric W. Biederman 提交于 1月 06, 2012

Simplify the code by treating the base sysctl table like any other
sysctl table and register it with register_sysctl_table.

To ensure this table is registered early enough to avoid problems
call sysctl_init from proc_sys_init.

Rename sysctl_net.c:sysctl_init() to net_sysctl_init() to avoid
name conflicts now that kernel/sysctl.c:sysctl_init() is no longer
static.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

de4e83bd

sysctl: remove impossible condition check · 36885d7b

由 Lucas De Marchi 提交于 6月 10, 2011

Remove checks for conditions that will never happen. If procname is NULL
the loop would already had bailed out, so there's no need to check it
again.

At the same time this also compacts the function find_in_table() by
refactoring it to be easier to read.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
Reviewed-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

36885d7b

18 1月, 2012 3 次提交

proc: clean up and fix /proc/<pid>/mem handling · e268337d

由 Linus Torvalds 提交于 1月 17, 2012

Jüri Aedla reported that the /proc/<pid>/mem handling really isn't very
robust, and it also doesn't match the permission checking of any of the
other related files.

This changes it to do the permission checks at open time, and instead of
tracking the process, it tracks the VM at the time of the open.  That
simplifies the code a lot, but does mean that if you hold the file
descriptor open over an execve(), you'll continue to read from the _old_
VM.

That is different from our previous behavior, but much simpler.  If
somebody actually finds a load where this matters, we'll need to revert
this commit.

I suspect that nobody will ever notice - because the process mapping
addresses will also have changed as part of the execve.  So you cannot
actually usefully access the fd across a VM change simply because all
the offsets for IO would have changed too.
Reported-by: NJüri Aedla <asd@ut.ee>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e268337d

audit: only allow tasks to set their loginuid if it is -1 · 633b4545

由 Eric Paris 提交于 1月 03, 2012

At the moment we allow tasks to set their loginuid if they have
CAP_AUDIT_CONTROL. In reality we want tasks to set the loginuid when they
log in and it be impossible to ever reset. We had to make it mutable even
after it was once set (with the CAP) because on update and admin might have
to restart sshd. Now sshd would get his loginuid and the next user which
logged in using ssh would not be able to set his loginuid.

Systemd has changed how userspace works and allowed us to make the kernel
work the way it should. With systemd users (even admins) are not supposed
to restart services directly. The system will restart the service for
them. Thus since systemd is going to loginuid==-1, sshd would get -1, and
sshd would be allowed to set a new loginuid without special permissions.

If an admin in this system were to manually start an sshd he is inserting
himself into the system chain of trust and thus, logically, it's his
loginuid that should be used! Since we have old systems I make this a
Kconfig option.
Signed-off-by: NEric Paris <eparis@redhat.com>

633b4545

audit: remove task argument to audit_set_loginuid · 0a300be6

由 Eric Paris 提交于 1月 03, 2012

The function always deals with current.  Don't expose an option
pretending one can use it for something.  You can't.
Signed-off-by: NEric Paris <eparis@redhat.com>

0a300be6

16 1月, 2012 1 次提交

sched/accounting, proc: Fix /proc/stat interrupts sum · f7e6746e

由 Russell King 提交于 1月 14, 2012

Commit 3292beb3 ("sched/accounting: Change cpustat fields to an array")
deleted the code which provides us with the sum of all interrupts in the
system, causing vmstat to report zero interrupts occuring in the system.

Fix this by restoring the code.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> # [on ARM]
Tested-by: NTony Luck <tony.luck@intel.com>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Glauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Tuner <pjt@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f7e6746e

13 1月, 2012 2 次提交

c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4 · b3f7f573

由 Cyrill Gorcunov 提交于 1月 12, 2012

The mm->start_code/end_code, mm->start_data/end_data, mm->start_brk are
involved into calculation of program text/data segment sizes (which might
be seen in /proc/<pid>/statm) and into brk() call final address.

For restore we need to know all these values.  While
mm->start_code/end_code already present in /proc/$pid/stat, the rest
members are not, so this patch brings them in.

The restore procedure of these members is addressed in another patch using
prctl().
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b3f7f573

proc: fix null pointer deref in proc_pid_permission() · a2ef990a

由 Xiaotian Feng 提交于 1月 12, 2012

get_proc_task() can fail to search the task and return NULL,
put_task_struct() will then bomb the kernel with following oops:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
  IP: [<ffffffff81217d34>] proc_pid_permission+0x64/0xe0
  PGD 112075067 PUD 112814067 PMD 0
  Oops: 0002 [#1] PREEMPT SMP

This is a regression introduced by commit 0499680a ("procfs: add hidepid=
and gid= mount options").  The kernel should return -ESRCH if
get_proc_task() failed.
Signed-off-by: NXiaotian Feng <dannyfeng@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Stephen Wilson <wilsons@start.ca>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2ef990a

11 1月, 2012 3 次提交

procfs: add hidepid= and gid= mount options · 0499680a

由 Vasiliy Kulikov 提交于 1月 10, 2012

Add support for mount options to restrict access to /proc/PID/
directories.  The default backward-compatible "relaxed" behaviour is left
untouched.

The first mount option is called "hidepid" and its value defines how much
info about processes we want to be available for non-owners:

hidepid=0 (default) means the old behavior - anybody may read all
world-readable /proc/PID/* files.

hidepid=1 means users may not access any /proc/<pid>/ directories, but
their own.  Sensitive files like cmdline, sched*, status are now protected
against other users.  As permission checking done in proc_pid_permission()
and files' permissions are left untouched, programs expecting specific
files' modes are not confused.

hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
users.  It doesn't mean that it hides whether a process exists (it can be
learned by other means, e.g.  by kill -0 $PID), but it hides process' euid
and egid.  It compicates intruder's task of gathering info about running
processes, whether some daemon runs with elevated privileges, whether
another user runs some sensitive program, whether other users run any
program at all, etc.

gid=XXX defines a group that will be able to gather all processes' info
(as in hidepid=0 mode).  This group should be used instead of putting
nonroot user in sudoers file or something.  However, untrusted users (like
daemons, etc.) which are not supposed to monitor the tasks in the whole
system should not be added to the group.

hidepid=1 or higher is designed to restrict access to procfs files, which
might reveal some sensitive private information like precise keystrokes
timings:

http://www.openwall.com/lists/oss-security/2011/11/05/3

hidepid=1/2 doesn't break monitoring userspace tools.  ps, top, pgrep, and
conky gracefully handle EPERM/ENOENT and behave as if the current user is
the only user running processes.  pstree shows the process subtree which
contains "pstree" process.

Note: the patch doesn't deal with setuid/setgid issues of keeping
preopened descriptors of procfs files (like
https://lkml.org/lkml/2011/2/7/368).  We rely on that the leaked
information like the scheduling counters of setuid apps doesn't threaten
anybody's privacy - only the user started the setuid program may read the
counters.
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg KH <greg@kroah.com>
Cc: Theodore Tso <tytso@MIT.EDU>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: James Morris <jmorris@namei.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0499680a

procfs: parse mount options · 97412950

由 Vasiliy Kulikov 提交于 1月 10, 2012

Add support for procfs mount options.  Actual mount options are coming in
the next patches.
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg KH <greg@kroah.com>
Cc: Theodore Tso <tytso@MIT.EDU>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: James Morris <jmorris@namei.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

97412950

procfs: introduce the /proc/<pid>/map_files/ directory · 640708a2

由 Pavel Emelyanov 提交于 1月 10, 2012

This one behaves similarly to the /proc/<pid>/fd/ one - it contains
symlinks one for each mapping with file, the name of a symlink is
"vma->vm_start-vma->vm_end", the target is the file.  Opening a symlink
results in a file that point exactly to the same inode as them vma's one.

For example the ls -l of some arbitrary /proc/<pid>/map_files/

 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80403000-7f8f80404000 -> /lib64/libc-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f8061e000-7f8f80620000 -> /lib64/libselinux.so.1
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80826000-7f8f80827000 -> /lib64/libacl.so.1.1.0
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a2f000-7f8f80a30000 -> /lib64/librt-2.5.so
 | lr-x------ 1 root root 64 Aug 26 06:40 7f8f80a30000-7f8f80a4c000 -> /lib64/ld-2.5.so

This *helps* checkpointing process in three ways:

1. When dumping a task mappings we do know exact file that is mapped
   by particular region.  We do this by opening
   /proc/$pid/map_files/$address symlink the way we do with file
   descriptors.

2. This also helps in determining which anonymous shared mappings are
   shared with each other by comparing the inodes of them.

3. When restoring a set of processes in case two of them has a mapping
   shared, we map the memory by the 1st one and then open its
   /proc/$pid/map_files/$address file and map it by the 2nd task.

Using /proc/$pid/maps for this is quite inconvenient since it brings
repeatable re-reading and reparsing for this text file which slows down
restore procedure significantly.  Also as being pointed in (3) it is a way
easier to use top level shared mapping in children as
/proc/$pid/map_files/$address when needed.

[akpm@linux-foundation.org: coding-style fixes]
[gorcunov@openvz.org: make map_files depend on CHECKPOINT_RESTORE]
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: NVasiliy Kulikov <segoon@openwall.com>
Reviewed-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Tejun Heo <tj@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

640708a2

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功