提交 · 8bbf4976b59fc9fc2861e79cab7beb3f6d647640 · openeuler / Kernel

14 11月, 2008 4 次提交

KEYS: Alter use of key instantiation link-to-keyring argument · 8bbf4976

由 David Howells 提交于 11月 14, 2008

Alter the use of the key instantiation and negation functions' link-to-keyring
arguments.  Currently this specifies a keyring in the target process to link
the key into, creating the keyring if it doesn't exist.  This, however, can be
a problem for copy-on-write credentials as it means that the instantiating
process can alter the credentials of the requesting process.

This patch alters the behaviour such that:

 (1) If keyctl_instantiate_key() or keyctl_negate_key() are given a specific
     keyring by ID (ringid >= 0), then that keyring will be used.

 (2) If keyctl_instantiate_key() or keyctl_negate_key() are given one of the
     special constants that refer to the requesting process's keyrings
     (KEY_SPEC_*_KEYRING, all <= 0), then:

     (a) If sys_request_key() was given a keyring to use (destringid) then the
     	 key will be attached to that keyring.

     (b) If sys_request_key() was given a NULL keyring, then the key being
     	 instantiated will be attached to the default keyring as set by
     	 keyctl_set_reqkey_keyring().

 (3) No extra link will be made.

Decision point (1) follows current behaviour, and allows those instantiators
who've searched for a specifically named keyring in the requestor's keyring so
as to partition the keys by type to still have their named keyrings.

Decision point (2) allows the requestor to make sure that the key or keys that
get produced by request_key() go where they want, whilst allowing the
instantiator to request that the key is retained.  This is mainly useful for
situations where the instantiator makes a secondary request, the key for which
should be retained by the initial requestor:

	+-----------+        +--------------+        +--------------+
	|           |        |              |        |              |
	| Requestor |------->| Instantiator |------->| Instantiator |
	|           |        |              |        |              |
	+-----------+        +--------------+        +--------------+
	           request_key()           request_key()

This might be useful, for example, in Kerberos, where the requestor requests a
ticket, and then the ticket instantiator requests the TGT, which someone else
then has to go and fetch.  The TGT, however, should be retained in the
keyrings of the requestor, not the first instantiator.  To make this explict
an extra special keyring constant is also added.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

8bbf4976

KEYS: Disperse linux/key_ui.h · e9e349b0

由 David Howells 提交于 11月 14, 2008

Disperse the bits of linux/key_ui.h as the reason they were put here (keyfs)
didn't get in.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

e9e349b0

CRED: Wrap task credential accesses in the capabilities code · b103c598

由 David Howells 提交于 11月 14, 2008

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

b103c598

CRED: Wrap task credential accesses in the key management code · 47d804bf

由 David Howells 提交于 11月 14, 2008

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

47d804bf

11 11月, 2008 4 次提交

Currently SELinux jumps through some ugly hoops to not audit a capbility · 06674679

由 Eric Paris 提交于 11月 11, 2008

check when determining if a process has additional powers to override
memory limits or when trying to read/write illegal file labels.  Use
the new noaudit call instead.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NJames Morris <jmorris@namei.org>

06674679

Add a new capable interface that will be used by systems that use audit to · 06112163

由 Eric Paris 提交于 11月 11, 2008

make an A or B type decision instead of a security decision. Currently
this is the case at least for filesystems when deciding if a process can use
the reserved 'root' blocks and for the case of things like the oom
algorithm determining if processes are root processes and should be less
likely to be killed. These types of security system requests should not be
audited or logged since they are not really security decisions. It would be
possible to solve this problem like the vm_enough_memory security check did
by creating a new LSM interface and moving all of the policy into that
interface but proves the needlessly bloat the LSM and provide complex
indirection.

This merely allows those decisions to be made where they belong and to not
flood logs or printk with denials for thing that are not security decisions.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NJames Morris <jmorris@namei.org>

06112163

Any time fcaps or a setuid app under SECURE_NOROOT is used to result in a · 3fc689e9

由 Eric Paris 提交于 11月 11, 2008

non-zero pE we will crate a new audit record which contains the entire set
of known information about the executable in question, fP, fI, fE, fversion
and includes the process's pE, pI, pP. Before and after the bprm capability
are applied. This record type will only be emitted from execve syscalls.

an example of making ping use fcaps instead of setuid:

setcap "cat_net_raw+pe" /bin/ping

type=SYSCALL msg=audit(1225742021.015:236): arch=c000003e syscall=59 success=yes exit=0 a0=1457f30 a1=14606b0 a2=1463940 a3=321b770a70 items=2 ppid=2929 pid=2963 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=pts0 ses=3 comm="ping" exe="/bin/ping" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=UNKNOWN[1321] msg=audit(1225742021.015:236): fver=2 fp=0000000000002000 fi=0000000000000000 fe=1 old_pp=0000000000000000 old_pi=0000000000000000 old_pe=0000000000000000 new_pp=0000000000002000 new_pi=0000000000000000 new_pe=0000000000002000
type=EXECVE msg=audit(1225742021.015:236): argc=2 a0="ping" a1="127.0.0.1"
type=CWD msg=audit(1225742021.015:236): cwd="/home/test"
type=PATH msg=audit(1225742021.015:236): item=0 name="/bin/ping" inode=49256 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ping_exec_t:s0 cap_fp=0000000000002000 cap_fe=1 cap_fver=2
type=PATH msg=audit(1225742021.015:236): item=1 name=(null) inode=507915 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

3fc689e9

This patch add a generic cpu endian caps structure and externally available · c0b00441

由 Eric Paris 提交于 11月 11, 2008

functions which retrieve fcaps information from disk.  This information is
necessary so fcaps information can be collected and recorded by the audit
system.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

c0b00441

09 11月, 2008 1 次提交

SELinux: Use unknown perm handling to handle unknown netlink msg types · 39c9aede

由 Eric Paris 提交于 11月 05, 2008

Currently when SELinux has not been updated to handle a netlink message
type the operation is denied with EINVAL.  This patch will leave the
audit/warning message so things get fixed but if policy chose to allow
unknowns this will allow the netlink operation.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NJames Morris <jmorris@namei.org>

39c9aede

06 11月, 2008 2 次提交

file capabilities: add no_file_caps switch (v4) · 1f29fae2

由 Serge E. Hallyn 提交于 11月 05, 2008

Add a no_file_caps boot option when file capabilities are
compiled into the kernel (CONFIG_SECURITY_FILE_CAPABILITIES=y).

This allows distributions to ship a kernel with file capabilities
compiled in, without forcing users to use (and understand and
trust) them.

When no_file_caps is specified at boot, then when a process executes
a file, any file capabilities stored with that file will not be
used in the calculation of the process' new capability sets.

This means that booting with the no_file_caps boot option will
not be the same as booting a kernel with file capabilities
compiled out - in particular a task with  CAP_SETPCAP will not
have any chance of passing capabilities to another task (which
isn't "really" possible anyway, and which may soon by killed
altogether by David Howells in any case), and it will instead
be able to put new capabilities in its pI.  However since fI
will always be empty and pI is masked with fI, it gains the
task nothing.

We also support the extra prctl options, setting securebits and
dropping capabilities from the per-process bounding set.

The other remaining difference is that killpriv, task_setscheduler,
setioprio, and setnice will continue to be hooked.  That will
be noticable in the case where a root task changed its uid
while keeping some caps, and another task owned by the new uid
tries to change settings for the more privileged task.

Changelog:
	Nov 05 2008: (v4) trivial port on top of always-start-\
		with-clear-caps patch
	Sep 23 2008: nixed file_caps_enabled when file caps are
		not compiled in as it isn't used.
		Document no_file_caps in kernel-parameters.txt.
Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
Acked-by: NAndrew G. Morgan <morgan@kernel.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

1f29fae2

selinux: recognize netlink messages for 'ip addrlabel' · 2f99db28

由 Michal Schmidt 提交于 11月 05, 2008

In enforcing mode '/sbin/ip addrlabel' results in a SELinux error:
type=SELINUX_ERR msg=audit(1225698822.073:42): SELinux:  unrecognized
netlink message type=74 for sclass=43

The problem is missing RTM_*ADDRLABEL entries in SELinux's netlink
message types table.

Reported in https://bugzilla.redhat.com/show_bug.cgi?id=469423Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NJames Morris <jmorris@namei.org>

2f99db28

05 11月, 2008 1 次提交

SELinux: hold tasklist_lock and siglock while waking wait_chldexit · 41d9f9c5

由 Eric Paris 提交于 11月 04, 2008

SELinux has long been calling wake_up_interruptible() on
current->parent->signal->wait_chldexit without holding any locks.  It
appears that this operation should hold the tasklist_lock to dereference
current->parent and we should hold the siglock when waking up the
signal->wait_chldexit.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

41d9f9c5

02 11月, 2008 1 次提交

file caps: always start with clear bprm->caps_* · 3318a386

由 Serge Hallyn 提交于 10月 30, 2008

While Linux doesn't honor setuid on scripts.  However, it mistakenly
behaves differently for file capabilities.

This patch fixes that behavior by making sure that get_file_caps()
begins with empty bprm->caps_*.  That way when a script is loaded,
its bprm->caps_* may be filled when binfmt_misc calls prepare_binprm(),
but they will be cleared again when binfmt_elf calls prepare_binprm()
next to read the interpreter's file capabilities.
Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NAndrew G. Morgan <morgan@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3318a386

01 11月, 2008 1 次提交

SELinux: properly handle empty tty_files list · 37dd0bd0

由 Eric Paris 提交于 10月 31, 2008

SELinux has wrongly (since 2004) had an incorrect test for an empty
tty->tty_files list.  With an empty list selinux would be pointing to part
of the tty struct itself and would then proceed to dereference that value
and again dereference that result.  An F10 change to plymouth on a ppc64
system is actually currently triggering this bug.  This patch uses
list_empty() to handle empty lists rather than looking at a meaningless
location.

[note, this fixes the oops reported in
https://bugzilla.redhat.com/show_bug.cgi?id=469079]
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

37dd0bd0

31 10月, 2008 1 次提交

nfsd: fix vm overcommit crash · 731572d3

由 Alan Cox 提交于 10月 29, 2008

Junjiro R.  Okajima reported a problem where knfsd crashes if you are
using it to export shmemfs objects and run strict overcommit.  In this
situation the current->mm based modifier to the overcommit goes through a
NULL pointer.

We could simply check for NULL and skip the modifier but we've caught
other real bugs in the past from mm being NULL here - cases where we did
need a valid mm set up (eg the exec bug about a year ago).

To preserve the checks and get the logic we want shuffle the checking
around and add a new helper to the vm_ security wrappers

Also fix a current->mm reference in nommu that should use the passed mm

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix build]
Reported-by: NJunjiro R. Okajima <hooanon05@yahoo.co.jp>
Acked-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NAlan Cox <alan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

731572d3

30 10月, 2008 1 次提交

SELinux: check open perms in dentry_open not inode_permission · 8b6a5a37

由 Eric Paris 提交于 10月 29, 2008

Some operations, like searching a directory path or connecting a unix domain
socket, make explicit calls into inode_permission.  Our choices are to
either try to come up with a signature for all of the explicit calls to
inode_permission and do not check open on those, or to move the open checks to
dentry_open where we know this is always an open operation.  This patch moves
the checks to dentry_open.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NJames Morris <jmorris@namei.org>

8b6a5a37

20 10月, 2008 3 次提交

devcgroup: remove spin_lock() · 47c59803

由 Lai Jiangshan 提交于 10月 18, 2008

Since we introduced rcu for read side, spin_lock is used only for update.
But we always hold cgroup_lock() when update, so spin_lock() is not need.

Additional cleanup:
1) include linux/rcupdate.h explicitly
2) remove unused variable cur_devcgroup in devcgroup_update_access()
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: N"Serge E. Hallyn" <serue@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

47c59803

devcgroup: remove unused variable · c012a54a

由 Li Zefan 提交于 10月 18, 2008

Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c012a54a

devcgroup: use kmemdup() · 2cdc7241

由 Li Zefan 提交于 10月 18, 2008

This saves 40 bytes on my x86_32 box.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2cdc7241

14 10月, 2008 3 次提交

vfs: Use const for kernel parser table · a447c093

由 Steven Whitehouse 提交于 10月 13, 2008

This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.

This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a447c093

tty: Redo current tty locking · 934e6ebf

由 Alan Cox 提交于 10月 13, 2008

Currently it is sometimes locked by the tty mutex and sometimes by the
sighand lock. The latter is in fact correct and now we can hand back referenced
objects we can fix this up without problems around sleeping functions.
Signed-off-by: NAlan Cox <alan@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

934e6ebf

tty: Make get_current_tty use a kref · 452a00d2

由 Alan Cox 提交于 10月 13, 2008

We now return a kref covered tty reference. That ensures the tty structure
doesn't go away when you have a return from get_current_tty. This is not
enough to protect you from most of the resources being freed behind your
back - yet.

[Updated to include fixes for SELinux problems found by Andrew Morton and
 an s390 leak found while debugging the former]
Signed-off-by: NAlan Cox <alan@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

452a00d2

13 10月, 2008 1 次提交

integrity: special fs magic · 92562927

由 Mimi Zohar 提交于 10月 07, 2008

Discussion on the mailing list questioned the use of these
magic values in userspace, concluding these values are already
exported to userspace via statfs and their correct/incorrect
usage is left up to the userspace application.

  - Move special fs magic number definitions to magic.h
  - Add magic.h include
Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

92562927

10 10月, 2008 11 次提交

netlabel: Changes to the NetLabel security attributes to allow LSMs to pass full contexts · 8d75899d

由 Paul Moore 提交于 10月 10, 2008

This patch provides support for including the LSM's secid in addition to
the LSM's MLS information in the NetLabel security attributes structure.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NJames Morris <jmorris@namei.org>

8d75899d

selinux: Cache NetLabel secattrs in the socket's security struct · 6c5b3fc0

由 Paul Moore 提交于 10月 10, 2008

Previous work enabled the use of address based NetLabel selectors, which
while highly useful, brought the potential for additional per-packet overhead
when used. This patch attempts to mitigate some of that overhead by caching
the NetLabel security attribute struct within the SELinux socket security
structure. This should help eliminate the need to recreate the NetLabel
secattr structure for each packet resulting in less overhead.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NJames Morris <jmorris@namei.org>

6c5b3fc0

selinux: Set socket NetLabel based on connection endpoint · 014ab19a

由 Paul Moore 提交于 10月 10, 2008

Previous work enabled the use of address based NetLabel selectors, which while
highly useful, brought the potential for additional per-packet overhead when
used.  This patch attempts to solve that by applying NetLabel socket labels
when sockets are connect()'d.  This should alleviate the per-packet NetLabel
labeling for all connected sockets (yes, it even works for connected DGRAM
sockets).
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Reviewed-by: NJames Morris <jmorris@namei.org>

014ab19a

netlabel: Add functionality to set the security attributes of a packet · 948bf85c

由 Paul Moore 提交于 10月 10, 2008

This patch builds upon the new NetLabel address selector functionality by
providing the NetLabel KAPI and CIPSO engine support needed to enable the
new packet-based labeling.  The only new addition to the NetLabel KAPI at
this point is shown below:

 * int netlbl_skbuff_setattr(skb, family, secattr)

... and is designed to be called from a Netfilter hook after the packet's
IP header has been populated such as in the FORWARD or LOCAL_OUT hooks.

This patch also provides the necessary SELinux hooks to support this new
functionality.  Smack support is not currently included due to uncertainty
regarding the permissions needed to expand the Smack network access controls.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Reviewed-by: NJames Morris <jmorris@namei.org>

948bf85c

netlabel: Replace protocol/NetLabel linking with refrerence counts · b1edeb10

由 Paul Moore 提交于 10月 10, 2008

NetLabel has always had a list of backpointers in the CIPSO DOI definition
structure which pointed to the NetLabel LSM domain mapping structures which
referenced the CIPSO DOI struct. The rationale for this was that when an
administrator removed a CIPSO DOI from the system all of the associated
NetLabel LSM domain mappings should be removed as well; a list of
backpointers made this a simple operation.

Unfortunately, while the backpointers did make the removal easier they were
a bit of a mess from an implementation point of view which was making
further development difficult. Since the removal of a CIPSO DOI is a
realtively rare event it seems to make sense to remove this backpointer
list as the optimization was hurting us more then it was helping. However,
we still need to be able to track when a CIPSO DOI definition is being used
so replace the backpointer list with a reference count. In order to
preserve the current functionality of removing the associated LSM domain
mappings when a CIPSO DOI is removed we walk the LSM domain mapping table,
removing the relevant entries.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Reviewed-by: NJames Morris <jmorris@namei.org>

b1edeb10

smack: Fix missing calls to netlbl_skbuff_err() · a8134296

由 Paul Moore 提交于 10月 10, 2008

Smack needs to call netlbl_skbuff_err() to let NetLabel do the necessary
protocol specific error handling.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NCasey Schaufler <casey@schaufler-ca.com>

a8134296

selinux: Fix missing calls to netlbl_skbuff_err() · dfaebe98

由 Paul Moore 提交于 10月 10, 2008

At some point I think I messed up and dropped the calls to netlbl_skbuff_err()
which are necessary for CIPSO to send error notifications to remote systems.
This patch re-introduces the error handling calls into the SELinux code.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NJames Morris <jmorris@namei.org>

dfaebe98

selinux: Fix a problem in security_netlbl_sid_to_secattr() · 99d854d2

由 Paul Moore 提交于 10月 10, 2008

Currently when SELinux fails to allocate memory in
security_netlbl_sid_to_secattr() the NetLabel LSM domain field is set to
NULL which triggers the default NetLabel LSM domain mapping which may not
always be the desired mapping.  This patch fixes this by returning an error
when the kernel is unable to allocate memory.  This could result in more
failures on a system with heavy memory pressure but it is the "correct"
thing to do.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NJames Morris <jmorris@namei.org>

99d854d2

selinux: Better local/forward check in selinux_ip_postroute() · d8395c87

由 Paul Moore 提交于 10月 10, 2008

It turns out that checking to see if skb->sk is NULL is not a very good
indicator of a forwarded packet as some locally generated packets also have
skb->sk set to NULL. Fix this by not only checking the skb->sk field but also
the IP[6]CB(skb)->flags field for the IP[6]SKB_FORWARDED flag. While we are
at it, we are calling selinux_parse_skb() much earlier than we really should
resulting in potentially wasted cycles parsing packets for information we
might no use; so shuffle the code around a bit to fix this.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NJames Morris <jmorris@namei.org>

d8395c87

selinux: Correctly handle IPv4 packets on IPv6 sockets in all cases · aa862900

由 Paul Moore 提交于 10月 10, 2008

We did the right thing in a few cases but there were several areas where we
determined a packet's address family based on the socket's address family which
is not the right thing to do since we can get IPv4 packets on IPv6 sockets.
This patch fixes these problems by either taking the address family directly
from the packet.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NJames Morris <jmorris@namei.org>

aa862900

selinux: Cleanup the NetLabel glue code · accc6093

由 Paul Moore 提交于 10月 10, 2008

We were doing a lot of extra work in selinux_netlbl_sock_graft() what wasn't
necessary so this patch removes that code.  It also removes the redundant
second argument to selinux_netlbl_sock_setsid() which allows us to simplify a
few other functions.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NJames Morris <jmorris@namei.org>

accc6093

04 10月, 2008 2 次提交

selinux: Fix an uninitialized variable BUG/panic in selinux_secattr_to_sid() · 3040a6d5

由 Paul Moore 提交于 10月 03, 2008

At some point during the 2.6.27 development cycle two new fields were added
to the SELinux context structure, a string pointer and a length field. The
code in selinux_secattr_to_sid() was not modified and as a result these two
fields were left uninitialized which could result in erratic behavior,
including kernel panics, when NetLabel is used. This patch fixes the
problem by fully initializing the context in selinux_secattr_to_sid() before
use and reducing the level of direct context manipulation done to help
prevent future problems.

Please apply this to the 2.6.27-rcX release stream.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

3040a6d5

selinux: Fix an uninitialized variable BUG/panic in selinux_secattr_to_sid() · 81990fbd

由 Paul Moore 提交于 10月 03, 2008

Please apply this to the 2.6.27-rcX release stream.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

81990fbd

29 9月, 2008 1 次提交

selinux: use default proc sid on symlinks · ea6b184f

由 Stephen Smalley 提交于 9月 22, 2008

As we are not concerned with fine-grained control over reading of
symlinks in proc, always use the default proc SID for all proc symlinks.
This should help avoid permission issues upon changes to the proc tree
as in the /proc/net -> /proc/self/net example.
This does not alter labeling of symlinks within /proc/pid directories.
ls -Zd /proc/net output before and after the patch should show the difference.
Signed-off-by: NStephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: NJames Morris <jmorris@namei.org>

ea6b184f

27 9月, 2008 1 次提交

file capabilities: uninline cap_safe_nice · de45e806

由 Serge E. Hallyn 提交于 9月 26, 2008

This reduces the kernel size by 289 bytes.
Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
Acked-by: NAndrew G. Morgan <morgan@kernel.org>
Signed-off-by: NJames Morris <jmorris@namei.org>

de45e806

14 9月, 2008 1 次提交

timers: fix itimer/many thread hang · f06febc9

由 Frank Mayhar 提交于 9月 12, 2008

Overview

This patch reworks the handling of POSIX CPU timers, including the
ITIMER_PROF, ITIMER_VIRT timers and rlimit handling. It was put together
with the help of Roland McGrath, the owner and original writer of this code.

The problem we ran into, and the reason for this rework, has to do with using
a profiling timer in a process with a large number of threads. It appears
that the performance of the old implementation of run_posix_cpu_timers() was
at least O(n*3) (where "n" is the number of threads in a process) or worse.
Everything is fine with an increasing number of threads until the time taken
for that routine to run becomes the same as or greater than the tick time, at
which point things degrade rather quickly.

This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF."

Code Changes

This rework corrects the implementation of run_posix_cpu_timers() to make it
run in constant time for a particular machine. (Performance may vary between
one machine and another depending upon whether the kernel is built as single-
or multiprocessor and, in the latter case, depending upon the number of
running processors.) To do this, at each tick we now update fields in
signal_struct as well as task_struct. The run_posix_cpu_timers() function
uses those fields to make its decisions.

We define a new structure, "task_cputime," to contain user, system and
scheduler times and use these in appropriate places:

struct task_cputime {
cputime_t utime;
cputime_t stime;
unsigned long long sum_exec_runtime;
};

This is included in the structure "thread_group_cputime," which is a new
substructure of signal_struct and which varies for uniprocessor versus
multiprocessor kernels. For uniprocessor kernels, it uses "task_cputime" as
a simple substructure, while for multiprocessor kernels it is a pointer:

struct thread_group_cputime {
struct task_cputime totals;
};

struct thread_group_cputime {
struct task_cputime *totals;
};

We also add a new task_cputime substructure directly to signal_struct, to
cache the earliest expiration of process-wide timers, and task_cputime also
replaces the it_*_expires fields of task_struct (used for earliest expiration
of thread timers). The "thread_group_cputime" structure contains process-wide
timers that are updated via account_user_time() and friends. In the non-SMP
case the structure is a simple aggregator; unfortunately in the SMP case that
simplicity was not achievable due to cache-line contention between CPUs (in
one measured case performance was actually _worse_ on a 16-cpu system than
the same test on a 4-cpu system, due to this contention). For SMP, the
thread_group_cputime counters are maintained as a per-cpu structure allocated
using alloc_percpu(). The timer functions update only the timer field in
the structure corresponding to the running CPU, obtained using per_cpu_ptr().

We define a set of inline functions in sched.h that we use to maintain the
thread_group_cputime structure and hide the differences between UP and SMP
implementations from the rest of the kernel. The thread_group_cputime_init()
function initializes the thread_group_cputime structure for the given task.
The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the
out-of-line function thread_group_cputime_alloc_smp() to allocate and fill
in the per-cpu structures and fields. The thread_group_cputime_free()
function, also a no-op for UP, in SMP frees the per-cpu structures. The
thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls
thread_group_cputime_alloc() if the per-cpu structures haven't yet been
allocated. The thread_group_cputime() function fills the task_cputime
structure it is passed with the contents of the thread_group_cputime fields;
in UP it's that simple but in SMP it must also safely check that tsk->signal
is non-NULL (if it is it just uses the appropriate fields of task_struct) and,
if so, sums the per-cpu values for each online CPU. Finally, the three
functions account_group_user_time(), account_group_system_time() and
account_group_exec_runtime() are used by timer functions to update the
respective fields of the thread_group_cputime structure.

Non-SMP operation is trivial and will not be mentioned further.

The per-cpu structure is always allocated when a task creates its first new
thread, via a call to thread_group_cputime_clone_thread() from copy_signal().
It is freed at process exit via a call to thread_group_cputime_free() from
cleanup_signal().

All functions that formerly summed utime/stime/sum_sched_runtime values from
from all threads in the thread group now use thread_group_cputime() to
snapshot the values in the thread_group_cputime structure or the values in
the task structure itself if the per-cpu structure hasn't been allocated.

Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit.
The run_posix_cpu_timers() function has been split into a fast path and a
slow path; the former safely checks whether there are any expired thread
timers and, if not, just returns, while the slow path does the heavy lifting.
With the dedicated thread group fields, timers are no longer "rebalanced" and
the process_timer_rebalance() function and related code has gone away. All
summing loops are gone and all code that used them now uses the
thread_group_cputime() inline. When process-wide timers are set, the new
task_cputime structure in signal_struct is used to cache the earliest
expiration; this is checked in the fast path.

Performance

The fix appears not to add significant overhead to existing operations. It
generally performs the same as the current code except in two cases, one in
which it performs slightly worse (Case 5 below) and one in which it performs
very significantly better (Case 2 below). Overall it's a wash except in those
two cases.

I've since done somewhat more involved testing on a dual-core Opteron system.

Case 1: With no itimer running, for a test with 100,000 threads, the fixed
kernel took 1428.5 seconds, 513 seconds more than the unfixed system,
all of which was spent in the system. There were twice as many
voluntary context switches with the fix as without it.

Case 2: With an itimer running at .01 second ticks and 4000 threads (the most
an unmodified kernel can handle), the fixed kernel ran the test in
eight percent of the time (5.8 seconds as opposed to 70 seconds) and
had better tick accuracy (.012 seconds per tick as opposed to .023
seconds per tick).

Case 3: A 4000-thread test with an initial timer tick of .01 second and an
interval of 10,000 seconds (i.e. a timer that ticks only once) had
very nearly the same performance in both cases: 6.3 seconds elapsed
for the fixed kernel versus 5.5 seconds for the unfixed kernel.

With fewer threads (eight in these tests), the Case 1 test ran in essentially
the same time on both the modified and unmodified kernels (5.2 seconds versus
5.8 seconds). The Case 2 test ran in about the same time as well, 5.9 seconds
versus 5.4 seconds but again with much better tick accuracy, .013 seconds per
tick versus .025 seconds per tick for the unmodified kernel.

Since the fix affected the rlimit code, I also tested soft and hard CPU limits.

Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer
running), the modified kernel was very slightly favored in that while
it killed the process in 19.997 seconds of CPU time (5.002 seconds of
wall time), only .003 seconds of that was system time, the rest was
user time. The unmodified kernel killed the process in 20.001 seconds
of CPU (5.014 seconds of wall time) of which .016 seconds was system
time. Really, though, the results were too close to call. The results
were essentially the same with no itimer running.

Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds
(where the hard limit would never be reached) and an itimer running,
the modified kernel exhibited worse tick accuracy than the unmodified
kernel: .050 seconds/tick versus .028 seconds/tick. Otherwise,
performance was almost indistinguishable. With no itimer running this
test exhibited virtually identical behavior and times in both cases.

In times past I did some limited performance testing. those results are below.

On a four-cpu Opteron system without this fix, a sixteen-thread test executed
in 3569.991 seconds, of which user was 3568.435s and system was 1.556s. On
the same system with the fix, user and elapsed time were about the same, but
system time dropped to 0.007 seconds. Performance with eight, four and one
thread were comparable. Interestingly, the timer ticks with the fix seemed
more accurate: The sixteen-thread test with the fix received 149543 ticks
for 0.024 seconds per tick, while the same test without the fix received 58720
for 0.061 seconds per tick. Both cases were configured for an interval of
0.01 seconds. Again, the other tests were comparable. Each thread in this
test computed the primes up to 25,000,000.

I also did a test with a large number of threads, 100,000 threads, which is
impossible without the fix. In this case each thread computed the primes only
up to 10,000 (to make the runtime manageable). System time dominated, at
1546.968 seconds out of a total 2176.906 seconds (giving a user time of
629.938s). It received 147651 ticks for 0.015 seconds per tick, still quite
accurate. There is obviously no comparable test without the fix.
Signed-off-by: NFrank Mayhar <fmayhar@google.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f06febc9

11 9月, 2008 1 次提交

Update selinux info in MAINTAINERS and Kconfig help text · f058925b

由 Stephen Smalley 提交于 9月 11, 2008

Update the SELinux entry in MAINTAINERS and drop the obsolete information
from the selinux Kconfig help text.
Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NJames Morris <jmorris@namei.org>

f058925b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功