提交 · a01af8e4a4ee1135598f157051959982418c38f8 · Linux-御风守护者 / linux

30 11月, 2010 1 次提交

af_unix: limit recursion level · 25888e30

由 Eric Dumazet 提交于 11月 25, 2010

Its easy to eat all kernel memory and trigger NMI watchdog, using an
exploit program that queues unix sockets on top of others.

lkml ref : http://lkml.org/lkml/2010/11/25/8

This mechanism is used in applications, one choice we have is to have a
recursion limit.

Other limits might be needed as well (if we queue other types of files),
since the passfd mechanism is currently limited by socket receive queue
sizes only.

Add a recursion_level to unix socket, allowing up to 4 levels.

Each time we send an unix socket through sendfd mechanism, we copy its
recursion level (plus one) to receiver. This recursion level is cleared
when socket receive queue is emptied.
Reported-by: NМарк Коренберг <socketpair@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25888e30

17 6月, 2010 1 次提交

af_unix: Allow credentials to work across user and pid namespaces. · 7361c36c

由 Eric W. Biederman 提交于 6月 13, 2010

In unix_skb_parms store pointers to struct pid and struct cred instead
of raw uid, gid, and pid values, then translate the credentials on
reception into values that are meaningful in the receiving processes
namespaces.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7361c36c

02 5月, 2010 1 次提交

net: sock_def_readable() and friends RCU conversion · 43815482

由 Eric Dumazet 提交于 4月 29, 2010

sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.

RCU conversion is pretty much needed :

1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).

[Future patch will add a list anchor for wakeup coalescing]

2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().

3) Respect RCU grace period when freeing a "struct socket_wq"

4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"

5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep

6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.

7) Change all sk_has_sleeper() callers to :
  - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
  - Use wq_has_sleeper() to eventually wakeup tasks.
  - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

8) sock_wake_async() is modified to use rcu protection as well.

9) Exceptions :
  macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.

Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43815482

27 11月, 2008 1 次提交

net: Fix soft lockups/OOM issues w/ unix garbage collector · 5f23b734

由 dann frazier 提交于 11月 26, 2008

This is an implementation of David Miller's suggested fix in:
  https://bugzilla.redhat.com/show_bug.cgi?id=470201

It has been updated to use wait_event() instead of
wait_event_interruptible().

Paraphrasing the description from the above report, it makes sendmsg()
block while UNIX garbage collection is in progress. This avoids a
situation where child processes continue to queue new FDs over a
AF_UNIX socket to a parent which is in the exit path and running
garbage collection on these FDs. This contention can result in soft
lockups and oom-killing of unrelated processes.
Signed-off-by: Ndann frazier <dannf@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f23b734

10 11月, 2008 1 次提交

net: unix: fix inflight counting bug in garbage collector · 6209344f

由 Miklos Szeredi 提交于 11月 09, 2008

Previously I assumed that the receive queues of candidates don't
change during the GC.  This is only half true, nothing can be received
from the queues (see comment in unix_gc()), but buffers could be added
through the other half of the socket pair, which may still have file
descriptors referring to it.

This can result in inc_inflight_move_tail() erronously increasing the
"inflight" counter for a unix socket for which dec_inflight() wasn't
previously called.  This in turn can trigger the "BUG_ON(total_refs <
inflight_refs)" in a later garbage collection run.

Fix this by only manipulating the "inflight" counter for sockets which
are candidates themselves.  Duplicating the file references in
unix_attach_fds() is also needed to prevent a socket becoming a
candidate for GC while the skb that contains it is not yet queued.
Reported-by: NAndrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6209344f

27 7月, 2008 1 次提交

[PATCH] f_count may wrap around · 516e0cc5

由 Al Viro 提交于 7月 26, 2008

make it atomic_long_t; while we are at it, get rid of useless checks in affs,
hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

516e0cc5

29 1月, 2008 2 次提交

D
[AF_UNIX]: Remove unused declaration of sysctl_unix_max_dgram_qlen. · 27147c9e
由 Denis V. Lunev 提交于 12月 11, 2007
```
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
27147c9e

[UNIX]: Extend unix_sysctl_(un)register prototypes · 97577e38

由 Pavel Emelyanov 提交于 12月 01, 2007

Add the struct net * argument to both of them to use in
the future. Also make the register one return an error code.

It is useless right now, but will make the future patches
much simpler.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97577e38

11 11月, 2007 1 次提交

[AF_UNIX]: Make unix_tot_inflight counter non-atomic · 9305cfa4

由 Pavel Emelyanov 提交于 11月 10, 2007

This counter is _always_ modified under the unix_gc_lock spinlock, 
so its atomicity can be provided w/o additional efforts.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9305cfa4

31 7月, 2007 1 次提交

[AF_UNIX]: Make code static. · 13111698

由 Adrian Bunk 提交于 7月 30, 2007

The following code can now become static:
- struct unix_socket_table
- unix_table_lock
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13111698

12 7月, 2007 1 次提交

[AF_UNIX]: Rewrite garbage collector, fixes race. · 1fd05ba5

由 Miklos Szeredi 提交于 7月 11, 2007

Throw out the old mark & sweep garbage collector and put in a
refcounting cycle detecting one.

The old one had a race with recvmsg, that resulted in false positives
and hence data loss.  The old algorithm operated on all unix sockets
in the system, so any additional locking would have meant performance
problems for all users of these.

The new algorithm instead only operates on "in flight" sockets, which
are very rare, and the additional locking for these doesn't negatively
impact the vast majority of users.

In fact it's probable, that there weren't *any* heavy senders of
sockets over sockets, otherwise the above race would have been
discovered long ago.

The patch works OK with the app that exposed the race with the old
code.  The garbage collection has also been verified to work in a few
simple cases.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fd05ba5

04 6月, 2007 1 次提交

[AF_UNIX]: Make socket locking much less confusing. · 1c92b4e5

由 David S. Miller 提交于 5月 31, 2007

The unix_state_*() locking macros imply that there is some
rwlock kind of thing going on, but the implementation is
actually a spinlock which makes the code more confusing than
it needs to be.

So use plain unix_state_lock and unix_state_unlock.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c92b4e5

03 8月, 2006 1 次提交

[AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patch · dc49c1f9

由 Catherine Zhang 提交于 8月 02, 2006

From: Catherine Zhang <cxzhang@watson.ibm.com>

This patch implements a cleaner fix for the memory leak problem of the
original unix datagram getpeersec patch.  Instead of creating a
security context each time a unix datagram is sent, we only create the
security context when the receiver requests it.

This new design requires modification of the current
unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
secid_to_secctx and release_secctx.  The former retrieves the security
context and the latter releases it.  A hook is required for releasing
the security context because it is up to the security module to decide
how that's done.  In the case of Selinux, it's a simple kfree
operation.
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc49c1f9

04 7月, 2006 1 次提交

[PATCH] lockdep: annotate af_unix locking · a09785a2

由 Ingo Molnar 提交于 7月 03, 2006

Teach special (recursive) locking code to the lock validator.  Also splits
af_unix's sk_receive_queue.lock class from the other networking skb-queue
locks.  Has no effect on non-lockdep kernels.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a09785a2

30 6月, 2006 1 次提交

[AF_UNIX]: Datagram getpeersec · 877ce7c1

由 Catherine Zhang 提交于 6月 29, 2006

This patch implements an API whereby an application can determine the
label of its peer's Unix datagram sockets via the auxiliary data mechanism of
recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of the peer of a Unix datagram socket.  The application
can then use this security context to determine the security context for
processing on behalf of the peer who sent the packet.

Patch design and implementation:

The design and implementation is very similar to the UDP case for INET
sockets.  Basically we build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).  To retrieve the security
context, the application first indicates to the kernel such desire by
setting the SO_PASSSEC option via getsockopt.  Then the application
retrieves the security context using the auxiliary data mechanism.

An example server application for Unix datagram socket should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
        cmsg_hdr->cmsg_level == SOL_SOCKET &&
        cmsg_hdr->cmsg_type == SCM_SECURITY) {
        memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
}

sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
a server socket to receive security context of the peer.

Testing:

We have tested the patch by setting up Unix datagram client and server
applications.  We verified that the server can retrieve the security context
using the auxiliary data mechanism of recvmsg.
Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com>
Acked-by: NAcked-by: James Morris <jmorris@namei.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

877ce7c1

26 4月, 2006 1 次提交
- D
  Don't include linux/config.h from anywhere else in include/ · 62c4f0a2
  由 David Woodhouse 提交于 4月 26, 2006
```
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
```
  62c4f0a2
21 3月, 2006 1 次提交

[NET]: sem2mutex part 2 · 57b47a53

由 Ingo Molnar 提交于 3月 20, 2006

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57b47a53

04 1月, 2006 2 次提交

[AF_UNIX]: Convert to use a spinlock instead of rwlock · fd19f329

由 Benjamin LaHaise 提交于 1月 03, 2006

From: Benjamin LaHaise <bcrl@kvack.org>

In af_unix, a rwlock is used to protect internal state.  At least on my 
P4 with HT it is faster to use a spinlock due to the simpler memory 
barrier used to unlock.  This patch raises bw_unix to ~690K/s.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd19f329

[AF_UNIX]: Use spinlock for unix_table_lock · fbe9cc4a

由 David S. Miller 提交于 12月 13, 2005

This lock is actually taken mostly as a writer,
so using a rwlock actually just makes performance
worse especially on chips like the Intel P4.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbe9cc4a

30 8月, 2005 1 次提交

[NET]: Fix sparse warnings · 20380731

由 Arnaldo Carvalho de Melo 提交于 8月 16, 2005

Of this type, mostly:

CHECK net/ipv6/netfilter.c
net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20380731

17 4月, 2005 1 次提交

Linux-2.6.12-rc2 · 1da177e4

由 Linus Torvalds 提交于 4月 16, 2005

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

1da177e4

Linux-御风守护者 / linux 与 Fork 源项目一致

Linux-御风守护者 / linux
与 Fork 源项目一致