- 22 2月, 2012 2 次提交
-
-
由 Pavel Emelyanov 提交于
The same here -- we can protect the sk_peek_off manipulations with the unix_sk->readlock mutex. The peeking of data from a stream socket is done in the datagram style, i.e. even if there's enough room for more data in the user buffer, only the head skb's data is copied in there. This feature is preserved when peeking data from a given offset -- the data is read till the nearest skb's boundary. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
The sk_peek_off manipulations are protected with the unix_sk->readlock mutex. This mutex is enough since all we need is to syncronize setting the offset vs reading the queue head. The latter is fully covered with the mentioned lock. The recently added __skb_recv_datagram's offset is used to pick the skb to read the data from. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 1月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
Commit 0884d7aa (AF_UNIX: Fix poll blocking problem when reading from a stream socket) added a regression for epoll() in Edge Triggered mode (EPOLLET) Appropriate fix is to use skb_peek()/skb_unlink() instead of skb_dequeue(), and only call skb_unlink() when skb is fully consumed. This remove the need to requeue a partial skb into sk_receive_queue head and the extra sk->sk_data_ready() calls that added the regression. This is safe because once skb is given to sk_receive_queue, it is not modified by a writer, and readers are serialized by u->readlock mutex. This also reduce number of spinlock acquisition for small reads or MSG_PEEK users so should improve overall performance. Reported-by: NNick Mathewson <nickm@freehaven.net> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Alexey Moiseytsev <himeraster@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 1月, 2012 1 次提交
-
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 1月, 2012 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 31 12月, 2011 3 次提交
-
-
由 Pavel Emelyanov 提交于
While it's not too late fix the recently added RQLEN diag extension to report rqlen and wqlen in the same way as TCP does. I.e. for listening sockets the ack backlog length (which is the input queue length for socket) in rqlen and the max ack backlog length in wqlen, and what the CINQ/OUTQ ioctls do for established. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Currently tcp diag reports rqlen and wqlen values similar to how the CINQ/COUTQ iotcls do. To make unix diag report these values in the same way move the respective code into helpers. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
[ Fix indentation of sock_diag*() calls. -DaveM ] Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 12月, 2011 2 次提交
-
-
由 David S. Miller 提交于
Otherwise we leave uninitialized kernel memory in there. Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
The NLA_PUT macro should accept the actual attribute length, not the amount of elements in array :( Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 12月, 2011 1 次提交
-
-
由 Cyrill Gorcunov 提交于
Otherwise getting | net/unix/diag.c:312:16: error: expected declaration specifiers or ‘...’ before string constant | net/unix/diag.c:313:1: error: expected declaration specifiers or ‘...’ before string constant Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 12月, 2011 10 次提交
-
-
由 Pavel Emelyanov 提交于
Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
When establishing a unix connection on stream sockets the server end receives an skb with socket in its receive queue. Report who is waiting for these ends to be accepted for listening sockets via NLA. There's a lokcing issue with this -- the unix sk state lock is required to access the peer, and it is taken under the listening sk's queue lock. Strictly speaking the queue lock should be taken inside the state lock, but since in this case these two sockets are different it shouldn't lead to deadlock. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Report the peer socket inode ID as NLA. With this it's finally possible to find out the other end of an interesting unix connection. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Actually, the socket path if it's not anonymous doesn't give a clue to which file the socket is bound to. Even if the path is absolute, it can be unlinked and then new socket can be bound to it. With this NLA it's possible to check which file a particular socket is really bound to. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Report the sun_path when requested as NLA. With leading '\0' if present but without the leading AF_UNIX bits. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
The socket inode is used as a key for lookup. This is effectively the only really unique ID of a unix socket, but using this for search currently has one problem -- it is O(number of sockets) :( Does it worth fixing this lookup or inventing some other ID for unix sockets? Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Walk the unix sockets table and fill the core response structure, which includes type, state and inode. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Includes basic module_init/_exit functionality, dump/get_exact stubs and declares the basic API structures for request and response. Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pavel Emelyanov 提交于
Signed-off-by: NPavel Emelyanov <xemul@parallels.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 11月, 2011 1 次提交
-
-
由 Alexey Moiseytsev 提交于
poll() call may be blocked by concurrent reading from the same stream socket. Signed-off-by: NAlexey Moiseytsev <himeraster@gmail.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 9月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Since commit 7361c36c (af_unix: Allow credentials to work across user and pid namespaces) af_unix performance dropped a lot. This is because we now take a reference on pid and cred in each write(), and release them in read(), usually done from another process, eventually from another cpu. This triggers false sharing. # Events: 154K cycles # # Overhead Command Shared Object Symbol # ........ ....... .................. ......................... # 10.40% hackbench [kernel.kallsyms] [k] put_pid 8.60% hackbench [kernel.kallsyms] [k] unix_stream_recvmsg 7.87% hackbench [kernel.kallsyms] [k] unix_stream_sendmsg 6.11% hackbench [kernel.kallsyms] [k] do_raw_spin_lock 4.95% hackbench [kernel.kallsyms] [k] unix_scm_to_skb 4.87% hackbench [kernel.kallsyms] [k] pid_nr_ns 4.34% hackbench [kernel.kallsyms] [k] cred_to_ucred 2.39% hackbench [kernel.kallsyms] [k] unix_destruct_scm 2.24% hackbench [kernel.kallsyms] [k] sub_preempt_count 1.75% hackbench [kernel.kallsyms] [k] fget_light 1.51% hackbench [kernel.kallsyms] [k] __mutex_lock_interruptible_slowpath 1.42% hackbench [kernel.kallsyms] [k] sock_alloc_send_pskb This patch includes SCM_CREDENTIALS information in a af_unix message/skb only if requested by the sender, [man 7 unix for details how to include ancillary data using sendmsg() system call] Note: This might break buggy applications that expected SCM_CREDENTIAL from an unaware write() system call, and receiver not using SO_PASSCRED socket option. If SOCK_PASSCRED is set on source or destination socket, we still include credentials for mere write() syscalls. Performance boost in hackbench : more than 50% gain on a 16 thread machine (2 quad-core cpus, 2 threads per core) hackbench 20 thread 2000 4.228 sec instead of 9.102 sec Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 9月, 2011 1 次提交
-
-
由 David S. Miller 提交于
This reverts commit 0856a304. As requested by Eric Dumazet, it has various ref-counting problems and has introduced regressions. Eric will add a more suitable version of this performance fix. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 8月, 2011 1 次提交
-
-
由 Tim Chen 提交于
Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to allow credentials to work across pid namespaces for packets sent via UNIX sockets. However, the atomic reference counts on pid and credentials caused plenty of cache bouncing when there are numerous threads of the same pid sharing a UNIX socket. This patch mitigates the problem by eliminating extraneous reference counts on pid and credentials on both send and receive path of UNIX sockets. I found a 2x improvement in hackbench's threaded case. On the receive path in unix_dgram_recvmsg, currently there is an increment of reference count on pid and credentials in scm_set_cred. Then there are two decrement of the reference counts. Once in scm_recv and once when skb_free_datagram call skb->destructor function unix_destruct_scm. One pair of increment and decrement of ref count on pid and credentials can be eliminated from the receive path. Until we destroy the skb, we already set a reference when we created the skb on the send side. On the send path, there are two increments of ref count on pid and credentials, once in scm_send and once in unix_scm_to_skb. Then there is a decrement of the reference counts in scm_destroy's call to scm_destroy_cred at the end of unix_dgram_sendmsg functions. One pair of increment and decrement of the reference counts can be removed so we only need to increment the ref counts once. By incorporating these changes, for hackbench running on a 4 socket NHM-EX machine with 40 cores, the execution of hackbench on 50 groups of 20 threads sped up by factor of 2. Hackbench command used for testing: ./hackbench 50 thread 2000 Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2011 1 次提交
-
-
由 Al Viro 提交于
combination of kern_path_parent() and lookup_create(). Does *not* expose struct nameidata to caller. Syscalls converted to that... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 5月, 2011 1 次提交
-
-
由 Dan Rosenberg 提交于
The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 5月, 2011 1 次提交
-
-
由 Eric W. Biederman 提交于
This fixes the following oops discovered by Dan Aloni: > Anyway, the following is the output of the Oops that I got on the > Ubuntu kernel on which I first detected the problem > (2.6.37-12-generic). The Oops that followed will be more useful, I > guess. >[ 5594.669852] BUG: unable to handle kernel NULL pointer dereference > at (null) > [ 5594.681606] IP: [<ffffffff81550b7b>] unix_dgram_recvmsg+0x1fb/0x420 > [ 5594.687576] PGD 2a05d067 PUD 2b951067 PMD 0 > [ 5594.693720] Oops: 0002 [#1] SMP > [ 5594.699888] last sysfs file: The bug was that unix domain sockets use a pseduo packet for connecting and accept uses that psudo packet to get the socket. In the buggy seqpacket case we were allowing unconnected sockets to call recvmsg and try to receive the pseudo packet. That is always wrong and as of commit 7361c36c the pseudo packet had become enough different from a normal packet that the kernel started oopsing. Do for seqpacket_recv what was done for seqpacket_send in 2.5 and only allow it on connected seqpacket sockets. Cc: stable@kernel.org Tested-by: NDan Aloni <dan@aloni.org> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 15 3月, 2011 2 次提交
-
-
由 Al Viro 提交于
Just need to make sure that AF_UNIX garbage collector won't confuse O_PATHed socket on filesystem for real AF_UNIX opened socket. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Daniel Baluta 提交于
We latch our state using a spinlock not a r/w kind of lock. Signed-off-by: NDaniel Baluta <dbaluta@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 3月, 2011 1 次提交
-
-
由 Al Viro 提交于
all remaining callers pass LOOKUP_PARENT to it, so flags argument can die; renamed to kern_path_parent() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 08 3月, 2011 2 次提交
-
-
由 Hagen Paul Pfeifer 提交于
Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rainer Weikusat 提交于
The unix_dgram_recvmsg and unix_stream_recvmsg routines in net/af_unix.c utilize mutex_lock(&u->readlock) calls in order to serialize read operations of multiple threads on a single socket. This implies that, if all n threads of a process block in an AF_UNIX recv call trying to read data from the same socket, one of these threads will be sleeping in state TASK_INTERRUPTIBLE and all others in state TASK_UNINTERRUPTIBLE. Provided that a particular signal is supposed to be handled by a signal handler defined by the process and that none of this threads is blocking the signal, the complete_signal routine in kernel/signal.c will select the 'first' such thread it happens to encounter when deciding which thread to notify that a signal is supposed to be handled and if this is one of the TASK_UNINTERRUPTIBLE threads, the signal won't be handled until the one thread not blocking on the u->readlock mutex is woken up because some data to process has arrived (if this ever happens). The included patch fixes this by changing mutex_lock to mutex_lock_interruptible and handling possible error returns in the same way interruptions are handled by the actual receive-code. Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 2月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
Add proper RCU annotations/verbs to sk_wq and wq members Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access) Fix sunrpc sk_sleep() abuse too Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 1月, 2011 1 次提交
-
-
由 Alban Crequy 提交于
Signed-off-by: NAlban Crequy <alban.crequy@collabora.co.uk> Reviewed-by: NIan Molton <ian.molton@collabora.co.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 1月, 2011 1 次提交
-
-
由 Alban Crequy 提交于
Linux Socket Filters can already be successfully attached and detached on unix sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...). See: Documentation/networking/filter.txt But the filter was never used in the unix socket code so it did not work. This patch uses sk_filter() to filter buffers before delivery. This short program demonstrates the problem on SOCK_DGRAM. int main(void) { int i, j, ret; int sv[2]; struct pollfd fds[2]; char *message = "Hello world!"; char buffer[64]; struct sock_filter ins[32] = {{0,},}; struct sock_fprog filter; socketpair(AF_UNIX, SOCK_DGRAM, 0, sv); for (i = 0 ; i < 2 ; i++) { fds[i].fd = sv[i]; fds[i].events = POLLIN; fds[i].revents = 0; } for(j = 1 ; j < 13 ; j++) { /* Set a socket filter to truncate the message */ memset(ins, 0, sizeof(ins)); ins[0].code = BPF_RET|BPF_K; ins[0].k = j; filter.len = 1; filter.filter = ins; setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter)); /* send a message */ send(sv[0], message, strlen(message) + 1, 0); /* The filter should let the message pass but truncated. */ poll(fds, 2, 0); /* Receive the truncated message*/ ret = recv(sv[1], buffer, 64, 0); printf("received %d bytes, expected %d\n", ret, j); } for (i = 0 ; i < 2 ; i++) close(sv[i]); return 0; } Signed-off-by: NAlban Crequy <alban.crequy@collabora.co.uk> Reviewed-by: NIan Molton <ian.molton@collabora.co.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 1月, 2011 1 次提交
-
-
由 David S. Miller 提交于
unix_release() can asynchornously set socket->sk to NULL, and it does so without holding the unix_state_lock() on "other" during stream connects. However, the reverse mapping, sk->sk_socket, is only transitioned to NULL under the unix_state_lock(). Therefore make the security hooks follow the reverse mapping instead of the forward mapping. Reported-by: NJeremy Fitzhardinge <jeremy@goop.org> Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 11月, 2010 1 次提交
-
-
由 Eric Dumazet 提交于
Its easy to eat all kernel memory and trigger NMI watchdog, using an exploit program that queues unix sockets on top of others. lkml ref : http://lkml.org/lkml/2010/11/25/8 This mechanism is used in applications, one choice we have is to have a recursion limit. Other limits might be needed as well (if we queue other types of files), since the passfd mechanism is currently limited by socket receive queue sizes only. Add a recursion_level to unix socket, allowing up to 4 levels. Each time we send an unix socket through sendfd mechanism, we copy its recursion level (plus one) to receiver. This recursion level is cleared when socket receive queue is emptied. Reported-by: NМарк Коренберг <socketpair@gmail.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 11月, 2010 1 次提交
-
-
由 Eric Dumazet 提交于
Vegard Nossum found a unix socket OOM was possible, posting an exploit program. My analysis is we can eat all LOWMEM memory before unix_gc() being called from unix_release_sock(). Moreover, the thread blocked in unix_gc() can consume huge amount of time to perform cleanup because of huge working set. One way to handle this is to have a sensible limit on unix_tot_inflight, tested from wait_for_unix_gc() and to force a call to unix_gc() if this limit is hit. This solves the OOM and also reduce overall latencies, and should not slowdown normal workloads. Reported-by: NVegard Nossum <vegard.nossum@gmail.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-