提交 · 5b35e1e6e9ca651e6b291c96d1106043c9af314a · openanolis / cloud-kernel

08 1月, 2012 1 次提交
- D
  net: Default UDP and UNIX diag to 'n'. · 6d62a66e
  由 David S. Miller 提交于 1月 07, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  6d62a66e
04 1月, 2012 1 次提交
- A
  switch ->path_mknod() to umode_t · 04fc66e7
  由 Al Viro 提交于 11月 21, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  04fc66e7
31 12月, 2011 3 次提交

unix_diag: Fixup RQLEN extension report · c9da99e6

由 Pavel Emelyanov 提交于 12月 30, 2011

While it's not too late fix the recently added RQLEN diag extension
to report rqlen and wqlen in the same way as TCP does.

I.e. for listening sockets the ack backlog length (which is the input
queue length for socket) in rqlen and the max ack backlog length in
wqlen, and what the CINQ/OUTQ ioctls do for established.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9da99e6

af_unix: Move CINQ/COUTQ code to helpers · 885ee74d

由 Pavel Emelyanov 提交于 12月 30, 2011

Currently tcp diag reports rqlen and wqlen values similar to how
the CINQ/COUTQ iotcls do. To make unix diag report these values
in the same way move the respective code into helpers.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

885ee74d

unix_diag: Add the MEMINFO extension · 257b5298

由 Pavel Emelyanov 提交于 12月 30, 2011

[ Fix indentation of sock_diag*() calls. -DaveM ]
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

257b5298

27 12月, 2011 2 次提交

unix: If we happen to find peer NULL when diag dumping, write zero. · e09e9d18

由 David S. Miller 提交于 12月 26, 2011

Otherwise we leave uninitialized kernel memory in there.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e09e9d18

unix_diag: Fix incoming connections nla length · 3b0723c1

由 Pavel Emelyanov 提交于 12月 26, 2011

The NLA_PUT macro should accept the actual attribute length, not
the amount of elements in array :(
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b0723c1

21 12月, 2011 1 次提交

net: unix -- Add missing module.h inclusion · 2ea744a5

由 Cyrill Gorcunov 提交于 12月 20, 2011

Otherwise getting

 | net/unix/diag.c:312:16: error: expected declaration specifiers or ‘...’ before string constant
 | net/unix/diag.c:313:1: error: expected declaration specifiers or ‘...’ before string constant
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ea744a5

17 12月, 2011 10 次提交

unix_diag: Write it into kbuild · 5d531aaa

由 Pavel Emelyanov 提交于 12月 15, 2011

Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d531aaa

unix_diag: Receive queue lenght NLA · cbf39195

由 Pavel Emelyanov 提交于 12月 15, 2011

Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbf39195

unix_diag: Pending connections IDs NLA · 2aac7a2c

由 Pavel Emelyanov 提交于 12月 15, 2011

When establishing a unix connection on stream sockets the
server end receives an skb with socket in its receive queue.

Report who is waiting for these ends to be accepted for
listening sockets via NLA.

There's a lokcing issue with this -- the unix sk state lock is
required to access the peer, and it is taken under the listening
sk's queue lock. Strictly speaking the queue lock should be taken
inside the state lock, but since in this case these two sockets
are different it shouldn't lead to deadlock.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2aac7a2c

unix_diag: Unix peer inode NLA · ac02be8d

由 Pavel Emelyanov 提交于 12月 15, 2011

Report the peer socket inode ID as NLA. With this it's finally
possible to find out the other end of an interesting unix connection.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac02be8d

unix_diag: Unix inode info NLA · 5f7b0569

由 Pavel Emelyanov 提交于 12月 15, 2011

Actually, the socket path if it's not anonymous doesn't give
a clue to which file the socket is bound to. Even if the path
is absolute, it can be unlinked and then new socket can be
bound to it.

With this NLA it's possible to check which file a particular
socket is really bound to.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f7b0569

unix_diag: Unix socket name NLA · f5248b48

由 Pavel Emelyanov 提交于 12月 15, 2011

Report the sun_path when requested as NLA. With leading '\0' if
present but without the leading AF_UNIX bits.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5248b48

unix_diag: Dumping exact socket core · 5d3cae8b

由 Pavel Emelyanov 提交于 12月 15, 2011

The socket inode is used as a key for lookup. This is effectively
the only really unique ID of a unix socket, but using this for
search currently has one problem -- it is O(number of sockets) :(

Does it worth fixing this lookup or inventing some other ID for
unix sockets?
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d3cae8b

unix_diag: Dumping all sockets core · 45a96b9b

由 Pavel Emelyanov 提交于 12月 15, 2011

Walk the unix sockets table and fill the core response structure,
which includes type, state and inode.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45a96b9b

unix_diag: Basic module skeleton · 22931d3b

由 Pavel Emelyanov 提交于 12月 15, 2011

Includes basic module_init/_exit functionality, dump/get_exact stubs
and declares the basic API structures for request and response.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22931d3b

af_unix: Export stuff required for diag module · fa7ff56f

由 Pavel Emelyanov 提交于 12月 15, 2011

Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa7ff56f

27 11月, 2011 1 次提交

AF_UNIX: Fix poll blocking problem when reading from a stream socket · 0884d7aa

由 Alexey Moiseytsev 提交于 11月 21, 2011

poll() call may be blocked by concurrent reading from the same stream
socket.
Signed-off-by: NAlexey Moiseytsev <himeraster@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0884d7aa

29 9月, 2011 1 次提交

af_unix: dont send SCM_CREDENTIALS by default · 16e57262

由 Eric Dumazet 提交于 9月 19, 2011

Since commit 7361c36c (af_unix: Allow credentials to work across
user and pid namespaces) af_unix performance dropped a lot.

This is because we now take a reference on pid and cred in each write(),
and release them in read(), usually done from another process,
eventually from another cpu. This triggers false sharing.

# Events: 154K cycles
#
# Overhead  Command       Shared Object        Symbol
# ........  .......  ..................  .........................
#
    10.40%  hackbench  [kernel.kallsyms]   [k] put_pid
     8.60%  hackbench  [kernel.kallsyms]   [k] unix_stream_recvmsg
     7.87%  hackbench  [kernel.kallsyms]   [k] unix_stream_sendmsg
     6.11%  hackbench  [kernel.kallsyms]   [k] do_raw_spin_lock
     4.95%  hackbench  [kernel.kallsyms]   [k] unix_scm_to_skb
     4.87%  hackbench  [kernel.kallsyms]   [k] pid_nr_ns
     4.34%  hackbench  [kernel.kallsyms]   [k] cred_to_ucred
     2.39%  hackbench  [kernel.kallsyms]   [k] unix_destruct_scm
     2.24%  hackbench  [kernel.kallsyms]   [k] sub_preempt_count
     1.75%  hackbench  [kernel.kallsyms]   [k] fget_light
     1.51%  hackbench  [kernel.kallsyms]   [k]
__mutex_lock_interruptible_slowpath
     1.42%  hackbench  [kernel.kallsyms]   [k] sock_alloc_send_pskb

This patch includes SCM_CREDENTIALS information in a af_unix message/skb
only if requested by the sender, [man 7 unix for details how to include
ancillary data using sendmsg() system call]

Note: This might break buggy applications that expected SCM_CREDENTIAL
from an unaware write() system call, and receiver not using SO_PASSCRED
socket option.

If SOCK_PASSCRED is set on source or destination socket, we still
include credentials for mere write() syscalls.

Performance boost in hackbench : more than 50% gain on a 16 thread
machine (2 quad-core cpus, 2 threads per core)

hackbench 20 thread 2000

4.228 sec instead of 9.102 sec
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16e57262

17 9月, 2011 1 次提交

Revert "Scm: Remove unnecessary pid & credential references in Unix socket's send and receive path" · f78a5fda

由 David S. Miller 提交于 9月 16, 2011

This reverts commit 0856a304.

As requested by Eric Dumazet, it has various ref-counting
problems and has introduced regressions.  Eric will add
a more suitable version of this performance fix.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f78a5fda

25 8月, 2011 1 次提交

Scm: Remove unnecessary pid & credential references in Unix socket's send and receive path · 0856a304

由 Tim Chen 提交于 8月 22, 2011

Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to
allow credentials to work across pid namespaces for packets sent via
UNIX sockets.  However, the atomic reference counts on pid and
credentials caused plenty of cache bouncing when there are numerous
threads of the same pid sharing a UNIX socket.  This patch mitigates the
problem by eliminating extraneous reference counts on pid and
credentials on both send and receive path of UNIX sockets. I found a 2x
improvement in hackbench's threaded case.

On the receive path in unix_dgram_recvmsg, currently there is an
increment of reference count on pid and credentials in scm_set_cred.
Then there are two decrement of the reference counts.  Once in scm_recv
and once when skb_free_datagram call skb->destructor function
unix_destruct_scm.  One pair of increment and decrement of ref count on
pid and credentials can be eliminated from the receive path.  Until we
destroy the skb, we already set a reference when we created the skb on
the send side.

On the send path, there are two increments of ref count on pid and
credentials, once in scm_send and once in unix_scm_to_skb.  Then there
is a decrement of the reference counts in scm_destroy's call to
scm_destroy_cred at the end of unix_dgram_sendmsg functions.   One pair
of increment and decrement of the reference counts can be removed so we
only need to increment the ref counts once.

By incorporating these changes, for hackbench running on a 4 socket
NHM-EX machine with 40 cores, the execution of hackbench on
50 groups of 20 threads sped up by factor of 2.

Hackbench command used for testing:
./hackbench 50 thread 2000
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0856a304

20 7月, 2011 1 次提交

new helpers: kern_path_create/user_path_create · dae6ad8f

由 Al Viro 提交于 6月 26, 2011

combination of kern_path_parent() and lookup_create().  Does *not*
expose struct nameidata to caller.  Syscalls converted to that...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dae6ad8f

24 5月, 2011 1 次提交

net: convert %p usage to %pK · 71338aa7

由 Dan Rosenberg 提交于 5月 23, 2011

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces.  Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers.  The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
 If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges.  Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".

The supporting code for kptr_restrict and %pK are currently in the -mm
tree.  This patch converts users of %p in net/ to %pK.  Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71338aa7

02 5月, 2011 1 次提交

af_unix: Only allow recv on connected seqpacket sockets. · a05d2ad1

由 Eric W. Biederman 提交于 4月 24, 2011

This fixes the following oops discovered by Dan Aloni:
> Anyway, the following is the output of the Oops that I got on the
> Ubuntu kernel on which I first detected the problem
> (2.6.37-12-generic). The Oops that followed will be more useful, I
> guess.

>[ 5594.669852] BUG: unable to handle kernel NULL pointer dereference
> at           (null)
> [ 5594.681606] IP: [<ffffffff81550b7b>] unix_dgram_recvmsg+0x1fb/0x420
> [ 5594.687576] PGD 2a05d067 PUD 2b951067 PMD 0
> [ 5594.693720] Oops: 0002 [#1] SMP
> [ 5594.699888] last sysfs file:

The bug was that unix domain sockets use a pseduo packet for
connecting and accept uses that psudo packet to get the socket.
In the buggy seqpacket case we were allowing unconnected
sockets to call recvmsg and try to receive the pseudo packet.

That is always wrong and as of commit 7361c36c the pseudo
packet had become enough different from a normal packet
that the kernel started oopsing.

Do for seqpacket_recv what was done for seqpacket_send in 2.5
and only allow it on connected seqpacket sockets.

Cc: stable@kernel.org
Tested-by: NDan Aloni <dan@aloni.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a05d2ad1

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

15 3月, 2011 2 次提交

Allow passing O_PATH descriptors via SCM_RIGHTS datagrams · 326be7b4

由 Al Viro 提交于 3月 13, 2011

Just need to make sure that AF_UNIX garbage collector won't
confuse O_PATHed socket on filesystem for real AF_UNIX opened
socket.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

326be7b4

af_unix: update locking comment · e5537bfc

由 Daniel Baluta 提交于 3月 14, 2011

We latch our state using a spinlock not a r/w kind of lock.
Signed-off-by: NDaniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5537bfc

14 3月, 2011 1 次提交

kill path_lookup() · c9c6cac0

由 Al Viro 提交于 2月 16, 2011

all remaining callers pass LOOKUP_PARENT to it, so
flags argument can die; renamed to kern_path_parent()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c9c6cac0

08 3月, 2011 2 次提交

af_unix: remove unused struct sockaddr_un cruft · 6118e35a

由 Hagen Paul Pfeifer 提交于 3月 04, 2011

Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6118e35a

net: fix multithreaded signal handling in unix recv routines · b3ca9b02

由 Rainer Weikusat 提交于 2月 28, 2011

The unix_dgram_recvmsg and unix_stream_recvmsg routines in
net/af_unix.c utilize mutex_lock(&u->readlock) calls in order to
serialize read operations of multiple threads on a single socket. This
implies that, if all n threads of a process block in an AF_UNIX recv
call trying to read data from the same socket, one of these threads
will be sleeping in state TASK_INTERRUPTIBLE and all others in state
TASK_UNINTERRUPTIBLE. Provided that a particular signal is supposed to
be handled by a signal handler defined by the process and that none of
this threads is blocking the signal, the complete_signal routine in
kernel/signal.c will select the 'first' such thread it happens to
encounter when deciding which thread to notify that a signal is
supposed to be handled and if this is one of the TASK_UNINTERRUPTIBLE
threads, the signal won't be handled until the one thread not blocking
on the u->readlock mutex is woken up because some data to process has
arrived (if this ever happens). The included patch fixes this by
changing mutex_lock to mutex_lock_interruptible and handling possible
error returns in the same way interruptions are handled by the actual
receive-code.
Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3ca9b02

23 2月, 2011 1 次提交

net: add __rcu annotations to sk_wq and wq · eaefd110

由 Eric Dumazet 提交于 2月 18, 2011

Add proper RCU annotations/verbs to sk_wq and wq members

Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)

Fix sunrpc sk_sleep() abuse too
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eaefd110

20 1月, 2011 1 次提交

af_unix: coding style: remove one level of indentation in unix_shutdown() · 7180a031

由 Alban Crequy 提交于 1月 19, 2011

Signed-off-by: NAlban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: NIan Molton <ian.molton@collabora.co.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7180a031

19 1月, 2011 1 次提交

af_unix: implement socket filter · d6ae3bae

由 Alban Crequy 提交于 1月 18, 2011

Linux Socket Filters can already be successfully attached and detached on unix
sockets with setsockopt(sockfd, SOL_SOCKET, SO_{ATTACH,DETACH}_FILTER, ...).
See: Documentation/networking/filter.txt

But the filter was never used in the unix socket code so it did not work. This
patch uses sk_filter() to filter buffers before delivery.

This short program demonstrates the problem on SOCK_DGRAM.

int main(void) {
  int i, j, ret;
  int sv[2];
  struct pollfd fds[2];
  char *message = "Hello world!";
  char buffer[64];
  struct sock_filter ins[32] = {{0,},};
  struct sock_fprog filter;

  socketpair(AF_UNIX, SOCK_DGRAM, 0, sv);

  for (i = 0 ; i < 2 ; i++) {
    fds[i].fd = sv[i];
    fds[i].events = POLLIN;
    fds[i].revents = 0;
  }

  for(j = 1 ; j < 13 ; j++) {

    /* Set a socket filter to truncate the message */
    memset(ins, 0, sizeof(ins));
    ins[0].code = BPF_RET|BPF_K;
    ins[0].k = j;
    filter.len = 1;
    filter.filter = ins;
    setsockopt(sv[1], SOL_SOCKET, SO_ATTACH_FILTER, &filter, sizeof(filter));

    /* send a message */
    send(sv[0], message, strlen(message) + 1, 0);

    /* The filter should let the message pass but truncated. */
    poll(fds, 2, 0);

    /* Receive the truncated message*/
    ret = recv(sv[1], buffer, 64, 0);
    printf("received %d bytes, expected %d\n", ret, j);
  }

    for (i = 0 ; i < 2 ; i++)
      close(sv[i]);

  return 0;
}
Signed-off-by: NAlban Crequy <alban.crequy@collabora.co.uk>
Reviewed-by: NIan Molton <ian.molton@collabora.co.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6ae3bae

06 1月, 2011 1 次提交

af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks. · 3610cda5

由 David S. Miller 提交于 1月 05, 2011

unix_release() can asynchornously set socket->sk to NULL, and
it does so without holding the unix_state_lock() on "other"
during stream connects.

However, the reverse mapping, sk->sk_socket, is only transitioned
to NULL under the unix_state_lock().

Therefore make the security hooks follow the reverse mapping instead
of the forward mapping.
Reported-by: NJeremy Fitzhardinge <jeremy@goop.org>
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3610cda5

30 11月, 2010 1 次提交

af_unix: limit recursion level · 25888e30

由 Eric Dumazet 提交于 11月 25, 2010

Its easy to eat all kernel memory and trigger NMI watchdog, using an
exploit program that queues unix sockets on top of others.

lkml ref : http://lkml.org/lkml/2010/11/25/8

This mechanism is used in applications, one choice we have is to have a
recursion limit.

Other limits might be needed as well (if we queue other types of files),
since the passfd mechanism is currently limited by socket receive queue
sizes only.

Add a recursion_level to unix socket, allowing up to 4 levels.

Each time we send an unix socket through sendfd mechanism, we copy its
recursion level (plus one) to receiver. This recursion level is cleared
when socket receive queue is emptied.
Reported-by: NМарк Коренберг <socketpair@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25888e30

25 11月, 2010 1 次提交

af_unix: limit unix_tot_inflight · 9915672d

由 Eric Dumazet 提交于 11月 24, 2010

Vegard Nossum found a unix socket OOM was possible, posting an exploit
program.

My analysis is we can eat all LOWMEM memory before unix_gc() being
called from unix_release_sock(). Moreover, the thread blocked in
unix_gc() can consume huge amount of time to perform cleanup because of
huge working set.

One way to handle this is to have a sensible limit on unix_tot_inflight,
tested from wait_for_unix_gc() and to force a call to unix_gc() if this
limit is hit.

This solves the OOM and also reduce overall latencies, and should not
slowdown normal workloads.
Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9915672d

09 11月, 2010 3 次提交

af_unix: optimize unix_dgram_poll() · 973a34aa

由 Eric Dumazet 提交于 10月 31, 2010

unix_dgram_poll() is pretty expensive to check POLLOUT status, because
it has to lock the socket to get its peer, take a reference on the peer
to check its receive queue status, and queue another poll_wait on
peer_wait. This all can be avoided if the process calling
unix_dgram_poll() is not interested in POLLOUT status. It makes
unix_dgram_recvmsg() faster by not queueing irrelevant pollers in
peer_wait.

On a test program provided by Alan Crequy :

Before:

real    0m0.211s
user    0m0.000s
sys     0m0.208s

After:

real    0m0.044s
user    0m0.000s
sys     0m0.040s
Suggested-by: NDavide Libenzi <davidel@xmailserver.org>
Reported-by: NAlban Crequy <alban.crequy@collabora.co.uk>
Acked-by: NDavide Libenzi <davidel@xmailserver.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

973a34aa

af_unix: fix unix_dgram_poll() behavior for EPOLLOUT event · 5456f09a

由 Eric Dumazet 提交于 10月 31, 2010

Alban Crequy reported a problem with connected dgram af_unix sockets and
provided a test program. epoll() would miss to send an EPOLLOUT event
when a thread unqueues a packet from the other peer, making its receive
queue not full.

This is because unix_dgram_poll() fails to call sock_poll_wait(file,
&unix_sk(other)->peer_wait, wait);
if the socket is not writeable at the time epoll_ctl(ADD) is called.

We must call sock_poll_wait(), regardless of 'writable' status, so that
epoll can be notified later of states changes.

Misc: avoids testing twice (sk->sk_shutdown & RCV_SHUTDOWN)
Reported-by: NAlban Crequy <alban.crequy@collabora.co.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NDavide Libenzi <davidel@xmailserver.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5456f09a

af_unix: use keyed wakeups · 67426b75

由 Eric Dumazet 提交于 10月 29, 2010

Instead of wakeup all sleepers, use wake_up_interruptible_sync_poll() to
wakeup only ones interested into writing the socket.

This patch is a specialization of commit 37e5540b (epoll keyed
wakeups: make sockets use keyed wakeups).

On a test program provided by Alan Crequy :

Before:
real    0m3.101s
user    0m0.000s
sys     0m6.104s

After:

real	0m0.211s
user	0m0.000s
sys	0m0.208s
Reported-by: NAlban Crequy <alban.crequy@collabora.co.uk>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67426b75

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功