提交 · faf02010290e202e275c1bf94ca9dd808bf85607 · openeuler / raspberrypi-kernel

30 7月, 2012 3 次提交

A
clean unix_bind() up a bit · faf02010
由 Al Viro 提交于 7月 20, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
faf02010

pull mnt_want_write()/mnt_drop_write() into kern_path_create()/done_path_create() resp. · a8104a9f

由 Al Viro 提交于 7月 20, 2012

One side effect - attempt to create a cross-device link on a read-only fs fails
with EROFS instead of EXDEV now. Makes more sense, POSIX allows, etc.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a8104a9f

new helper: done_path_create() · 921a1650

由 Al Viro 提交于 7月 20, 2012

releases what needs to be released after {kern,user}_path_create()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

921a1650

26 4月, 2012 1 次提交

net: sock_diag_handler structs can be const · 8dcf01fc

由 Shan Wei 提交于 4月 24, 2012

read only, so change it to const.
Signed-off-by: NShan Wei <davidshan@tencent.com>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8dcf01fc

21 4月, 2012 2 次提交

net: Convert all sysctl registrations to register_net_sysctl · ec8f23ce

由 Eric W. Biederman 提交于 4月 19, 2012

This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec8f23ce

net: Move all of the network sysctls without a namespace into init_net. · 5dd3df10

由 Eric W. Biederman 提交于 4月 19, 2012

This makes it clearer which sysctls are relative to your current network
namespace.

This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.

This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5dd3df10

16 4月, 2012 1 次提交

net: cleanup unsigned to unsigned int · 95c96174

由 Eric Dumazet 提交于 4月 15, 2012

Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95c96174

04 4月, 2012 1 次提交

af_unix: reduce high order page allocations · eb6a2481

由 Eric Dumazet 提交于 4月 03, 2012

unix_dgram_sendmsg() currently builds linear skbs, and this can stress
page allocator with high order page allocations. When memory gets
fragmented, this can eventually fail.

We can try to use order-2 allocations for skb head (SKB_MAX_ALLOC) plus
up to 16 page fragments to lower pressure on buddy allocator.

This patch has no effect on messages of less than 16064 bytes.
(on 64bit arches with PAGE_SIZE=4096)

For bigger messages (from 16065 to 81600 bytes), this patch brings
reliability at the expense of performance penalty because of extra pages
allocations.

netperf -t DG_STREAM -T 0,2 -- -m 16064 -s 200000
->4086040 Messages / 10s

netperf -t DG_STREAM -T 0,2 -- -m 16068 -s 200000
->3901747 Messages / 10s
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb6a2481

24 3月, 2012 1 次提交

poll: add poll_requested_events() and poll_does_not_wait() functions · 626cf236

由 Hans Verkuil 提交于 3月 23, 2012

In some cases the poll() implementation in a driver has to do different
things depending on the events the caller wants to poll for.  An example
is when a driver needs to start a DMA engine if the caller polls for
POLLIN, but doesn't want to do that if POLLIN is not requested but instead
only POLLOUT or POLLPRI is requested.  This is something that can happen
in the video4linux subsystem among others.

Unfortunately, the current epoll/poll/select implementation doesn't
provide that information reliably.  The poll_table_struct does have it: it
has a key field with the event mask.  But once a poll() call matches one
or more bits of that mask any following poll() calls are passed a NULL
poll_table pointer.

Also, the eventpoll implementation always left the key field at ~0 instead
of using the requested events mask.

This was changed in eventpoll.c so the key field now contains the actual
events that should be polled for as set by the caller.

The solution to the NULL poll_table pointer is to set the qproc field to
NULL in poll_table once poll() matches the events, not the poll_table
pointer itself.  That way drivers can obtain the mask through a new
poll_requested_events inline.

The poll_table_struct can still be NULL since some kernel code calls it
internally (netfs_state_poll() in ./drivers/staging/pohmelfs/netfs.h).  In
that case poll_requested_events() returns ~0 (i.e.  all events).

Very rarely drivers might want to know whether poll_wait will actually
wait.  If another earlier file descriptor in the set already matched the
events the caller wanted to wait for, then the kernel will return from the
select() call without waiting.  This might be useful information in order
to avoid doing expensive work.

A new helper function poll_does_not_wait() is added that drivers can use
to detect this situation.  This is now used in sock_poll_wait() in
include/net/sock.h.  This was the only place in the kernel that needed
this information.

Drivers should no longer access any of the poll_table internals, but use
the poll_requested_events() and poll_does_not_wait() access functions
instead.  In order to enforce that the poll_table fields are now prepended
with an underscore and a comment was added warning against using them
directly.

This required a change in unix_dgram_poll() in unix/af_unix.c which used
the key field to get the requested events.  It's been replaced by a call
to poll_requested_events().

For qproc it was especially important to change its name since the
behavior of that field changes with this patch since this function pointer
can now be NULL when that wasn't possible in the past.

Any driver accessing the qproc or key fields directly will now fail to compile.

Some notes regarding the correctness of this patch: the driver's poll()
function is called with a 'struct poll_table_struct *wait' argument.  This
pointer may or may not be NULL, drivers can never rely on it being one or
the other as that depends on whether or not an earlier file descriptor in
the select()'s fdset matched the requested events.

There are only three things a driver can do with the wait argument:

1) obtain the key field:

	events = wait ? wait->key : ~0;

   This will still work although it should be replaced with the new
   poll_requested_events() function (which does exactly the same).
   This will now even work better, since wait is no longer set to NULL
   unnecessarily.

2) use the qproc callback. This could be deadly since qproc can now be
   NULL. Renaming qproc should prevent this from happening. There are no
   kernel drivers that actually access this callback directly, BTW.

3) test whether wait == NULL to determine whether poll would return without
   waiting. This is no longer sufficient as the correct test is now
   wait == NULL || wait->_qproc == NULL.

   However, the worst that can happen here is a slight performance hit in
   the case where wait != NULL and wait->_qproc == NULL. In that case the
   driver will assume that poll_wait() will actually add the fd to the set
   of waiting file descriptors. Of course, poll_wait() will not do that
   since it tests for wait->_qproc. This will not break anything, though.

   There is only one place in the whole kernel where this happens
   (sock_poll_wait() in include/net/sock.h) and that code will be replaced
   by a call to poll_does_not_wait() in the next patch.

   Note that even if wait->_qproc != NULL drivers cannot rely on poll_wait()
   actually waiting. The next file descriptor from the set might match the
   event mask and thus any possible waits will never happen.
Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: NJonathan Corbet <corbet@lwn.net>
Reviewed-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

626cf236

21 3月, 2012 2 次提交
- A
  switch touch_atime to struct path · 68ac1234
  由 Al Viro 提交于 3月 15, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  68ac1234
- A
  switch unix_sock to struct path · 40ffe67d
  由 Al Viro 提交于 3月 14, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  40ffe67d
27 2月, 2012 1 次提交

netlink: add netlink_dump_control structure for netlink_dump_start() · 80d326fa

由 Pablo Neira Ayuso 提交于 2月 24, 2012

Davem considers that the argument list of this interface is getting
out of control. This patch tries to address this issue following
his proposal:

struct netlink_dump_control c = { .dump = dump, .done = done, ... };

netlink_dump_start(..., &c);

Suggested by David S. Miller.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80d326fa

23 2月, 2012 1 次提交

af_unix: MSG_TRUNC support for dgram sockets · 9f6f9af7

由 Eric Dumazet 提交于 2月 21, 2012

Piergiorgio Beruto expressed the need to fetch size of first datagram in
queue for AF_UNIX sockets and suggested a patch against SIOCINQ ioctl.

I suggested instead to implement MSG_TRUNC support as a recv() input
flag, as already done for RAW, UDP & NETLINK sockets.

len = recv(fd, &byte, 1, MSG_PEEK | MSG_TRUNC);

MSG_TRUNC asks recv() to return the real length of the packet, even when
is was longer than the passed buffer.

There is risk that a userland application used MSG_TRUNC by accident
(since it had no effect on af_unix sockets) and this might break after
this patch.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NPiergiorgio Beruto <piergiorgio.beruto@gmail.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f6f9af7

22 2月, 2012 2 次提交

unix: Support peeking offset for stream sockets · fc0d7536

由 Pavel Emelyanov 提交于 2月 21, 2012

The same here -- we can protect the sk_peek_off manipulations with
the unix_sk->readlock mutex.

The peeking of data from a stream socket is done in the datagram style,
i.e. even if there's enough room for more data in the user buffer, only
the head skb's data is copied in there. This feature is preserved when
peeking data from a given offset -- the data is read till the nearest
skb's boundary.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc0d7536

unix: Support peeking offset for datagram and seqpacket sockets · f55bb7f9

由 Pavel Emelyanov 提交于 2月 21, 2012

The sk_peek_off manipulations are protected with the unix_sk->readlock mutex.
This mutex is enough since all we need is to syncronize setting the offset
vs reading the queue head. The latter is fully covered with the mentioned lock.

The recently added __skb_recv_datagram's offset is used to pick the skb to
read the data from.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f55bb7f9

31 1月, 2012 1 次提交

af_unix: fix EPOLLET regression for stream sockets · 6f01fd6e

由 Eric Dumazet 提交于 1月 28, 2012

Commit 0884d7aa (AF_UNIX: Fix poll blocking problem when reading from
a stream socket) added a regression for epoll() in Edge Triggered mode
(EPOLLET)

Appropriate fix is to use skb_peek()/skb_unlink() instead of
skb_dequeue(), and only call skb_unlink() when skb is fully consumed.

This remove the need to requeue a partial skb into sk_receive_queue head
and the extra sk->sk_data_ready() calls that added the regression.

This is safe because once skb is given to sk_receive_queue, it is not
modified by a writer, and readers are serialized by u->readlock mutex.

This also reduce number of spinlock acquisition for small reads or
MSG_PEEK users so should improve overall performance.
Reported-by: NNick Mathewson <nickm@freehaven.net>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Alexey Moiseytsev <himeraster@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f01fd6e

08 1月, 2012 1 次提交
- D
  net: Default UDP and UNIX diag to 'n'. · 6d62a66e
  由 David S. Miller 提交于 1月 07, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  6d62a66e
04 1月, 2012 1 次提交
- A
  switch ->path_mknod() to umode_t · 04fc66e7
  由 Al Viro 提交于 11月 21, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  04fc66e7
31 12月, 2011 3 次提交

unix_diag: Fixup RQLEN extension report · c9da99e6

由 Pavel Emelyanov 提交于 12月 30, 2011

While it's not too late fix the recently added RQLEN diag extension
to report rqlen and wqlen in the same way as TCP does.

I.e. for listening sockets the ack backlog length (which is the input
queue length for socket) in rqlen and the max ack backlog length in
wqlen, and what the CINQ/OUTQ ioctls do for established.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9da99e6

af_unix: Move CINQ/COUTQ code to helpers · 885ee74d

由 Pavel Emelyanov 提交于 12月 30, 2011

Currently tcp diag reports rqlen and wqlen values similar to how
the CINQ/COUTQ iotcls do. To make unix diag report these values
in the same way move the respective code into helpers.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

885ee74d

unix_diag: Add the MEMINFO extension · 257b5298

由 Pavel Emelyanov 提交于 12月 30, 2011

[ Fix indentation of sock_diag*() calls. -DaveM ]
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

257b5298

27 12月, 2011 2 次提交

unix: If we happen to find peer NULL when diag dumping, write zero. · e09e9d18

由 David S. Miller 提交于 12月 26, 2011

Otherwise we leave uninitialized kernel memory in there.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e09e9d18

unix_diag: Fix incoming connections nla length · 3b0723c1

由 Pavel Emelyanov 提交于 12月 26, 2011

The NLA_PUT macro should accept the actual attribute length, not
the amount of elements in array :(
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b0723c1

21 12月, 2011 1 次提交

net: unix -- Add missing module.h inclusion · 2ea744a5

由 Cyrill Gorcunov 提交于 12月 20, 2011

Otherwise getting

 | net/unix/diag.c:312:16: error: expected declaration specifiers or ‘...’ before string constant
 | net/unix/diag.c:313:1: error: expected declaration specifiers or ‘...’ before string constant
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ea744a5

17 12月, 2011 10 次提交

unix_diag: Write it into kbuild · 5d531aaa

由 Pavel Emelyanov 提交于 12月 15, 2011

Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d531aaa

unix_diag: Receive queue lenght NLA · cbf39195

由 Pavel Emelyanov 提交于 12月 15, 2011

Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbf39195

unix_diag: Pending connections IDs NLA · 2aac7a2c

由 Pavel Emelyanov 提交于 12月 15, 2011

When establishing a unix connection on stream sockets the
server end receives an skb with socket in its receive queue.

Report who is waiting for these ends to be accepted for
listening sockets via NLA.

There's a lokcing issue with this -- the unix sk state lock is
required to access the peer, and it is taken under the listening
sk's queue lock. Strictly speaking the queue lock should be taken
inside the state lock, but since in this case these two sockets
are different it shouldn't lead to deadlock.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2aac7a2c

unix_diag: Unix peer inode NLA · ac02be8d

由 Pavel Emelyanov 提交于 12月 15, 2011

Report the peer socket inode ID as NLA. With this it's finally
possible to find out the other end of an interesting unix connection.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac02be8d

unix_diag: Unix inode info NLA · 5f7b0569

由 Pavel Emelyanov 提交于 12月 15, 2011

Actually, the socket path if it's not anonymous doesn't give
a clue to which file the socket is bound to. Even if the path
is absolute, it can be unlinked and then new socket can be
bound to it.

With this NLA it's possible to check which file a particular
socket is really bound to.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f7b0569

unix_diag: Unix socket name NLA · f5248b48

由 Pavel Emelyanov 提交于 12月 15, 2011

Report the sun_path when requested as NLA. With leading '\0' if
present but without the leading AF_UNIX bits.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5248b48

unix_diag: Dumping exact socket core · 5d3cae8b

由 Pavel Emelyanov 提交于 12月 15, 2011

The socket inode is used as a key for lookup. This is effectively
the only really unique ID of a unix socket, but using this for
search currently has one problem -- it is O(number of sockets) :(

Does it worth fixing this lookup or inventing some other ID for
unix sockets?
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d3cae8b

unix_diag: Dumping all sockets core · 45a96b9b

由 Pavel Emelyanov 提交于 12月 15, 2011

Walk the unix sockets table and fill the core response structure,
which includes type, state and inode.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45a96b9b

unix_diag: Basic module skeleton · 22931d3b

由 Pavel Emelyanov 提交于 12月 15, 2011

Includes basic module_init/_exit functionality, dump/get_exact stubs
and declares the basic API structures for request and response.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22931d3b

af_unix: Export stuff required for diag module · fa7ff56f

由 Pavel Emelyanov 提交于 12月 15, 2011

Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa7ff56f

27 11月, 2011 1 次提交

AF_UNIX: Fix poll blocking problem when reading from a stream socket · 0884d7aa

由 Alexey Moiseytsev 提交于 11月 21, 2011

poll() call may be blocked by concurrent reading from the same stream
socket.
Signed-off-by: NAlexey Moiseytsev <himeraster@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0884d7aa

29 9月, 2011 1 次提交

af_unix: dont send SCM_CREDENTIALS by default · 16e57262

由 Eric Dumazet 提交于 9月 19, 2011

Since commit 7361c36c (af_unix: Allow credentials to work across
user and pid namespaces) af_unix performance dropped a lot.

This is because we now take a reference on pid and cred in each write(),
and release them in read(), usually done from another process,
eventually from another cpu. This triggers false sharing.

# Events: 154K cycles
#
# Overhead  Command       Shared Object        Symbol
# ........  .......  ..................  .........................
#
    10.40%  hackbench  [kernel.kallsyms]   [k] put_pid
     8.60%  hackbench  [kernel.kallsyms]   [k] unix_stream_recvmsg
     7.87%  hackbench  [kernel.kallsyms]   [k] unix_stream_sendmsg
     6.11%  hackbench  [kernel.kallsyms]   [k] do_raw_spin_lock
     4.95%  hackbench  [kernel.kallsyms]   [k] unix_scm_to_skb
     4.87%  hackbench  [kernel.kallsyms]   [k] pid_nr_ns
     4.34%  hackbench  [kernel.kallsyms]   [k] cred_to_ucred
     2.39%  hackbench  [kernel.kallsyms]   [k] unix_destruct_scm
     2.24%  hackbench  [kernel.kallsyms]   [k] sub_preempt_count
     1.75%  hackbench  [kernel.kallsyms]   [k] fget_light
     1.51%  hackbench  [kernel.kallsyms]   [k]
__mutex_lock_interruptible_slowpath
     1.42%  hackbench  [kernel.kallsyms]   [k] sock_alloc_send_pskb

This patch includes SCM_CREDENTIALS information in a af_unix message/skb
only if requested by the sender, [man 7 unix for details how to include
ancillary data using sendmsg() system call]

Note: This might break buggy applications that expected SCM_CREDENTIAL
from an unaware write() system call, and receiver not using SO_PASSCRED
socket option.

If SOCK_PASSCRED is set on source or destination socket, we still
include credentials for mere write() syscalls.

Performance boost in hackbench : more than 50% gain on a 16 thread
machine (2 quad-core cpus, 2 threads per core)

hackbench 20 thread 2000

4.228 sec instead of 9.102 sec
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16e57262

17 9月, 2011 1 次提交

Revert "Scm: Remove unnecessary pid & credential references in Unix socket's send and receive path" · f78a5fda

由 David S. Miller 提交于 9月 16, 2011

This reverts commit 0856a304.

As requested by Eric Dumazet, it has various ref-counting
problems and has introduced regressions.  Eric will add
a more suitable version of this performance fix.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f78a5fda

25 8月, 2011 1 次提交

Scm: Remove unnecessary pid & credential references in Unix socket's send and receive path · 0856a304

由 Tim Chen 提交于 8月 22, 2011

Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to
allow credentials to work across pid namespaces for packets sent via
UNIX sockets.  However, the atomic reference counts on pid and
credentials caused plenty of cache bouncing when there are numerous
threads of the same pid sharing a UNIX socket.  This patch mitigates the
problem by eliminating extraneous reference counts on pid and
credentials on both send and receive path of UNIX sockets. I found a 2x
improvement in hackbench's threaded case.

On the receive path in unix_dgram_recvmsg, currently there is an
increment of reference count on pid and credentials in scm_set_cred.
Then there are two decrement of the reference counts.  Once in scm_recv
and once when skb_free_datagram call skb->destructor function
unix_destruct_scm.  One pair of increment and decrement of ref count on
pid and credentials can be eliminated from the receive path.  Until we
destroy the skb, we already set a reference when we created the skb on
the send side.

On the send path, there are two increments of ref count on pid and
credentials, once in scm_send and once in unix_scm_to_skb.  Then there
is a decrement of the reference counts in scm_destroy's call to
scm_destroy_cred at the end of unix_dgram_sendmsg functions.   One pair
of increment and decrement of the reference counts can be removed so we
only need to increment the ref counts once.

By incorporating these changes, for hackbench running on a 4 socket
NHM-EX machine with 40 cores, the execution of hackbench on
50 groups of 20 threads sped up by factor of 2.

Hackbench command used for testing:
./hackbench 50 thread 2000
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0856a304

20 7月, 2011 1 次提交

new helpers: kern_path_create/user_path_create · dae6ad8f

由 Al Viro 提交于 6月 26, 2011

combination of kern_path_parent() and lookup_create().  Does *not*
expose struct nameidata to caller.  Syscalls converted to that...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dae6ad8f

24 5月, 2011 1 次提交

net: convert %p usage to %pK · 71338aa7

由 Dan Rosenberg 提交于 5月 23, 2011

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces.  Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers.  The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
 If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges.  Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".

The supporting code for kptr_restrict and %pK are currently in the -mm
tree.  This patch converts users of %p in net/ to %pK.  Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71338aa7