提交 · 16e5726269611b71c930054ffe9b858c1cea88eb · openeuler / Kernel

29 9月, 2011 1 次提交

af_unix: dont send SCM_CREDENTIALS by default · 16e57262

由 Eric Dumazet 提交于 9月 19, 2011

Since commit 7361c36c (af_unix: Allow credentials to work across
user and pid namespaces) af_unix performance dropped a lot.

This is because we now take a reference on pid and cred in each write(),
and release them in read(), usually done from another process,
eventually from another cpu. This triggers false sharing.

# Events: 154K cycles
#
# Overhead  Command       Shared Object        Symbol
# ........  .......  ..................  .........................
#
    10.40%  hackbench  [kernel.kallsyms]   [k] put_pid
     8.60%  hackbench  [kernel.kallsyms]   [k] unix_stream_recvmsg
     7.87%  hackbench  [kernel.kallsyms]   [k] unix_stream_sendmsg
     6.11%  hackbench  [kernel.kallsyms]   [k] do_raw_spin_lock
     4.95%  hackbench  [kernel.kallsyms]   [k] unix_scm_to_skb
     4.87%  hackbench  [kernel.kallsyms]   [k] pid_nr_ns
     4.34%  hackbench  [kernel.kallsyms]   [k] cred_to_ucred
     2.39%  hackbench  [kernel.kallsyms]   [k] unix_destruct_scm
     2.24%  hackbench  [kernel.kallsyms]   [k] sub_preempt_count
     1.75%  hackbench  [kernel.kallsyms]   [k] fget_light
     1.51%  hackbench  [kernel.kallsyms]   [k]
__mutex_lock_interruptible_slowpath
     1.42%  hackbench  [kernel.kallsyms]   [k] sock_alloc_send_pskb

This patch includes SCM_CREDENTIALS information in a af_unix message/skb
only if requested by the sender, [man 7 unix for details how to include
ancillary data using sendmsg() system call]

Note: This might break buggy applications that expected SCM_CREDENTIAL
from an unaware write() system call, and receiver not using SO_PASSCRED
socket option.

If SOCK_PASSCRED is set on source or destination socket, we still
include credentials for mere write() syscalls.

Performance boost in hackbench : more than 50% gain on a 16 thread
machine (2 quad-core cpus, 2 threads per core)

hackbench 20 thread 2000

4.228 sec instead of 9.102 sec
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16e57262

12 8月, 2011 1 次提交

net: cleanup some rcu_dereference_raw · 33d480ce

由 Eric Dumazet 提交于 8月 11, 2011

RCU api had been completed and rcu_access_pointer() or
rcu_dereference_protected() are better than generic
rcu_dereference_raw()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33d480ce

23 6月, 2011 1 次提交

netlink: advertise incomplete dumps · 670dc283

由 Johannes Berg 提交于 6月 20, 2011

Consider the following situation:
 * a dump that would show 8 entries, four in the first
   round, and four in the second
 * between the first and second rounds, 6 entries are
   removed
 * now the second round will not show any entry, and
   even if there is a sequence/generation counter the
   application will not know

To solve this problem, add a new flag NLM_F_DUMP_INTR
to the netlink header that indicates the dump wasn't
consistent, this flag can also be set on the MSG_DONE
message that terminates the dump, and as such above
situation can be detected.

To achieve this, add a sequence counter to the netlink
callback struct. Of course, netlink code still needs
to use this new functionality. The correct way to do
that is to always set cb->seq when a dumpit callback
is invoked and call nl_dump_check_consistent() for
each new message. The core code will also call this
function for the final MSG_DONE message.

To make it usable with generic netlink, a new function
genlmsg_nlhdr() is needed to obtain the netlink header
from the genetlink user header.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

670dc283

17 6月, 2011 1 次提交

rtnetlink: unlock on error path in netlink_dump() · c63d6ea3

由 Dan Carpenter 提交于 6月 15, 2011

In c7ac8679 "rtnetlink: Compute and store minimum ifinfo dump
size", we moved the allocation under the lock so we need to unlock
on error path.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NDavid S. Miller <davem@conan.davemloft.net>

c63d6ea3

10 6月, 2011 1 次提交

rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679

由 Greg Rose 提交于 6月 10, 2011

The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c7ac8679

24 5月, 2011 1 次提交

net: convert %p usage to %pK · 71338aa7

由 Dan Rosenberg 提交于 5月 23, 2011

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces.  Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers.  The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
 If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges.  Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".

The supporting code for kptr_restrict and %pK are currently in the -mm
tree.  This patch converts users of %p in net/ to %pK.  Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71338aa7

08 5月, 2011 1 次提交

net,rcu: convert call_rcu(listeners_free_rcu) to kfree_rcu() · 37b6b935

由 Lai Jiangshan 提交于 3月 15, 2011

The rcu callback listeners_free_rcu() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(listeners_free_rcu).
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

37b6b935

04 3月, 2011 2 次提交

netlink: kill eff_cap from struct netlink_skb_parms · 01a16b21

由 Patrick McHardy 提交于 3月 03, 2011

Netlink message processing in the kernel is synchronous these days,
capabilities can be checked directly in security_netlink_recv() from
the current process.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Reviewed-by: NJames Morris <jmorris@namei.org>
[chrisw: update to include pohmelfs and uvesafb]
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01a16b21

netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms · c53fa1ed

由 Patrick McHardy 提交于 3月 03, 2011

Netlink message processing in the kernel is synchronous these days, the
session information can be collected when needed.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c53fa1ed

01 3月, 2011 1 次提交

netlink: handle errors from netlink_dump() · b44d211e

由 Andrey Vagin 提交于 2月 21, 2011

netlink_dump() may failed, but nobody handle its error.
It generates output data, when a previous portion has been returned to
user space. This mechanism works when all data isn't go in skb. If we
enter in netlink_recvmsg() and skb is absent in the recv queue, the
netlink_dump() will not been executed. So if netlink_dump() is failed
one time, the new data never appear and the reader will sleep forever.

netlink_dump() is called from two places:

1. from netlink_sendmsg->...->netlink_dump_start().
   In this place we can report error directly and it will be returned
   by sendmsg().

2. from netlink_recvmsg
   There we can't report error directly, because we have a portion of
   valid output data and call netlink_dump() for prepare the next portion.
   If netlink_dump() is failed, the socket will be mark as error and the
   next recvmsg will be failed.
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b44d211e

20 1月, 2011 1 次提交

Revert "netlink: test for all flags of the NLM_F_DUMP composite" · b8f3ab42

由 David S. Miller 提交于 1月 18, 2011

This reverts commit 0ab03c2b.

It breaks several things including the avahi daemon.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8f3ab42

10 1月, 2011 1 次提交

netlink: test for all flags of the NLM_F_DUMP composite · 0ab03c2b

由 Jan Engelhardt 提交于 1月 07, 2011

Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH,
when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits
being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL,
non-dump requests with NLM_F_EXCL set are mistaken as dump requests.

Substitute the condition to test for _all_ bits being set.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ab03c2b

25 10月, 2010 1 次提交

netlink: fix netlink_change_ngroups() · 5c398dc8

由 Eric Dumazet 提交于 10月 24, 2010

commit 6c04bb18 (netlink: use call_rcu for netlink_change_ngroups)
used a somewhat convoluted and racy way to perform call_rcu().

The old block of memory is freed after a grace period, but the rcu_head
used to track it is located in new block.

This can clash if we call two times or more netlink_change_ngroups(),
and a block is freed before another. call_rcu() called on different cpus
makes no guarantee in order of callbacks.

Fix this using a more standard way of handling this : Each block of
memory contains its own rcu_head, so that no 'use after free' can
happens.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Johannes Berg <johannes@sipsolutions.net>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c398dc8

06 10月, 2010 1 次提交

genetlink: introduce pre_doit/post_doit hooks · ff4c92d8

由 Johannes Berg 提交于 10月 04, 2010

Each family may have some amount of boilerplate
locking code that applies to most, or even all,
commands.

This allows a family to handle such things in
a more generic way, by allowing it to
 a) include private flags in each operation
 b) specify a pre_doit hook that is called,
    before an operation's doit() callback and
    may return an error directly,
 c) specify a post_doit hook that can undo
    locking or similar things done by pre_doit,
    and finally
 d) include two private pointers in each info
    struct passed between all these operations
    including doit(). (It's two because I'll
    need two in nl80211 -- can be extended.)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

ff4c92d8

01 9月, 2010 1 次提交

netlink: Make NETLINK_USERSOCK work again. · b963ea89

由 David S. Miller 提交于 8月 30, 2010

Once we started enforcing the a nl_table[] entry exist for
a protocol, NETLINK_USERSOCK stopped working.  Add a dummy
table entry so that it works again.
Reported-by: NThomas Voegtle <tv@lio96.de>
Tested-by: NThomas Voegtle <tv@lio96.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b963ea89

19 8月, 2010 1 次提交

netlink: fix compat recvmsg · 68d6ac6d

由 Johannes Berg 提交于 8月 15, 2010

Since
commit 1dacc76d
Author: Johannes Berg <johannes@sipsolutions.net>
Date:   Wed Jul 1 11:26:02 2009 +0000

    net/compat/wext: send different messages to compat tasks

we had a race condition when setting and then
restoring frag_list. Eric attempted to fix it,
but the fix created even worse problems.

However, the original motivation I had when I
added the code that turned out to be racy is
no longer clear to me, since we only copy up
to skb->len to userspace, which doesn't include
the frag_list length. As a result, not doing
any frag_list clearing and restoring avoids
the race condition, while not introducing any
other problems.

Additionally, while preparing this patch I found
that since none of the remaining netlink code is
really aware of the frag_list, we need to use the
original skb's information for packet information
and credentials. This fixes, for example, the
group information received by compat tasks.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@kernel.org [2.6.31+, for 2.6.35 revert 1235f504]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68d6ac6d

16 8月, 2010 1 次提交

Revert "netlink: netlink_recvmsg() fix" · daa3766e

由 David S. Miller 提交于 8月 15, 2010

This reverts commit 1235f504.

It causes regressions worse than the problem it was trying
to fix.  Eric will try to solve the problem another way.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

daa3766e

27 7月, 2010 3 次提交

genetlink: use genl_register_family_with_ops() · 652c6717

由 Changli Gao 提交于 7月 25, 2010

Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

652c6717

genetlink: cleanup code according to CodingStyle · 416c2f9c

由 Changli Gao 提交于 7月 25, 2010

If the function is exported, the EXPORT* macro for it should follow immediately
after the closing function brace line.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
----
 net/netlink/genetlink.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

416c2f9c

netlink: netlink_recvmsg() fix · 1235f504

由 Eric Dumazet 提交于 7月 26, 2010

commit 1dacc76d
(net/compat/wext: send different messages to compat tasks)
introduced a race condition on netlink, in case MSG_PEEK is used.

An skb given by skb_recv_datagram() might be shared, we must copy it
before any modification, or risk fatal corruption.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1235f504

21 7月, 2010 1 次提交

drop_monitor: convert some kfree_skb call sites to consume_skb · 70d4bf6d

由 Neil Horman 提交于 7月 20, 2010

Convert a few calls from kfree_skb to consume_skb

Noticed while I was working on dropwatch that I was detecting lots of internal
skb drops in several places. While some are legitimate, several were not,
freeing skbs that were at the end of their life, rather than being discarded due
to an error. This patch converts those calls sites from using kfree_skb to
consume_skb, which quiets the in-kernel drop_monitor code from detecting them as
drops. Tested successfully by myself
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70d4bf6d

17 6月, 2010 1 次提交

af_netlink: Add needed scm_destroy after scm_send. · b47030c7

由 Eric W. Biederman 提交于 6月 13, 2010

scm_send occasionally allocates state in the scm_cookie, so I have
modified netlink_sendmsg to guarantee that when scm_send succeeds
scm_destory will be called to free that state.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Reviewed-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b47030c7

22 5月, 2010 1 次提交

netlink: Implment netlink_broadcast_filtered · 910a7e90

由 Eric W. Biederman 提交于 5月 04, 2010

When netlink sockets are used to convey data that is in a namespace
we need a way to select a subset of the listening sockets to deliver
the packet to.  For the network namespace we have been doing this
by only transmitting packets in the correct network namespace.

For data belonging to other namespaces netlink_bradcast_filtered
provides a mechanism that allows us to examine the destination
socket and to decide if we should transmit the specified packet
to it.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

910a7e90

04 4月, 2010 1 次提交

netlink: Export genl_lock() API for use by modules · f408e0ce

由 James Chapman 提交于 4月 02, 2010

This lets kernel modules which use genl netlink APIs serialize netlink
processing.
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Reviewed-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f408e0ce

02 4月, 2010 1 次提交

net: check the length of the socket address passed to connect(2) · 6503d961

由 Changli Gao 提交于 3月 31, 2010

check the length of the socket address passed to connect(2).

Check the length of the socket address passed to connect(2). If the
length is invalid, -EINVAL will be returned.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
----
net/bluetooth/l2cap.c | 3 ++-
net/bluetooth/rfcomm/sock.c | 3 ++-
net/bluetooth/sco.c | 3 ++-
net/can/bcm.c | 3 +++
net/ieee802154/af_ieee802154.c | 3 +++
net/ipv4/af_inet.c | 5 +++++
net/netlink/af_netlink.c | 3 +++
7 files changed, 20 insertions(+), 3 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6503d961

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

27 3月, 2010 1 次提交

netlink: use the appropriate namespace pid · 66aa4a55

由 Tom Goff 提交于 3月 19, 2010

This was included in OpenVZ kernels but wasn't integrated upstream.
>From git://git.openvz.org/pub/linux-2.6.24-openvz:

	commit 5c69402f18adf7276352e051ece2cf31feefab02
	Author: Alexey Dobriyan <adobriyan@openvz.org>
	Date:   Mon Dec 24 14:37:45 2007 +0300

	    netlink: fixup ->tgid to work in multiple PID namespaces
Signed-off-by: NTom Goff <thomas.goff@boeing.com>
Acked-by: NAlexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66aa4a55

21 3月, 2010 1 次提交

netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err() · 1a50307b

由 Pablo Neira Ayuso 提交于 3月 18, 2010

Currently, ENOBUFS errors are reported to the socket via
netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However,
that should not happen. This fixes this problem and it changes the
prototype of netlink_set_err() to return the number of sockets that
have set the NETLINK_RECV_NO_ENOBUFS socket option. This return
value is used in the next patch in these bugfix series.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a50307b

28 2月, 2010 1 次提交

netlink: Adding inode field to /proc/net/netlink · cf0aa4e0

由 Masatake YAMATO 提交于 2月 27, 2010

The Inode field in /proc/net/{tcp,udp,packet,raw,...} is useful to know the types of
file descriptors associated to a process. Actually lsof utility uses the field.
Unfortunately, unlike /proc/net/{tcp,udp,packet,raw,...}, /proc/net/netlink doesn't have the field.
This patch adds the field to /proc/net/netlink.
Signed-off-by: NMasatake YAMATO <yamato@redhat.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf0aa4e0

04 2月, 2010 1 次提交

netlink: fix for too early rmmod · 974c37e9

由 Alexey Dobriyan 提交于 1月 30, 2010

Netlink code does module autoload if protocol userspace is asking for is
not ready. However, module can dissapear right after it was autoloaded.
Example: modprobe/rmmod stress-testing and xfrm_user.ko providing NETLINK_XFRM.

netlink_create() in such situation _will_ create userspace socket and
_will_not_ pin module. Now if module was removed and we're going to call
->netlink_rcv into nothing:

BUG: unable to handle kernel paging request at ffffffffa02f842a
					       ^^^^^^^^^^^^^^^^
	modules are loaded near these addresses here

IP: [<ffffffffa02f842a>] 0xffffffffa02f842a
PGD 161f067 PUD 1623063 PMD baa12067 PTE 0
Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent
CPU 1
Pid: 11515, comm: ip Not tainted 2.6.33-rc5-netns-00594-gaaa5728-dirty #6 P5E/P5E
RIP: 0010:[<ffffffffa02f842a>]  [<ffffffffa02f842a>] 0xffffffffa02f842a
RSP: 0018:ffff8800baa3db48  EFLAGS: 00010292
RAX: ffff8800baa3dfd8 RBX: ffff8800be353640 RCX: 0000000000000000
RDX: ffffffff81959380 RSI: ffff8800bab7f130 RDI: 0000000000000001
RBP: ffff8800baa3db58 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000011
R13: ffff8800be353640 R14: ffff8800bcdec240 R15: ffff8800bd488010
FS:  00007f93749656f0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa02f842a CR3: 00000000ba82b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ip (pid: 11515, threadinfo ffff8800baa3c000, task ffff8800bab7eb30)
Stack:
 ffffffff813637c0 ffff8800bd488000 ffff8800baa3dba8 ffffffff8136397d
<0> 0000000000000000 ffffffff81344adc 7fffffffffffffff 0000000000000000
<0> ffff8800baa3ded8 ffff8800be353640 ffff8800bcdec240 0000000000000000
Call Trace:
 [<ffffffff813637c0>] ? netlink_unicast+0x100/0x2d0
 [<ffffffff8136397d>] netlink_unicast+0x2bd/0x2d0

	netlink_unicast_kernel:
		nlk->netlink_rcv(skb);

 [<ffffffff81344adc>] ? memcpy_fromiovec+0x6c/0x90
 [<ffffffff81364263>] netlink_sendmsg+0x1d3/0x2d0
 [<ffffffff8133975b>] sock_sendmsg+0xbb/0xf0
 [<ffffffff8106cdeb>] ? __lock_acquire+0x27b/0xa60
 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0
 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0
 [<ffffffff8106db22>] ? __lock_release+0x82/0x170
 [<ffffffff810a190e>] ? might_fault+0xbe/0xd0
 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0
 [<ffffffff81344c77>] ? verify_iovec+0x47/0xd0
 [<ffffffff8133a509>] sys_sendmsg+0x1a9/0x360
 [<ffffffff813c2be5>] ? _raw_spin_unlock_irqrestore+0x65/0x70
 [<ffffffff8106aced>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff813c2bc2>] ? _raw_spin_unlock_irqrestore+0x42/0x70
 [<ffffffff81197004>] ? __up_read+0x84/0xb0
 [<ffffffff8106ac95>] ? trace_hardirqs_on_caller+0x145/0x190
 [<ffffffff813c207f>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8100262b>] system_call_fastpath+0x16/0x1b
Code:  Bad RIP value.
RIP  [<ffffffffa02f842a>] 0xffffffffa02f842a
 RSP <ffff8800baa3db48>
CR2: ffffffffa02f842a

If module was quickly removed after autoloading, return -E.

Return -EPROTONOSUPPORT if module was quickly removed after autoloading.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

974c37e9

14 1月, 2010 1 次提交

genetlink: optimize ctrl_dumpfamily() · e1d5a010

由 Samir Bellabes 提交于 1月 07, 2010

there is a unnecessary test which can be replaced by a good initialization in
the 'for' statement

Noticed by Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: NSamir Bellabes <sam@synack.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1d5a010

26 11月, 2009 1 次提交

net: use net_eq to compare nets · 09ad9bc7

由 Octavian Purdila 提交于 11月 25, 2009

Generated with the following semantic patch

@@
struct net *n1;
struct net *n2;
@@
- n1 == n2
+ net_eq(n1, n2)

@@
struct net *n1;
struct net *n2;
@@
- n1 != n2
+ !net_eq(n1, n2)

applied over {include,net,drivers/net}.
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09ad9bc7

17 11月, 2009 1 次提交

netlink: remove subscriptions check on notifier · 649300b9

由 Johannes Berg 提交于 11月 16, 2009

The netlink URELEASE notifier doesn't notify for
sockets that have been used to receive multicast
but it should be called for such sockets as well
since they might _also_ be used for sending and
not solely for receiving multicast. We will need
that for nl80211 (generic netlink sockets) in the
future.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

649300b9

11 11月, 2009 1 次提交

net: netlink_getname, packet_getname -- use DECLARE_SOCKADDR guard · 13cfa97b

由 Cyrill Gorcunov 提交于 11月 08, 2009

Use guard DECLARE_SOCKADDR in a few more places which allow
us to catch if the structure copied back is too big.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13cfa97b

06 11月, 2009 1 次提交

net: pass kern to net_proto_family create function · 3f378b68

由 Eric Paris 提交于 11月 05, 2009

The generic __sock_create function has a kern argument which allows the
security system to make decisions based on if a socket is being created by
the kernel or by userspace. This patch passes that flag to the
net_proto_family specific create function, so it can do the same thing.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f378b68

18 10月, 2009 2 次提交

genetlink: Optimize and one bug fix in genl_generate_id() · 988ade6b

由 Krishna Kumar 提交于 10月 14, 2009

1. GENL_MIN_ID is a valid id -> no need to start at
   GENL_MIN_ID + 1.
2. Avoid going through the ids two times: If we start at
   GENL_MIN_ID+1 (*or bigger*) and all ids are over!, the
   code iterates through the list twice (*or lesser*).
3. Simplify code - no need to start at idx=0 which gets
   reset to GENL_MIN_ID.

Patch on net-next-2.6. Reboot test shows that first id
passed to genl_register_family was 16, next two were
GENL_ID_GENERATE and genl_generate_id returned 17 & 18
(user level testing of same code shows expected values
across entire range of MIN/MAX).
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

988ade6b

genetlink: Optimize genl_register_family() · 93860b08

由 Krishna Kumar 提交于 10月 14, 2009

genl_register_family() doesn't need to call genl_family_find_byid
when GENL_ID_GENERATE is passed during register.

Patch on net-next-2.6, compile and reboot testing only.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93860b08

07 10月, 2009 1 次提交

net: mark net_proto_ops as const · ec1b4cf7

由 Stephen Hemminger 提交于 10月 05, 2009

All usages of structure net_proto_ops should be declared const.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec1b4cf7

01 10月, 2009 1 次提交

net: Make setsockopt() optlen be unsigned. · b7058842

由 David S. Miller 提交于 9月 30, 2009

This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7058842

27 9月, 2009 1 次提交

net: fix nlmsg len size for skb when error bit is set. · 5dba93ae

由 John Fastabend 提交于 9月 25, 2009

Currently, the nlmsg->len field is not set correctly in  netlink_ack()
for ack messages that include the nlmsg of the error frame.  This
corrects the length field passed to __nlmsg_put to use the correct
payload size.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5dba93ae

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功