提交 · ccdfcc398594ddf3f77348c5a10938dbe9efefbe · openanolis / cloud-kernel

20 4月, 2013 1 次提交

netlink: mmaped netlink: ring setup · ccdfcc39

由 Patrick McHardy 提交于 4月 17, 2013

Add support for mmap'ed RX and TX ring setup and teardown based on the
af_packet.c code. The following patches will use this to add the real
mmap'ed receive and transmit functionality.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccdfcc39

27 3月, 2013 1 次提交

netlink: remove duplicated NLMSG_ALIGN · a88b9ce5

由 Hong zhi guo 提交于 3月 25, 2013

NLMSG_HDRLEN is already aligned value. It's for directly reference
without extra alignment.

The redundant alignment here may confuse the API users.
Signed-off-by: NHong Zhiguo <honkiko@gmail.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a88b9ce5

13 10月, 2012 1 次提交

UAPI: (Scripted) Disintegrate include/linux · 607ca46e

由 David Howells 提交于 10月 13, 2012

Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NDave Jones <davej@redhat.com>

607ca46e

07 10月, 2012 1 次提交

netlink: add reference of module in netlink_dump_start · 6dc878a8

由 Gao feng 提交于 10月 04, 2012

I get a panic when I use ss -a and rmmod inet_diag at the
same time.

It's because netlink_dump uses inet_diag_dump which belongs to module
inet_diag.

I search the codes and find many modules have the same problem.  We
need to add a reference to the module which the cb->dump belongs to.

Thanks for all help from Stephen,Jan,Eric,Steffen and Pablo.

Change From v3:
change netlink_dump_start to inline,suggestion from Pablo and
Eric.

Change From v2:
delete netlink_dump_done,and call module_put in netlink_dump
and netlink_sock_destruct.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6dc878a8

23 9月, 2012 1 次提交
- D
  netlink: Rearrange netlink_kernel_cfg to save space on 64-bit. · c9d2ea96
  由 David S. Miller 提交于 9月 23, 2012
```
Suggested by Jan Engelhardt.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  c9d2ea96
22 9月, 2012 1 次提交

netlink: use <linux/export.h> instead of <linux/module.h> · abb17e6c

由 Pablo Neira Ayuso 提交于 9月 21, 2012

Since (9f00d977 netlink: hide struct module parameter in netlink_kernel_create),
linux/netlink.h includes linux/module.h because of the use of THIS_MODULE.

Use linux/export.h instead, as suggested by Stephen Rothwell, which is
significantly smaller and defines THIS_MODULES.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abb17e6c

11 9月, 2012 1 次提交

netlink: Rename pid to portid to avoid confusion · 15e47304

由 Eric W. Biederman 提交于 9月 07, 2012

It is a frequent mistake to confuse the netlink port identifier with a
process identifier.  Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15e47304

09 9月, 2012 2 次提交

netlink: hide struct module parameter in netlink_kernel_create · 9f00d977

由 Pablo Neira Ayuso 提交于 9月 08, 2012

This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).

Suggested by David S. Miller.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f00d977

netlink: kill netlink_set_nonroot · 9785e10a

由 Pablo Neira Ayuso 提交于 9月 08, 2012

Replace netlink_set_nonroot by one new field `flags' in
struct netlink_kernel_cfg that is passed to netlink_kernel_create.

This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
now the flags field in nl_table is generic (so we can add more
flags if needed in the future).

Also adjust all callers in the net-next tree to use these flags
instead of netlink_set_nonroot.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9785e10a

08 9月, 2012 1 次提交

scm: Don't use struct ucred in NETLINK_CB and struct scm_cookie. · dbe9a417

由 Eric W. Biederman 提交于 9月 06, 2012

Passing uids and gids on NETLINK_CB from a process in one user
namespace to a process in another user namespace can result in the
wrong uid or gid being presented to userspace.  Avoid that problem by
passing kuids and kgids instead.

- define struct scm_creds for use in scm_cookie and netlink_skb_parms
  that holds uid and gid information in kuid_t and kgid_t.

- Modify scm_set_cred to fill out scm_creds by heand instead of using
  cred_to_ucred to fill out struct ucred.  This conversion ensures
  userspace does not get incorrect uid or gid values to look at.

- Modify scm_recv to convert from struct scm_creds to struct ucred
  before copying credential values to userspace.

- Modify __scm_send to populate struct scm_creds on in the scm_cookie,
  instead of just copying struct ucred from userspace.

- Modify netlink_sendmsg to copy scm_creds instead of struct ucred
  into the NETLINK_CB.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbe9a417

15 8月, 2012 1 次提交

netlink: Make the sending netlink socket availabe in NETLINK_CB · 3fbc2905

由 Eric W. Biederman 提交于 5月 24, 2012

The sending socket of an skb is already available by it's port id
in the NETLINK_CB.  If you want to know more like to examine the
credentials on the sending socket you have to look up the sending
socket by it's port id and all of the needed functions and data
structures are static inside of af_netlink.c.  So do the simple
thing and pass the sending socket to the receivers in the NETLINK_CB.

I intend to use this to get the user namespace of the sending socket
in inet_diag so that I can report uids in the context of the process
who opened the socket, the same way I report uids in the contect
of the process who opens files.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

3fbc2905

30 6月, 2012 2 次提交

netlink: add nlk->netlink_bind hook for module auto-loading · 03292745

由 Pablo Neira Ayuso 提交于 6月 29, 2012

This patch adds a hook in the binding path of netlink.

This is used by ctnetlink to allow module autoloading for the case
in which one user executes:

 conntrack -E

So far, this resulted in nfnetlink loaded, but not
nf_conntrack_netlink.

I have received in the past many complains on this behaviour.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03292745

netlink: add netlink_kernel_cfg parameter to netlink_kernel_create · a31f2d17

由 Pablo Neira Ayuso 提交于 6月 29, 2012

This patch adds the following structure:

struct netlink_kernel_cfg {
        unsigned int    groups;
        void            (*input)(struct sk_buff *skb);
        struct mutex    *cb_mutex;
};

That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.

I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.

That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.

This patch also adapts all callers to use this new interface.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a31f2d17

27 6月, 2012 1 次提交

netlink: Delete NLMSG_PUT and NLMSG_NEW. · c3deafc5

由 David S. Miller 提交于 6月 26, 2012

No longer used and a poor interface as they were macros
with embedded gotos.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3deafc5

09 5月, 2012 1 次提交

netfilter: remove ip_queue support · d16cf20e

由 Pablo Neira Ayuso 提交于 5月 08, 2012

This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.

This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.

Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d16cf20e

27 2月, 2012 2 次提交

netlink: allow to pass data pointer to netlink_dump_start() callback · 7175c883

由 Pablo Neira Ayuso 提交于 2月 24, 2012

This patch allows you to pass a data pointer that can be
accessed from the dump callback.

Netfilter is going to use this patch to provide filtered dumps
to user-space. This is specifically interesting in ctnetlink that
may handle lots of conntrack entries. We can save precious
cycles by skipping the conversion to TLV format of conntrack
entries that are not interesting for user-space.

More specifically, ctnetlink will include one operation to allow
to filter the dumping of conntrack entries by ctmark values.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7175c883

netlink: add netlink_dump_control structure for netlink_dump_start() · 80d326fa

由 Pablo Neira Ayuso 提交于 2月 24, 2012

Davem considers that the argument list of this interface is getting
out of control. This patch tries to address this issue following
his proposal:

struct netlink_dump_control c = { .dump = dump, .done = done, ... };

netlink_dump_start(..., &c);

Suggested by David S. Miller.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80d326fa

31 1月, 2012 1 次提交

net: Deinline __nlmsg_put and genlmsg_put. -7k code on i386 defconfig. · a46621a3

由 Denys Vlasenko 提交于 1月 30, 2012

   text	   data	    bss	    dec	    hex	filename
8455963	 532732	1810804	10799499 a4c98b	vmlinux.o.before
8448899	 532732	1810804	10792435 a4adf3	vmlinux.o

This change also removes commented-out copy of __nlmsg_put
which was last touched in 2005 with "Enable once all users
have been converted" comment on top.

Changes in v2: rediffed against net-next.
Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a46621a3

07 12月, 2011 1 次提交

inet_diag: Partly rename inet_ to sock_ · 7f1fb60c

由 Pavel Emelyanov 提交于 12月 06, 2011

The ultimate goal is to get the sock_diag module, that works in
family+protocol terms. Currently this is suitable to do on the
inet_diag basis, so rename parts of the code. It will be moved
to sock_diag.c later.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f1fb60c

21 10月, 2011 1 次提交

crypto: Add userspace configuration API · a38f7907

由 Steffen Klassert 提交于 9月 27, 2011

This patch adds a basic userspace configuration API for the crypto layer.
With this it is possible to instantiate, remove and to show crypto
algorithms from userspace.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

a38f7907

27 8月, 2011 1 次提交

headers, net: Use __kernel_sa_family_t in more definitions shared with userland · bcb949b8

由 Ben Hutchings 提交于 8月 24, 2011

Complete the work started with commit
6602a4ba ('net: Make userland include
of netlink.h more sane').
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bcb949b8

08 8月, 2011 1 次提交

net: Make userland include of netlink.h more sane. · 6602a4ba

由 David S. Miller 提交于 8月 07, 2011

Currently userland will barf when including linux/netlink.h unless it
precisely includes sys/socket.h first.  The issue is where the
definition of "sa_family_t" comes from.

We've been back and forth on how to fix this issue in the past, see:

http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/622621
http://thread.gmane.org/gmane.linux.network/143380

Ben Hutchings suggested we take a hint from how we handle the
sockaddr_storage type.  First we define a "__kernel_sa_family_t"
to linux/socket.h that is always defined.

Then if __KERNEL__ is defined, we also define "sa_family_t" as
equal to "__kernel_sa_family_t".

Then in places like linux/netlink.h we use __kernel_sa_family_t
in user visible datastructures.
Reported-by: NMichel Machado <michel@digirati.com.br>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6602a4ba

23 6月, 2011 1 次提交

netlink: advertise incomplete dumps · 670dc283

由 Johannes Berg 提交于 6月 20, 2011

Consider the following situation:
 * a dump that would show 8 entries, four in the first
   round, and four in the second
 * between the first and second rounds, 6 entries are
   removed
 * now the second round will not show any entry, and
   even if there is a sequence/generation counter the
   application will not know

To solve this problem, add a new flag NLM_F_DUMP_INTR
to the netlink header that indicates the dump wasn't
consistent, this flag can also be set on the MSG_DONE
message that terminates the dump, and as such above
situation can be detected.

To achieve this, add a sequence counter to the netlink
callback struct. Of course, netlink code still needs
to use this new functionality. The correct way to do
that is to always set cb->seq when a dumpit callback
is invoked and call nl_dump_check_consistent() for
each new message. The core code will also call this
function for the final MSG_DONE message.

To make it usable with generic netlink, a new function
genlmsg_nlhdr() is needed to obtain the netlink header
from the genetlink user header.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

670dc283

10 6月, 2011 1 次提交

rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679

由 Greg Rose 提交于 6月 10, 2011

The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c7ac8679

21 5月, 2011 1 次提交

RDMA: Add netlink infrastructure · b2cbae2c

由 Roland Dreier 提交于 5月 20, 2011

Add basic RDMA netlink infrastructure that allows for registration of
RDMA clients for which data is to be exported and supplies message
construction callbacks.
Signed-off-by: NNir Muchtar <nirm@voltaire.com>

[ Reorganize a few things, add CONFIG_NET dependency.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b2cbae2c

04 3月, 2011 2 次提交

netlink: kill eff_cap from struct netlink_skb_parms · 01a16b21

由 Patrick McHardy 提交于 3月 03, 2011

Netlink message processing in the kernel is synchronous these days,
capabilities can be checked directly in security_netlink_recv() from
the current process.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Reviewed-by: NJames Morris <jmorris@namei.org>
[chrisw: update to include pohmelfs and uvesafb]
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01a16b21

netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms · c53fa1ed

由 Patrick McHardy 提交于 3月 03, 2011

Netlink message processing in the kernel is synchronous these days, the
session information can be collected when needed.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c53fa1ed

18 12月, 2010 1 次提交

netlink: fix gcc -Wconversion compilation warning · 4b8fe663

由 Dmitry V. Levin 提交于 12月 17, 2010

$ cat << EOF | gcc -Wconversion -xc -S -o/dev/null -
unsigned f(void) {return NLMSG_HDRLEN;}
EOF
<stdin>: In function 'f':
<stdin>:3:26: warning: negative integer implicitly converted to unsigned type
Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b8fe663

23 9月, 2010 1 次提交

net: Move "struct net" declaration inside the __KERNEL__ macro guard · 56b49f4b

由 Ollie Wild 提交于 9月 22, 2010

This patch reduces namespace pollution by moving the "struct net" declaration
out of the userspace-facing portion of linux/netlink.h.  It has no impact on
the kernel.

(This came up because we have several C++ applications which use "net" as a
namespace name.)
Signed-off-by: NOllie Wild <aaw@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56b49f4b

22 5月, 2010 1 次提交

netlink: Implment netlink_broadcast_filtered · 910a7e90

由 Eric W. Biederman 提交于 5月 04, 2010

When netlink sockets are used to convey data that is in a namespace
we need a way to select a subset of the listening sockets to deliver
the packet to.  For the network namespace we have been doing this
by only transmitting packets in the correct network namespace.

For data belonging to other namespaces netlink_bradcast_filtered
provides a mechanism that allows us to examine the destination
socket and to decide if we should transmit the specified packet
to it.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

910a7e90

21 3月, 2010 1 次提交

netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err() · 1a50307b

由 Pablo Neira Ayuso 提交于 3月 18, 2010

Currently, ENOBUFS errors are reported to the socket via
netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However,
that should not happen. This fixes this problem and it changes the
prototype of netlink_set_err() to return the number of sockets that
have set the NETLINK_RECV_NO_ENOBUFS socket option. This return
value is used in the next patch in these bugfix series.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a50307b

05 11月, 2009 1 次提交

net: cleanup include/linux · d94d9fee

由 Eric Dumazet 提交于 11月 04, 2009

This cleanup patch puts struct/union/enum opening braces,
in first line to ease grep games.

struct something
{

becomes :

struct something {
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d94d9fee

25 9月, 2009 1 次提交

genetlink: fix netns vs. netlink table locking (2) · b8273570

由 Johannes Berg 提交于 9月 24, 2009

Similar to commit d136f1bd,
there's a bug when unregistering a generic netlink family,
which is caught by the might_sleep() added in that commit:

    BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183
    in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod
    2 locks held by rmmod/1510:
     #0:  (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130
     #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120
    Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444
    Call Trace:
     [<ffffffff81044ff9>] __might_sleep+0x119/0x150
     [<ffffffff81380501>] netlink_table_grab+0x21/0x100
     [<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60
     [<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120
     [<ffffffff81382866>] genl_unregister_family+0x56/0x130
     [<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211]
     [<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211]

Fix in the same way by grabbing the netlink table lock
before doing rcu_read_lock().
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8273570

15 9月, 2009 1 次提交

genetlink: fix netns vs. netlink table locking · d136f1bd

由 Johannes Berg 提交于 9月 12, 2009

Since my commits introducing netns awareness into
genetlink we can get this problem:

BUG: scheduling while atomic: modprobe/1178/0x00000002
2 locks held by modprobe/1178:
 #0:  (genl_mutex){+.+.+.}, at: [<ffffffff8135ee1a>] genl_register_mc_grou
 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8135eeb5>] genl_register_mc_g
Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty #
Call Trace:
 [<ffffffff8103e285>] __schedule_bug+0x85/0x90
 [<ffffffff81403138>] schedule+0x108/0x588
 [<ffffffff8135b131>] netlink_table_grab+0xa1/0xf0
 [<ffffffff8135c3a7>] netlink_change_ngroups+0x47/0x100
 [<ffffffff8135ef0f>] genl_register_mc_group+0x12f/0x290

because I overlooked that netlink_table_grab() will
schedule, thinking it was just the rwlock. However,
in the contention case, that isn't actually true.

Fix this by letting the code grab the netlink table
lock first and then the RCU for netns protection.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d136f1bd

25 8月, 2009 1 次提交

netlink: constify nlmsghdr arguments · 3a6c2b41

由 Patrick McHardy 提交于 8月 25, 2009

Consitfy nlmsghdr arguments to a couple of functions as preparation
for the next patch, which will constify the netlink message data in
all nfnetlink users.
Signed-off-by: NPatrick McHardy <kaber@trash.net>

3a6c2b41

25 3月, 2009 1 次提交

netlink: add NETLINK_NO_ENOBUFS socket flag · 38938bfe

由 Pablo Neira Ayuso 提交于 3月 24, 2009

This patch adds the NETLINK_NO_ENOBUFS socket flag. This flag can
be used by unicast and broadcast listeners to avoid receiving
ENOBUFS errors.

Generally speaking, ENOBUFS errors are useful to notify two things
to the listener:

a) You may increase the receiver buffer size via setsockopt().
b) You have lost messages, you may be out of sync.

In some cases, ignoring ENOBUFS errors can be useful. For example:

a) nfnetlink_queue: this subsystem does not have any sort of resync
method and you can decide to ignore ENOBUFS once you have set a
given buffer size.

b) ctnetlink: you can use this together with the socket flag
NETLINK_BROADCAST_SEND_ERROR to stop getting ENOBUFS errors as
you do not need to resync (packets whose event are not delivered
are drop to provide reliable logging and state-synchronization).

Moreover, the use of NETLINK_NO_ENOBUFS also reduces a "go up, go down"
effect in terms of performance which is due to the netlink congestion
control when the listener cannot back off. The effect is the following:

1) throughput rate goes up and netlink messages are inserted in the
receiver buffer.
2) Then, netlink buffer fills and overruns (set on nlk->state bit 0).
3) While the listener empties the receiver buffer, netlink keeps
dropping messages. Thus, throughput goes dramatically down.
4) Then, once the listener has emptied the buffer (nlk->state
bit 0 is set off), goto step 1.

This effect is easy to trigger with netlink broadcast under heavy
load, and it is more noticeable when using a big receiver buffer.
You can find some results in [1] that show this problem.

[1] http://1984.lsi.us.es/linux/netlink/

This patch also includes the use of sk_drop to account the number of
netlink messages drop due to overrun. This value is shown in
/proc/net/netlink.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38938bfe

20 2月, 2009 1 次提交

netlink: add NETLINK_BROADCAST_ERROR socket option · be0c22a4

由 Pablo Neira Ayuso 提交于 2月 18, 2009

This patch adds NETLINK_BROADCAST_ERROR which is a netlink
socket option that the listener can set to make netlink_broadcast()
return errors in the delivery to the caller. This option is useful
if the caller of netlink_broadcast() do something with the result
of the message delivery, like in ctnetlink where it drops a network
packet if the event delivery failed, this is used to enable reliable
logging and state-synchronization. If this socket option is not set,
netlink_broadcast() only reports ESRCH errors and silently ignore
ENOBUFS errors, which is what most netlink_broadcast() callers
should do.

This socket option is based on a suggestion from Patrick McHardy.
Patrick McHardy can exchange this patch for a beer from me ;).
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be0c22a4

20 11月, 2008 1 次提交

netlink: avoid memset of 0 bytes sparse warning · 0c19b0ad

由 Patrick McHardy 提交于 11月 20, 2008

A netlink attribute padding of zero triggers this sparse warning:

include/linux/netlink.h:245:8: warning: memset with byte count of 0

Avoid the memset when the size parameter is constant and requires no padding.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c19b0ad

01 10月, 2008 1 次提交

ipsec: Put dumpers on the dump list · 12a169e7

由 Herbert Xu 提交于 10月 01, 2008

Herbert Xu came up with the idea and the original patch to make
xfrm_state dump list contain also dumpers:

As it is we go to extraordinary lengths to ensure that states
don't go away while dumpers go to sleep.  It's much easier if
we just put the dumpers themselves on the list since they can't
go away while they're going.

I've also changed the order of addition on new states to prevent
a never-ending dump.

Timo Teräs improved the patch to apply cleanly to latest tree,
modified iteration code to be more readable by using a common
struct for entries in the list, implemented the same idea for
xfrm_policy dumping and moved the af_key specific "last" entry
caching to af_key.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12a169e7

23 9月, 2008 1 次提交

ipsec: Fix xfrm_state_walk race · 5c182458

由 Herbert Xu 提交于 9月 22, 2008

As discovered by Timo Teräs, the currently xfrm_state_walk scheme
is racy because if a second dump finishes before the first, we
may free xfrm states that the first dump would walk over later.

This patch fixes this by storing the dumps in a list in order
to calculate the correct completion counter which cures this
problem.

I've expanded netlink_cb in order to accomodate the extra state
related to this.  It shouldn't be a big deal since netlink_cb
is kmalloced for each dump and we're just increasing it by 4 or
8 bytes.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c182458

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功