提交 · dfc96c192ad48a16b0d5bba43165d9893a00fe37 · openanolis / cloud-kernel

10 12月, 2014 1 次提交

rtnetlink: delay RTM_DELLINK notification until after ndo_uninit() · 395eea6c

由 Mahesh Bandewar 提交于 12月 03, 2014

The commit 56bfa7ee ("unregister_netdevice : move RTM_DELLINK to
until after ndo_uninit") tried to do this ealier but while doing so
it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
delayed call to fill_info(). So this translated into asking driver
to remove private state and then query it's private state. This
could have catastropic consequences.

This change breaks the rtmsg_ifinfo() into two parts - one takes the
precise snapshot of the device by called fill_info() before calling
the ndo_uninit() and the second part sends the notification using
collected snapshot.

It was brought to notice when last link is deleted from an ipvlan device
when it has free-ed the port and the subsequent .fill_info() call is
trying to get the info from the port.

kernel: [ 255.139429] ------------[ cut here ]------------
kernel: [ 255.139439] WARNING: CPU: 12 PID: 11173 at net/core/rtnetlink.c:2238 rtmsg_ifinfo+0x100/0x110()
kernel: [ 255.139493] Modules linked in: ipvlan bonding w1_therm ds2482 wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid bnx2x ptp pps_core mdio libcrc32c
kernel: [ 255.139513] CPU: 12 PID: 11173 Comm: ip Not tainted 3.18.0-smp-DEV #167
kernel: [ 255.139514] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 1.0.10 05/15/2012
kernel: [ 255.139515] 0000000000000009 ffff880851b6b828 ffffffff815d87f4 00000000000000e0
kernel: [ 255.139516] 0000000000000000 ffff880851b6b868 ffffffff8109c29c 0000000000000000
kernel: [ 255.139518] 00000000ffffffa6 00000000000000d0 ffffffff81aaf580 0000000000000011
kernel: [ 255.139520] Call Trace:
kernel: [ 255.139527] [<ffffffff815d87f4>] dump_stack+0x46/0x58
kernel: [ 255.139531] [<ffffffff8109c29c>] warn_slowpath_common+0x8c/0xc0
kernel: [ 255.139540] [<ffffffff8109c2ea>] warn_slowpath_null+0x1a/0x20
kernel: [ 255.139544] [<ffffffff8150d570>] rtmsg_ifinfo+0x100/0x110
kernel: [ 255.139547] [<ffffffff814f78b5>] rollback_registered_many+0x1d5/0x2d0
kernel: [ 255.139549] [<ffffffff814f79cf>] unregister_netdevice_many+0x1f/0xb0
kernel: [ 255.139551] [<ffffffff8150acab>] rtnl_dellink+0xbb/0x110
kernel: [ 255.139553] [<ffffffff8150da90>] rtnetlink_rcv_msg+0xa0/0x240
kernel: [ 255.139557] [<ffffffff81329283>] ? rhashtable_lookup_compare+0x43/0x80
kernel: [ 255.139558] [<ffffffff8150d9f0>] ? __rtnl_unlock+0x20/0x20
kernel: [ 255.139562] [<ffffffff8152cb11>] netlink_rcv_skb+0xb1/0xc0
kernel: [ 255.139563] [<ffffffff8150a495>] rtnetlink_rcv+0x25/0x40
kernel: [ 255.139565] [<ffffffff8152c398>] netlink_unicast+0x178/0x230
kernel: [ 255.139567] [<ffffffff8152c75f>] netlink_sendmsg+0x30f/0x420
kernel: [ 255.139571] [<ffffffff814e0b0c>] sock_sendmsg+0x9c/0xd0
kernel: [ 255.139575] [<ffffffff811d1d7f>] ? rw_copy_check_uvector+0x6f/0x130
kernel: [ 255.139577] [<ffffffff814e11c9>] ? copy_msghdr_from_user+0x139/0x1b0
kernel: [ 255.139578] [<ffffffff814e1774>] ___sys_sendmsg+0x304/0x310
kernel: [ 255.139581] [<ffffffff81198723>] ? handle_mm_fault+0xca3/0xde0
kernel: [ 255.139585] [<ffffffff811ebc4c>] ? destroy_inode+0x3c/0x70
kernel: [ 255.139589] [<ffffffff8108e6ec>] ? __do_page_fault+0x20c/0x500
kernel: [ 255.139597] [<ffffffff811e8336>] ? dput+0xb6/0x190
kernel: [ 255.139606] [<ffffffff811f05f6>] ? mntput+0x26/0x40
kernel: [ 255.139611] [<ffffffff811d2b94>] ? __fput+0x174/0x1e0
kernel: [ 255.139613] [<ffffffff814e2129>] __sys_sendmsg+0x49/0x90
kernel: [ 255.139615] [<ffffffff814e2182>] SyS_sendmsg+0x12/0x20
kernel: [ 255.139617] [<ffffffff815df092>] system_call_fastpath+0x12/0x17
kernel: [ 255.139619] ---[ end trace 5e6703e87d984f6b ]---
Signed-off-by: NMahesh Bandewar <maheshb@google.com>
Reported-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

395eea6c

03 12月, 2014 2 次提交

bridge: add brport flags to dflt bridge_getlink · 2c3c031c

由 Scott Feldman 提交于 11月 28, 2014

To allow brport device to return current brport flags set on port.  Add
returned flags to nested IFLA_PROTINFO netlink msg built in dflt getlink.
With this change, netlink msg returned for bridge_getlink contains the port's
offloaded flag settings (the port's SELF settings).
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c3c031c

net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del · f6f6424b

由 Jiri Pirko 提交于 11月 28, 2014

Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple
u16 vid to drivers from there.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6f6424b

14 9月, 2014 2 次提交

net: sched: RCU cls_tcindex · 331b7292

由 John Fastabend 提交于 9月 12, 2014

Make cls_tcindex RCU safe.

This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check
caller either holds the rcu read lock or RTNL. This is needed to
handle the case where tcindex_lookup() is being called in both cases.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

331b7292

net: sched: RCU cls_tcindex · 8332904a

由 John Fastabend 提交于 9月 12, 2014

Make cls_tcindex RCU safe.

This patch addds a new RCU routine rcu_dereference_bh_rtnl() to check
caller either holds the rcu read lock or RTNL. This is needed to
handle the case where tcindex_lookup() is being called in both cases.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8332904a

11 7月, 2014 1 次提交

bridge: fdb dumping takes a filter device · 5d5eacb3

由 Jamal Hadi Salim 提交于 7月 10, 2014

Dumping a bridge fdb dumps every fdb entry
held. With this change we are going to filter
on selected bridge port.
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d5eacb3

16 5月, 2014 1 次提交

rtnetlink: wait for unregistering devices in rtnl_link_unregister() · 200b916f

由 Cong Wang 提交于 5月 12, 2014

From: Cong Wang <cwang@twopensource.com>

commit 50624c93 (net: Delay default_device_exit_batch until no
devices are unregistering) introduced rtnl_lock_unregistering() for
default_device_exit_batch(). Same race could happen we when rmmod a driver
which calls rtnl_link_unregister() as we call dev->destructor without rtnl
lock.

For long term, I think we should clean up the mess of netdev_run_todo()
and net namespce exit code.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

200b916f

18 12月, 2013 1 次提交

net: allow netdev_all_upper_get_next_dev_rcu with rtnl lock held · 85328240

由 John Fastabend 提交于 11月 26, 2013

It is useful to be able to walk all upper devices when bringing
a device online where the RTNL lock is held. In this case it
is safe to walk the all_adj_list because the RTNL lock is used
to protect the write side as well.

This patch adds a check to see if the rtnl lock is held before
throwing a warning in netdev_all_upper_get_next_dev_rcu().

Also because we now have a call site for lockdep_rtnl_is_held()
outside COFIG_LOCK_PROVING an inline definition returning 1 is
needed. Similar to the rcu_read_lock_is_held().

Fixes: 2a47fa45 ("ixgbe: enable l2 forwarding acceleration for macvlans")
CC: Veaceslav Falico <vfalico@redhat.com>
Reported-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

85328240

26 10月, 2013 1 次提交

net: fix rtnl notification in atomic context · 7f294054

由 Alexei Starovoitov 提交于 10月 23, 2013

commit 991fb3f7 "dev: always advertise rx_flags changes via netlink"
introduced rtnl notification from __dev_set_promiscuity(),
which can be called in atomic context.

Steps to reproduce:
ip tuntap add dev tap1 mode tap
ifconfig tap1 up
tcpdump -nei tap1 &
ip tuntap del dev tap1 mode tap

[  271.627994] device tap1 left promiscuous mode
[  271.639897] BUG: sleeping function called from invalid context at mm/slub.c:940
[  271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip
[  271.677525] INFO: lockdep is turned off.
[  271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G        W    3.12.0-rc3+ #73
[  271.703996] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[  271.731254]  ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5 ffff88082fa0f428
[  271.760261]  ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1 ffffffff81110ff8
[  271.790683]  0000000000000010 00000000000000d0 00000000000000d0 ffff8807f0d57af8
[  271.822332] Call Trace:
[  271.838234]  [<ffffffff817544e5>] dump_stack+0x55/0x76
[  271.854446]  [<ffffffff8108bad1>] __might_sleep+0x181/0x240
[  271.870836]  [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0
[  271.887076]  [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0
[  271.903368]  [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0
[  271.919716]  [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0
[  271.936088]  [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0
[  271.952504]  [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0
[  271.968902]  [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100
[  271.985302]  [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0
[  272.001642]  [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0
[  272.017917]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[  272.033961]  [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50
[  272.049855]  [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0
[  272.065494]  [<ffffffff81732052>] packet_notifier+0x1b2/0x380
[  272.080915]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[  272.096009]  [<ffffffff81761c66>] notifier_call_chain+0x66/0x150
[  272.110803]  [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10
[  272.125468]  [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20
[  272.139984]  [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70
[  272.154523]  [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20
[  272.168552]  [<ffffffff816224c5>] rollback_registered_many+0x145/0x240
[  272.182263]  [<ffffffff81622641>] rollback_registered+0x31/0x40
[  272.195369]  [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90
[  272.208230]  [<ffffffff81547ca0>] __tun_detach+0x140/0x340
[  272.220686]  [<ffffffff81547ed6>] tun_chr_close+0x36/0x60
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f294054

08 3月, 2013 1 次提交

net: generic fdb support for drivers without ndo_fdb_<op> · 090096bf

由 Vlad Yasevich 提交于 3月 06, 2013

If the driver does not support the ndo_op use the generic
handler for it. This should work in the majority of cases.
Eventually the fdb_dflt_add call gets translated into a
__dev_set_rx_mode() call which should handle hardware
support for filtering via the IFF_UNICAST_FLT flag.

Namely IFF_UNICAST_FLT indicates if the hardware can do
unicast address filtering. If no support is available
the device is put into promisc mode.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

090096bf

01 11月, 2012 1 次提交

ixgbe: add setlink, getlink support to ixgbe and ixgbevf · 815cccbf

由 John Fastabend 提交于 10月 24, 2012

This adds support for the net device ops to manage the embedded
hardware bridge on ixgbe devices. With this patch the bridge
mode can be toggled between VEB and VEPA to support stacking
macvlan devices or using the embedded switch without any SW
component in 802.1Qbg/br environments.

Additionally, this adds source address pruning to the ixgbevf
driver to prune any frames sent back from a reflective relay on
the switch. This is required because the existing hardware does
not support this. Without it frames get pushed into the stack
with its own src mac which is invalid per 802.1Qbg VEPA
definition.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

815cccbf

13 10月, 2012 1 次提交

UAPI: (Scripted) Disintegrate include/linux · 607ca46e

由 David Howells 提交于 10月 13, 2012

Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NDave Jones <davej@redhat.com>

607ca46e

11 7月, 2012 1 次提交
- D
  rtnetlink: Remove ts/tsage args to rtnl_put_cacheinfo(). · 87a50699
  由 David S. Miller 提交于 7月 10, 2012
```
Nobody provides non-zero values any longer.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  87a50699
28 6月, 2012 1 次提交

netlink: Get rid of obsolete rtnetlink macros · 4c3af034

由 Thomas Graf 提交于 6月 26, 2012

Removes all RTA_GET*() and RTA_PUT*() variations, as well as the
the unused rtattr_strcmp(). Get rid of rtm_get_table() by moving
it to its only user decnet.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c3af034

16 4月, 2012 1 次提交

net: add generic PF_BRIDGE:RTM_ FDB hooks · 77162022

由 John Fastabend 提交于 4月 15, 2012

This adds two new flags NTF_MASTER and NTF_SELF that can
now be used to specify where PF_BRIDGE netlink commands should
be sent. NTF_MASTER sends the commands to the 'dev->master'
device for parsing. Typically this will be the linux net/bridge,
or open-vswitch devices. Also without any flags set the command
will be handled by the master device as well so that current user
space tools continue to work as expected.

The NTF_SELF flag will push the PF_BRIDGE commands to the
device. In the basic example below the commands are then parsed
and programmed in the embedded bridge.

Note if both NTF_SELF and NTF_MASTER bits are set then the
command will be sent to both 'dev->master' and 'dev' this allows
user space to easily keep the embedded bridge and software bridge
in sync.

There is a slight complication in the case with both flags set
when an error occurs. To resolve this the rtnl handler clears
the NTF_ flag in the netlink ack to indicate which sets completed
successfully. The add/del handlers will abort as soon as any
error occurs.

To support this new net device ops were added to call into
the device and the existing bridging code was refactored
to use these. There should be no required changes in user space
to support the current bridge behavior.

A basic setup with a SR-IOV enabled NIC looks like this,

          veth0  veth2
            |      |
          ------------
          |  bridge0 |   <---- software bridging
          ------------
               /
               /
  ethx.y      ethx
    VF         PF
     \         \          <---- propagate FDB entries to HW
     \         \
  --------------------
  |  Embedded Bridge |    <---- hardware offloaded switching
  --------------------

In this case the embedded bridge must be managed to allow 'veth0'
to communicate with 'ethx.y' correctly. At present drivers managing
the embedded bridge either send frames onto the network which
then get dropped by the switch OR the embedded bridge will flood
these frames. With this patch we have a mechanism to manage the
embedded bridge correctly from user space. This example is specific
to SR-IOV but replacing the VF with another PF or dropping this
into the DSA framework generates similar management issues.

Examples session using the 'br'[1] tool to add, dump and then
delete a mac address with a new "embedded" option and enabled
ixgbe driver:

# br fdb add 22:35:19:ac:60:59 dev eth3
# br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
#br fdb add 22:35:19:ac:60:59 embedded dev eth3
#br fdb
port    mac addr                flags
veth0   22:35:19:ac:60:58       static
veth0   9a:5f:81:f7:f6:ec       local
eth3    00:1b:21:55:23:59       local
eth3    22:35:19:ac:60:59       static
veth0   22:35:19:ac:60:57       static
eth3    22:35:19:ac:60:59       local embedded
#br fdb del 22:35:19:ac:60:59 embedded dev eth3

I added a couple lines to 'br' to set the flags correctly is all. It
is my opinion that the merit of this patch is now embedded and SW
bridges can both be modeled correctly in user space using very nearly
the same message passing.

[1] 'br' tool was published as an RFC here and will be renamed 'bridge'
    http://patchwork.ozlabs.org/patch/117664/

Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for
valuable feedback, suggestions, and review.

v2: fixed api descriptions and error case with both NTF_SELF and
    NTF_MASTER set plus updated patch description.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77162022

22 2月, 2012 1 次提交

rtnetlink: Fix problem with buffer allocation · 115c9b81

由 Greg Rose 提交于 2月 21, 2012

Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
is a 32 bit value that can be used to indicate to the kernel that
certain extended ifinfo values are requested by the user application.
At this time the only mask value defined is RTEXT_FILTER_VF to
indicate that the user wants the ifinfo dump to send information
about the VFs belonging to the interface.

This patch fixes a bug in which certain applications do not have
large enough buffers to accommodate the extra information returned
by the kernel with large numbers of SR-IOV virtual functions.
Those applications will not send the new netlink attribute with
the interface info dump request netlink messages so they will
not get unexpectedly large request buffers returned by the kernel.

Modifies the rtnl_calcit function to traverse the list of net
devices and compute the minimum buffer size that can hold the
info dumps of all matching devices based upon the filter passed
in via the new netlink attribute filter mask.  If no filter
mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.

With this change it is possible to add yet to be defined netlink
attributes to the dump request which should make it fairly extensible
in the future.
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

115c9b81

09 7月, 2011 1 次提交

rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check · d8bf4ca9

由 Michal Hocko 提交于 7月 08, 2011

Since ca5ecddf (rcu: define __rcu address space modifier for sparse)
rcu_dereference_check use rcu_read_lock_held as a part of condition
automatically so callers do not have to do that as well.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

d8bf4ca9

22 6月, 2011 1 次提交

net: dcbnl, add multicast group for DCB · 314b4778

由 John Fastabend 提交于 6月 21, 2011

Now that dcbnl is being used in many cases by more
than a single agent it is beneficial to be notified
when some entity either driver or user space has
changed the DCB attributes.

Today applications either end up polling the interface
or relying on a user space database to maintain the DCB
state and post events. Polling is a poor solution for
obvious reasons. And relying on a user space database
has its own downside. Namely it has created strange
boot dependencies requiring the database be populated
before any applications dependent on DCB attributes
starts or the application goes into a polling loop.
Populating the database requires negotiating link
setting with the peer and can take anywhere from less
than a second up to a few seconds depending on the switch
implementation.

Perhaps more importantly if another application or an
embedded agent sets a DCB link attribute the database
has no way of knowing other than polling the kernel.
This prevents applications from responding quickly to
changes in link events which at least in the FCoE case
and probably any other protocols expecting a lossless
link may result in IO errors.

By adding a multicast group for DCB we have clean way
to disseminate kernel DCB link attributes up to user
space. Avoiding the need for user space to maintain
a coherant database and disperse events that potentially
do not reflect the current link state.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

314b4778

16 11月, 2010 1 次提交

net: rtnetlink.h -- only include linux/netdevice.h when used by the kernel · 3b42a96d

由 Andy Whitcroft 提交于 11月 15, 2010

The commit below added a new helper dev_ingress_queue to cleanly obtain the
ingress queue pointer.  This necessitated including 'linux/netdevice.h':

  commit 24824a09
  Author: Eric Dumazet <eric.dumazet@gmail.com>
  Date:   Sat Oct 2 06:11:55 2010 +0000

    net: dynamic ingress_queue allocation

However this include triggers issues for applications in userspace
which use the rtnetlink interfaces.  Commonly this requires they include
'net/if.h' and 'linux/rtnetlink.h' leading to a compiler error as below:

  In file included from /usr/include/linux/netdevice.h:28:0,
                   from /usr/include/linux/rtnetlink.h:9,
                   from t.c:2:
  /usr/include/linux/if.h:135:8: error: redefinition of ‘struct ifmap’
  /usr/include/net/if.h:112:8: note: originally defined here
  /usr/include/linux/if.h:169:8: error: redefinition of ‘struct ifreq’
  /usr/include/net/if.h:127:8: note: originally defined here
  /usr/include/linux/if.h:218:8: error: redefinition of ‘struct ifconf’
  /usr/include/net/if.h:177:8: note: originally defined here

The new helper is only defined for the kernel and protected by __KERNEL__
therefore we can simply pull the include down into the same protected
section.
Signed-off-by: NAndy Whitcroft <apw@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b42a96d

05 10月, 2010 2 次提交

net: relax rtnl_dereference() · 29fa060e

由 David S. Miller 提交于 10月 05, 2010

rtnl_dereference() is used in contexts where RTNL is held, to fetch an
RCU protected pointer.

Updates to this pointer are prevented by RTNL, so we dont need
smp_read_barrier_depends() and the ACCESS_ONCE() provided in
rcu_dereference_check().

rtnl_dereference() is mainly a macro to document the locking invariant.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29fa060e

net: dynamic ingress_queue allocation · 24824a09

由 Eric Dumazet 提交于 10月 02, 2010

ingress being not used very much, and net_device->ingress_queue being
quite a big object (128 or 256 bytes), use a dynamic allocation if
needed (tc qdisc add dev eth0 ingress ...)

dev_ingress_queue(dev) helper should be used only with RTNL taken.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24824a09

16 9月, 2010 1 次提交

net: add rtnl_dereference() · 7dff59ef

由 Eric Dumazet 提交于 9月 15, 2010

We sometime want to dereference an rcu protected pointer while
holding RTNL. Use a macro to hide all lockdep details.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7dff59ef

09 9月, 2010 1 次提交

net: introduce rcu_dereference_rtnl · a6e0fc85

由 Eric Dumazet 提交于 9月 08, 2010

We use rcu_dereference_check(p, rcu_read_lock_held() ||
lockdep_rtnl_is_held()) several times in network stack.

More usages to come too, so its time to create a helper.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6e0fc85

23 7月, 2010 1 次提交

net: RTA_MARK addition · 963bfeee

由 Eric Dumazet 提交于 7月 20, 2010

Add a new rt attribute, RTA_MARK, and use it in
rt_fill_info()/inet_rtm_getroute() to support following commands :

ip route get 192.168.20.110 mark NUMBER
ip route get 192.168.20.108 from 192.168.20.110 iif eth1 mark NUMBER
ip route list cache [192.168.20.110] mark NUMBER
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

963bfeee

11 5月, 2010 1 次提交

ipv6: ip6mr: support multiple tables · d1db275d

由 Patrick McHardy 提交于 5月 11, 2010

This patch adds support for multiple independant multicast routing instances,
named "tables".

Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT6_TABLE. The table number is
stored in the raw socket data and affects all following ip6mr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
is created with a default routing rule pointing to it. Newly created pim6reg
devices have the table number appended ("pim6regX"), with the exception of
devices created in the default table, which are named just "pim6reg" for
compatibility reasons.

Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.

Example usage:

- bind pimd/xorp/... to a specific table:

uint32_t table = 123;
setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));

- create routing rules directing packets to the new table:

# ip -6 mrule add iif eth0 lookup 123
# ip -6 mrule add oif eth0 lookup 123
Signed-off-by: NPatrick McHardy <kaber@trash.net>

d1db275d

26 4月, 2010 1 次提交

net: rtnetlink: decouple rtnetlink address families from real address families · 25239cee

由 Patrick McHardy 提交于 4月 26, 2010

Decouple rtnetlink address families from real address families in socket.h to
be able to add rtnetlink interfaces to code that is not a real address family
without increasing AF_MAX/NPROTO.

This will be used to add support for multicast route dumping from all tables
as the proc interface can't be extended to support anything but the main table
without breaking compatibility.

This partialy undoes the patch to introduce independant families for routing
rules and converts ipmr routing rules to a new rtnetlink family. Similar to
that patch, values up to 127 are reserved for real address families, values
above that may be used arbitrarily.
Signed-off-by: NPatrick McHardy <kaber@trash.net>

25239cee

25 2月, 2010 1 次提交

net: Add checking to rcu_dereference() primitives · a898def2

由 Paul E. McKenney 提交于 2月 22, 2010

Update rcu_dereference() primitives to use new lockdep-based
checking. The rcu_dereference() in __in6_dev_get() may be
protected either by rcu_read_lock() or RTNL, per Eric Dumazet.
The rcu_dereference() in __sk_free() is protected by the fact
that it is never reached if an update could change it.  Check
for this by using rcu_dereference_check() to verify that the
struct sock's ->sk_wmem_alloc counter is zero.
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1266887105-1528-5-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a898def2

24 12月, 2009 1 次提交

net: Add rtnetlink init_rcvwnd to set the TCP initial receive window · 31d12926

由 laurent chavey 提交于 12月 15, 2009

Add rtnetlink init_rcvwnd to set the TCP initial receive window size
advertised by passive and active TCP connections.
The current Linux TCP implementation limits the advertised TCP initial
receive window to the one prescribed by slow start. For short lived
TCP connections used for transaction type of traffic (i.e. http
requests), bounding the advertised TCP initial receive window results
in increased latency to complete the transaction.
Support for setting initial congestion window is already supported
using rtnetlink init_cwnd, but the feature is useless without the
ability to set a larger TCP initial receive window.
The rtnetlink init_rcvwnd allows increasing the TCP initial receive
window, allowing TCP connection to advertise larger TCP receive window
than the ones bounded by slow start.
Signed-off-by: NLaurent Chavey <chavey@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31d12926

16 12月, 2009 1 次提交

tcp: Revert per-route SACK/DSACK/TIMESTAMP changes. · bb5b7c11

由 David S. Miller 提交于 12月 15, 2009

It creates a regression, triggering badness for SYN_RECV
sockets, for example:

[19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
[19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
[19148.023035] REGS: eeecbd30 TRAP: 0700   Not tainted  (2.6.32)
[19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24002442  XER: 00000000
[19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000

This is likely caused by the change in the 'estab' parameter
passed to tcp_parse_options() when invoked by the functions
in net/ipv4/tcp_minisocks.c

But even if that is fixed, the ->conn_request() changes made in
this patch series is fundamentally wrong.  They try to use the
listening socket's 'dst' to probe the route settings.  The
listening socket doesn't even have a route, and you can't
get the right route (the child request one) until much later
after we setup all of the state, and it must be done by hand.

This stuff really isn't ready, so the best thing to do is a
full revert.  This reverts the following commits:

f55017a9
022c3f7d
1aba721e
cda42ebd
345cda2f
dc343475
05eaade2
6a2a2d6bSigned-off-by: NDavid S. Miller <davem@davemloft.net>

bb5b7c11

05 11月, 2009 1 次提交

net: cleanup include/linux · d94d9fee

由 Eric Dumazet 提交于 11月 04, 2009

This cleanup patch puts struct/union/enum opening braces,
in first line to ease grep games.

struct something
{

becomes :

struct something {
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d94d9fee

29 10月, 2009 4 次提交

Allow disabling of DSACK TCP option per route · dc343475

由 Gilad Ben-Yossef 提交于 10月 28, 2009

Add and use no DSCAK bit in the features field.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: NOri Finkelman <ori@comsleep.com>
Sigend-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc343475

Allow to turn off TCP window scale opt per route · 345cda2f

由 Gilad Ben-Yossef 提交于 10月 28, 2009

Add and use no window scale bit in the features field.

Note that this is not the same as setting a window scale of 0
as would happen with window limit on route.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: NOri Finkelman <ori@comsleep.com>
Sigend-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

345cda2f

Allow disabling TCP timestamp options per route · cda42ebd

由 Gilad Ben-Yossef 提交于 10月 28, 2009

Implement querying and acting upon the no timestamp bit in the feature
field.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: NOri Finkelman <ori@comsleep.com>
Sigend-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cda42ebd

Add the no SACK route option feature · 1aba721e

由 Gilad Ben-Yossef 提交于 10月 28, 2009

Implement querying and acting upon the no sack bit in the features
field.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: NOri Finkelman <ori@comsleep.com>
Sigend-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1aba721e

09 9月, 2009 1 次提交

IPv6/addrconf: Fix minor addrlabel thinko · 5d5d9c97

由 Tushar Gohad 提交于 9月 09, 2009

Fix apparent thinko related to RTM_DELADDRLABEL, introduced by commit
2a8cc6c8 ("[IPV6] ADDRCONF: Support
RFC3484 configurable address selection policy table.").
Signed-off-by: NTushar Gohad <tgohad@mvista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d5d9c97

20 3月, 2009 1 次提交

rtnetlink: add new value for DHCP added routes · 2e1ab634

由 Stephen Hemminger 提交于 3月 19, 2009

To improve manageability, it would be good to be able to disambiguate routes
added by administrator from those added by DHCP client.  The only necessary
kernel change is to add value to rtnetlink include file so iproute2 utility
can use it.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e1ab634

25 2月, 2009 1 次提交

netlink: change nlmsg_notify() return value logic · 1ce85fe4

由 Pablo Neira Ayuso 提交于 2月 24, 2009

This patch changes the return value of nlmsg_notify() as follows:

If NETLINK_BROADCAST_ERROR is set by any of the listeners and
an error in the delivery happened, return the broadcast error;
else if there are no listeners apart from the socket that
requested a change with the echo flag, return the result of the
unicast notification. Thus, with this patch, the unicast
notification is handled in the same way of a broadcast listener
that has set the NETLINK_BROADCAST_ERROR socket flag.

This patch is useful in case that the caller of nlmsg_notify()
wants to know the result of the delivery of a netlink notification
(including the broadcast delivery) and take any action in case
that the delivery failed. For example, ctnetlink can drop packets
if the event delivery failed to provide reliable logging and
state-synchronization at the cost of dropping packets.

This patch also modifies the rtnetlink code to ignore the return
value of rtnl_notify() in all callers. The function rtnl_notify()
(before this patch) returned the error of the unicast notification
which makes rtnl_set_sk_err() reports errors to all listeners. This
is not of any help since the origin of the change (the socket that
requested the echoing) notices the ENOBUFS error if the notification
fails and should resync itself.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ce85fe4

31 1月, 2009 1 次提交

headers_check fix: linux/rtnetlink.h · 541c94f1

由 Jaswinder Singh Rajput 提交于 1月 30, 2009

fix the following 'make headers_check' warning:

usr/include/linux/rtnetlink.h:328: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>

541c94f1

21 11月, 2008 1 次提交

ixgbe: this patch adds support for DCB to the kernel and ixgbe driver · 2f90b865

由 Alexander Duyck 提交于 11月 20, 2008

This adds support for Data Center Bridging (DCB) features in the ixgbe
driver and adds an rtnetlink interface for configuring DCB to the
kernel. The DCB feature support included are Priority Grouping (PG) -
which allows bandwidth guarantees to be allocated to groups to traffic
based on the 802.1q priority, and Priority Based Flow Control (PFC) -
which introduces a new MAC control PAUSE frame which works at
granularity of the 802.1p priority instead of the link (IEEE 802.3x).
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f90b865

23 9月, 2008 1 次提交

Phonet: global definitions · bce7b154

由 Remi Denis-Courmont 提交于 9月 22, 2008

Signed-off-by: NRemi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bce7b154

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功