1. 06 1月, 2015 2 次提交
    • D
      net: tcp: add RTAX_CC_ALGO fib handling · ea697639
      Daniel Borkmann 提交于
      This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
      control metric to be set up and dumped back to user space.
      
      While the internal representation of RTAX_CC_ALGO is handled as a u32
      key, we avoided to expose this implementation detail to user space, thus
      instead, we chose the netlink attribute that is being exchanged between
      user space to be the actual congestion control algorithm name, similarly
      as in the setsockopt(2) API in order to allow for maximum flexibility,
      even for 3rd party modules.
      
      It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
      it should have been stored in RTAX_FEATURES instead, we first thought
      about reusing it for the congestion control key, but it brings more
      complications and/or confusion than worth it.
      
      Joint work with Florian Westphal.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea697639
    • H
      net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined · 6cb69742
      Hubert Sokolowski 提交于
      Add checking whether the call to ndo_dflt_fdb_dump is needed.
      It is not expected to call ndo_dflt_fdb_dump unconditionally
      by some drivers (i.e. qlcnic or macvlan) that defines
      own ndo_fdb_dump. Other drivers define own ndo_fdb_dump
      and don't want ndo_dflt_fdb_dump to be called at all.
      At the same time it is desirable to call the default dump
      function on a bridge device.
      Fix attributes that are passed to dev->netdev_ops->ndo_fdb_dump.
      Add extra checking in br_fdb_dump to avoid duplicate entries
      as now filter_dev can be NULL.
      
      Following tests for filtering have been performed before
      the change and after the patch was applied to make sure
      they are the same and it doesn't break the filtering algorithm.
      
      [root@localhost ~]# cd /root/iproute2-3.18.0/bridge
      [root@localhost bridge]# modprobe dummy
      [root@localhost bridge]# ./bridge fdb add f1:f2:f3:f4:f5:f6 dev dummy0
      [root@localhost bridge]# brctl addbr br0
      [root@localhost bridge]# brctl addif  br0 dummy0
      [root@localhost bridge]# ip link set dev br0 address 02:00:00:12:01:04
      [root@localhost bridge]# # show all
      [root@localhost bridge]# ./bridge fdb show
      33:33:00:00:00:01 dev p2p1 self permanent
      01:00:5e:00:00:01 dev p2p1 self permanent
      33:33:ff:ac:ce:32 dev p2p1 self permanent
      33:33:00:00:02:02 dev p2p1 self permanent
      01:00:5e:00:00:fb dev p2p1 self permanent
      33:33:00:00:00:01 dev p7p1 self permanent
      01:00:5e:00:00:01 dev p7p1 self permanent
      33:33:ff:79:50:53 dev p7p1 self permanent
      33:33:00:00:02:02 dev p7p1 self permanent
      01:00:5e:00:00:fb dev p7p1 self permanent
      f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
      f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
      33:33:00:00:00:01 dev dummy0 self permanent
      f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
      33:33:00:00:00:01 dev br0 self permanent
      02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
      02:00:00:12:01:04 dev br0 master br0 permanent
      [root@localhost bridge]# # filter by bridge
      [root@localhost bridge]# ./bridge fdb show br br0
      f2:46:50:85:6d:d9 dev dummy0 master br0 permanent
      f2:46:50:85:6d:d9 dev dummy0 vlan 1 master br0 permanent
      33:33:00:00:00:01 dev dummy0 self permanent
      f1:f2:f3:f4:f5:f6 dev dummy0 self permanent
      33:33:00:00:00:01 dev br0 self permanent
      02:00:00:12:01:04 dev br0 vlan 1 master br0 permanent
      02:00:00:12:01:04 dev br0 master br0 permanent
      [root@localhost bridge]# # filter by port
      [root@localhost bridge]# ./bridge fdb show brport dummy0
      f2:46:50:85:6d:d9 master br0 permanent
      f2:46:50:85:6d:d9 vlan 1 master br0 permanent
      33:33:00:00:00:01 self permanent
      f1:f2:f3:f4:f5:f6 self permanent
      [root@localhost bridge]# # filter by port + bridge
      [root@localhost bridge]# ./bridge fdb show br br0 brport dummy0
      f2:46:50:85:6d:d9 master br0 permanent
      f2:46:50:85:6d:d9 vlan 1 master br0 permanent
      33:33:00:00:00:01 self permanent
      f1:f2:f3:f4:f5:f6 self permanent
      [root@localhost bridge]#
      Signed-off-by: NHubert Sokolowski <hubert.sokolowski@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cb69742
  2. 17 12月, 2014 1 次提交
  3. 10 12月, 2014 2 次提交
    • R
      rocker: remove swdev mode · 1d460b98
      Roopa Prabhu 提交于
      Remove use of 'swdev' mode in rocker. rocker dev offloads
      can use the BRIDGE_FLAGS_SELF to indicate offload to hardware.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d460b98
    • M
      rtnetlink: delay RTM_DELLINK notification until after ndo_uninit() · 395eea6c
      Mahesh Bandewar 提交于
      The commit 56bfa7ee ("unregister_netdevice : move RTM_DELLINK to
      until after ndo_uninit") tried to do this ealier but while doing so
      it created a problem. Unfortunately the delayed rtmsg_ifinfo() also
      delayed call to fill_info(). So this translated into asking driver
      to remove private state and then query it's private state. This
      could have catastropic consequences.
      
      This change breaks the rtmsg_ifinfo() into two parts - one takes the
      precise snapshot of the device by called fill_info() before calling
      the ndo_uninit() and the second part sends the notification using
      collected snapshot.
      
      It was brought to notice when last link is deleted from an ipvlan device
      when it has free-ed the port and the subsequent .fill_info() call is
      trying to get the info from the port.
      
      kernel: [  255.139429] ------------[ cut here ]------------
      kernel: [  255.139439] WARNING: CPU: 12 PID: 11173 at net/core/rtnetlink.c:2238 rtmsg_ifinfo+0x100/0x110()
      kernel: [  255.139493] Modules linked in: ipvlan bonding w1_therm ds2482 wire cdc_acm ehci_pci ehci_hcd i2c_dev i2c_i801 i2c_core msr cpuid bnx2x ptp pps_core mdio libcrc32c
      kernel: [  255.139513] CPU: 12 PID: 11173 Comm: ip Not tainted 3.18.0-smp-DEV #167
      kernel: [  255.139514] Hardware name: Intel RML,PCH/Ibis_QC_18, BIOS 1.0.10 05/15/2012
      kernel: [  255.139515]  0000000000000009 ffff880851b6b828 ffffffff815d87f4 00000000000000e0
      kernel: [  255.139516]  0000000000000000 ffff880851b6b868 ffffffff8109c29c 0000000000000000
      kernel: [  255.139518]  00000000ffffffa6 00000000000000d0 ffffffff81aaf580 0000000000000011
      kernel: [  255.139520] Call Trace:
      kernel: [  255.139527]  [<ffffffff815d87f4>] dump_stack+0x46/0x58
      kernel: [  255.139531]  [<ffffffff8109c29c>] warn_slowpath_common+0x8c/0xc0
      kernel: [  255.139540]  [<ffffffff8109c2ea>] warn_slowpath_null+0x1a/0x20
      kernel: [  255.139544]  [<ffffffff8150d570>] rtmsg_ifinfo+0x100/0x110
      kernel: [  255.139547]  [<ffffffff814f78b5>] rollback_registered_many+0x1d5/0x2d0
      kernel: [  255.139549]  [<ffffffff814f79cf>] unregister_netdevice_many+0x1f/0xb0
      kernel: [  255.139551]  [<ffffffff8150acab>] rtnl_dellink+0xbb/0x110
      kernel: [  255.139553]  [<ffffffff8150da90>] rtnetlink_rcv_msg+0xa0/0x240
      kernel: [  255.139557]  [<ffffffff81329283>] ? rhashtable_lookup_compare+0x43/0x80
      kernel: [  255.139558]  [<ffffffff8150d9f0>] ? __rtnl_unlock+0x20/0x20
      kernel: [  255.139562]  [<ffffffff8152cb11>] netlink_rcv_skb+0xb1/0xc0
      kernel: [  255.139563]  [<ffffffff8150a495>] rtnetlink_rcv+0x25/0x40
      kernel: [  255.139565]  [<ffffffff8152c398>] netlink_unicast+0x178/0x230
      kernel: [  255.139567]  [<ffffffff8152c75f>] netlink_sendmsg+0x30f/0x420
      kernel: [  255.139571]  [<ffffffff814e0b0c>] sock_sendmsg+0x9c/0xd0
      kernel: [  255.139575]  [<ffffffff811d1d7f>] ? rw_copy_check_uvector+0x6f/0x130
      kernel: [  255.139577]  [<ffffffff814e11c9>] ? copy_msghdr_from_user+0x139/0x1b0
      kernel: [  255.139578]  [<ffffffff814e1774>] ___sys_sendmsg+0x304/0x310
      kernel: [  255.139581]  [<ffffffff81198723>] ? handle_mm_fault+0xca3/0xde0
      kernel: [  255.139585]  [<ffffffff811ebc4c>] ? destroy_inode+0x3c/0x70
      kernel: [  255.139589]  [<ffffffff8108e6ec>] ? __do_page_fault+0x20c/0x500
      kernel: [  255.139597]  [<ffffffff811e8336>] ? dput+0xb6/0x190
      kernel: [  255.139606]  [<ffffffff811f05f6>] ? mntput+0x26/0x40
      kernel: [  255.139611]  [<ffffffff811d2b94>] ? __fput+0x174/0x1e0
      kernel: [  255.139613]  [<ffffffff814e2129>] __sys_sendmsg+0x49/0x90
      kernel: [  255.139615]  [<ffffffff814e2182>] SyS_sendmsg+0x12/0x20
      kernel: [  255.139617]  [<ffffffff815df092>] system_call_fastpath+0x12/0x17
      kernel: [  255.139619] ---[ end trace 5e6703e87d984f6b ]---
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Reported-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: David S. Miller <davem@davemloft.net>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      395eea6c
  4. 03 12月, 2014 4 次提交
  5. 30 11月, 2014 1 次提交
  6. 27 11月, 2014 2 次提交
  7. 04 11月, 2014 1 次提交
  8. 03 9月, 2014 4 次提交
  9. 09 8月, 2014 1 次提交
  10. 17 7月, 2014 1 次提交
  11. 16 7月, 2014 2 次提交
  12. 11 7月, 2014 2 次提交
    • J
      bridge: netlink dump interface at par with brctl · 5e6d2435
      Jamal Hadi Salim 提交于
      Actually better than brctl showmacs because we can filter by bridge
      port in the kernel.
      The current bridge netlink interface doesnt scale when you have many
      bridges each with large fdbs or even bridges with many bridge ports
      
      And now for the science non-fiction novel you have all been
      waiting for..
      
      //lets see what bridge ports we have
      root@moja-1:/configs/may30-iprt/bridge# ./bridge link show
      8: eth1 state DOWN : <BROADCAST,MULTICAST> mtu 1500 master br0 state
      disabled priority 32 cost 19
      17: sw1-p1 state DOWN : <BROADCAST,NOARP> mtu 1500 master br0 state
      disabled priority 32 cost 100
      
      // show all..
      root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show
      33:33:00:00:00:01 dev bond0 self permanent
      33:33:00:00:00:01 dev dummy0 self permanent
      33:33:00:00:00:01 dev ifb0 self permanent
      33:33:00:00:00:01 dev ifb1 self permanent
      33:33:00:00:00:01 dev eth0 self permanent
      01:00:5e:00:00:01 dev eth0 self permanent
      33:33:ff:22:01:01 dev eth0 self permanent
      02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent
      00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent
      00:17:42:8a:b4:07 dev eth1 self permanent
      33:33:00:00:00:01 dev eth1 self permanent
      33:33:00:00:00:01 dev gretap0 self permanent
      da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent
      33:33:00:00:00:01 dev sw1-p1 self permanent
      
      //filter by bridge
      root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br br0
      02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent
      00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent
      00:17:42:8a:b4:07 dev eth1 self permanent
      33:33:00:00:00:01 dev eth1 self permanent
      da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent
      33:33:00:00:00:01 dev sw1-p1 self permanent
      
      // bridge sw1 has no ports attached..
      root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br sw1
      
      //filter by port
      root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show brport eth1
      02:00:00:12:01:02 vlan 0 master br0 permanent
      00:17:42:8a:b4:05 vlan 0 master br0 permanent
      00:17:42:8a:b4:07 self permanent
      33:33:00:00:00:01 self permanent
      
      // filter by port + bridge
      root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br br0 brport
      sw1-p1
      da:ac:46:27:d9:53 vlan 0 master br0 permanent
      33:33:00:00:00:01 self permanent
      
      // for shits and giggles (as they say in New Brunswick), lets
      // change the mac that br0 uses
      // Note: a magical fdb entry with no brport is added ...
      root@moja-1:/configs/may30-iprt/bridge# ip link set dev br0 address
      02:00:00:12:01:04
      
      // lets see if we can see the unicorn ..
      root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show
      33:33:00:00:00:01 dev bond0 self permanent
      33:33:00:00:00:01 dev dummy0 self permanent
      33:33:00:00:00:01 dev ifb0 self permanent
      33:33:00:00:00:01 dev ifb1 self permanent
      33:33:00:00:00:01 dev eth0 self permanent
      01:00:5e:00:00:01 dev eth0 self permanent
      33:33:ff:22:01:01 dev eth0 self permanent
      02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent
      00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent
      00:17:42:8a:b4:07 dev eth1 self permanent
      33:33:00:00:00:01 dev eth1 self permanent
      33:33:00:00:00:01 dev gretap0 self permanent
      02:00:00:12:01:04 dev br0 vlan 0 master br0 permanent <=== there it is
      da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent
      33:33:00:00:00:01 dev sw1-p1 self permanent
      
      //can we see it if we filter by bridge?
      root@moja-1:/configs/may30-iprt/bridge# ./bridge fdb show br br0
      02:00:00:12:01:02 dev eth1 vlan 0 master br0 permanent
      00:17:42:8a:b4:05 dev eth1 vlan 0 master br0 permanent
      00:17:42:8a:b4:07 dev eth1 self permanent
      33:33:00:00:00:01 dev eth1 self permanent
      02:00:00:12:01:04 dev br0 vlan 0 master br0 permanent <=== there it is
      da:ac:46:27:d9:53 dev sw1-p1 vlan 0 master br0 permanent
      33:33:00:00:00:01 dev sw1-p1 self permanent
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e6d2435
    • J
      bridge: fdb dumping takes a filter device · 5d5eacb3
      Jamal Hadi Salim 提交于
      Dumping a bridge fdb dumps every fdb entry
      held. With this change we are going to filter
      on selected bridge port.
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d5eacb3
  13. 02 7月, 2014 1 次提交
  14. 13 6月, 2014 1 次提交
    • M
      rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 · e5eca6d4
      Michal Schmidt 提交于
      When running RHEL6 userspace on a current upstream kernel, "ip link"
      fails to show VF information.
      
      The reason is a kernel<->userspace API change introduced by commit
      88c5b5ce ("rtnetlink: Call nlmsg_parse() with correct header length"),
      after which the kernel does not see iproute2's IFLA_EXT_MASK attribute
      in the netlink request.
      
      iproute2 adjusted for the API change in its commit 63338dca4513
      ("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter").
      
      The problem has been noticed before:
      http://marc.info/?l=linux-netdev&m=136692296022182&w=2
      (Subject: Re: getting VF link info seems to be broken in 3.9-rc8)
      
      We can do better than tell those with old userspace to upgrade. We can
      recognize the old iproute2 in the kernel by checking the netlink message
      length. Even when including the IFLA_EXT_MASK attribute, its netlink
      message is shorter than struct ifinfomsg.
      
      With this patch "ip link" shows VF information in both old and new
      iproute2 versions.
      Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5eca6d4
  15. 12 6月, 2014 1 次提交
  16. 09 6月, 2014 1 次提交
  17. 04 6月, 2014 1 次提交
  18. 24 5月, 2014 1 次提交
    • S
      net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool. · ed616689
      Sucheta Chakraborty 提交于
      o min_tx_rate puts lower limit on the VF bandwidth. VF is guaranteed
        to have a bandwidth of at least this value.
        max_tx_rate puts cap on the VF bandwidth. VF can have a bandwidth
        of up to this value.
      
      o A new handler set_vf_rate for attr IFLA_VF_RATE has been introduced
        which takes 4 arguments:
        netdev, VF number, min_tx_rate, max_tx_rate
      
      o ndo_set_vf_rate replaces ndo_set_vf_tx_rate handler.
      
      o Drivers that currently implement ndo_set_vf_tx_rate should now call
        ndo_set_vf_rate instead and reject attempt to set a minimum bandwidth
        greater than 0 for IFLA_VF_TX_RATE when IFLA_VF_RATE is not yet
        implemented by driver.
      
      o If user enters only one of either min_tx_rate or max_tx_rate, then,
        userland should read back the other value from driver and set both
        for IFLA_VF_RATE.
        Drivers that have not yet implemented IFLA_VF_RATE should always
        return min_tx_rate as 0 when read from ip tool.
      
      o If both IFLA_VF_TX_RATE and IFLA_VF_RATE options are specified, then
        IFLA_VF_RATE should override.
      
      o Idea is to have consistent display of rate values to user.
      
      o Usage example: -
      
        ./ip link set p4p1 vf 0 rate 900
      
        ./ip link show p4p1
        32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
        DEFAULT qlen 1000
          link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
          vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 900 (Mbps), max_tx_rate 900Mbps
          vf 1 MAC f6:c6:7c:3f:3d:6c
          vf 2 MAC 56:32:43:98:d7:71
          vf 3 MAC d6:be:c3:b5:85:ff
          vf 4 MAC ee:a9:9a:1e:19:14
          vf 5 MAC 4a:d0:4c:07:52:18
          vf 6 MAC 3a:76:44:93:62:f9
          vf 7 MAC 82:e9:e7:e3:15:1a
      
        ./ip link set p4p1 vf 0 max_tx_rate 300 min_tx_rate 200
      
        ./ip link show p4p1
        32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
        DEFAULT qlen 1000
          link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
          vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 300 (Mbps), max_tx_rate 300Mbps,
          min_tx_rate 200Mbps
          vf 1 MAC f6:c6:7c:3f:3d:6c
          vf 2 MAC 56:32:43:98:d7:71
          vf 3 MAC d6:be:c3:b5:85:ff
          vf 4 MAC ee:a9:9a:1e:19:14
          vf 5 MAC 4a:d0:4c:07:52:18
          vf 6 MAC 3a:76:44:93:62:f9
          vf 7 MAC 82:e9:e7:e3:15:1a
      
        ./ip link set p4p1 vf 0 max_tx_rate 600 rate 300
      
        ./ip link show p4p1
        32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
        DEFAULT qlen 1000
          link/ether 00:0e:1e:08:b0:f brd ff:ff:ff:ff:ff:ff
          vf 0 MAC 3e:a0:ca:bd:ae:5, tx rate 600 (Mbps), max_tx_rate 600Mbps,
          min_tx_rate 200Mbps
          vf 1 MAC f6:c6:7c:3f:3d:6c
          vf 2 MAC 56:32:43:98:d7:71
          vf 3 MAC d6:be:c3:b5:85:ff
          vf 4 MAC ee:a9:9a:1e:19:14
          vf 5 MAC 4a:d0:4c:07:52:18
          vf 6 MAC 3a:76:44:93:62:f9
          vf 7 MAC 82:e9:e7:e3:15:1a
      Signed-off-by: NSucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed616689
  19. 16 5月, 2014 1 次提交
  20. 25 4月, 2014 3 次提交
    • D
      rtnetlink: Only supply IFLA_VF_PORTS information when RTEXT_FILTER_VF is set · c53864fd
      David Gibson 提交于
      Since 115c9b81 (rtnetlink: Fix problem with
      buffer allocation), RTM_NEWLINK messages only contain the IFLA_VFINFO_LIST
      attribute if they were solicited by a GETLINK message containing an
      IFLA_EXT_MASK attribute with the RTEXT_FILTER_VF flag.
      
      That was done because some user programs broke when they received more data
      than expected - because IFLA_VFINFO_LIST contains information for each VF
      it can become large if there are many VFs.
      
      However, the IFLA_VF_PORTS attribute, supplied for devices which implement
      ndo_get_vf_port (currently the 'enic' driver only), has the same problem.
      It supplies per-VF information and can therefore become large, but it is
      not currently conditional on the IFLA_EXT_MASK value.
      
      Worse, it interacts badly with the existing EXT_MASK handling.  When
      IFLA_EXT_MASK is not supplied, the buffer for netlink replies is fixed at
      NLMSG_GOODSIZE.  If the information for IFLA_VF_PORTS exceeds this, then
      rtnl_fill_ifinfo() returns -EMSGSIZE on the first message in a packet.
      netlink_dump() will misinterpret this as having finished the listing and
      omit data for this interface and all subsequent ones.  That can cause
      getifaddrs(3) to enter an infinite loop.
      
      This patch addresses the problem by only supplying IFLA_VF_PORTS when
      IFLA_EXT_MASK is supplied with the RTEXT_FILTER_VF flag set.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c53864fd
    • D
      rtnetlink: Warn when interface's information won't fit in our packet · 973462bb
      David Gibson 提交于
      Without IFLA_EXT_MASK specified, the information reported for a single
      interface in response to RTM_GETLINK is expected to fit within a netlink
      packet of NLMSG_GOODSIZE.
      
      If it doesn't, however, things will go badly wrong,  When listing all
      interfaces, netlink_dump() will incorrectly treat -EMSGSIZE on the first
      message in a packet as the end of the listing and omit information for
      that interface and all subsequent ones.  This can cause getifaddrs(3) to
      enter an infinite loop.
      
      This patch won't fix the problem, but it will WARN_ON() making it easier to
      track down what's going wrong.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      973462bb
    • E
      net: Use netlink_ns_capable to verify the permisions of netlink messages · 90f62cf3
      Eric W. Biederman 提交于
      It is possible by passing a netlink socket to a more privileged
      executable and then to fool that executable into writing to the socket
      data that happens to be valid netlink message to do something that
      privileged executable did not intend to do.
      
      To keep this from happening replace bare capable and ns_capable calls
      with netlink_capable, netlink_net_calls and netlink_ns_capable calls.
      Which act the same as the previous calls except they verify that the
      opener of the socket had the desired permissions as well.
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90f62cf3
  21. 01 4月, 2014 1 次提交
  22. 21 3月, 2014 1 次提交
  23. 19 2月, 2014 1 次提交
  24. 14 2月, 2014 1 次提交
    • C
      net: correct error path in rtnl_newlink() · 0e0eee24
      Cong Wang 提交于
      I saw the following BUG when ->newlink() fails in rtnl_newlink():
      
      [   40.240058] kernel BUG at net/core/dev.c:6438!
      
      this is due to free_netdev() is not supposed to be called before
      netdev is completely unregistered, therefore it is not correct
      to call free_netdev() here, at least for ops->newlink!=NULL case,
      many drivers call it in ->destructor so that rtnl_unlock() will
      take care of it, we probably don't need to do anything here.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e0eee24
  25. 05 2月, 2014 1 次提交
    • F
      rtnetlink: fix oops in rtnl_link_get_slave_info_data_size · 6049f253
      Fernando Luis Vazquez Cao 提交于
      We should check whether rtnetlink link operations
      are defined before calling get_slave_size().
      
      Without this, the following oops can occur when
      adding a tap device to OVS.
      
      [   87.839553] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
      [   87.839595] IP: [<ffffffff813d47c0>] if_nlmsg_size+0xf0/0x220
      [...]
      [   87.840651] Call Trace:
      [   87.840664]  [<ffffffff813d694b>] ? rtmsg_ifinfo+0x2b/0x100
      [   87.840688]  [<ffffffff813c8340>] ? __netdev_adjacent_dev_insert+0x150/0x1a0
      [   87.840718]  [<ffffffff813d6a50>] ? rtnetlink_event+0x30/0x40
      [   87.840742]  [<ffffffff814b4144>] ? notifier_call_chain+0x44/0x70
      [   87.840768]  [<ffffffff813c8946>] ? __netdev_upper_dev_link+0x3c6/0x3f0
      [   87.840798]  [<ffffffffa0678d6c>] ? netdev_create+0xcc/0x160 [openvswitch]
      [   87.840828]  [<ffffffffa06781ea>] ? ovs_vport_add+0x4a/0xd0 [openvswitch]
      [   87.840857]  [<ffffffffa0670139>] ? new_vport+0x9/0x50 [openvswitch]
      [   87.840884]  [<ffffffffa067279e>] ? ovs_vport_cmd_new+0x11e/0x210 [openvswitch]
      [   87.840915]  [<ffffffff813f3efa>] ? genl_family_rcv_msg+0x19a/0x360
      [   87.840941]  [<ffffffff813f40c0>] ? genl_family_rcv_msg+0x360/0x360
      [   87.840967]  [<ffffffff813f4139>] ? genl_rcv_msg+0x79/0xc0
      [   87.840991]  [<ffffffff813b6cf9>] ? __kmalloc_reserve.isra.25+0x29/0x80
      [   87.841018]  [<ffffffff813f2389>] ? netlink_rcv_skb+0xa9/0xc0
      [   87.841042]  [<ffffffff813f27cf>] ? genl_rcv+0x1f/0x30
      [   87.841064]  [<ffffffff813f1988>] ? netlink_unicast+0xe8/0x1e0
      [   87.841088]  [<ffffffff813f1d9a>] ? netlink_sendmsg+0x31a/0x750
      [   87.841113]  [<ffffffff813aee96>] ? sock_sendmsg+0x86/0xc0
      [   87.841136]  [<ffffffff813c960d>] ? __netdev_update_features+0x4d/0x200
      [   87.841163]  [<ffffffff813ca94e>] ? ethtool_get_value+0x2e/0x50
      [   87.841188]  [<ffffffff813af269>] ? ___sys_sendmsg+0x359/0x370
      [   87.841212]  [<ffffffff813da686>] ? dev_ioctl+0x1a6/0x5c0
      [   87.841236]  [<ffffffff8109c210>] ? autoremove_wake_function+0x30/0x30
      [   87.841264]  [<ffffffff813ac59d>] ? sock_do_ioctl+0x3d/0x50
      [   87.841288]  [<ffffffff813aca68>] ? sock_ioctl+0x1e8/0x2c0
      [   87.841312]  [<ffffffff811934bf>] ? do_vfs_ioctl+0x2cf/0x4b0
      [   87.841335]  [<ffffffff813afeb9>] ? __sys_sendmsg+0x39/0x70
      [   87.841362]  [<ffffffff814b86f9>] ? system_call_fastpath+0x16/0x1b
      [   87.841386] Code: c0 74 10 48 89 ef ff d0 83 c0 07 83 e0 fc 48 98 49 01 c7 48 89 ef e8 d0 d6 fe ff 48 85 c0 0f 84 df 00 00 00 48 8b 90 08 07 00 00 <48> 8b 8a a8 00 00 00 31 d2 48 85 c9 74 0c 48 89 ee 48 89 c7 ff
      [   87.841529] RIP  [<ffffffff813d47c0>] if_nlmsg_size+0xf0/0x220
      [   87.841555]  RSP <ffff880221aa5950>
      [   87.841569] CR2: 00000000000000a8
      [   87.851442] ---[ end trace e42ab217691b4fc2 ]---
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6049f253
  26. 24 1月, 2014 1 次提交
  27. 23 1月, 2014 1 次提交